Closed Bug 450858 Opened 16 years ago Closed 11 years ago

[th] Default Character Encoding for Thai build should be set appropriately

Categories

(Mozilla Localizations :: th / Thai, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: kengggg, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1
Build Identifier: 

Refer to bug 284265.

Since Thai fonts are appropriately set. The default character set for Thai build should be set appropriately as well.

Now the default charset of Thai build is set to "Western ISO-8859-1" which caused distorted font display when visit web page which is not specify charset, therefore the default character set of Thai build should be "Thai TIS-620".

Reproducible: Always

Steps to Reproduce:
1. Change default character encoding to "Western ISO-8859-1"
2. Visit http://www.nectec.or.th/it-standards/std620/std620.htm
3.
Actual Results:  
The header of the page will be displayed as "		
Áҵðҹ¼ÅÔµÀѳ±ìÍصÊÒË¡ÃÃÁ"

Expected Results:  
The header of the page should be displayed as "มาตรฐานผลิตภัณฑ์อุตสาหกรรม"
Blocks: thai
depends on bug 65896
I think the better default is ISO-8859-11? In practice, it's TIS-620 plus special set of characters (only few).

Also cc Samphan
TIS-620 same as Isriya.
(In reply to comment #2)
> I think the better default is ISO-8859-11? In practice, it's TIS-620 plus
> special set of characters (only few).

The only difference is character 'A0'
* ISO 8859-11 defines it as a non-breaking space
* TIS-620 leaves it undefined

For comparison, Windows 874 is also based on TIS-620,
with additions of Control Characters (e.g. shift, substitute, bell, acknowledge, delete, ...) and these "modern text" characters [1]:

80 = U+20AC : EURO SIGN
85 = U+2026 : HORIZONTAL ELLIPSIS
91 = U+2018 : LEFT SINGLE QUOTATION MARK
92 = U+2019 : RIGHT SINGLE QUOTATION MARK
93 = U+201C : LEFT DOUBLE QUOTATION MARK
94 = U+201D : RIGHT DOUBLE QUOTATION MARK
95 = U+2022 : BULLET
96 = U+2013 : EN DASH
97 = U+2014 : EM DASH
A0 = U+00A0 : NO-BREAK SPACE


As there are only two Thai character sets that have been assigned by IANA,
the 'historic' IBM-Thai (CP838) and TIS-620 [2].
As CP838 is proprietary and found on use on the Internet,
for information exchange standard sake, we should make it TIS-620.


[1] http://www.microsoft.com/globaldev/reference/sbcs/874.mspx
[2] http://www.iana.org/assignments/character-sets
Status: UNCONFIRMED → NEW
Ever confirmed: true
I agree with Arthit's argument. Difference between TIS-620 and ISO-8859-11 is negligible (NON-BREAKING SPACE is rarely used in practice) so we should stick with the IANA-assigned character set (TIS-620).
Flags: blocking1.9.0.3?
Flags: blocking1.9.0.2?
Flags: blocking-thunderbird3.0b1?
Flags: blocking-firefox3.1?
Flags: blocking1.9.0.2?
Flags: blocking1.9.0.3?
Flags: blocking-thunderbird3.0b1?
Flags: blocking-firefox3.1?
I agree with Arthit and Phisite. We should stick with TIS-620.

the config file is /l10n/th/toolkit/chrome/global/intl.properties? if it correct, I will verify and submit a patch.
Simon, do we even have different implementations for the thai charsets?

Also, would it make sense to set the intl.charset.detector instead?
For decoding we treat all three Thai charsets as if they were Windows-874. For encoding we distinguish between them. 

In other words, if a web page whose encoding is declared as TIS-620 or ISO-8859-11 contains e.g. 0x85, we will display it as U+2026 HORIZONTAL ELLIPSIS, but if we output TIS-620 or ISO-8859-11 we will treat U+2026 as an undefined character and will only encode it as 0x85 when explicitly outputting Windows-874.

All that said, I agree with the comments here that we should use the IANA-assigned TIS-620, if only to keep up appearances ;-)

There is currently no support for Thai in the charset detector.
Attached image Thai default charset. (deleted) —
Could we set TIS-620 to be a default charset in Firefox's perference?
AFAICT, the Thai localization uses TIS-620 on 1.9.1. Or is that not working for you?

Or are you talking about something else?
Attached image Default charset on ThaiL10n. (deleted) —
Axel,

my point is TIS-620 should be set as default charset in Thai L10n build. Now the default velue is ISO-8859-1.
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/th/toolkit/chrome/global/intl.properties?mark=33-33#33 says that that's the case.

Mind testing on a fresh profile? I would expect that you're having left-overs switching to ISO-8859-1
Attached image Thai default charset on 3.0.8 (deleted) —
Well, from the comment #1 the step to reproduce step 1 should be start Firefox Thai L10n with a fresh profile, instead of changing default charset to ISO-8859-1.

So, I've tested with Firefox 3.0.8 Thai L10n fresh profile, the default charset is ISO-8859-1 instead of TIS-620.
Probably you should fix charset entries in http://mxr.mozilla.org/l10n/source/th/toolkit/chrome/global-platform/win/intl.properties#2,﷒0﷓ and mac/intl.properties#2
They still have ISO-8859-1 instead TIS-620.
Yeah, what Alexander said.

Simon, do you remember why we have that entry in both global-platform and global?
Here is the patch followed Alexander's suggestion.

I did it 3 platforms at a time.
Comment on attachment 372919 [details] [diff] [review]
Patch for setting default charset on 3 platforms.

This likely is bitrotted by now...  but Axel, who would review this? Is this change still needed?
Dietrich, this change is still needed, i'm wondering if this patch is still work on fx4...
Pull the latest source and try it! If you update it very soon and we can find a reviewer, maybe we can get it in Firefox 4 :)
Sure :)
Simon, why do we actually have intl.charset.default in global/intl.properties, it seems we're only using the platform ones?

As for the review, this is a patch to the thai locale, so the folks there can just apply it.
I don't even understand why we have the platform ones in the first place, or if we still need them. Apparently you already asked me that in bug 488433 and I never answered. Let's continue that discussion there and not diffuse this bug.
There has no problem in this case with Firefox 4.0b6 on Mac OSX 10.6 . I think this bug has been solved.
This bug has no problem in current version of Firefox (en-US/Thai). Should we close this bug?

Screenshot of http://www.nectec.or.th/it-standards/std620/std620.htm in Firefox 20.0.1: http://imgur.com/PWxG2OQ
Cool! Do we know what fixed it?
WFM per comments 23 & 24.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: