Closed
Bug 450858
Opened 16 years ago
Closed 12 years ago
[th] Default Character Encoding for Thai build should be set appropriately
Categories
(Mozilla Localizations :: th / Thai, defect)
Mozilla Localizations
th / Thai
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: kengggg, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(4 files)
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1
Build Identifier:
Refer to bug 284265.
Since Thai fonts are appropriately set. The default character set for Thai build should be set appropriately as well.
Now the default charset of Thai build is set to "Western ISO-8859-1" which caused distorted font display when visit web page which is not specify charset, therefore the default character set of Thai build should be "Thai TIS-620".
Reproducible: Always
Steps to Reproduce:
1. Change default character encoding to "Western ISO-8859-1"
2. Visit http://www.nectec.or.th/it-standards/std620/std620.htm
3.
Actual Results:
The header of the page will be displayed as "
Áҵðҹ¼ÅÔµÀѳ±ìÍصÊÒË¡ÃÃÁ"
Expected Results:
The header of the page should be displayed as "มาตรฐานผลิตภัณฑ์อุตสาหกรรม"
Comment 2•16 years ago
|
||
I think the better default is ISO-8859-11? In practice, it's TIS-620 plus special set of characters (only few).
Also cc Samphan
Comment 3•16 years ago
|
||
TIS-620 same as Isriya.
Comment 4•16 years ago
|
||
(In reply to comment #2)
> I think the better default is ISO-8859-11? In practice, it's TIS-620 plus
> special set of characters (only few).
The only difference is character 'A0'
* ISO 8859-11 defines it as a non-breaking space
* TIS-620 leaves it undefined
For comparison, Windows 874 is also based on TIS-620,
with additions of Control Characters (e.g. shift, substitute, bell, acknowledge, delete, ...) and these "modern text" characters [1]:
80 = U+20AC : EURO SIGN
85 = U+2026 : HORIZONTAL ELLIPSIS
91 = U+2018 : LEFT SINGLE QUOTATION MARK
92 = U+2019 : RIGHT SINGLE QUOTATION MARK
93 = U+201C : LEFT DOUBLE QUOTATION MARK
94 = U+201D : RIGHT DOUBLE QUOTATION MARK
95 = U+2022 : BULLET
96 = U+2013 : EN DASH
97 = U+2014 : EM DASH
A0 = U+00A0 : NO-BREAK SPACE
As there are only two Thai character sets that have been assigned by IANA,
the 'historic' IBM-Thai (CP838) and TIS-620 [2].
As CP838 is proprietary and found on use on the Internet,
for information exchange standard sake, we should make it TIS-620.
[1] http://www.microsoft.com/globaldev/reference/sbcs/874.mspx
[2] http://www.iana.org/assignments/character-sets
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 5•16 years ago
|
||
I agree with Arthit's argument. Difference between TIS-620 and ISO-8859-11 is negligible (NON-BREAKING SPACE is rarely used in practice) so we should stick with the IANA-assigned character set (TIS-620).
Updated•16 years ago
|
Flags: blocking1.9.0.3?
Flags: blocking1.9.0.2?
Flags: blocking-thunderbird3.0b1?
Flags: blocking-firefox3.1?
Updated•16 years ago
|
Flags: blocking1.9.0.2?
Updated•16 years ago
|
Flags: blocking1.9.0.3?
Flags: blocking-thunderbird3.0b1?
Flags: blocking-firefox3.1?
Reporter | ||
Comment 6•16 years ago
|
||
I agree with Arthit and Phisite. We should stick with TIS-620.
the config file is /l10n/th/toolkit/chrome/global/intl.properties? if it correct, I will verify and submit a patch.
Comment 7•16 years ago
|
||
Simon, do we even have different implementations for the thai charsets?
Also, would it make sense to set the intl.charset.detector instead?
Comment 8•16 years ago
|
||
For decoding we treat all three Thai charsets as if they were Windows-874. For encoding we distinguish between them.
In other words, if a web page whose encoding is declared as TIS-620 or ISO-8859-11 contains e.g. 0x85, we will display it as U+2026 HORIZONTAL ELLIPSIS, but if we output TIS-620 or ISO-8859-11 we will treat U+2026 as an undefined character and will only encode it as 0x85 when explicitly outputting Windows-874.
All that said, I agree with the comments here that we should use the IANA-assigned TIS-620, if only to keep up appearances ;-)
There is currently no support for Thai in the charset detector.
Reporter | ||
Comment 9•16 years ago
|
||
Could we set TIS-620 to be a default charset in Firefox's perference?
Comment 10•16 years ago
|
||
AFAICT, the Thai localization uses TIS-620 on 1.9.1. Or is that not working for you?
Or are you talking about something else?
Reporter | ||
Comment 11•16 years ago
|
||
Axel,
my point is TIS-620 should be set as default charset in Thai L10n build. Now the default velue is ISO-8859-1.
Comment 12•16 years ago
|
||
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/th/toolkit/chrome/global/intl.properties?mark=33-33#33 says that that's the case.
Mind testing on a fresh profile? I would expect that you're having left-overs switching to ISO-8859-1
Reporter | ||
Comment 13•16 years ago
|
||
Well, from the comment #1 the step to reproduce step 1 should be start Firefox Thai L10n with a fresh profile, instead of changing default charset to ISO-8859-1.
So, I've tested with Firefox 3.0.8 Thai L10n fresh profile, the default charset is ISO-8859-1 instead of TIS-620.
Comment 14•16 years ago
|
||
Probably you should fix charset entries in http://mxr.mozilla.org/l10n/source/th/toolkit/chrome/global-platform/win/intl.properties#2,0 and mac/intl.properties#2
They still have ISO-8859-1 instead TIS-620.
Comment 15•16 years ago
|
||
Yeah, what Alexander said.
Simon, do you remember why we have that entry in both global-platform and global?
Reporter | ||
Comment 16•16 years ago
|
||
Here is the patch followed Alexander's suggestion.
I did it 3 platforms at a time.
Comment 17•14 years ago
|
||
Comment on attachment 372919 [details] [diff] [review]
Patch for setting default charset on 3 platforms.
This likely is bitrotted by now... but Axel, who would review this? Is this change still needed?
Reporter | ||
Comment 18•14 years ago
|
||
Dietrich, this change is still needed, i'm wondering if this patch is still work on fx4...
Comment 19•14 years ago
|
||
Pull the latest source and try it! If you update it very soon and we can find a reviewer, maybe we can get it in Firefox 4 :)
Reporter | ||
Comment 20•14 years ago
|
||
Sure :)
Comment 21•14 years ago
|
||
Simon, why do we actually have intl.charset.default in global/intl.properties, it seems we're only using the platform ones?
As for the review, this is a patch to the thai locale, so the folks there can just apply it.
Comment 22•14 years ago
|
||
I don't even understand why we have the platform ones in the first place, or if we still need them. Apparently you already asked me that in bug 488433 and I never answered. Let's continue that discussion there and not diffuse this bug.
Comment 23•14 years ago
|
||
There has no problem in this case with Firefox 4.0b6 on Mac OSX 10.6 . I think this bug has been solved.
Comment 24•12 years ago
|
||
This bug has no problem in current version of Firefox (en-US/Thai). Should we close this bug?
Screenshot of http://www.nectec.or.th/it-standards/std620/std620.htm in Firefox 20.0.1: http://imgur.com/PWxG2OQ
Comment 25•12 years ago
|
||
Cool! Do we know what fixed it?
Comment 26•12 years ago
|
||
WFM per comments 23 & 24.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•