Closed Bug 802030 Opened 12 years ago Closed 11 years ago

Stop treating us-ascii, iso-8859-1, and Windows-1252 as distinct encodings

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 936466

People

(Reporter: hsivonen, Unassigned)

References

(Blocks 1 open bug)

Details

At present, our character encoding infrastructure treats iso-8859-1 and Windows-1252 as distinct encodings even though they have identical decoders and having a true iso-8859-1 encoder is kind of pointless. In the Encoding Standard, iso-8859-1 is merely an alias for Windows-1252. We should get rid of the separate iso-8859-1 encoding and make its labels aliases for Windows-1252. Risk: it is possible that there exists a site that reads an encoding label supplied by Gecko and expects it to say iso-8859-1 and 10 deal if it says Windows-1252.
(In reply to Henri Sivonen (:hsivonen) from comment #0) > and 10 deal and can't deal
Blocks: 562096
This also applies to the following: * iso-8859-11 is the same as windows-874 in the spec and in IE/WebKit. * tis-620 is the same as windows-874 in the spec and in IE/WebKit. * us-ascii is the same as windows-1252 in the spec, but not in any browser. * iso-8859-9 is the same as windows-1254 in the spec and WebKit, but not in IE. * gbk is the same as gb2312 in the spec and WebKit, but not in IE. * big5-hkscs is the same as big5 in the spec and IE, but not in WebKit. * euc-kr is the same as x-windows-949 in the spec and in IE/WebKit. * iso-8859-6-e and iso-8859-6-i are the same as iso-8859-6 in the spec and WebKit. IE seems not to recognize them at all. * iso-8859-8-e is the same as iso-8859-8 in the spec and WebKit. IE seems not to recognize it. Some or all of these should probably be in different bugs, though. In particular, all of them except iso-8859-9/windows-1254 are already implemented in at least one browser, so should be safer than this.
I should add that the data from the previous comment comes only from .characterSet, and didn't involve analysis of encoders or decoders. But I hope that if .characterSet is the same in a browser, the encoder/decoder is the same too.
And I also should add that by "WebKit" I mean "Chrome 23 dev". Anne tells me Safari uses a different ICU version.
This should cover us-ascii too. I'll open a new bug for the other ones, since they're more likely to be web-compatible.
Summary: Stop treating iso-8859-1 and Windows-1252 as distinct encodings → Stop treating us-ascii, iso-8859-1, and Windows-1252 as distinct encodings
The browser side label handling is done. Blocking on mailnews as far as getting rid of the extra code goes.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.