Closed Bug 953946 Opened 11 years ago Closed 6 years ago

Implement automatic character set decoding

Categories

(Chat Core :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: clokep, Unassigned)

References

()

Details

*** Original post on bio 509 at 2010-09-15 19:36:00 UTC *** This would be useful (at least in IRC) where the specification does not necessarily specify a character encoding to use for sent messages nor does it provide the character encoding of each message. In order to accurately display the messages from a variety of languages in the same conversation it is necessary to guess the proper charset. A suggestion from Mic: we could possibly build a log for each user of all of their messages. This should increase the accuracy of our guess at that particular charset. Useful links: Summary of Charset Detectors - http://www.mozilla.org/projects/intl/chardet.html Using Universal Charset Detector from Mozilla in a standalone project - http://www.mozilla.org/projects/intl/detectorsrc.html Paper on the theory - http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html There's some code available (neither is scriptable): Universal charset detector - http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/ Charset detector - http://mxr.mozilla.org/mozilla-central/source/intl/chardet/ Mozilla Bugs of interest: Universal autodetect needs to be on by default - https://bugzilla.mozilla.org/show_bug.cgi?id=264871 Remove the universal charset detector - https://bugzilla.mozilla.org/show_bug.cgi?id=361289 Unfortunately a lot of the code seems really old, possibly contacting the Mozilla Internationalization team might be able to get some more information?
*** Original post on bio 509 at 2010-09-16 07:57:42 UTC *** Hey, you have done some great research on this! :) From a quick look, I think we will want to make the nsIStringCharsetDetector interface (http://mxr.mozilla.org/mozilla-central/source/intl/chardet/public/nsIStringCharsetDetector.h) or at least understand why it's currently not scriptable. Asking Mozilla developers on IRC is likely to help.
*** Original post on bio 509 at 2010-12-14 17:58:26 UTC *** This does not block my JS-IRC work, it would just be nice to have.
No longer blocks: 953944
I don't think we've really had anyone ask for this. I think most IRC networks are UTF-8 now anyway, which is what we use by default. If someone needs this, please speak up!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.