Closed
Bug 953946
Opened 11 years ago
Closed 6 years ago
Implement automatic character set decoding
Categories
(Chat Core :: General, enhancement)
Chat Core
General
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: clokep, Unassigned)
References
()
Details
*** Original post on bio 509 at 2010-09-15 19:36:00 UTC ***
This would be useful (at least in IRC) where the specification does not necessarily specify a character encoding to use for sent messages nor does it provide the character encoding of each message. In order to accurately display the messages from a variety of languages in the same conversation it is necessary to guess the proper charset.
A suggestion from Mic: we could possibly build a log for each user of all of their messages. This should increase the accuracy of our guess at that particular charset.
Useful links:
Summary of Charset Detectors - http://www.mozilla.org/projects/intl/chardet.html
Using Universal Charset Detector from Mozilla in a standalone project - http://www.mozilla.org/projects/intl/detectorsrc.html
Paper on the theory - http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
There's some code available (neither is scriptable):
Universal charset detector - http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/
Charset detector - http://mxr.mozilla.org/mozilla-central/source/intl/chardet/
Mozilla Bugs of interest:
Universal autodetect needs to be on by default - https://bugzilla.mozilla.org/show_bug.cgi?id=264871
Remove the universal charset detector - https://bugzilla.mozilla.org/show_bug.cgi?id=361289
Unfortunately a lot of the code seems really old, possibly contacting the Mozilla Internationalization team might be able to get some more information?
Comment 1•11 years ago
|
||
*** Original post on bio 509 at 2010-09-16 07:57:42 UTC ***
Hey, you have done some great research on this! :)
From a quick look, I think we will want to make the nsIStringCharsetDetector interface (http://mxr.mozilla.org/mozilla-central/source/intl/chardet/public/nsIStringCharsetDetector.h) or at least understand why it's currently not scriptable. Asking Mozilla developers on IRC is likely to help.
Reporter | ||
Comment 2•11 years ago
|
||
*** Original post on bio 509 at 2010-12-14 17:58:26 UTC ***
This does not block my JS-IRC work, it would just be nice to have.
No longer blocks: 953944
Reporter | ||
Comment 3•6 years ago
|
||
I don't think we've really had anyone ask for this. I think most IRC networks are UTF-8 now anyway, which is what we use by default.
If someone needs this, please speak up!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•