Closed Bug 1177830 Opened 9 years ago Closed 9 years ago

remove obsolete Chinese encoding options

Categories

(MailNews Core :: Internationalization, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX
Thunderbird 45.0

People

(Reporter: mkmelin, Unassigned)

Details

(Whiteboard: [relnote])

Attachments

(1 file, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1174580 +++ Bug 1174580 comment 61: "The latest build wfm. And setting GB18030 as the outgoing default in Display-Formatting-Advanced makes replies also encode properly and roundtrip fine. However, using the GB2312 [edit: I assume GBK was meant here] menuitem causes encoding in UTF8; is there any reason that option is still kept, given 1) it doesn't do what it advertises, 2) it's a subset of 18030 and officially superseded by it anyway. And for incoming, the label is GBK, which means 18030 but doesn't say it and isn't consistent with outgoing, thus confusing. To be sure, the advice of the mozilla zh-cn localizer should be sought." Bug 1174580 comment 7-8: The gb18030 decoder is a superset of the old gbk decoder, which is why Gecko *decodes* content labeled as GB2312, gbk and gb18030 *exactly* the same way (as gb18030). Shaohua Wen: any opinion on this?
I think it is true that there is no need to keep the GBK option. GB18030 is sufficient.
Attached patch gbk.patch (obsolete) (deleted) — — Splinter Review
Assignee: nobody → alta88
Attachment #8690192 - Flags: review?(mkmelin+mozilla)
Comment on attachment 8690192 [details] [diff] [review] gbk.patch Review of attachment 8690192 [details] [diff] [review]: ----------------------------------------------------------------- Please also adjust MigrateDefaultCharsets(), bump the version... and make it check a version. And also adjust the suite/ version of the properties file the same way.
Attachment #8690192 - Flags: review?(mkmelin+mozilla)
Attached patch gbk.patch (deleted) — — Splinter Review
@ewong, for suite strings part.
Attachment #8690192 - Attachment is obsolete: true
Attachment #8690408 - Flags: review?(mkmelin+mozilla)
Attachment #8690408 - Flags: review?(ewong)
Comment on attachment 8690408 [details] [diff] [review] gbk.patch Review of attachment 8690408 [details] [diff] [review]: ----------------------------------------------------------------- Aside for the big5-hkscs.title removal question, the others look ok. ::: mail/locales/en-US/chrome/messenger/charsetTitles.properties @@ -35,2 @@ > big5.title = Chinese Traditional (Big5) > -big5-hkscs.title = Chinese Traditional (Big5-HKSCS) Out of curiosity, is there a reason why you're also removing Big5-HKSCS? ::: suite/locales/en-US/chrome/mailnews/charsetTitles.properties @@ -35,2 @@ > big5.title = Chinese Traditional (Big5) > -big5-hkscs.title = Chinese Traditional (Big5-HKSCS) Why does this need to be removed?
(In reply to Edmund Wong (:ewong) from comment #5) > Comment on attachment 8690408 [details] [diff] [review] > gbk.patch Note that this doesn't deal with the case where someone has chosen gbk in per-server NNTP settings. Also, it's unclear if using gb18030 provides any compatibility benefit over UTF-8. The theory was that some ancient MUA might understand GB2312 but not UTF-8. If you no longer want to label outgoing email as GB2312 or gbk, it doesn't follow that using gb18030 makes sense (as opposed to just using UTF-8 already). As usual, reiterating bug 862292 and especially bug 862292 comment 14. > ::: mail/locales/en-US/chrome/messenger/charsetTitles.properties > @@ -35,2 @@ > > big5.title = Chinese Traditional (Big5) > > -big5-hkscs.title = Chinese Traditional (Big5-HKSCS) > > Out of curiosity, is there a reason why you're also removing Big5-HKSCS? Big5-HKSCS went away in bug 912470.
(In reply to Henri Sivonen (:hsivonen) from comment #6) > (In reply to Edmund Wong (:ewong) from comment #5) > > Comment on attachment 8690408 [details] [diff] [review] > > gbk.patch > > Note that this doesn't deal with the case where someone has chosen gbk in > per-server NNTP settings. No, nor any other folder where encoding has been set in Properties. I don't know that in the past any per folder migration code has been used when removing encodings. It would involve opening and testing every folder for everyone. A relnote and self help might suffice. Magnus? > > Also, it's unclear if using gb18030 provides any compatibility benefit over > UTF-8. The theory was that some ancient MUA might understand GB2312 but not > UTF-8. If you no longer want to label outgoing email as GB2312 or gbk, it > doesn't follow that using gb18030 makes sense (as opposed to just using > UTF-8 already). > > As usual, reiterating bug 862292 and especially bug 862292 comment 14. > The goal of this patch is to make the labels reflect the latest gbk variant, not make a policy decision on removing gbk for outgoing. Believe me, I personally think it should be all utf8 all the time. But unilaterally pulling gbk without at least getting input on ramifications from a zh-CN locale expert doesn't seem right. Shaohua Wen, what do you think about removing gb as an outgoing mail encoding and using only utf8?
Flags: needinfo?(wenbins)
(In reply to alta88 from comment #7) > Shaohua Wen, what do you think about removing gb as an outgoing mail > encoding and using only utf8? GB18030 is the current official standard, I think we should keep it.
(In reply to alta88 from comment #7) > No, nor any other folder where encoding has been set in Properties. I don't > know that in the past any per folder migration code has been used when > removing encodings. It would involve opening and testing every folder for > everyone. A relnote and self help might suffice. Magnus? Yeah we seem to have gotten away with that. Not a single complaint AFAIR. I think the per-folder encoding settings is a misfeature at least for mail and should be removed. The only potentially legitimate use I've heard is for some old broken chinese/russian(?) news servers - but that may be just hearsay... > > As usual, reiterating bug 862292 and especially bug 862292 comment 14. Joshua has a complete send-code rewrite (in js) in the works. I don't think there's much stopping us from going all-utf8 for sending except there's a fair amount of code to purge/touch for that, with a large part of that likely going drown the drain in the rewrite, so basically status quo has been time best spent.
Attachment #8690408 - Flags: review?(mkmelin+mozilla) → review+
(In reply to Shaohua Wen from comment #8) > (In reply to alta88 from comment #7) > > > Shaohua Wen, what do you think about removing gb as an outgoing mail > > encoding and using only utf8? > > GB18030 is the current official standard, I think we should keep it. The official standard is irrelevant. The relevant question is: Does there exist a significant population of recipients whose MUA can ingest gb18030-encoded email labeled as gb18030 but cannot ingest UTF-8-encoded email labeled as UTF-8? (In reply to alta88 from comment #7) > Believe me, I personally think it should be all utf8 all the time. But > unilaterally pulling gbk without at least getting input on ramifications > from a zh-CN locale expert doesn't seem right. Thunderbird has significant technical debt to deal with. It doesn't make sense to keep tweaking features that no longer have value (sending email in non-UTF-8 encodings or configuring what encoding to use for sending)--especially when, as Magnus says, this stuff will soon be purged.
(In reply to Henri Sivonen (:hsivonen) from comment #10) > Does there exist a significant population of recipients whose MUA can ingest > gb18030-encoded email labeled as gb18030 but cannot ingest UTF-8-encoded > email labeled as UTF-8? No, I believe any modern email clients are able to ingest utf-8 encoded email.
(In reply to Shaohua Wen from comment #11) > (In reply to Henri Sivonen (:hsivonen) from comment #10) > > Does there exist a significant population of recipients whose MUA can ingest > > gb18030-encoded email labeled as gb18030 but cannot ingest UTF-8-encoded > > email labeled as UTF-8? > No, I believe any modern email clients are able to ingest utf-8 encoded > email. The question I'm getting at, apparently poorly, is whether there is an official requirement for email communications in the PRC to be gb18030 encoded (to the exclusion of utf8), or a widespread practice and technical inability to communicate or transact with PRC organizations in anything other than gb18030.
Flags: needinfo?(luoyonggang)
(In reply to alta88 from comment #12) > The question I'm getting at, apparently poorly, is whether there is an > official requirement for email communications in the PRC to be gb18030 > encoded (to the exclusion of utf8), or a widespread practice and technical > inability to communicate or transact with PRC organizations in anything > other than gb18030. AFAIK there is no such official requirement, or I've never heard such thing. But there are some very old Chinese news group servers, that can only communicate with emails encoded with gb2312/gb18030.
I believe this is not a good idea, the clients that supports utf8 is more than gb 18030, and the clients supports gb2312/gbk js more than utf8
Flags: needinfo?(luoyonggang)
(In reply to Yonggang Luo from comment #14) > I believe this is not a good idea, the clients that supports utf8 is more > than gb 18030, and the clients supports gb2312/gbk js more than utf8 I don't think any consideration whatsoever should be given to clients that support gb2312/gbk but not its successor gb18030, published almost 10 yrs ago. I agree with Magnus that the compose rewrite will best deal with a utf8 only policy. @ewong, review ping? If suite wants to keeps the obsolete strings, I'll remove the suite part from the patch.
Comment on attachment 8690408 [details] [diff] [review] gbk.patch Looks good.
Attachment #8690408 - Flags: review?(ewong) → review+
Relnote: The following Chinese character encodings are obsolete: GB2312 and GBK (superseded by GB18030); Big5-HKSCS (superseded by Big5). News (nntp) servers should be updated in Account Settings, Server Settings to a supported Default Text Encoding. Any account subfolder should be updated in folder Properties to a supported Fallback Text Encoding.
Keywords: checkin-needed
Whiteboard: [relnote]
Summary: consider removing Chinese Simplified (GBK) from outgoing mail encoding options → remove obsolete Chinese encoding options
(In reply to Magnus Melin from comment #9) > Joshua has a complete send-code rewrite (in js) in the works. I don't think > there's much stopping us from going all-utf8 for sending except there's a > fair amount of code to purge/touch for that, with a large part of that > likely going drown the drain in the rewrite, so basically status quo has > been time best spent. I've been told that the JP locale still desires it's please-mojibake-me ISO-2022-JP support, so the rewrite will support non-UTF-8 bodies, although quite possibly not to the same degree that it does now.
Status: NEW → RESOLVED
Closed: 9 years ago
Keywords: checkin-needed
OS: Unspecified → All
Hardware: Unspecified → All
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 45.0
(In reply to alta88 from comment #15) > I don't think any consideration whatsoever should be given to clients that > support gb2312/gbk but not its successor gb18030, published almost 10 yrs > ago. It's fine if you don't want to care about clients that only support gbk anymore, but changing to gb18030 for sending instead of changing to UTF-8 makes no sense. GB18030 postdates UTF-8. There's no inherent value in UTF-8 avoidance. The reason for UTF-8 avoidance is compatibility. When GB18030 isn't more compatible than UTF-8, you should use UTF-8. It's an error to assume that you have to have specific non-UTF-8 menu items for Chinese forever just for the sake of having menu items for Chinese. (The menu item for Armenian didn't make sense and got removed, for example.) > I agree with Magnus that the compose rewrite will best deal with a utf8 only > policy. So why did this patch land, then? I think it's reprehensible to create a new non-UTF-8 configuration to complicate the ecosystem with. Also, if TB is trying to move from XUL + XPCOM JS to becoming a (local) HTML + Web API JS app, tweaking the non-UTF-8 sending options (as opposed to going to UTF-8-only for sending right away) seems like rearranging the deck chairs. (In reply to Joshua Cranmer [:jcranmer] from comment #18) > I've been told that the JP locale still desires it's please-mojibake-me > ISO-2022-JP support, so the rewrite will support non-UTF-8 bodies, although > quite possibly not to the same degree that it does now. Seems like a bad idea to cater to such desires when Gmail no longer does.
(In reply to Henri Sivonen (:hsivonen) from comment #20) > Seems like a bad idea to cater to such desires when Gmail no longer does. Sadly, people already complains about that. Example: https://productforums.google.com/forum/#!topic/gmail-ja/ONkkit3FULU;context-place=topicsearchin/gmail-ja/category$3Afirefox Their current workaround is "Use Thunderbird". Although Apple Mail.app dropped non-UTF-8 composing, A plug-in to re-enable iso-2022-jp is still developped to avoid mojibake: https://osdn.jp/projects/letter-fix/
(In reply to Henri Sivonen (:hsivonen) from comment #20) > (In reply to alta88 from comment #15) > > I don't think any consideration whatsoever should be given to clients that > > support gb2312/gbk but not its successor gb18030, published almost 10 yrs > > ago. > > It's fine if you don't want to care about clients that only support gbk > anymore, but changing to gb18030 for sending instead of changing to UTF-8 > makes no sense. GB18030 postdates UTF-8. There's no inherent value in UTF-8 > avoidance. The reason for UTF-8 avoidance is compatibility. When GB18030 > isn't more compatible than UTF-8, you should use UTF-8. It's an error to > assume that you have to have specific non-UTF-8 menu items for Chinese > forever just for the sake of having menu items for Chinese. (The menu item > for Armenian didn't make sense and got removed, for example.) > The error, however, is entirely your error in assuming 'menu items in Chinese' is of any concern here. > > I agree with Magnus that the compose rewrite will best deal with a utf8 only > > policy. > > So why did this patch land, then? > > I think it's reprehensible to create a new non-UTF-8 configuration to > complicate the ecosystem with. > I suggest you reread comment 7. The label is merely being changed to reflect the encoding actually being used, according to your own self in the parent bug. Meaning a request to encode in gbk/gb2312 is really emitting gb18030. Is this incorrect? Then, I further suggest you knock it off with the hyperventilationary tone. > Also, if TB is trying to move from XUL + XPCOM JS to becoming a (local) HTML > + Web API JS app, tweaking the non-UTF-8 sending options (as opposed to > going to UTF-8-only for sending right away) seems like rearranging the deck > chairs. > > (In reply to Joshua Cranmer [:jcranmer] from comment #18) > > I've been told that the JP locale still desires it's please-mojibake-me > > ISO-2022-JP support, so the rewrite will support non-UTF-8 bodies, although > > quite possibly not to the same degree that it does now. > > Seems like a bad idea to cater to such desires when Gmail no longer does. Killing usage for a large number of users in this locale, regardless of how righteous your desire may be, seems like a bad idea. Worse (and I'd even say embarrassing) is using gmail as some sort of beacon on the hill for your argument.
(In reply to alta88 from comment #22) > The label is merely being changed to > reflect the > encoding actually being used, according to your own self in the parent bug. > Meaning > a request to encode in gbk/gb2312 is really emitting gb18030. Is this > incorrect? Incorrect, yes. When you request gbk/gb2312 *decode*, you get the same behavior as by requesting gb18030 *decode*. When you request gbk *encode*, you get different behavior compared to requesting gb18030 *encode*. So changing the pref for outgoing from gbk to gb18030 changes behavior and changing the pref for incoming from gbk to gb18030 changes behavior when replying to unlabeled messages. > Worse (and I'd even say > embarrassing) is using gmail as some sort of beacon on the hill for your > argument. The point is that major MUAs have already taken the lead to force recipients to deal. This means that those MUAs already believe they can get away with it, and, more to the point, unless all Japanese senders migrate away from Gmail and (unextended) Mail.app, broken gateways have to be fixed regardless of what TB does at which point TB gets no value out of UTF-8 avoidance.
(In reply to Henri Sivonen (:hsivonen) from comment #23) > (In reply to alta88 from comment #22) > > > Worse (and I'd even say > > embarrassing) is using gmail as some sort of beacon on the hill for your > > argument. > > The point is that major MUAs have already taken the lead to force recipients > to deal. This means that those MUAs already believe they can get away with > it, and, more to the point, unless all Japanese senders migrate away from > Gmail and (unextended) Mail.app, broken gateways have to be fixed regardless > of what TB does at which point TB gets no value out of UTF-8 avoidance. At least 1 major client doesn't support the encoding and there may be some gateways, likely to not include any in ja, that break. But imo this is an important point for the locale, as they've said many times, and it's highly unfriendly to unilaterally remove it. And pulling a thing that works is very different than never implementing the thing. If you don't want to support it (unclear what that really means after your encodings renovation work), you don't have to. If no one wants to support it in mailnews/, then it's up to the locale to do so. But 'I do not want to support it' should not mean 'you will not use it' if there are others who will. And really, Tb gets no value from.. having users, does it? Since ISO-2022-JP is clearly works on some level and is of value to Tb's ja users, you probably mean something else there.
(In reply to Henri Sivonen (:hsivonen) from comment #23) > (In reply to alta88 from comment #22) > > The label is merely being changed to > > reflect the > > encoding actually being used, according to your own self in the parent bug. > > Meaning > > a request to encode in gbk/gb2312 is really emitting gb18030. Is this > > incorrect? > > Incorrect, yes. When you request gbk/gb2312 *decode*, you get the same > behavior as by requesting gb18030 *decode*. When you request gbk *encode*, > you get different behavior compared to requesting gb18030 *encode*. So > changing the pref for outgoing from gbk to gb18030 changes behavior and > changing the pref for incoming from gbk to gb18030 changes behavior when > replying to unlabeled messages. > I see, that's my misunderstanding then, sorry. Let sleeping dogs lie.. Magnus should we backout?
Flags: needinfo?(mkmelin+mozilla)
(In reply to Henri Sivonen (:hsivonen) from comment #23) > unless all Japanese senders migrate away from > Gmail and (unextended) Mail.app, broken gateways have to be fixed regardless > of what TB does at which point TB gets no value out of UTF-8 avoidance. Not all senders have to migrate because not all senders send a mail to utf-8-unaware recipients.
Yes I guess we should back this out then.
Flags: needinfo?(mkmelin+mozilla)
I'm tracking just to make sure this does not get lost, so remove the tracking when the backout is done.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: alta88 → nobody
Flags: needinfo?(wenbins)
Nothing more to do here then.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Keywords: regression
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: