"Edit as new message" not working of addressing header contains raw UTF-8, was: Non-ASCII characters in e-mail address don't survive round trip of "save draft" and "edit draft"
Categories
(Thunderbird :: Message Compose Window, defect, P2)
Tracking
(thunderbird_esr78 fixed, thunderbird80 fixed)
People
(Reporter: jorgk-bmo, Assigned: jorgk-bmo)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
jorgk-bmo
:
review+
gds
:
feedback+
wsmwk
:
approval-comm-beta+
wsmwk
:
approval-comm-esr78+
|
Details | Diff | Splinter Review |
Gene detected this in bug 1563891 and tracked it down to this regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f83f2771414cf5938a46f44e1b11cdcd5181ea0f&tochange=192e0e33eb597e8d923eb89f6d49bf42654e9d11
See bug 1563891 comment #35
STR:
Compose a message to
a) öö@öö.com
b) jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ
Save draft, edit draft.
a) Recipient lost
b) Recipient corrupted: j<?>ran@bl<?>b<?>rsyltet<?>y.gulbrandsen.priv.n<?> (where <?> is the UTF-8 replacement character that BMO doesn't display)
You will notice that in case a) there is no To: header at all, in case b) we have To: jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ in the message encoded in UTF-8.
Gene tracked it down to a truncation that happens from a "high bit" UTF-16 character like ǿ which is U+01FF, down to U+00FF in the JS engine, so that smells like something running through an IDL-defined API of type string
. Apparently moving some of those APIs in attachment 9169034 [details] [diff] [review] of the said bug doesn't help. I noticed that the recipients are all AString in nsIMsgCompFields.idl, so hard to tell where the information loss happens.
The round trip was working in TB 68. Whilst this is not a terrible issue for TB 78.x, it's blocking further development in bug 1563891.
Comment 1•4 years ago
|
||
(In reply to Jorg K (CEST = GMT+2) from comment #0)
Gene detected this in bug 1563891 and tracked it down to this regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f83f2771414cf5938a46f44e1b11cdcd5181ea0f&tochange=192e0e33eb597e8d923eb89f6d49bf42654e9d11
See bug 1563891 comment #35
Actually, the reporter of bug 1563891 Alessandro V. originally found the problem.
STR:
Compose a message to
a) öö@öö.com
b) jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ
Save draft, edit draft.
a) Recipient lost
b) Recipient corrupted: j<?>ran@bl<?>b<?>rsyltet<?>y.gulbrandsen.priv.n<?> (where <?> is the UTF-8 replacement character that BMO doesn't display)You will notice that in case a) there is no To: header at all, in case b) we have To: jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ in the message encoded in UTF-8.
I tried with both a) and b) as two (2) TO fields and see them both on edit after saving the draft. Same with just a single TO of a) or b). Of course the non-ascii chars are replaced with <?> as you show. This is with about Jul 26 trunk and with your patch attachment 9169034 [details] [diff] [review] so that might matter.
Gene tracked it down to a truncation that happens from a "high bit" UTF-16 character like ǿ which is U+01FF, down to U+00FF in the JS engine, so that smells like something running through an IDL-defined API of type
string
. Apparently moving some of those APIs in attachment 9169034 [details] [diff] [review] of the said bug doesn't help. I noticed that the recipients are all AString in nsIMsgCompFields.idl, so hard to tell where the information loss happens.
AFAICT, the problem is not exclusively caused by the high-byte getting chopped which only affect chars with code point U+01FF and higher. I see this getting called twice: https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/mime/jsmime/jsmime.js#684. The first call returns properly encoded UTF-16 (converted from UTF-8). The 2nd call receives UTF-16 as input and fails when trying to convert any non-ascii from UTF-8 to UTF-16 resulting in the <?> chars displayed. The 2nd (redundant?) call is not a problem when the address to convert to UTF-16 is pure ascii.
The round trip was working in TB 68. Whilst this is not a terrible issue for TB 78.x, it's blocking further development in bug 1563891.
Yes, the fix for bug 1563891 (support SMTPUTF8) allows the recipient address to be non-ascii which reveals the problem.
Assignee | ||
Comment 2•4 years ago
|
||
Gene, maybe you're still around today. I took a look here:
https://hg.mozilla.org/comm-central/rev/019eaf0f9cf72621a436ce06b2607bc5a3532fae#l6.264
Can you revert this to the original, like restore the, for example, let msgReplyTo = msgCompFields.replyTo
. I can't see why the parseEncodedHeader()
call is necessary when it wasn't necessary before. It will lead to decoding raw UTF-8 from a header (encoded header) to a UTF-16 JS string. But I assume msgCompFields.replyTo
is already a UTF-16 JS string, so why decode and parse it again? However, should parsing be required at that point, try parseDecodedHeader()
instead since the JS string already has a decoded header and no longer an UTF-8 header in it.
So two options: Original code, or parseDecodedHeader()
.
I can try it in the morning unless you get there first.
Assignee | ||
Comment 3•4 years ago
|
||
Looking at the context, it looks like parsing is necessary if you have multiple To: or Cc/Bcc: recipients, see the old code here:
https://hg.mozilla.org/comm-central/rev/019eaf0f9cf72621a436ce06b2607bc5a3532fae#l6.294
So I guess, switching to parseDecodedHeader()
is the go here. There are intricacies of transporting UTF-8 strings as single bytes in UTF-16 JS strings and it's important to use the right parse function. There's actually not much difference between them until you reach one of those UTF-8/UTF-16 points where you need to know exactly what the string contains since interpreting a JS string (UTF-16) as raw UTF-8 is fatal. So it looks like the "2nd call" you've described is the culprit and I bet you'll find that it's gone after the change I suggested.
For more confusing reading I suggest bug 1390337 ;-)
Comment 4•4 years ago
|
||
Yes, I think you found the problem! Work fine now with just one call to the convert8BitHeader()
. One of my earlier attempt to fix the problem was to change this https://searchfox.org/comm-central/rev/45c9a2810df38a7d8b4a9a396e244a9b438f3729/mailnews/mime/src/mimeParser.jsm#235 so that the call to convert8BitHeader()
is skipped when the flag indicates an address. But that's what using the calls to parseDecodedHeader() instead of parseEncodedHeader() does so it looks like the correct fix. (And my fix failed under some conditions.)
Also, with this change not seeing the chopped off upper byte for chars like U+01FF. Didn't test with all the possible field types but OK with TO (including multiple TOs) and BCC.
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
Comment on attachment 9169356 [details] [diff] [review]
fixes-reflected-addresses.diff
Thanks, Gene.
Assignee | ||
Updated•4 years ago
|
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/345cbe13bb71
Correct use of header parsing function in CompFields2Recipients() after bug 440377. r=jorgk DONTBUILD
Assignee | ||
Comment 7•4 years ago
|
||
[Approval Request Comment]
Regression caused by (bug #): Bug 440377
User impact if declined: "Edit Draft" or "Edit as new message" will result in garbled recipients if recipients field contains raw UTF-8 according to https://tools.ietf.org/html/rfc6532.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky): This patch corrects a coding error which slipped through the review of bug 440377. Having this fix is also a pre-condition of getting bug 1563891 implemented.
Assignee | ||
Updated•4 years ago
|
Comment 8•4 years ago
|
||
Unfortunately this broke browser_replyAddresses.js. Here's a sample log.
Backed out:
https://hg.mozilla.org/comm-central/rev/034b7e5d52f27b9a3e77048b661297d80883b095
Updated•4 years ago
|
Assignee | ||
Comment 9•4 years ago
|
||
Sorry about that. I did take a look, but obviously I missed it (in a sea of orange) :-( - I can't see it on debug runs.
OK, so the issue is more difficult, parseEncodedHeader() is the wrong function since it will consider the input as raw UTF-8 string and go through here https://searchfox.org/comm-central/rev/51b7d1ea0a96a8c4d52812ac3e27360b2da4b91c/mailnews/mime/src/mimeParser.jsm#238 which is undesired. But the test shows that we want RFC 2047 decoding. I'll have to think of something.
Assignee | ||
Comment 10•4 years ago
|
||
Actually, I see it now. Gene, instead of using parseEncodedHeader() use parseEncodedHeaderW() - W for wide string, so we pass in a wide UTF-16 string. That will work.
Assignee | ||
Comment 11•4 years ago
|
||
OK, now with parseEncodedHeaderW():
Assignee | ||
Comment 12•4 years ago
|
||
Here's the description of the function:
https://searchfox.org/comm-central/rev/51b7d1ea0a96a8c4d52812ac3e27360b2da4b91c/mailnews/mime/public/nsIMsgHeaderParser.idl#97
Comment 13•4 years ago
|
||
Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch
Looks good. I tested against the original problem and still fixes it.
Comment 14•4 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/27513002dc77
Correct use of header parsing function in CompFields2Recipients() after bug 440377. r=jorgk
Assignee | ||
Comment 15•4 years ago
|
||
Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch
[Approval Request Comment]
Regression caused by (bug #): Bug 440377
User impact if declined: "Edit Draft" or "Edit as new message" will result in garbled recipients if recipients field contains raw UTF-8 according to https://tools.ietf.org/html/rfc6532.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky): This patch corrects a coding error which slipped through the review of bug 440377. Having this fix is also a pre-condition of getting bug 1563891 implemented.
Comment 16•4 years ago
|
||
Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch
[Triage Comment]
Approved for beta
Comment 17•4 years ago
|
||
Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch
[Triage Comment]
Approved for esr78 (hasn't gone through beta, but simple enough)
Comment 18•4 years ago
|
||
bugherder uplift |
Thunderbird 80.0b4:
https://hg.mozilla.org/releases/comm-beta/rev/32f1b6610144
Comment 19•4 years ago
|
||
bugherder uplift |
Thunderbird 78.2.0:
https://hg.mozilla.org/releases/comm-esr78/rev/4a26e69e448d
Description
•