1658361 - "Edit as new message" not working of addressing header contains raw UTF-8, was: Non-ASCII characters in e-mail address don't survive round trip of "save draft" and "edit draft"

Assignee

Description

•

4 years ago

Gene detected this in bug 1563891 and tracked it down to this regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f83f2771414cf5938a46f44e1b11cdcd5181ea0f&tochange=192e0e33eb597e8d923eb89f6d49bf42654e9d11
See bug 1563891 comment #35

STR:
Compose a message to
a) öö@öö.com
b) jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ
Save draft, edit draft.
a) Recipient lost
b) Recipient corrupted: j<?>ran@bl<?>b<?>rsyltet<?>y.gulbrandsen.priv.n<?> (where <?> is the UTF-8 replacement character that BMO doesn't display)

You will notice that in case a) there is no To: header at all, in case b) we have To: jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ in the message encoded in UTF-8.

Gene tracked it down to a truncation that happens from a "high bit" UTF-16 character like ǿ which is U+01FF, down to U+00FF in the JS engine, so that smells like something running through an IDL-defined API of type string. Apparently moving some of those APIs in attachment 9169034 [details] [diff] [review] of the said bug doesn't help. I noticed that the recipients are all AString in nsIMsgCompFields.idl, so hard to tell where the information loss happens.

The round trip was working in TB 68. Whilst this is not a terrible issue for TB 78.x, it's blocking further development in bug 1563891.

Flags: needinfo?(alessandro)

gene smith

Comment 1

•

4 years ago

(In reply to Jorg K (CEST = GMT+2) from comment #0)

Gene detected this in bug 1563891 and tracked it down to this regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f83f2771414cf5938a46f44e1b11cdcd5181ea0f&tochange=192e0e33eb597e8d923eb89f6d49bf42654e9d11
See bug 1563891 comment #35

Actually, the reporter of bug 1563891 Alessandro V. originally found the problem.

STR:
Compose a message to
a) öö@öö.com
b) jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ
Save draft, edit draft.
a) Recipient lost
b) Recipient corrupted: j<?>ran@bl<?>b<?>rsyltet<?>y.gulbrandsen.priv.n<?> (where <?> is the UTF-8 replacement character that BMO doesn't display)

You will notice that in case a) there is no To: header at all, in case b) we have To: jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ in the message encoded in UTF-8.

I tried with both a) and b) as two (2) TO fields and see them both on edit after saving the draft. Same with just a single TO of a) or b). Of course the non-ascii chars are replaced with <?> as you show. This is with about Jul 26 trunk and with your patch attachment 9169034 [details] [diff] [review] so that might matter.

Gene tracked it down to a truncation that happens from a "high bit" UTF-16 character like ǿ which is U+01FF, down to U+00FF in the JS engine, so that smells like something running through an IDL-defined API of type string. Apparently moving some of those APIs in attachment 9169034 [details] [diff] [review] of the said bug doesn't help. I noticed that the recipients are all AString in nsIMsgCompFields.idl, so hard to tell where the information loss happens.

AFAICT, the problem is not exclusively caused by the high-byte getting chopped which only affect chars with code point U+01FF and higher. I see this getting called twice: https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/mime/jsmime/jsmime.js#684. The first call returns properly encoded UTF-16 (converted from UTF-8). The 2nd call receives UTF-16 as input and fails when trying to convert any non-ascii from UTF-8 to UTF-16 resulting in the <?> chars displayed. The 2nd (redundant?) call is not a problem when the address to convert to UTF-16 is pure ascii.

The round trip was working in TB 68. Whilst this is not a terrible issue for TB 78.x, it's blocking further development in bug 1563891.

Yes, the fix for bug 1563891 (support SMTPUTF8) allows the recipient address to be non-ascii which reveals the problem.

Jorg K (CEST = GMT+2)

Assignee

Comment 2

•

4 years ago

Gene, maybe you're still around today. I took a look here:
https://hg.mozilla.org/comm-central/rev/019eaf0f9cf72621a436ce06b2607bc5a3532fae#l6.264
Can you revert this to the original, like restore the, for example, let msgReplyTo = msgCompFields.replyTo. I can't see why the parseEncodedHeader() call is necessary when it wasn't necessary before. It will lead to decoding raw UTF-8 from a header (encoded header) to a UTF-16 JS string. But I assume msgCompFields.replyTo is already a UTF-16 JS string, so why decode and parse it again? However, should parsing be required at that point, try parseDecodedHeader() instead since the JS string already has a decoded header and no longer an UTF-8 header in it.

So two options: Original code, or parseDecodedHeader().

I can try it in the morning unless you get there first.

Jorg K (CEST = GMT+2)

Assignee

Comment 3

•

4 years ago

Looking at the context, it looks like parsing is necessary if you have multiple To: or Cc/Bcc: recipients, see the old code here:
https://hg.mozilla.org/comm-central/rev/019eaf0f9cf72621a436ce06b2607bc5a3532fae#l6.294
So I guess, switching to parseDecodedHeader() is the go here. There are intricacies of transporting UTF-8 strings as single bytes in UTF-16 JS strings and it's important to use the right parse function. There's actually not much difference between them until you reach one of those UTF-8/UTF-16 points where you need to know exactly what the string contains since interpreting a JS string (UTF-16) as raw UTF-8 is fatal. So it looks like the "2nd call" you've described is the culprit and I bet you'll find that it's gone after the change I suggested.

For more confusing reading I suggest bug 1390337 ;-)

gene smith

Comment 4

•

4 years ago

Attached patch fixes-reflected-addresses.diff (obsolete) (deleted) — Details — Splinter Review

Yes, I think you found the problem! Work fine now with just one call to the convert8BitHeader(). One of my earlier attempt to fix the problem was to change this https://searchfox.org/comm-central/rev/45c9a2810df38a7d8b4a9a396e244a9b438f3729/mailnews/mime/src/mimeParser.jsm#235 so that the call to convert8BitHeader() is skipped when the flag indicates an address. But that's what using the calls to parseDecodedHeader() instead of parseEncodedHeader() does so it looks like the correct fix. (And my fix failed under some conditions.)

Also, with this change not seeing the chopped off upper byte for chars like U+01FF. Didn't test with all the possible field types but OK with TO (including multiple TOs) and BCC.

Assignee: nobody → gds

gene smith

Updated

•

4 years ago

Assignee: gds → nobody

Jorg K (CEST = GMT+2)

Assignee

Comment 5

•

4 years ago

Comment on attachment 9169356 [details] [diff] [review]
fixes-reflected-addresses.diff

Thanks, Gene.

Flags: needinfo?(alessandro)

Attachment #9169356 - Flags: review+

Jorg K (CEST = GMT+2)

Assignee

Updated

•

4 years ago

Assignee: nobody → gds

Status: NEW → ASSIGNED

Target Milestone: --- → 81 Branch

Pulsebot

Comment 6

•

4 years ago

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/345cbe13bb71
Correct use of header parsing function in CompFields2Recipients() after bug 440377. r=jorgk DONTBUILD

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Resolution: --- → FIXED

Jorg K (CEST = GMT+2)

Assignee

Comment 7

•

4 years ago

Attached patch 1658361-fix-header-parsing.patch (obsolete) (deleted) — Details — Splinter Review

[Approval Request Comment]
Regression caused by (bug #): Bug 440377
User impact if declined: "Edit Draft" or "Edit as new message" will result in garbled recipients if recipients field contains raw UTF-8 according to https://tools.ietf.org/html/rfc6532.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky): This patch corrects a coding error which slipped through the review of bug 440377. Having this fix is also a pre-condition of getting bug 1563891 implemented.

Attachment #9169356 - Attachment is obsolete: true

Attachment #9169375 - Flags: review+

Attachment #9169375 - Flags: approval-comm-esr78?

Attachment #9169375 - Flags: approval-comm-beta?

Jorg K (CEST = GMT+2)

Assignee

Updated

•

4 years ago

Summary: Non-ASCII characters in e-mail address don't survive round trip of "save draft" and "edit draft" → "Edit as new message" not working of addressing header contains raw UTF-8, was: Non-ASCII characters in e-mail address don't survive round trip of "save draft" and "edit draft"

Geoff Lankow (:darktrojan)

Comment 8

•

4 years ago

Unfortunately this broke browser_replyAddresses.js. Here's a sample log.

Backed out:
https://hg.mozilla.org/comm-central/rev/034b7e5d52f27b9a3e77048b661297d80883b095

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Geoff Lankow (:darktrojan)

Updated

•

4 years ago

Attachment #9169375 - Flags: approval-comm-esr78?

Attachment #9169375 - Flags: approval-comm-beta?

Jorg K (CEST = GMT+2)

Assignee

Comment 9

•

4 years ago

Sorry about that. I did take a look, but obviously I missed it (in a sea of orange) :-( - I can't see it on debug runs.

OK, so the issue is more difficult, parseEncodedHeader() is the wrong function since it will consider the input as raw UTF-8 string and go through here https://searchfox.org/comm-central/rev/51b7d1ea0a96a8c4d52812ac3e27360b2da4b91c/mailnews/mime/src/mimeParser.jsm#238 which is undesired. But the test shows that we want RFC 2047 decoding. I'll have to think of something.

Assignee: gds → jorgk-bmo

Jorg K (CEST = GMT+2)

Assignee

Comment 10

•

4 years ago

Actually, I see it now. Gene, instead of using parseEncodedHeader() use parseEncodedHeaderW() - W for wide string, so we pass in a wide UTF-16 string. That will work.

Jorg K (CEST = GMT+2)

Assignee

Comment 11

•

4 years ago

Attached patch 1658361-fix-header-parsing.patch (deleted) — Details — Splinter Review

OK, now with parseEncodedHeaderW():

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=8740c6bd542b1f6e8d9556428b4bd9126deeaeeb

Attachment #9169375 - Attachment is obsolete: true

Attachment #9169523 - Flags: review+

Attachment #9169523 - Flags: feedback?(gds)

Jorg K (CEST = GMT+2)

Assignee

Comment 12

•

4 years ago

Here's the description of the function:
https://searchfox.org/comm-central/rev/51b7d1ea0a96a8c4d52812ac3e27360b2da4b91c/mailnews/mime/public/nsIMsgHeaderParser.idl#97

gene smith

Comment 13

•

4 years ago

Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch

Looks good. I tested against the original problem and still fixes it.

Attachment #9169523 - Flags: feedback?(gds) → feedback+

Pulsebot

Comment 14

•

4 years ago

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/27513002dc77
Correct use of header parsing function in CompFields2Recipients() after bug 440377. r=jorgk

Status: REOPENED → RESOLVED

Closed: 4 years ago → 4 years ago

Resolution: --- → FIXED

Jorg K (CEST = GMT+2)

Assignee

Comment 15

•

4 years ago

Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch

[Approval Request Comment]
Regression caused by (bug #): Bug 440377
User impact if declined: "Edit Draft" or "Edit as new message" will result in garbled recipients if recipients field contains raw UTF-8 according to https://tools.ietf.org/html/rfc6532.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky): This patch corrects a coding error which slipped through the review of bug 440377. Having this fix is also a pre-condition of getting bug 1563891 implemented.

Attachment #9169523 - Flags: approval-comm-esr78?

Attachment #9169523 - Flags: approval-comm-beta?

Wayne Mery (:wsmwk)

Comment 16

•

4 years ago

Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch

[Triage Comment]
Approved for beta

Attachment #9169523 - Flags: approval-comm-beta? → approval-comm-beta+

Wayne Mery (:wsmwk)

Comment 17

•

4 years ago

Comment on attachment 9169523 [details] [diff] [review]
1658361-fix-header-parsing.patch

[Triage Comment]
Approved for esr78 (hasn't gone through beta, but simple enough)

Attachment #9169523 - Flags: approval-comm-esr78? → approval-comm-esr78+

Rob Lemley [:rjl]

Comment 18

•

4 years ago

bugherder uplift

Thunderbird 80.0b4:
https://hg.mozilla.org/releases/comm-beta/rev/32f1b6610144

status-thunderbird80: --- → fixed

Rob Lemley [:rjl]

Comment 19

•

4 years ago

bugherder uplift

Thunderbird 78.2.0:
https://hg.mozilla.org/releases/comm-esr78/rev/4a26e69e448d

status-thunderbird_esr78: --- → fixed

fixes-reflected-addresses.diff 4 years ago gene smith (deleted), patch	jorgk-bmo : review+	Details \| Diff \| Splinter Review
1658361-fix-header-parsing.patch 4 years ago Jorg K (CEST = GMT+2) (deleted), patch	jorgk-bmo : review+	Details \| Diff \| Splinter Review
1658361-fix-header-parsing.patch 4 years ago Jorg K (CEST = GMT+2) (deleted), patch	jorgk-bmo : review+ gds : feedback+ wsmwk : approval-comm-beta+ wsmwk : approval-comm-esr78+	Details \| Diff \| Splinter Review