Daily: First message displayed after start looks garbled (non-ascii messages)
Categories
(MailNews Core :: Internationalization, defect, P1)
Tracking
(thunderbird_esr78 unaffected, thunderbird91+ fixed)
Tracking | Status | |
---|---|---|
thunderbird_esr78 | --- | unaffected |
thunderbird91 | + | fixed |
People
(Reporter: jose, Assigned: darktrojan)
References
(Regression)
Details
(Keywords: regression)
Attachments
(3 files)
(deleted),
text/plain
|
Details | |
(deleted),
message/rfc822
|
Details | |
(deleted),
text/x-phabricator-request
|
wsmwk
:
approval-comm-beta+
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36
Steps to reproduce:
I've come here from bug 1717523 comment #8. There I observed that when importing a message with an incorrectly declared charset, it's displayed "double-garbled" instead of "simple-garbled", that means, a message with a UTF-8 å (0xC3 0xA5) should be displayed as Ã¥ when interpreted as windows-1252, but it instead shows Ã¥, that's the four UTF-8 bytes for Ã¥ displayed as ISO-8859-1.
It turns out that any message displayed after TB started is displayed garbled, for example the attached one.
I think this is a fallout from bug 1717523 and bug 1713627.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 1•3 years ago
|
||
Simple UTF-8 message.
Alice, could you check this out, please:
STR:
Import/drag the attached message to a local folder.
Click on on it. It displays garbled.
F8 to hide the preview, F8 to show it again. It doesn't display garbled any more.
Restart TB, click on the message. It displays garbled.
F8 to hide the preview, F8 to show it again. It doesn't display garbled any more.
Do you see this effect? Can you confirm the regression around bug 1717523 and bug 1713627?
Of course you can also try with a Japanese message.
Comment 2•3 years ago
|
||
I can also reproduce the issue in Latest Daily91.0a1(20210624120918) Windows 10 with a Japanese message as well as the utf-8.eml.
Regression window:
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=79634195a6bf893848570ab3583ef8eb8657b642&tochange=fd910989e88864d435fe444919a1df53e0b1ae2f
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2ad6d795630df04703a887f12f2fdba219bf7f4f&tochange=997f00815e6bc28806b75448c8829f0259d2cb28
Reporter | ||
Comment 3•3 years ago
|
||
Henri, any idea why https://hg.mozilla.org/comm-central/rev/9941ab12f44ccfa75652bdc6e54d723a3e984b15 and https://hg.mozilla.org/mozilla-central/rev/640ad102b31631a639d0b0e875257e684fe10dea would have caused display issues on first display?
Reporter | ||
Comment 4•3 years ago
|
||
I forgot to thank you, Alice. Many thanks!! I was looking at bug 1717523 and one can get easily confused with too many problems at the same time. So it's good to have the confirmation and a Japanese example as well.
Comment 5•3 years ago
|
||
(In reply to José M. Muñoz from comment #0)
"double-garbled" instead of "simple-garbled", that means, a message with a UTF-8 å (0xC3 0xA5) should be displayed as Ã¥ when interpreted as windows-1252, but it instead shows Ã¥, that's the four UTF-8 bytes for Ã¥ displayed as ISO-8859-1.
That reminds me of Bug 1666146. I am not sure if there is any connection at all. It is, after all, a completely different code path. But the symptoms are similar.
Bug 1666146 occurs when TB reuses an expired news connection. Then it seems that the connection is not initialized completely or not in time.
Reporter | ||
Comment 6•3 years ago
|
||
Since you're asking for a déjà-vu: I have another one: Bug 1287336 (and duplicates), for steps to reproduce, see bug 1287336 comment #33.
Comment 7•3 years ago
|
||
Indeed. Want some more?
Bug 1485469 - circumflexed A ( Â ) is shown in message display that isn't in the message source
Bug 952861 - When switching views, message source is displayed in the preview pane
Assignee | ||
Comment 8•3 years ago
|
||
Confirming this even though I've only seen it in tests. Not sure why it doesn't happen to me as a user.
Assignee | ||
Updated•3 years ago
|
Reporter | ||
Comment 9•3 years ago
|
||
Not sure why it doesn't happen to me as a user.
You need to use a message with non-ASCII content. Dragging attachment 9228820 [details] or attachment 9228871 [details] into a folder will show them garbled, if they are the first message you display in the session. They are also garbled after a restart or even if they are the first message received after a restart. Whilst the "western" message is still somewhat readable, the Japanese message is totally unreadable, so that will not be well received when shipped to a large number of users, Japan being a popular locale for TB.
Updated•3 years ago
|
(In reply to José M. Muñoz from comment #3)
Henri, any idea why https://hg.mozilla.org/comm-central/rev/9941ab12f44ccfa75652bdc6e54d723a3e984b15 and https://hg.mozilla.org/mozilla-central/rev/640ad102b31631a639d0b0e875257e684fe10dea would have caused display issues on first display?
No idea why it's first display only.
However, I suppose on first display there is nothing that communicates the UTF-8ness of the input to the HTML parser. I suggest one of the following:
- Making the channel that mailnews core uses for passing the converted-to-UTF-8 email to Gecko report UTF-8 as the channel charset.
- Starting the converted-to-UTF-8 data with the UTF-8 BOM.
Comment 11•3 years ago
|
||
Just to note, the changes here have caused the Conversations add-on to have a similar issue - except it sees it all the time (with non-ascii messages).
Reporter | ||
Comment 12•3 years ago
|
||
Looks like Henri was spot on, I've just seen this in the console:
The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol.
Reporter | ||
Comment 13•3 years ago
|
||
Likely that this removal should have been a replacement:
https://hg.mozilla.org/comm-central/rev/9941ab12f44ccfa75652bdc6e54d723a3e984b15#l1.12
mDocShell->SetCharset("UTF-8"_ns);
, but what's the replacement?
Comment 14•3 years ago
|
||
(In reply to José M. Muñoz from comment #13)
Likely that this removal should have been a replacement:
https://hg.mozilla.org/comm-central/rev/9941ab12f44ccfa75652bdc6e54d723a3e984b15#l1.12
mDocShell->SetCharset("UTF-8"_ns);
, but what's the replacement?
There is no direct replacement. The items Henri described in comment 10 will need to be implemented.
Reporter | ||
Comment 15•3 years ago
|
||
Henri's comment suggested "one of the following". So surely "Making the channel ... report UTF-8 as the channel charset" would be easier than converting the data to UTF-8 with BOM rather than UTF-8 (without BOM). So dropping in a "replacement" nsIChannel::SetContentCharset()
in the right location will likely fix the issue.
https://searchfox.org/comm-central/search?q=setcontentcharset&path=mime&case=false®exp=false
(In reply to José M. Muñoz from comment #15)
Henri's comment suggested "one of the following". So surely "Making the channel ... report UTF-8 as the channel charset" would be easier than converting the data to UTF-8 with BOM rather than UTF-8 (without BOM). So dropping in a "replacement"
nsIChannel::SetContentCharset()
in the right location will likely fix the issue.https://searchfox.org/comm-central/search?q=setcontentcharset&path=mime&case=false®exp=false
The BOM route is the most robust one and involves prepending 3 bytes to the data, so it should be very easy. The only downside is the expected buffer copy from the prepending operation.
Comment 17•3 years ago
|
||
Thanks for the discussion here. Until upstream finds a better solution, we're going with this fix which hacks into the GetContentCharset() implementations:
https://github.com/Betterbird/thunderbird-patches/blob/main/91/bugs/1718119-first-display-garbled.patch
Could be hard to find the right spot in the MIME code to prepend the UTF-8 BOM.
Comment 18•3 years ago
|
||
Sorry about the noise, we revised our first hacky version. Now it's down to this and two similar hunks:
NS_IMETHODIMP nsMsgProtocol::GetContentCharset(nsACString& aContentCharset) {
- aContentCharset.Assign(mCharset);
+ aContentCharset.Assign(mCharset.IsEmpty() ? "UTF-8"_ns : mCharset);
return NS_OK;
}
Assignee | ||
Comment 19•3 years ago
|
||
Depends on D119842
Updated•3 years ago
|
Assignee | ||
Comment 20•3 years ago
|
||
I considered changing the protocols too, but in the end decided not to.
Comment 21•3 years ago
|
||
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/8cbcb7686639
When emitting messages for display, start with a UTF-8 BOM. r=mkmelin
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 22•3 years ago
|
||
Comment on attachment 9231094 [details]
Bug 1718119 - When emitting messages for display, start with a UTF-8 BOM. r=mkmelin
[Approval Request Comment]
Regression caused by (bug #): bug 1717523 removed what was previously holding this together
User impact if declined: messages load in the wrong charset and look garbled
Testing completed (on c-c, etc.): on c-c, has a test that fails otherwise
Risk to taking this patch (and alternatives if risky): low
Comment 23•3 years ago
|
||
Comment on attachment 9231094 [details]
Bug 1718119 - When emitting messages for display, start with a UTF-8 BOM. r=mkmelin
[Triage Comment]
Approved for beta
Comment 25•3 years ago
|
||
bugherder uplift |
Thunderbird 91.0b2:
https://hg.mozilla.org/releases/comm-beta/rev/1b5123cb9c29
Updated•3 years ago
|
Description
•