messages.getFull can not decode body of nntp message
Categories
(Thunderbird :: Add-Ons: Extensions API, defect)
Tracking
(thunderbird_esr78 wontfix)
Tracking | Status | |
---|---|---|
thunderbird_esr78 | --- | wontfix |
People
(Reporter: manikulin, Assigned: TbSync)
Details
Attachments
(2 files, 2 obsolete files)
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0
Steps to reproduce:
MimeParser.extractMimeMsg
introduced in Bug #1644027 for messages.getFull
is unable to properly decode UTF-8 body or message part having quoted-printable or base64 transfer encoding. The issue is specific to nntp messages, e.g. messages from local folders are decoded correctly.
Minimal example using code from implementation of messages.getFull
. Inspect output in console after loading of the following extension:
background.js
browser.tst.decode(
`From - Thu May 27 21:23:35 2021
Newsgroups: gmane.comp.mozilla.thundebird.user
From: Bug Reporter <new@thunderbird.bug>
Subject: =?UTF-8?B?zrHOu8+GzqzOss63z4TOvw==?=
Date: Thu, 27 May 2021 21:23:35 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8;
Content-Transfer-Encoding: base64
zobOu8+GzrEK
`).then(msg => {
console.log(msg);
console.log(msg.headers['subject']);
console.log(msg.parts[0].body);
}).catch(console.error);
experiment.js
var { MimeParser } = ChromeUtils.import("resource:///modules/mimeParser.jsm");
function convertMessagePart(part) {
let partObject = {};
for (let key of [
"body",
"contentType",
"headers",
"name",
"partName",
"size",
]) {
if (key in part) {
partObject[key] = part[key];
}
}
if ("parts" in part && Array.isArray(part.parts) && part.parts.length > 0) {
partObject.parts = part.parts.map(convertMessagePart);
}
return partObject;
}
var tst = class extends ExtensionCommon.ExtensionAPI {
getAPI(context) {
return { tst: {
async decode(raw) {
let mimeMsg = {};
try {
mimeMsg = MimeParser.extractMimeMsg(raw, {
includeAttachments: false,
});
} catch (e) {
console.log(MimeParser);
console.error(e);
throw new Error('Message decode error');
}
return convertMessagePart(mimeMsg);
}
} };
}
};
schema.json
[ {
"namespace": "tst",
"functions": [ {
"name": "decode",
"type": "function",
"async": true,
"parameters": [ { "name": "raw", "type": "string" } ]
} ]
} ]
manifest.json
{
"manifest_version": 2,
"name": "getFull decoder test",
"version": "0.1",
"background": { "scripts": [ "background.js" ] },
"experiment_apis": { "tst": {
"schema": "schema.json",
"parent": {
"scopes": [ "addon_parent" ],
"paths": [ [ "tst" ] ],
"script": "experiment.js"
}
} }
}
Actual results:
- Headers (Subject) are decoded correctly.
- Garbage instead of body text. Representation is different in object inspector (
msg
) and for body printed byconsole.log
.
Expected results:
Body: "Άλφα"
Assignee | ||
Comment 1•3 years ago
|
||
Assignee | ||
Comment 2•3 years ago
|
||
@max: Do you know how to download binaries from the artifact section of "B" tests of the try run? For Windows it is the zip file:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/IEp0TznQRu2uL_AhqKS1yw/runs/0/artifacts/public/build/target.zip
Could you give that version a spin and check if this bug is fixed and if there are more flaws I have missed?
If it is not Windows, what OS are you running, so I can give you the correct download link?
I hope, I will try it tomorrow. I am a linux user. I noticed the problem when I tried a couple of messages from news.gmane.io. There are plenty of groups with messages where ASCII is not enough. Try e.g. gmane.linux.debian.user.russian or similar one with a more familiar language.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Thank you, most messages, I have tried, are working correctly. While I was looking for a message in legacy 8 bit encoding, I found some messages with missed body in getFull
result. It may happen even with messages having UTF-8 encoding. Try e.g. X70U3LO/zYt2yoRe@mail.ioffe.ru on news.gmane.io and other messages by the same author.
I have not figured out what can be wrong with such messages (or why they are special ones).
Assignee | ||
Comment 6•3 years ago
|
||
Can you give me the name of the newsgroup and a title so I can find this example message?
thunderbird 'news://news.gmane.io/X70U3LO/zYt2yoRe@mail.ioffe.ru'
I can not say that it works reliably, but first time when thunderbird is already opened, the command opens the message. It is gmane.linux.debian.user.russian.
It seems, the problem is caused by Content-Disposition: inline
. Other headers may be quite usual
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Assignee | ||
Comment 8•3 years ago
|
||
It is the Content-Disposition: inline
header. For some reason the emitter is copying many headers from the main message to the first mime part and thus the body is seen as an attachment, even though it is not. Investigating. Nevertheless, setting content-disposition on the main message seems invalid, but I think WE (Thunderbird) did that wrong, if I understad the wiki article correct:
https://en.wikipedia.org/wiki/MIME#Content-Disposition
Assignee | ||
Comment 9•3 years ago
|
||
Assignee | ||
Comment 10•3 years ago
|
||
Updated•3 years ago
|
Reporter | ||
Comment 11•3 years ago
|
||
(In reply to John Bieling (:TbSync) from comment #8)
It is the
Content-Disposition: inline
header. For some reason the emitter is copying many headers from the main message to the first mime part and thus the body is seen as an attachment, even though it is not. Investigating. Nevertheless, setting content-disposition on the main message seems invalid, but I think WE (Thunderbird) did that wrong, if I understad the wiki article correct:
https://en.wikipedia.org/wiki/MIME#Content-Disposition
I am not familiar with related RFC's or de-facto practice. I think, it should be consisted within thunderbird. If the body is shown in message window then extensions should get it as well.
Recently I was looking for a way to view patches sent as attachments in thunderbird without launching an external viewer (maybe it is not a fool-proof way, but acceptable implementation was done in Bug #1509709). Before I found that preference, I noticed a couple of bugs related to Content-Disposition
issues: Bug #184869 and Bug #182627 (the latter have complains added after it was closed). Unsure if they are relevant to the case found while I was testing your patch.
Please, consider adding a couple of tests with legacy 8-bit encodings in addition to UTF-8. It seems, they work.
Expectations: {body: "Вопрос\n", subject: [ "Алфавит" ]}
From - Sun May 27 21:23:35 2001
Newsgroups: gmane.comp.mozilla.thundebird.user
From: Bug Reporter <new@thunderbird.bug>
Subject: =?koi8-r?B?4czGwdfJ1Ao=?=
Date: Sun, 27 May 2001 21:23:35 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset=koi8-r;
Content-Transfer-Encoding: base64
98/Q0s/TCg==
From - Sun May 27 21:23:35 2001
Newsgroups: gmane.comp.mozilla.thundebird.user
From: Bug Reporter <new@thunderbird.bug>
Subject: =?windows-1251?B?wOv04OLo8go=?=
Date: Sun, 27 May 2001 21:23:35 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251;
Content-Transfer-Encoding: base64
wu7v8O7xCg==
Assignee | ||
Comment 12•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 13•3 years ago
|
||
I found more issues and updated the patch/tests. Try run is here:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=196e3f512c528e4d4c611f33a529fed73f9f3fee
If that is through, you could give it another spin.
Assignee | ||
Updated•3 years ago
|
Comment 14•3 years ago
|
||
Pushed by alessandro@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/9473904d2047
Fix MimeParser.extractMimeMsg() to return the same headers as MsgHdrToMimeMessage, properly return inline body parts and fix multiple decoding issues in the messages.getFull WebExtension API. r=mkmelin
Comment 15•3 years ago
|
||
Is it correct that MimeParser.extractMimeMsg() will now return the MIME parts as unicode instead of raw binary data in the original encoding? See:
https://hg.mozilla.org/comm-central/rev/9473904d2047fc7d42476625eb8f4cf8d5a6f11f#l12.408 ?
This will invalidate the proposed solution in bug 1713786, attachment 9224827 [details] [diff] [review], where I relied on the raw data to be returned so it can be used for charset detection. Or am I reading this incorrectly?
BTW, what does https://hg.mozilla.org/comm-central/rev/9473904d2047fc7d42476625eb8f4cf8d5a6f11f#l12.224 do? Looks like you're encoding header values into UTF-8 now. Why is that necessary? Maybe for headers that where not ASCII/RFC 2047, so they're now safely are made UTF-8 since you can have raw UTF-8 in headers according to RFC 6532.
Assignee | ||
Comment 16•3 years ago
|
||
MimeParser.extractMimeMsg()
behaves now identical to the core libmime based mime parser used for imap and local messages. The API messages.getFull()
now returns the same values for local, imap and nntp. The only remaining difference is that local/imap headers eat the TAB after a CLRF of a multiline header, while nntp headers do not. I opened bug 1715833 for that.
Is it correct that MimeParser.extractMimeMsg() will now return the MIME parts as unicode instead of raw binary data in the original encoding? See:
https://hg.mozilla.org/comm-central/rev/9473904d2047fc7d42476625eb8f4cf8d5a6f11f#l12.408 ?
That is only used internally and was not exposed to the final API. The binary string was later converted, but as it turned out, body parts got mangled. If you search for the enigmail code this is based on, you will find they used unicode
there as well. It was a mistake to switch to binarystring
, but at that time I did not know any better.
This will invalidate the proposed solution in bug 1713786, attachment 9224827 [details] [diff] [review] [diff] [review], where I relied on the raw data to be returned > so it can be used for charset detection. Or am I reading this incorrectly?
messages.getFull() does not (and should not) return a binary string, but properly decoded data, according to the parts charset. I did not invent this, I just aligned the output of nntp messages and imap/local messages. See the added tests which have to be passed for all three account types:
https://searchfox.org/comm-central/rev/8a77f2ff2762bbe436133205d6b98561f3bc81d1/mail/components/extensions/test/xpcshell/test_ext_messages_get.js#163-642
BTW, what does https://hg.mozilla.org/comm-central/rev/9473904d2047fc7d42476625eb8f4cf8d5a6f11f#l12.224 do? Looks like you're encoding
header values into UTF-8 now. Why is that necessary? Maybe for headers that where not ASCII/RFC 2047, so they're now safely are made UTF-8
since you can have raw UTF-8 in headers according to RFC 6532.
TextEncoder is used to turn the header into a binary string, as this is what the original mime parser was doing (internally). Again, I did not invent this. I just alligned the new MimeParser.extractMimeMsg()
mime parser with the already existing one.
Reporter | ||
Comment 17•3 years ago
|
||
Thank you. I have not tried hard it, but a dozen of messages are decoded properly (including "Content-Disposition: inline").
I noticed the following behavior however, I hope, it is expected
Content-Type: multipart/alternative
txt+html (e.g. from gmail) have their parts inparts[0].parts
, not in the top-levelparts
array.- Message generated by Alpine has
Content-Type: multipart/mixed
and
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
--8323329-1640320710-1596988834=:40294
Content-Type: text/plain; CHARSET=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT
I can not find "This message is in MIME format..." in getFull()
result. news://news.gmane.io/alpine.DEB.2.23.453.2008091859370.40294@jura104.jinr.ru Unsure if I really need it, I am just slightly afraid that some parts of complex messages are unavailable.
Assignee | ||
Comment 18•3 years ago
|
||
Assignee | ||
Comment 19•3 years ago
|
||
This is expected, the core parser is doing the same. I updated the test files to ensure identical behavior.
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 20•3 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/cdf8bc4921de
Also test content at root part of multipart/alternative emails for messages.getFull API. r=mkmelin
Description
•