Closed
Bug 227290
Opened 21 years ago
Closed 19 years ago
be generous to overlong (invalid) B-encoded words in 2047 encoded header?
Categories
(MailNews Core :: MIME, enhancement)
MailNews Core
MIME
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ilya.konstantinov+future, Assigned: jshin1987)
References
Details
(Keywords: fixed1.8.1, intl)
Attachments
(3 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
smontagu
:
review+
Bienvenu
:
superreview+
dveditz
:
approval1.8.0.2-
mscott
:
approval1.8.1+
|
Details | Diff | Splinter Review |
(deleted),
message/rfc822
|
Details |
Multiple encoded words (= MIME header values which include charset
specification, as per RFC 2047) are not parsed. Seems like the only encoded word
to get parsed is the encoded word on the first line of the header.
For example:
Subject:
=?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?=
=?koi8-r?B?ZXJpY28gRmVsbGluaSAoMTkyMCAtIDE5OTMpIg==?=
does not get parsed at all, and the mangled header is displayed as-is in the GUI
(both Mozilla and Thunderbird).
Assignee | ||
Comment 1•21 years ago
|
||
Hmm. that's strange. What version did you try?
OS: Linux → All
Hardware: PC → All
Assignee | ||
Comment 2•21 years ago
|
||
It seems like it's the second encoded word that is missing. I sent an email to
myself with the following header and the first and the third encoded words are
decoded and shown, but the second is not.
Subject: =?UTF-8?B?6rCA64KY64usIO2VnOq4gCDqsITri6Trnbwg7ZWc6riA44WHIOOEtCDqsA==?=
=?UTF-8?B?gOuCmOuLpOudvCDtlZzquIDqsIDrgpjri6Trnbwg7ZWc6riAIOqwgOuCmA==?=
=?UTF-8?B?64usIOqwgOuCmOuLpOudvCDqsIDrgpjri6TrnoQg6rCA64KY64us6528IA==?=
I'll take a look.
Comment 3•21 years ago
|
||
To reporter :
> Subject:
> =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?=
> =?koi8-r?B?ZXJpY28gRmVsbGluaSAoMTkyMCAtIDE5OTMpIg==?=
Second part was displayed as
> erico Fellini (1920 - 1993)"
This part is properly encoded.
But first part was displayed as is, ie. =?koi8-r?B?7s....
This part is not encoded properly.
Reporter, Try following test.
(1) Create two draft mails in Drafts folder
(2) "Compact this folder" for Drafts folder
(3) Shutdown Mozilla
(4) Edit file for Draft folder (file named "Draft" instead of "Drafts.msf")
- Paste first part to Subject: header of first mail
- Paste seconfd part to Subject: header of second mail
(5) Delete file named "Drafts.msf".
(6) Restart Mozilla and see Drafts folder.
This is bug in mail sender's side.
Probably bug in splitting long encoded string to multiple Subject: header lines.
What is the mailer? Mozilla?
To Comment #2 From Jungshik Shin :
In above test for your UTF-8 Subject: on Thunderbird 2003-12-23 build,
first part and third part are displyed in Hangul characters prperly(probably. I
can not read Hangul chars), but second part was not.
However, WORKSFORME with Mozilla 2003122809-trunk/Win-Me, for long Subject: of
both ISO-2022-JP encoding and UTF-8 encoding for Japanese characters.
Splitting to multiple lines is done with no problem.
Are there any special condition around splitted point?
Assignee | ||
Comment 4•21 years ago
|
||
What led you to believe that the first line of the header in comment #0 is
invalid? By just inspection, I don't see anything wrong with. Besides, Pine
(with iconv patch) has no problem rendering both lines correctly:
Subject: Новинки каталога "Феллини Федерико -
Federico Fellini (1920 - 1993)"
However, there's something. There may be an embedded new line (in the first
encoded word) that may lead Mozilla to a trouble.
As for my case, there's nothing special. I just typed a long enough string to
get Pine to generate multiple encoded words. There's very low chance that Pine
has a bug in RFC 2047 implementation. It's the most standard-compliant MUA.
Assignee | ||
Comment 5•21 years ago
|
||
There's no new line embedded in either of two encoded words. The first encoded
word is, when decoded, 'Новинки каталога "Феллини Федерико - Fed' and the second
one is 'erico Fellini (1920 - 1993)"'. Mozilla doesn't decode either of them
as reported. I have to debug it.
Comment 6•21 years ago
|
||
To Comment #4 From Jungshik Shin :
>What led you to believe that the first line of the header in comment #0 is
>invalid? By just inspection, I don't see anything wrong with. Besides, Pine
>(with iconv patch) has no problem rendering both lines correctly:
My test result lead me :
Both of Mozilla 2003122809-trunk/Win-Me and Thunderbird 2003-12-23 build
displayed following header as ASCII string.
>Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?=
But I belive encoding itself is correct as you say since this header is
displayed properly in your environment.
I guess this problem is OS dependent.
You use Pine (with iconv patch) but I use Japanese MS Windows Me.
MS Windows implementation for Unicode is slightly different from Unicode
Standards, for example large Tilda.
In addition to it, MS Win-9x family's Unicode support is partial, although MS
Win-NT family's one is nearly full support.
Following is mail source when I paseted your decoded text to Subject: and body.
>Subject: =?KOI8-R?Q?=EE=CF=D7=C9=CE=CB=C9_=CB=C1=D4=C1=CC=CF=C7=C1_=22?=
> =?KOI8-R?Q?=E6=C5=CC=CC=C9=CE=C9_=E6=C5=C4=C5=D2=C9=CB=CF_-_Federi?=
> =?KOI8-R?Q?co_Fellini_=281920_-_1993=29=22?=
>Content-Type: text/plain; charset=KOI8-R; format=flowed
>Content-Transfer-Encoding: 8bit
>
>Новинки каталога "Феллини Федерико - Federico Fellini >(1920 - 1993)"
Your second UTF-8 portion was displayed as single strange character, a "?"
sarounded by diamond shape by Mozilla under MS Win-Me.
Font specified for Korean : Proportinal=Arial Unicode MS, Monospace=GulimChe
Assignee | ||
Comment 7•21 years ago
|
||
Did you use Mozilla to test whether the encoded word Mozilla has trouble with is
valid or not per RFC 2047? Obviously, that doesn't work. How can it work? I
just used Pine as a quick test tool and then independently decoded encoded words
with other tools.
This bug (as reported) has NO platform dependency. It's 100% XP code and I know
where to look. Actually, I'm almost sure Mozilla doesn't have a problem with
'encoded words' themselves, but it has a problem with header fields made of
multiple lines/encoded words in some cases. It has the code to deal with, but
somehow it seems like it fails in some cases (as given here).
Assignee: sspitzer → jshin
Comment 8•21 years ago
|
||
To Comment #7 From Jungshik Shin :
When I changed
>Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA===?=
to
>Subject: =?koi8-r?B?7s/Xyc7LySDLwdTBzM/HwSAi5sXMzMnOySDmxcTF0snLzyAtIEZlZA==?=
(Removed single "=" before last "?")
Mozilla displayed it as
> Новинки каталога "Феллини Федерико - Fed
Unescape sequence is corrupted.
Jungshik Shin, please do not confuse reporter's case and your case.
Assignee | ||
Comment 9•21 years ago
|
||
I'm completely at loss what you're talking about.
Assignee | ||
Comment 10•21 years ago
|
||
Sorry. I just got what you meant. The last '=' is the 57th (if I didn't
miscount) character and it's kinda redundant (57 being 1 modulo 4). So, Mozilla
has a trouble with that. That has to be 'fixed', I guess.
Status: NEW → ASSIGNED
Comment 11•21 years ago
|
||
Sorry for improper word "unescaping".
This is invalid in encoding based on RFC 2047/RFC 2045 and meaningless.
It should have be "exess padding character('=')".
RFC 2045 for Base 64 says :
> Since all base64
> input is an integral number of octets, only the following cases can
> arise: (1) the final quantum of encoding input is an integral
> multiple of 24 bits; here, the final unit of encoded output will be
> an integral multiple of 4 characters with no "=" padding, (2) the
> final quantum of encoding input is exactly 8 bits; here, the final
> unit of encoded output will be two characters followed by two "="
> padding characters, or (3) the final quantum of encoding input is
> exactly 16 bits; here, the final unit of encoded output will be three
> characters followed by one "=" padding character.
I can not find rule for exess "="(s) after proper padding of zero or one or two
"="s.
Mozilla probably considers whole encoded data as "Invalid" when exess padding
character exists (expects "?=", but not).
I belive this is not violation of RFC.
However, in this bug's case, all data from first bytes to just before exess
padding is valid encoded data.
So I feel "ignoring" exess data or printing exess data as ascii character is
kind action for users since some mailers produced reporter's data actually.
If Mozilla processes encoded word from start to end and expects "?=" just after
proper end of base 64 encoded data, parsing order change(external first,
important first) may help easy solution, for example :
parse by "=?" and "?=" first, parse by "?"s secons and determin charset and
encoding method, then process encoded data portion only.
Jungshik Shin, what do you think?
Assignee | ||
Comment 12•21 years ago
|
||
You're right that it's invalid (you were right at the beginning and I was misled
by Pine and other Mime tools I have that turned out to be more generous than
Mozilla.) It's easy to make Mozilla more generous (just a one-line fix would
suffice), but I'm not sure if I have to. There may be a 'security' issue??
reporter, what's the mail program that generated the header cited in your report?
Assignee | ||
Comment 13•21 years ago
|
||
this patch will "fix" the problem, but as I wrote, we have to think about this
a little.
Assignee | ||
Comment 14•21 years ago
|
||
WADA, you're in favor of the patch, right? David and Seth, what do you think?
Simon, do you see any security implication in accepting overlong base64 encoded
words in the message header? Base64-encoded words (B-encoded word) always have
to the number of characters that is a multiple of four and end with one of three
sequences a) a sequnece entirely made of base64 'alphabets', b) two characters
(of base64 alphabets) followed by '==', c) three characters of base64 alphabets
followed by '='
Summary: Multiple encoded words (=?charset?...?=) not parsed → be generous to overlong (invalid) B-encoded words in 2047 encoded header?
Comment 15•21 years ago
|
||
Being more tolerant makes sense, but I think I would be happier with a more
focused fix to ignore 3 consecutive "=" characters at the end of a B-encoded
word, rather than blindly reducing the length to a multiple of 4.
Comment 16•21 years ago
|
||
I would be more happier with fix to ignore "More than 2" consecutive "="
characters at the end of a B-encoded word.
I have questions.
(Q1) I can not say whether exess "="(s) should be displayed as ascii "=" in
order to let mail receiver to know about existence of invalid header, or exess
"="(s) should only be ignored.
Which should Mozilla do?
(Q2) How about characters other than "=" after valid end of encoding word?
(Q3) In replying or forwarding, I can not say whether exess "="(s) or characters
should be removed or shoud be kept.
Which should Mozilla do?
Assignee | ||
Comment 17•21 years ago
|
||
WADA, I don't want to do anything fancier than this or attachment 138246 [details] [diff] [review].
Comment 18•21 years ago
|
||
> I don't want to do anything fancier than this or attachment 138246 [details] [diff] [review]
(1) If enhancement for invalidly encoded header will be developed on Mozilla, I
think it should not be only a limited relief from a bug of one or a few
not-well-designed mailers only.
It should be an universal enhancement.
At least, issues I described in Comment #16 should be discussed and cleared.
(2) I guess invalidly encoded header of this bug was produced by one or a few
versions of one or a few mailers only.
(3) I believe bug of the mailer(s) should be fixed first.
So, I, as an user, recommend you, a developer, to close this bug as INVALID, or
to close as FUTURE or WONTFIX with changing severity=Enhancement.
Comment 19•21 years ago
|
||
By the way, Jungshik Shin, how did you generate header in your Comment #2?
It seems to be a new problem in folding of mail header encoded with UTF-8.
Comment 20•20 years ago
|
||
See bug 258320.
Updated•20 years ago
|
Product: MailNews → Core
Comment 21•20 years ago
|
||
*** Bug 274156 has been marked as a duplicate of this bug. ***
Comment 22•20 years ago
|
||
*** Bug 274384 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 23•20 years ago
|
||
*** Bug 282439 has been marked as a duplicate of this bug. ***
Comment 24•19 years ago
|
||
*** Bug 244002 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 25•19 years ago
|
||
Jungshik, what are our goals with this one?
Reporter | ||
Comment 26•19 years ago
|
||
Just verifying: bug still exists on Thunderbird 1.5rc2.
Reporter | ||
Comment 27•19 years ago
|
||
As to "what mailer generated this mail", this is an automated mailing generated by a major Russian online store. Yes, custom mailing apps tend to be written with disregard to standards, but if we can afford ourselves a little "be generous in what you accept", why not? (Especially that Pine, Evolution and probably OE too afford it.)
Reporter | ||
Comment 28•19 years ago
|
||
Assignee | ||
Comment 29•19 years ago
|
||
Comment on attachment 207942 [details]
Testcase
ok. let's 'fix' this.
Attachment #207942 -
Flags: superreview?(bienvenu)
Attachment #207942 -
Flags: review?(smontagu)
Comment 30•19 years ago
|
||
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only
I assume the review request was supposed to be on this attachment, not the testcase :)
Attachment #138283 -
Flags: review+
Assignee | ||
Comment 31•19 years ago
|
||
Comment on attachment 207942 [details]
Testcase
Thanks for r and catching my stupid mistake. :-)
Attachment #207942 -
Flags: superreview?(bienvenu)
Attachment #207942 -
Flags: review?(smontagu)
Assignee | ||
Updated•19 years ago
|
Attachment #138283 -
Flags: superreview?(bienvenu)
Updated•19 years ago
|
Attachment #138283 -
Flags: superreview?(bienvenu) → superreview+
Assignee | ||
Comment 32•19 years ago
|
||
Fix checked into the trunk.
David, I think this patch is safe enough for TB 1.5 release. For what branch (1.8.0.1, 1.8.0.2) should I ask approval?
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment 33•19 years ago
|
||
1.5 is getting release tomorrow - to make a 1.5.0.1 release, I'm not sure what branch you'd want. But definitely do 1.8.1 so it will make 2.0.
Assignee | ||
Comment 34•19 years ago
|
||
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only
This is a trivial fix to make our RFC 2047 decoder a bit generous to a common mistake of other mail programs. We need to make it in 2.0.
I also want this in TB 1.5.1(?), but not sure which branch I have to ask an approval for (1.8.0.1 or 1.8.0.2?). Whichver it may be, it'd be nice to get approval for that, too.
Attachment #138283 -
Flags: approval1.8.1?
Assignee | ||
Comment 35•19 years ago
|
||
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only
This is a trivial fix to make our RFC 2047 decoder a bit generous to a common
mistake of other mail programs and server-side programs (well, at the moment, we don't interpret C-D filename parameter in 'browser').
Anyway, we'd better fix this in next point release of thunerbird 1.5.1(?)
Attachment #138283 -
Flags: approval1.8.0.2?
Comment 36•19 years ago
|
||
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only
do you need a review from Darin (the module owner of netwerk) for this?
Attachment #138283 -
Flags: approval1.8.1? → approval1.8.1+
Assignee | ||
Comment 37•19 years ago
|
||
(In reply to comment #36)
> (From update of attachment 138283 [details] [diff] [review] [edit])
> do you need a review from Darin (the module owner of netwerk) for this?
In principle, I guess, the answer is yes. However, I hope :-) Darin will excuse me for getting away with this especially considering that this part is currently only used by TB (due to bug 299372) Just in case, I'm adding him to cc.
fix landed on the branch for TB 2.0
Keywords: fixed1.8.1
Comment 38•19 years ago
|
||
Can some of the folks concerned about this bug on the cc list help test the 1.8 branch builds so we can see how this fix is looking before we consider it for 1.8.0.x?
Thanks.
ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8
Comment 39•19 years ago
|
||
Comment on attachment 138283 [details] [diff] [review]
patch that handles one extra '=' only
lack of testing (no reply to comment 38), too late for 1.8.0.2
Attachment #138283 -
Flags: approval1.8.0.2? → approval1.8.0.2-
Comment 40•19 years ago
|
||
(In reply to comment #38)
Works for me. Thunderbird 1.5 (Windows/20060222)
Thanks.
Comment 41•18 years ago
|
||
*** Bug 302816 has been marked as a duplicate of this bug. ***
Comment 42•18 years ago
|
||
Bug 302816 is about the same problem as this bug; the header is:
=?koi8-r?B?KioqIPXXxcTPzczFzsnFIM8g08/T1M/RzsnJIOzJw8XXz8fPINPexdTB=?=
That's 56 bytes of base-64 plus one (superfluous) '='. Doesn't that make this
a case of "one extra '=' only"? But it isn't being handled correctly in recent 2a1/3a1 builds (altho the original data in comment 0 *is* handled correctly).
Assignee | ||
Comment 43•18 years ago
|
||
(In reply to comment #42)
> Bug 302816 is about the same problem as this bug; the header is:
> =?koi8-r?B?KioqIPXXxcTPzczFzsnFIM8g08/T1M/RzsnJIOzJw8XXz8fPINPexdTB=?=
>
> That's 56 bytes of base-64 plus one (superfluous) '='. Doesn't that make this
> a case of "one extra '=' only"? But it isn't being handled correctly in recent
> 2a1/3a1 builds (altho the original data in comment 0 *is* handled correctly).
Actually, with my patch TB only tolerates case 2 (of RFC 2045 : comment 11) + one superfluous '='. It doesn't accept case 1 or case 3 + one superfluous '=' (bug 302816 being case 1 + '='). That was because we limited our fix to 'then-known' malformed cases (see comment #15).
Do we have to be more generous now that a new strain of malformed header has been discovered? I'm not sure, but it seems a bit arbitrary that case 2 + extra '=' is accepted while case 1 + '=' or case 3 + '=' is rejected as invalid.
Assignee | ||
Comment 44•18 years ago
|
||
*** Bug 351203 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 45•18 years ago
|
||
Simon, what do you think of these new 'strains' of malformed encoded words?
Comment 46•18 years ago
|
||
(In reply to comment #43)
> I'm not sure, but it seems a bit arbitrary that case 2 + extra
> '=' is accepted while case 1 + '=' or case 3 + '=' is rejected as invalid.
Yes, I think it would be more consistent to accept case 1 and case 3 as well.
Updated•17 years ago
|
Severity: normal → enhancement
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•