MBox to Maildir conversion fails on some mails because it uses a stricter notion of envelope From-line than Thunderbird's mbox parser
Categories
(MailNews Core :: Backend, defect)
Tracking
(Not tracked)
People
(Reporter: cluster15, Assigned: benc)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fixed by bug 1515254])
Attachments
(1 file)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Steps to reproduce:
I tried to convert a local folder from mbox to MailDir Some of
the of mails in the old mbox format were imported into
Thunderbird years ago from old unix mboxes and contained envelope
From lines like:
From someone@SOMECOLLEGE.AC.UK Wed Dec 15 21:09:00 GMT 1993
From user Tue Apr 26 12:32 +0200 1995
containing time zone information in the time stamp (note the
absence of seconds in the second line). All the formats which
have been perfectly acceptable for Thunderbird (and still are).
I then converted the mboxes to one-file-per-mail-format (maildir)
in Thunderbird.
To reproduce:
- create a new profile
- copy the attached file below in mbox format to the local folders directory
- convert "Local Folders" to maildir format
Actual results:
In the maildir version of folders the mbox-mails were not split into the same separate mails as before.
Mails were seemingly missing because they were (including all the header lines) part of the previous mail.
Expected results:
The conversion process should have split the mails like
nsParseMailMessageState::IsEnvelopeLine() in
comm/mailnews/local/src/nsParseMailbox.cpp.
nsParseMailMessageState::IsEnvelopeLine() identifies any line
starting with "From " as an envelope line - regardless of
anything following "From ". A stricter parser is seemingly
skipped by an #ifdef, because (see comment in
comm/mailnews/local/src/nsParseMailbox.cpp):
" DANGER!! The released version of 2.0b1 was (on some systems,
some Unix, some NT, possibly others) writing out envelope lines
like "From - 10/13/95 11:22:33" which STRICT_ENVELOPE will reject!"
In mboxToMaildir() in /mailnews/base/util/converterWorker.js a
regexp is defined to detect envelope lines:
let sepRE = /^((?:From \r?\n)|(?:From [\S]+ \S{3} \S{3} [ \d]\d \d\d:\d\d:\d\d \d{4}\r?\n))[\x21-\x7E]+:/gm;
It either accepts a line containing only "From " (and nothing
else) or the strict format:
"From sender-identifier Day Mon NN hh:mm:ss YYYY"
sender-identifier: any sequence of non-whitespace chars
Day: Day of week abbreviation (exactly 3 non-whitespace chars)
Mon: Month abbreviation (exactly 3 non-whitespace chars)
NN: Day of month as space filled 2 digits
hh:mm:ss: time of day including seconds
YYYY: 4 digit years
All white spaces in the above regexp separating tokens need to be
exactly of length one!
I attached a file in mbox format with anonymised content and
headers of 20 mails. Apart from that the headers are identical
to the mails in one of my mboxes. I provided new subject lines to
identify the mails and edited the from lines so that all my
variations and the variations mentioned in the comment to
nsParseMailMessageState::IsEnvelopeLine() are present.
When used in mbox format all mails are correctly split by their
envelope lines. After conversion to maildir format 8 of the 20
mails are seemingly missing because mails no. 2&3 appear in mail
no. 1, mails no. 8&9 in mail no 7, mail no 11 in mail no. 10 and so
on.
As a rough patch in mboxToMaildir() in /mailnews/base/util/converterWorker.js :
let sepRE =/^(?:From .*\r?\n)[\x21-\x7E]+:/gm;
would identify the same From-lines as
nsParseMailMessageState::IsEnvelopeLine()
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
I stumbled across this issue as part of fixing Bug 1515254. There's a patch over there to address it.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1515254#c18
Updated•5 years ago
|
Description
•