Closed Bug 1611897 Opened 5 years ago Closed 5 years ago

MBox to Maildir conversion fails on some mails because it uses a stricter notion of envelope From-line than Thunderbird's mbox parser

Categories

(MailNews Core :: Backend, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 76.0

People

(Reporter: cluster15, Assigned: benc)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fixed by bug 1515254])

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

I tried to convert a local folder from mbox to MailDir Some of
the of mails in the old mbox format were imported into
Thunderbird years ago from old unix mboxes and contained envelope
From lines like:

From someone@SOMECOLLEGE.AC.UK Wed Dec 15 21:09:00 GMT 1993

From user Tue Apr 26 12:32 +0200 1995

containing time zone information in the time stamp (note the
absence of seconds in the second line). All the formats which
have been perfectly acceptable for Thunderbird (and still are).

I then converted the mboxes to one-file-per-mail-format (maildir)
in Thunderbird.

To reproduce:

  1. create a new profile
  2. copy the attached file below in mbox format to the local folders directory
  3. convert "Local Folders" to maildir format

Actual results:

In the maildir version of folders the mbox-mails were not split into the same separate mails as before.
Mails were seemingly missing because they were (including all the header lines) part of the previous mail.

Expected results:

The conversion process should have split the mails like
nsParseMailMessageState::IsEnvelopeLine() in
comm/mailnews/local/src/nsParseMailbox.cpp.

nsParseMailMessageState::IsEnvelopeLine() identifies any line
starting with "From " as an envelope line - regardless of
anything following "From ". A stricter parser is seemingly
skipped by an #ifdef, because (see comment in
comm/mailnews/local/src/nsParseMailbox.cpp):

" DANGER!! The released version of 2.0b1 was (on some systems,
some Unix, some NT, possibly others) writing out envelope lines
like "From - 10/13/95 11:22:33" which STRICT_ENVELOPE will reject!"

In mboxToMaildir() in /mailnews/base/util/converterWorker.js a
regexp is defined to detect envelope lines:

let sepRE = /^((?:From \r?\n)|(?:From [\S]+ \S{3} \S{3} [ \d]\d \d\d:\d\d:\d\d \d{4}\r?\n))[\x21-\x7E]+:/gm;

It either accepts a line containing only "From " (and nothing
else) or the strict format:

"From sender-identifier Day Mon NN hh:mm:ss YYYY"

sender-identifier: any sequence of non-whitespace chars
Day: Day of week abbreviation (exactly 3 non-whitespace chars)
Mon: Month abbreviation (exactly 3 non-whitespace chars)
NN: Day of month as space filled 2 digits
hh:mm:ss: time of day including seconds
YYYY: 4 digit years

All white spaces in the above regexp separating tokens need to be
exactly of length one!

I attached a file in mbox format with anonymised content and
headers of 20 mails. Apart from that the headers are identical
to the mails in one of my mboxes. I provided new subject lines to
identify the mails and edited the from lines so that all my
variations and the variations mentioned in the comment to
nsParseMailMessageState::IsEnvelopeLine() are present.

When used in mbox format all mails are correctly split by their
envelope lines. After conversion to maildir format 8 of the 20
mails are seemingly missing because mails no. 2&3 appear in mail
no. 1, mails no. 8&9 in mail no 7, mail no 11 in mail no. 10 and so
on.

As a rough patch in mboxToMaildir() in /mailnews/base/util/converterWorker.js :

let sepRE =/^(?:From .*\r?\n)[\x21-\x7E]+:/gm;

would identify the same From-lines as
nsParseMailMessageState::IsEnvelopeLine()

Component: Untriaged → Backend
Product: Thunderbird → MailNews Core

I stumbled across this issue as part of fixing Bug 1515254. There's a patch over there to address it.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1515254#c18

Depends on: 1515254
Assignee: nobody → benc
Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by bug 1515254]
Target Milestone: --- → Thunderbird 76.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: