Closed Bug 90161 Opened 23 years ago Closed 20 years ago

URL recognition at the end of line in QP (quoted-printable) misses last character

Categories

(MailNews Core :: MIME, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hcaley, Assigned: anlan)

References

Details

(Whiteboard: See dup bug 242695 for good descr)

Attachments

(5 files, 1 obsolete file)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:0.9.2+) Gecko/20010709 BuildID: 2001070922 URL's included in mail messages are sometimes not completely converted to links. The following URL is the output of "View Source" on one of the messages in question. It ends in two digits; when viewed in Mozilla the entire URL will be part of a link except for the last digit: http://lana.neomorphic.com/cgi-bin/trouble.pl?type=3Dsearchdetail&format= =3Duser&serial=3D24 Reproducible: Always Steps to Reproduce: 1. See example above 2. 3. Actual Results: Last character is left out of the URL link Expected Results: All characters should have been part of the URL The program that generates these messages is something I wrote in Perl; I don't know why the CGI module is inserting the "3D"'s in the URLs. That's the only thing that looks weird about the URL.d
Is the message in format=flowed? The =3Ds should only be inserted if it's format=flowed, I believe...
I can confirm that the same behaviour is seen on MacosX, build 2001070515, and Linux x86, build 200107118, and Netscape 6.1 PR1
Reporter, can you provide the entire source, including headers, of the message you used to reproduce this?
Received: from uxpx01.affymetrix.com ([10.10.5.130]) by ntex01.Affymetrix.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id RPPDXS6Z; Thu, 23 Aug 2001 17:44:12 -0700 Received: from iserver.affymetrix.com (iserver.affymetrix.com [204.162.24.3]) by uxpx01.affymetrix.com (Pro-8.9.3/Pro-8.9.3/CL-INT-20010517-01) with ESMTP id RAA07421 for <Hugh_Caley@affymetrix.com>; Thu, 23 Aug 2001 17:46:20 -0700 (PDT) Received: from roma.neomorphic.com (hidden-user@firewall.neomorphic.com [205.217.46.68]) by iserver.affymetrix.com (Pro-8.9.3/Pro-8.9.3/CL-EXT-20010517-01) with ESMTP id RAA14052 for <Hugh_Caley@affymetrix.com>; Thu, 23 Aug 2001 17:43:58 -0700 (PDT) Received: from localhost.localdomain (lana.neomorphic.com [10.60.100.148]) by roma.neomorphic.com (8.9.0/8.9.0) with SMTP id RAA14966 for <Hugh_Caley@affymetrix.com>; Thu, 23 Aug 2001 17:46:18 -0700 (PDT) Message-Id: <200108240046.RAA14966@roma.neomorphic.com> Mime-version: 1.0 Content-type: text/plain; charset="iso-8859-1" Date: Thu, 23 Aug 2001 17:46 -0700 Subject: Trouble - Config novaroma for raid, sendmail, ssh, bastille<?>, RH 7.1 To: Hugh_Caley@affymetrix.com, @affymetrix.com From: Trouble <trouble@affymetrix.com> Content-transfer-encoding: quoted-printable The following trouble ticket has passed it's target date;=20 please update it or close it and inform the owner as to the status: Owner is project, Primary Assigned Support is Hugh_Caley, Secondary Support is , The issue is "Config novaroma for raid, sendmail, ssh, bastille<?>, RH = 7.1" Target date for completion is "2001-08-13" http://lana.neomorphic.com/cgi-bin/trouble.pl?type=3Dsearchdetail&form=3D= edit&serial=3D21
BTW, that last was the complete headers and text of a message that will not display properly when received in Mozilla or Netscape 6. The final URL in the message will be highlighted in blue, except for the last digit.
hcaley, you mean that the whole URL is linked *except* for the final "1" character? That's surprising; it appears there's a carriage return just prior to the "edit" portion; this would have been inserted by the mail sender and would result in the behavior seen here in Bugzilla, that only the first line would be linked.
That being said, perhaps it's worth considering whether an additional criteria can be added to the "link" parser for text/plain e-mails; that, if http:// is encountered with a blank line above, search down through text for a blank line below and construct an anchor (link) for that text, removing all carriage returns and line breaks first.
Correct, everything is "linked" except for the last character. The line break seems to be irrelevant.
Reporter, send me a copy of the e-mail so I can confirm.
I too am seeing this problem. It appear to be when the URL is the last item in the message. For example if the last line of the message was http://www.dtu.ox.ac.uk then the k would not display as part of the link. I'll make this message an example also: http://www.dtu.ox.ac.uk
Well, before someone else says it I will, that worked fine. It must be something different about the way the message is built. The one I'm seeing it on is from http://www.bananaloto.co.uk. If you enter the draw it sends out a message the following day telling you how you did. This is the one that goes wrong. If I forward the problem message the problem goes away. Maybe it's missing the trailing CR/LF or some such.
Confirming as Hugh forwarded me a copy of his e-mail message. In it, the URL, http://lana.neomorphic.com/cgi-bin/trouble.pl?type=searchdetail&form=edit&serial=48 is displayed as a linked anchor with the exception of the final "8" character, which is not.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I forgot to mention that the bug is confirmed under both Mac/2001083008 and Mac/ 2001080214 (0.9.3).
*** Bug 110434 has been marked as a duplicate of this bug. ***
I am not sure, bug 110434 is a dupe. In that case the URL was simple, but at the very end of a message. I this bug here the problem seems to be a more complicated structure of the URL (possible made worse by quoted-printable screwed up). pi
Here's one: news://news.mozilla.org:119/3CA615A6.C063DB82@webaccess.net the last letter in the signature is not linkified
*** Bug 127840 has been marked as a duplicate of this bug. ***
Bug 127840 contains a hexdump of such an email; the reason seams to be only one CR/LF at the end of the mail instead of two.
Is 133016 a dup?
*** Bug 140970 has been marked as a duplicate of this bug. ***
problem exists in freebsd build 2002090806 echoing comment #11 - this seems to be something to do with how the message is built: in particular the headers. i sent a version of an example email back to myself and the link showed up fine. the only differences between the original email, and the one i resent to myself, where in the headers. stripping out the smtp server timestamps, here are the differing headers: one that has the problem: < X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 < content-class: urn:content-classes:message < MIME-Version: 1.0 < Content-Type: text/plain; < charset="iso-8859-1" < Content-Transfer-Encoding: quoted-printable < Subject: < Date: Wed, 11 Sep 2002 12:29:42 -0400 < Message-ID: <89D097CC1D003643BB9F13714BA9723B027816C8@OCCLUST01EVS1.snip> < X-MS-Has-Attach: < X-MS-TNEF-Correlator: < Thread-Index: AcJZsFeTR0sb4cD/EdaiqQDQtwj42Q== < From: <snipped> < To: <snipped> the one that was fine: > Date: Wed, 11 Sep 2002 12:35:58 -0400 (EDT) > From: <snipped> > Message-Id: <200209111635.g8BGZwC66237@snip> > To: <snipped> > note that the one that worked also has an additional newline after the end of headers.
folks, isn't this an important enough problem to act on? it was created 7/10 and is still NEW. outlook (which unfortunately is used by lots of people) has a "send web link" option that sends the URL of the web page in the message and no text after. every time i receive such a message i am unable to click on the link to view the page. i can live with it, but i would guess this would be a major annoyance for end-users?
Guys, it's been a long time on this one. Any progress at all? I'm using build 200392250 on MacOSX and it STILL has this problem.
*** Bug 185377 has been marked as a duplicate of this bug. ***
Narrowing summary, assuming this was really a dup.
Summary: Some URL's in mail messages are not completely converted to link → URL recognition at the end of line in QP (quoted-printable) misses last character
Hi I have the same issue Here is anther test case: Source of message: From - Thu Apr 03 09:42:52 2003 X-UIDL: $PV"!f3l"!TnP!!p77!! X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000 Envelope-to: olivier.vit@duke-interactive.com Received: from [10.42.0.3] (helo=assurancetourix) by mail.duke-interactive.com with esmtp (Exim 3.35 #1 (Debian)) id 190t6X-0001TU-00 for <olivier.vit@duke-interactive.com>; Thu, 03 Apr 2003 03:01:21 +0200 Received: from assurancetourix ([127.0.0.1]) by assurancetourix with esmtp (Exim 3.35 #1 (Debian)) id 190t5Y-0006JL-00 for <olivier.vit@duke-interactive.com>; Thu, 03 Apr 2003 03:00:20 +0200 Message-ID: <7832149.1049331620422.JavaMail.root@assurancetourix> From: intranet@duke-interactive.com To: olivier.vit@duke-interactive.com Subject: feuille de temps Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_31_162178.1049331620403" Date: Thu, 03 Apr 2003 03:00:20 +0200 X-MailScanner: Found to be clean, Found to be clean X-UIDL: $PV"!f3l"!TnP!!p77!! Status: U ------=_Part_31_162178.1049331620403 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Bonjour,=20 =09=09Tes feuilles de temps n'ont pas =E9t=E9 remplies. =09=09Merci de les compl=E9ter. =09=09Pour cela, il te suffit de cliquer sur le lien :=20 =09=09 http://intranet.duke ------=_Part_31_162178.1049331620403-- using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3) Gecko/20030312 I'm adding a screenshot
reproduced in 1.4b windows 2003050714
Can't someone take a crack at this, please? It's been years! Still showing up in 20030826 nightly build for MacOSX, and I'm sure in others.
This bug also affects Firebird for Linux, at least the trunk builds.
I'm sorry, I meant thunderbird
Coming up on 3 years for this problem ...
still an issue in 1.7b 2004042409 (post rc1)
Is anyone from Mozilla still associated with this bug? I note that the QA Contact seems to be a non- valid address.
*** Bug 242695 has been marked as a duplicate of this bug. ***
Dup bug 242695 contains good descr.
Whiteboard: See dup bug 242695 for good descr
could those of you who are cc'ed on this bug please vote for it, if you think its important? the really annoying part of the bug for me is that the behaviour chops the "l" off of ".html" making the link look like "file.htm". the latter is the 3 letter suffix format used by microsoft!!! seem like we are [inadvertently] promoting microsoft conventions.
> could those of you who are cc'ed on this bug please vote for it Asking others to vote will only make me treat the votes as faked (they are out of balance in comparison to other bugs). I'm not dumb, I can see the cc list. FYI, I consider this to be a bug and I want to have it fixed (I don't like bugs in my code), but I haven't had the motivation to fix any bugs in this code (libmime and mozTXT*) at all in the last time. Sorry.
Assignee: sspitzer → ben.bucksch
Attachment #147760 - Attachment description: sample message where the url is badly truncated at the end of the message → sample message where the url is badly truncated at the end of the message correponds to the screenshot also attached to this bug report
ben, not calling you dumb. i was not aware that canvassing for votes is disallowed (i have seen others do it, hence my attempt). my apologies if that is the case. i do not know exactly how the mozilla organization (for lack of a better word) treats votes. i am assuming that the votes agains a bug will be given some consideration w.r.t bug priority. many of the users on the Cc: list may not know about voting, and i was hoping to point them to it. i am looking at the source too, to see if i can perhaps find the problem and fix it.
I think bug #242817 ("Last character of QP message is displayed in a new line") has something to do with this bug. Maybe the same base problem in the code results in both bugs. Bug #242817 should be checked after fixing this one.
Bug 242817 (or anything like it) could very well the cause for this bug, in which case this bug is a dup of that one.
*** Bug 242933 has been marked as a duplicate of this bug. ***
Attached file Minimal message testcase (deleted) —
Ok, this is as small as it gets. It is a (very cropped) message from Pine. I poked around a bit in the code, and confirmed that the txt2html stuff was ok. It is simply the QP decoder that splits the last line in two parts instead of one. The url finder is then fed the two parts after each other, and of course only the url in the first part gets proper mark up. There are even a couple of comments in the code that each line must be passed on as one whole chunk to avoid stuff like this. :-) The last line splitting probably happens always, it is just that one doesn't notice it unless it is an url. It is likely that bug 242817 is caused by this somehow, but that symptom is very different in that it gets an additional linebreak inserted. This one (with the minimal testcase) should be easier to debug - hopefully it resolves both.
If one cares, the testcase can get one line smaller by removing the char above the url (just used it to provoke QP in the first place). :-)
After getting over the not so low treshold of mime code understanding, I think get why this happens. The QP decoder parses what it can, and then leaves the rest (one or two chars) in a buffer waiting for more data. When it is destroyed, this buffer is appended to what was finished before. The problem here is that the URL recognition is done before the decoder destruction. Decoder shutdown sequence currently: 1. MimeInlineTextPlain_parse_eof 2. MimeInlineText_parse_eof 3. MimeInlineText_rotate_convert_and_parse_line 4. MimeInlineTextPlain_parse_line 5. MimeLeaf_parse_eof 6. MimeDecoderDestroy In (1), we have an explicit comment saying we need to go up to make sure we have emptied all buffers. In (2), we have an explicit comment saying that we avoid just that (http://lxr.mozilla.org/mozilla/source/mailnews/mime/src/mimetext.cpp#234). Thus we end up in (4) where we find the faulty URL. Then (5) triggers (6), where the last chars are added to the buffer... If we change (2) to also go up, we get this instead: a. MimeInlineTextPlain_parse_eof b. MimeInlineText_parse_eof c. MimeLeaf_parse_eof d. MimeDecoderDestroy e. mime_LineBuffer f. MimeInlineText_rotate_convert_and_parse_line g. MimeInlineTextPlain_parse_line In (g) we handle the whole line at once! If it weren't for the comment mentioned above I would be satisified. One problem is that the bug # cited either is wrong or is in Netscape's bugtool. It would be nice to have more testcases to see if anything (and what) breaks...
Ok, here is a simple patch for testing - call the parent before continuing. This needs to be tested with more examples, primarly rot13 messages. Either this patch is the wrong approach, or there should be a new patch that also updates the comment in the code... Comments from those who know the code?
Thanks for the investigation (and patch)! Great work :) I don't think there's *any* active Mozilla contributor who really understands libmime in these depths. This is *very* old code originally from jwz. The bug reference in the comment probably still refers to the old Netscape 4.x bug database. I don't understand the possible side effects your patch could have. ducarroz is the owner (I think), so could you, J-F, review? Seth superreview? anlan, did you test the View|Body|Simple HTML and As Plaintext modes? They inherit indirectly from mimetext, and I had to fiddle with parse_line and _eof, so this is a possible area of regression.
Assignee: ben.bucksch → anlan
Component: Mail Window Front End → MIME
Attachment #150361 - Flags: superreview?(sspitzer)
Attachment #150361 - Flags: review?(ducarroz)
This is a simple but scary patch as it's hard to figure out the potential side effect. Have you tested it against bug 124941 to make sure it does not regress it? Also, dos this patch fix bug 242817?
Yes, this fixes bug 242817 as well. The common thing is that messages are QP-encoded. I suspect that the difference that makes messages in that bug recieve an extra linebreak is that they are "format=flowed". Simple HTML / Plain text seems fine so far. I'll investigate bug 124941 and a few other examples tomorrow.
Status: NEW → ASSIGNED
*** Bug 242817 has been marked as a duplicate of this bug. ***
Attachment #147760 - Attachment filename: testcaseQT.txt → testcaseQT.eml
Attachment #147760 - Attachment mime type: text/plain → message/rfc822
Thanks for pointing me to bug 124941. The patch does not quite regress it, but I think something happens... Scary patch with subtle changes, wasn't it? :-) Oh well, I'll dig deeper. Any other testcases while I'm at it?
Attachment #150361 - Attachment is obsolete: true
Attachment #150361 - Flags: superreview?(sspitzer)
Attachment #150361 - Flags: review?(ducarroz)
Attached patch Better patch (deleted) — Splinter Review
Ok, here is a second try. This patch looks a bit larger, but that is because it touches three files, refactors some code and updates the relevant comments. There is really only one new line of code (in MimeText_parse_eof()). The problem is that we both need to close down the QPdecoder (as in the first patch) _and_ do charset detection/conversion (which might fail with the first patch). This is solved by refactoring and exposing the decoder destruction in MimeLeaf so we can access it from MimeText without changing the codepath from bug 124941. Is this an acceptable modification?
Attachment #150522 - Flags: review?(ducarroz)
Comment on attachment 150522 [details] [diff] [review] Better patch Looks good. R=ducarroz
Attachment #150522 - Flags: superreview?(bienvenu)
Attachment #150522 - Flags: review?(ducarroz)
Attachment #150522 - Flags: review+
Comment on attachment 150522 [details] [diff] [review] Better patch there's an indentation problem here that should be fixed before checkin + if (leaf->decoder_data) + { + int status = MimeDecoderDestroy(leaf->decoder_data, PR_FALSE); + leaf->decoder_data = 0; + return status; + }
Attachment #150522 - Flags: superreview?(bienvenu) → superreview+
The indentation is a bit odd (and inconsistent) in those files, but I hope this is an improvment. Could someone with CVS access take care of getting this in?
I haven't tested the patch - I just want to add an observation about the original problem: I tried hand-sending (using telnet to port 25) a mail containing only an email address. The problem described in this bug occured, when I was using Content-Type: text/plain; charset="ISO-8859-1" and Content-Transfer-Encoding: quoted-printable. However, it did not occur, when either of these headers were left out.
It is a confirmed QP problem for messages ending without a newline. For text/plain, the only problem is that URL recognition in the last line will break. For format-flowed, the problem is worse - one or more char at the last line will end up after an extra linebreak, not pretty at all. I have used the patch the last couple of weeks without any noticable regressions. The only thing it needs is someone to get it into CVS for further testing. Would be nice to get in Thunderbird 1.0 as well...
Checked in on trunk by timeless.
Fix also on aviary branch as of today. Marking as fixed.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
*** Bug 256967 has been marked as a duplicate of this bug. ***
*** Bug 252292 has been marked as a duplicate of this bug. ***
*** Bug 225552 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
*** Bug 241811 has been marked as a duplicate of this bug. ***
*** Bug 140831 has been marked as a duplicate of this bug. ***
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: