Closed Bug 133016 Opened 23 years ago Closed 17 years ago

[mozTXTToHTMLConv] Freetext url recognition stops at ")" and "'" (apostrophe)

Categories

(MailNews Core :: Backend, defect, P2)

defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.9alpha6

People

(Reporter: jthg, Assigned: BenB)

References

()

Details

(Keywords: verified1.8.1.8)

Attachments

(8 files, 9 obsolete files)

(deleted), message/rfc822
Details
(deleted), application/octet-stream
Details
(deleted), image/png
Details
(deleted), patch
mscott
: review+
Biesinger
: superreview+
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), image/jpeg
Details
(deleted), image/png
Details
(deleted), patch
Details | Diff | Splinter Review
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9+) Gecko/20020318 BuildID: 2002031803 When reading a e-mail message, Mozilla turns all web addresses to clickable links. However, some links are not correctly highlighted; they are either segmented leaving the hyperlinks useless or the hyper link is not extended enough. Two examples: http://www.finaid.vt.edu/flash/Mar02flash.pdf http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s).JPG Reproducible: Always Steps to Reproduce: 1. recieve an e-mail with an above mentioned link in the body Actual Results: With http://www.finaid.vt.edu/flash/Mar02flash.pdf, the link is segmented so that http://www.finaid becomes a hyper link, .vt. just becomes text, and edu/flash/Mar02flash.pdf becomes a hyper link. With http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s).JPG, http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s becomes a hyper link and ).JPG becomes just text. Expected Results: the whole address should become a hyperlink
The following information may or may not be related. It may also be a problem at the senders end. I received the following mail message from the bugzilla system with a URL that was not fully recognised as a URL. Only the first line of the field "URL:" was recognised as a clickable URL - the rest was just text. http://bugzilla.mozilla.org/show_bug.cgi?id=140851 Summary: LDAP query in browser loads entries into address book Product: Browser Version: other Platform: PC URL: http://bugzilla.mozilla.org/buglist.cgi?bug_status=NEW&b ug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype 1=substring&emailassigned_to1=1&email2=&emailtype2=subst ring&emailreporter2=1&bugidtype=include&bug_id=&changedi n=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&short _desc=ldap&short_desc_type=allwordssubstr&long_desc=&lon g_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_ty pe=allwordssubstr&status_whiteboard=&status_whiteboard_t ype=allwordssubstr&keywords=&keywords_type=anywords&fiel d0-0-0=noop&type0-0-0=noop&value0-0- 0=&cmdtype=doit&order=Reuse+same+sort+as+last+time OS/Version: Windows 2000 Status: UNCONFIRMED Severity: enhancement Priority: -- Component: URL Bar AssignedTo: hewitt@netscape.com ReportedBy: matthew.jurgens@solnet.com.au QAContact: claudius@netscape.com >From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.0+) Gecko/20020428 BuildID: 2002042808
Status: UNCONFIRMED → NEW
Ever confirmed: true
Please attach the complete message in question to this bug for further inspection. Possibly a dupe of bug 90161. pi
Attached file message with bad link (deleted) —
The first link seems to work now! However, when parenthesis are used, the link terminates before the ending parenthesis ( ')' ).
*** Bug 145703 has been marked as a duplicate of this bug. ***
*** Bug 145473 has been marked as a duplicate of this bug. ***
Besides that, arbitrary text can be hilited when it should not be =( E.g. here PDF::parse: cannot find pdf parser /usr/local/bin/acroread PDF::parse: is shown as a link (since Mozilla 1.3a, 1.2 didn't have this effect). I think protocol should be verified with getservbyname() or similiar call.
*** Bug 200216 has been marked as a duplicate of this bug. ***
To stop this bug from morphing I am changing the summary to reflect URL 2 in comment O, which is what the dupes are about too. This bug affects normal ")", square "]", and sqiugly "}" brackets. Comment 7 is an other bug, which I belive is fixed.
Summary: mail message does not correctly identify URI / URL / web address / hyper link → mail message does not correctly identify URI / URL / web address / hyper link containing right bracket ")", "]", or "}"
Adding URL from dupped bug that demonstrates problem. Also, placing a "<" and a ">" around the URL fixes the problem, as long as the URL is not wrapped to multiple lines.
There was some work done with bug 183111, which may affect this bug as Torben said. RFC 2396 at one point has this to say about delimiters: 2.4.3. Excluded US-ASCII Characters Although they are disallowed within the URI syntax, we include here a description of those US-ASCII characters that have been excluded and the reasons for their exclusion. The control characters in the US-ASCII coded character set are not used within a URI, both because they are non-printable and because they are likely to be misinterpreted by some control mechanisms. control = <US-ASCII coded characters 00-1F and 7F hexadecimal> The space character is excluded because significant spaces may disappear and insignificant spaces may be introduced when URI are transcribed or typeset or subjected to the treatment of word- processing programs. Whitespace is also used to delimit URI in many contexts. space = <US-ASCII coded character 20 hexadecimal> The angle-bracket "<" and ">" and double-quote (") characters are excluded because they are often used as the delimiters around URI in text documents and protocol fields. The character "#" is excluded because it is used to delimit a URI from a fragment identifier in URI references (Section 4). The percent character "%" is excluded because it is used for the encoding of escaped characters. delims = "<" | ">" | "#" | "%" | <"> Other characters are excluded because gateways and other transport agents are known to sometimes modify such characters, or they are used as delimiters. unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`" Data corresponding to excluded characters must be escaped in order to be properly represented within a URI. The code that marks a url in a text does also recognize () and {} as delimiters, maybe it can be made a little bit smarter. Also there is the general problem with linebreaks, RFC 2396 is very clear about that: E. Recommendations for Delimiting URI in Context URI are often transmitted through formats that do not provide a clear context for their interpretation. For example, there are many occasions when URI are included in plain text; examples include text sent in electronic mail, USENET news messages, and, most importantly, printed on paper. In such cases, it is important to be able to delimit the URI from the rest of the text, and in particular from punctuation marks that might be mistaken for part of the URI. In practice, URI are delimited in a variety of ways, but usually within double-quotes "http://test.com/", angle brackets <http://test.com/>, or just using whitespace http://test.com/ These wrappers do not form part of the URI. In the case where a fragment identifier is associated with a URI reference, the fragment would be placed within the brackets as well (separated from the URI with a "#" character). In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may need to be added to break long URI across lines. The whitespace should be ignored when extracting the URI. No whitespace should be introduced after a hyphen ("-") character. Because some typesetters and printers may (erroneously) introduce a hyphen at the end of line when breaking a line, the interpreter of a URI containing a line break immediately after a hyphen should ignore all unescaped whitespace around the line break, and should be aware that the hyphen may or may not actually be part of the URI. Using <> angle brackets around each URI is especially recommended as a delimiting style for URI that contain whitespace. The prefix "URL:" (with or without a trailing space) was recommended as a way to used to help distinguish a URL from other bracketed designators, although this is not common in practice. For robustness, software that accepts user-typed URI should attempt to recognize and strip both delimiters and embedded whitespace. For example, the text: Yes, Jim, I found it under "http://www.w3.org/Addressing/", but you can probably pick it up from <ftp://ds.internic. net/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/ ietf/uri/historical.html#WARNING>. contains the URI references http://www.w3.org/Addressing/ ftp://ds.internic.net/rfc/ http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING Mozilla does not conform to the linebreak handling described here.
> Mozilla does not conform to the linebreak handling described here (comment 11) For the record, that is bug 5351.
OK, so RFC3296 has certain rules governing what can be in a URI. However, may I suggest that if a user receives an eMail that contains some "unwise" characters in the URI, that Mozilla should be able to link to the complete URI. Of course, sending an eMail with these "unwise" characters is a totally different matter.
As I said, our detection code probably needs to be a little bit smarter, for example: If the url does not start with a bracket then counting brackets and stopping when the count reaches zero makes no sense as it seems to happen with the second example, but that's just a guess.
Thanks for ccing me Andreas. Most of the comments on this bug are actually wrong or offtopic, so I am focussing on the problem mentioned in comment 9. Shortening summary. For the spec, see <http://www.bucksch.com/1/projects/mozilla/16507/>. Andreas, what are you suggesting? My opinion is basically: if you're sending freaky URLs like those with (), then wrap them in <>. If the sender didn't, fix the sender (if possible) or copy the URL manually (happens rarely enough IMHO). But, what would you do? IIRC, I didn't count spaces, I just stop at any bracket when in freetext mode.
Summary: mail message does not correctly identify URI / URL / web address / hyper link containing right bracket ")", "]", or "}" → Freetext url recognition stops at right bracket ")", "]", or "}"
narrowing summary even further, because } and ] are illegal in URLs and should IMHO not be linked. It's a recognizer, after all, it's not expected to get it right in all cases, esp. not the abuse ones.
Summary: Freetext url recognition stops at right bracket ")", "]", or "}" → Freetext url recognition stops at ")"
Code is in mozTXTToHTMLConv::FindURLEnd <http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#299> (sorry for the spam).
Interesting... I have a couple of observations/info. I am the one who posted the newsgroup message currently listed in the URL of this bug. I obtained this URL as follows. Using Mozilla, goto microsoft.com navigate to the search and select search knowledgebase by id. I typed in the id# 815411, the corresponding page is loaded and the URL in Mozilla location bar is the one in the message. I then simply selected and copied it from my location bar, and pasted it to my news post, to get: http://support.microsoft.com/default.aspx?scid=kb;[LN];815411 - I find it interesting that when posting this to bugzilla it handles it "correctly" (i.e. it highlights the entire url). I pasted the above URI string the same way. - As one would expect, since Microsoft is the originator of above URL Outlook also highlights the whole URL in the posting.
Perhapse there needs to be a "900lbs-gorilla" tag for bugs which highlight bugs which are technically compliant with public specs but M$ ignores thus promulgating bad behavior. I leave it up to others to make the political/pragmatic decisions if the prevalence needs to be accommodated or not.
Take a look at RFC 2396 (2.3): mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" There is described that ´)' and ´'' are also allowed within an URI. But both aren't working within MailNews. Like the comment [1] tells me, it was forgotten to update and RFC2396E sounds like a draft for me. Within the Browser you see this url [2] completly because Bugzilla creates a hyperlink and so its shown correctly. Instead MailNews renders the URI for itself. [1] http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#308 [2] http://support.microsoft.com/default.aspx?scid=kb;[LN];815411
Bug 119963 is about the apostrophe case. See also bug 218287.
Summary: Freetext url recognition stops at ")" → [mozTXTToHTMLConv] Freetext url recognition stops at ")"
*** Bug 119963 has been marked as a duplicate of this bug. ***
I marked bug 119963, which is about apostrophes, a dup of this, because they are so similar. I am tending towards WONTFIX. This is a recognizer, designed to get most valid cases right. Cases outside the spec (e.g. "[" as in comment 18) generally are not supported. Comment 18 asks to allow "[" in URLs, while bug 218287 asks *not* to support it and gives a good example. This shows that the recognizer is just guesswork and a judgement call. "(" and apostrophe are allowed in URL per spec. But we're not in a situation where we know for sure that it's supposed a URL, we need to *find* it in a bunch of human language text (like http://example.com). This is why I excluded brackets from the URL and probably apostrophe as well. For consistency and for bug 218287, I should probably actually disallow both opening and closing brackets everywhere in the URL.
Summary: [mozTXTToHTMLConv] Freetext url recognition stops at ")" → [mozTXTToHTMLConv] Freetext url recognition stops at ")" and "'" (apostrophe)
A quick point of clarification (since I added comment #18): I am not advocating support for parsing characters in URI's such as the "[" if it is outside the spec. I was simply adding contextual information on where I came across it in the field and how from an end user perspective there "seemed to be" differing behaviours between Mozilla, IE, an Bugzilla. If end users dont come across these non-standard cases very often in practice, its probably not a big deal, and Mozilla should do the right thing.
I know this is polluting the menus, but perhaps on selecting the full URL by hand you could add 'open selection as link' on the context menu? Would be less painful than the 'cut and paste and open a browser window and and and ...' solution that's required otherwise.
we should not do that. but i can today select text and drag it to a tab or window if i want it treated as a url. if i'm using x11 i can probably middle click on my selection to open it somewhere.
*** Bug 230972 has been marked as a duplicate of this bug. ***
*** Bug 248319 has been marked as a duplicate of this bug. ***
*** Bug 260714 has been marked as a duplicate of this bug. ***
I think that & (ampersant) and | (pipe) are included in standard but it still doesn't work with them.
Product: Browser → Seamonkey
*** Bug 271374 has been marked as a duplicate of this bug. ***
Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live with their bretheren.
Component: MailNews: Main Mail Window → MailNews: Backend
Product: Mozilla Application Suite → Core
*** Bug 275183 has been marked as a duplicate of this bug. ***
*** Bug 236915 has been marked as a duplicate of this bug. ***
*** Bug 303590 has been marked as a duplicate of this bug. ***
Domino produces links like: https://lsd.memach.com/ConditionReports.nsf/(lookup)/CC8CB6274C8EEBE385257062005E54A6?OpenDocument and the user has to copy the link and paste it into the browser in order to view the link. It would really be nice if the behavior of thunderbird would use the entire link vice stopping at the letter before the ")". I cant change the way domino produces the links. Thanks for your efforts
(In reply to comment #32) > Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live > with their bretheren. What does this mean? That the bug is going to die for lack of attention or it is going to be looked at and action taken. I was planning on deploying a bunch of thunderbird apps but since we use domino for one of our main application and domino insists on using ( and ) in the links I am stuck unless this is resolved. I will have to find an alternative email program... How can I find out what is being done. Do I need to look for an alternative? Thanks for all your efforts... Lyman
(In reply to comment #37) > (In reply to comment #32) > > Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live > > with their bretheren. > > What does this mean? That the bug is going to die for lack of attention or it > is going to be looked at and action taken. I was planning on deploying a bunch > of thunderbird apps but since we use domino for one of our main application > and domino insists on using ( and ) in the links I am stuck unless this is > resolved. Comment 23, written by the person who implemented URL recognition in the first place, gives the arguments *against* allowing parentheses in a recognized URL. The main issue here is that someone might write a message such as --- ... and you can visit my homepage (http://homepage.somewhere.tld) to see. --- In this case, the closing parenthesis should *not* be part of the URL. Making the code smart enough to recognize parentheses in context is harder than it sounds -- particularly since there could be a legal URL which contains *only* a closing or opening parenthesis, e.g. http://paranoia.tld/kso03)cldke.html You say you can't change how Domino works, but unless you can write some code for Mozilla (or convince/pay someone to do so for you), you can't change how TB works either. Why not file a bug with them and see if they can change the URL generation to either enclose the URL within angle brackets, or escape the parentheses? That change on their part would be much easier to implement than the change required on Mozilla's part.
Why isn't it simple enough to include a paren if there's no whitespace after it?
Both (http:example.com) and http://paranoia.tld/kso03)cldke.html are correctly recognized here. I guess that Bugzilla's recognition is better than Thunderbird's :p
*** Bug 301363 has been marked as a duplicate of this bug. ***
In response to Lyman Byrd, I created a simple patch which does what I think is the only thing I can do here: URL recognition won't stop at "(", ")" or "'" in the middle of the URL, but will exclude ")" and "'", if they appear at the end. The patch is completely untested (it does compile, though). I am still not sure, if this bug is a good idea in the first place or should be WONTFIXed. Please read my comments above before applying the patch. And please do contact IBM/Lotus to fix their broken software - it's highly unwise to use things like brackets and apostrophs in URLs, exactly because of cases like these.
Assignee: sspitzer → ben.bucksch
Status: NEW → ASSIGNED
Ops, last patch had an obvious error. (Also: Remove "broken" from last paragraph of last comment. Rest stays.)
Attachment #194943 - Attachment is obsolete: true
*** Bug 313252 has been marked as a duplicate of this bug. ***
I assume this is the catch-all bug for automatic URL detection problems. This URL is also segmented at a strange place: http://www.modpython.org/FAQ/faqw.py?query=expat&querytype=simple&casefold=yes&req=search Specifically, right before the "?". Now, obviously "?" is a pretty common character to be in a URL, so this really shouldn't happen. It is hard to imagine a regular sentence which is a question which ends with a URL... e.g.: Have you looked at: http://www.example.com/? In fact, if you wanted to catch that sort of thing, the logic should be fairly straight forward -- just see if there is any whitespace after the ?. I think the same goes for the parenthesis examples. I'm using Thunderbird 1.5beta1.
> I assume this is the catch-all bug for automatic URL detection problems It is most definitely not and has a specific summary. > Specifically, right before the "?" WFM, verbatim in the bugmail from your comment.
*** Bug 316428 has been marked as a duplicate of this bug. ***
*** Bug 326189 has been marked as a duplicate of this bug. ***
If ) brackets are not to be recognized as part of the URL, then why does this link exist? http://kb.mozillazine.org/Reducing_memory_usage_(Firefox) It cannot be copy/pasted or typed reliably with either Thunderbird, or reproduced with Firefox, because the final ) is rendered as text. If ) closing parenthesis are not to be used in a link, then why has mozillazine.org used them? The problem in Thunderbird 1.5 is even worse none of the following show as links, all failing at the ) character in TB 1.5, yet they render in Firefox! http://test(no.org/test(situation( http://test(no.org/test(situation) http://test(no.org/test)situation) http://test)no.org/test(situation) As read here in Firefox, items 2 thru 4 render incompletely on the LAST ) but in Thunderbird 1.5, the link breaks at the FIRST ) in each url. Thunderbird 1.5 and Firefox render the links differently
It is clear that there are cases when a round bracket might be a legitimate part of a URL. The fact of the matter is that USUALLY it won't be, so the auto-sensing URL checker should go for the most regular use case. Basically, it should get it right *most* of the time. This, in my experience, will mean ignoring closing brackets (of any sort), or punctuation *other than* a slash as the last character in the string.
(In reply to comment #50) That is understood as the rational behind not rendering the LAST ) as part of a link. But even mozillazine uses such as links. And, in Thunderbird 1.5 even a ) within the body of a link breaks it. while in Firefox ONLY a trailing ) breaks it. According to RFC http://www.faqs.org/rfcs/rfc1738.html From section 2.2 Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL. typing http://test)this.is.a.test here in Firefox, and the link is NOT broken try sending that in Thunderbird 1.5 and it comes out as http://test )this.is.a.test (space added after test for display only) The program is ignoring a valid character in a URL, simply because some users MIGHT use the character as a delimiter. Would it not be better to include the ) in links, and correct the improper use of such as a delimiter when it arises? And once again, the problem is more profound in Thunderbird 1.5 which renders ANY ) as an invalid character (in a URL)
Just a doubt: Am I wrong or with the rfc 3986 the chars ( ) and ' are considered as Reserved Characters (par. 2.2) and so must be escaped?
You are right that they are now "reserved" (they weren't in RFC2396). This means that they *must* be escaped when not used as delimiters, but *must not* be escaped when used in their origial purpose as delimiter. I'm not sure if the Lotus Notes use falls into the category of a proper use as delimiter.
Re last paragraph in comment 51: No, this is a *recognizer* mainly targetted at text that *humans* write. This is something totally different from an HTTP header parser. "Go to Google (http://www.google.com)" is far more often in emails than the rare URLs which make use of () inside the URL. See my other comments above.
*** Bug 348012 has been marked as a duplicate of this bug. ***
OS: Windows NT → All
Hardware: PC → All
*** Bug 349531 has been marked as a duplicate of this bug. ***
The workaround to send a good clickable link is to replace ) with %29 Compare: http://en.wikipedia.org/wiki/O_(Cyrillic) http://en.wikipedia.org/wiki/O_(Cyrillic%29 http://en.wikipedia.org/wiki/O_%28Cyrillic%29 Does anyone know the % codes for ' and " ?
Another workaround is to use <> brackets observe http://en.wikipedia.org/wiki/O_(Cyrillic) <http://en.wikipedia.org/wiki/O_(Cyrillic)> Firefox doesnt have this problem, what code do they use? Workarounds are good but they detract from the fact that in this case Thunderbirds actions are non-standard. a ) is a legitimate character in urls, and one shouldnt have to use a 'workaround' to get it to work in Thunderbird
*** Bug 353013 has been marked as a duplicate of this bug. ***
*** Bug 356291 has been marked as a duplicate of this bug. ***
It's not clear what to do there. traditionally (even netscape 4), we assumed that if you write "my company (http://www.foobar.com) does ...", you would not want the () included in the URL. wikipedia unfortunately makes very liberal use of that, which I think is very unwise. But it poses the question: which case do we favor? The only compromise I see is to look whether there's a ( after the URL start, and if so, include the ) as well. I like that. That doesn't solve the apostrophe proble, though, but I guess that's less severe, we could just include it? It (only) breaks people using ' as quote.
Comment on attachment 194945 [details] [diff] [review] Possible Patch, v2 (not sure, if good idea or WONTFIX) Example: http://en.wikipedia.org/wiki/Bulb_(disambiguation) (very common in wikipedia) http://en.wikipedia.org/wiki/Lemarchand's_box
Attachment #194945 - Attachment is obsolete: true
OK, I want to get this fixed.
Severity: minor → normal
Priority: -- → P2
Its not only common in Wikipedia, Mozillazine uses it quite often as well i.e. http://kb.mozillazine.org/Reducing_memory_usage_(Firefox) The essence imho is; Thunderbird currently DOESNT follow web standards in that () are approved characters in urls. Instead, it is 'fudging' the standards to meet developers expectations. Funny, when Microsoft does this, its propeity code and wrong, but here is Thunderbird doing the same exact thing. Currently you have to tell people they CANNOT quote a legimate website http://kb.mozillazine.org/Reducing_memory_usage_(Firefox) because Thunderbird wont render it properly - even tho there is NOTHING wrong with the web site at all! I would MUCH rather have Thunderbird properly render the () as authorized url characters which they in fact are, and then tell someone why HIS rendition of a url (http://mozilla.com/html) doesnt work (the extra )). Simply tell the person NOT to use () to quote.
(In reply to comment #65) When this bug (and so many related to it) was logged I would have agreed with you wholeheartedly, since RFC 2396 [Aug 1998] was in effect and "(" and ")" characters were permissible as data characters in URI's. Four years later, though, Thunderbird's behaviour is now "accidentally compliant" with 2396's successor, RFC 3986 [Jan 2005], which moves "(" and ")" into sub-delims, meaning that they should be pct-encoded as %28 and %29 when appearing as data characters in URI's. Ref: http://www.ietf.org/rfc/rfc3986.txt It seems that we should now be chasing the web masters of popular sites like Google Maps, Wikipedia and Mozillazine, to get them to fix their URI encoding. Or we could wait another four years and see what standards are in effect then. :)
Guys, this is not about standards. They are the ground rule, but this is a fuzzy recognizer, see my comments above. Unless somebody finds a problem with my idea in comment 62 or has better ideas: Please stop from making comments.
Here's another idea (in addition to comment 62): bug 356657. Only helps, if sender used Thunderbird/Seamonkey.
I frequently receive correspondance from companies which include ( and ) as part of their URL. I had a survey which I filled out this past week for a company wanting to know how I felt about their customer service, and I was directed to the below URL, and whether this is wise from a standards perspective seems to be moot when compared to the fact that the links do not get properly interpreted as would be reasonably expected by a human. I should add, though, that there was a bunch of other junk added to allow the browser to interpret it as a possible HTML link but since I display in plain text, I was unable to click without a bunch of cut and paste. (I just tested this with the message in question, and it was never intended to be an HTML e-mail, so perhaps the sender just tried to get fancy or something and Thunderbird didn't know what to properly do with it? Should Thunderbird even try to interpret such thing?) The link I received was as follows: http://surveys.benchmarkportal.com/ebay/survey.taf?survey_id=1283&amp;use r_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5AD10 Within its context, though, it actually looked like this mess: &lt;br&gt;&lt;br&gt;&lt;a href="http://surveys.benchmarkportal.com/ebay/survey.taf?survey_id=1283&amp;use r_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5AD10"&gt;http://surveys.benchmarkportal.c om/ebay/survey.taf?survey_id=1283&amp;user_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5A D10&lt;/a&gt;&lt;br&gt;&lt;br&gt; The URL was spread across multiple lines and even with the quotes around the URL, it was misinterpreted by thunderbird. I think it is PROBABLY safe to suggest that anything beginning with a quotation mark followed by the string http:// is a URL which should be interpreted as a URL until the next occurrence of a quotation mark, and one can probably make the same assumption with other network protocols like ftp:// or https:// for example. (Whether this is easy to code for or not, I don't know.) What I was able to click was only everything up until a line break occurred in Thunderbird.
Attached patch proposed fix (obsolete) (deleted) — Splinter Review
This fixes it for all cases I have found.
Attachment #251951 - Flags: review?(ben.bucksch)
Nit: separate keywords from parentheses with a space: while (...) ...; if (....) ...; ^ ^ You could perform the same matching-parenthesis test at the end for matching apostrophe. I don't think this is common, so maybe it's not worth testing for, but it's possible someone would put in the URL as: 'http://mozilla.org'
Comment on attachment 251951 [details] [diff] [review] proposed fix Thanks for the patch, but the code in the |if (state[check] == endok)| doesn't fit the design of the code. Any reason not to put it in the code you commented as "back up a bit"? No reason to change the loop from for to while. And it *is* iterating. Comments are not entirely correct, but comments would make sense, yes. Unless you have a reason why the above proposal won't work, I'll attach a patch for that.
Attachment #251951 - Flags: review?(ben.bucksch) → review-
> Any reason not to put it in the code you commented as "back up a bit"? Nevermind, I realized it. However, your code - apart from not being in FindURL where is belongs - would not recognize (for example http://www.yahoo.com) correctly, it only looks at the char before the URL start. See comment 62.
Attached patch Fix, v4, broken (obsolete) (deleted) — Splinter Review
Here's what I mean: - Changes for loop to be more understandable - Allows ' in middle of URL, but not at end - Allowes ) in middle of URL, but not at end, unless there was a ( earlier However, it doesn't work as intended. I don't see why. Magnus, when looking at the code, can you see what's wrong?
Attachment #251951 - Attachment is obsolete: true
Attachment #251988 - Flags: review?(mkmelin+mozilla)
> However, it doesn't work as intended. I don't see why. 2 reasons: - should be "&& !haveOpeningBracket" (forgot the ! ) - I stupidly built static, so my changes were not picked up. rebuilding.
Attachment #251988 - Flags: review?(mkmelin+mozilla)
Attached patch Fix, v5 (obsolete) (deleted) — Splinter Review
Yup, that was it. Fix works as intended. I couldn't find anything it breaks either. Magnus, please review.
Attachment #251988 - Attachment is obsolete: true
Attachment #251997 - Flags: review?(mkmelin+mozilla)
Attachment #251997 - Attachment is patch: true
Attachment #251997 - Attachment mime type: text/x-patch → text/plain
I'll be away for a couple of days, so I wont be able to try it out until monday. Looks like it could work though. Assuming it works, it sure would be nice to get this in for tb2.0... Do we want to fix the 'http://mozilla.org' case too? Don't know how usual that is, but it would be pretty straight forward fix.
Attached patch Fix, v6. Alternative to v5: ) only after ( (obsolete) (deleted) — Splinter Review
With the patch v5, http://test)foo is recognized. This patch is an alternative, allowing ) only after a (. I.e. http://test_(Comp) and http://test(lookup)bla are recognized, but http://text)foo is not. > Do we want to fix the 'http://mozilla.org' Works before and after the patch (both versions).
Attachment #252040 - Flags: review?(mkmelin+mozilla)
Attachment #252040 - Attachment is patch: true
Attachment #252040 - Attachment mime type: text/x-patch → text/plain
Attachment #251997 - Flags: review?(mkmelin+mozilla)
Attached file My old test folders (ZIP with mbox) (deleted) —
Here are my old test mails. Quite a lot. For thorough testing, they should be checked with all relevant pref combinations.
Attached file Test mail (obsolete) (deleted) —
Ops, the last attachment was actually intended for another bug, but may be useful here as well. Here's a mail with testcases specifically for the brackets and similar. Clicking on it should open it in the browser and render it with the browser's libmime, if available. You can also copy the file to LocalFolders and open it in Thunderbird.
Attached image Test mail screenshot (with patch v6) (deleted) —
Comment on attachment 251997 [details] [diff] [review] Fix, v5 Yes, I like v6 better
Attachment #251997 - Attachment is obsolete: true
Comment on attachment 252040 [details] [diff] [review] Fix, v6. Alternative to v5: ) only after ( Seems good except for one case I happend to stumble upon. In a bug it said: per foo@bar.com's suggestion. For emails apostrophes should mark the end, no? And hypothetical cases like foo@(bar).com probably shouldn't be links either. I'm not a reviewer, but with that fixed, seems ok to me.
This is fix v6 + don't allow ' and ( in email address recognition.
Attachment #252040 - Attachment is obsolete: true
Attachment #253854 - Flags: review?(ben.bucksch)
Attachment #252040 - Flags: review?(mkmelin+mozilla)
Ben: have you had time to look at this?
Attached patch Fix, v8 (deleted) — Splinter Review
Yes. The 2-line check you added was in the wrong place (there's a similar check later, should be added there). I just fixed that. I'll also disallow [ and {. Currently, ] and } are disallowed (reasons see above), so including the opening ones doesn't make any sense whatsoever.
Attachment #253854 - Attachment is obsolete: true
Attachment #254729 - Flags: review?(mscott)
Attachment #253854 - Flags: review?(ben.bucksch)
Attached file Test mail (deleted) —
Attachment #252195 - Attachment is obsolete: true
Attachment #254729 - Flags: review?(mscott) → review+
Attachment #254729 - Flags: superreview?(mscott)
Attachment #254729 - Flags: superreview?(mscott) → superreview?(bienvenu)
Comment on attachment 254729 [details] [diff] [review] Fix, v8 ops, this is netwerk, so I guess I need a netwerker sr. trying biesi.
Attachment #254729 - Flags: superreview?(bienvenu) → superreview?(cbiesinger)
Comment on attachment 254729 [details] [diff] [review] Fix, v8 sr=biesi
Attachment #254729 - Flags: superreview?(cbiesinger) → superreview+
Fix v8 checked into trunk. Will bake on trunk for a few days, then ask for permission for branch.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Can somebody QA this on trunk, please, so that we have a chance to get it into TB2.0? For expected results, see screenshot in comment 82.
Attachment #254730 - Attachment mime type: message/rfc822 → text/plain
Attached image Screenshot with patch v8 on trunk (deleted) —
Tested with Thunderbird Version 3 alpha 1 (20070226) on Windows. Ben, have a look at the screenshot if the output is that what we want to have.
Comment on attachment 254729 [details] [diff] [review] Fix, v8 Thanks Henrik. Yes, that output is as expected. Requesting TB2 branch approval.
Attachment #254729 - Flags: approval-thunderbird2?
Attachment #256518 - Attachment description: Screenshot with patch v6 on fixed trunk → Screenshot with patch v8 on trunk
Attachment #254729 - Flags: approval-thunderbird2? → approval-thunderbird2+
Whiteboard: [checkin needed]
QA Contact: olgam → backend
Whiteboard: [checkin needed]
checked into branch.
WFM with Tb 2.0pre 2007030604. Good job guys! - especially as this bug was about to get Wontfix'ed 3 years ago :)
^ wrong clipboard content: the build id is 2007030703 (Lin)
No, IIRC that was not intentional.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Branch apparently got an older version of the fix. Trunk seems OK. I don't know what the branch rules are. Are correctness fixes allowed or only critical and security fixes? If not allowed, I'll mark the bug fixed again.
Ben, if trunk is fine we could leave it as fixed. Due to the wrong patch checked into 1.8.1.3 we should ask for approval 1.8.1.7. Could you attach the patch which corrects the mistake?
Status: REOPENED → RESOLVED
Closed: 18 years ago17 years ago
Flags: blocking1.8.1.7?
Keywords: fixed1.8.1.3
Resolution: --- → FIXED
Bug 390779 comment 8 is the result of the bad branch check-in.
Attached patch Branch fixup, v1 (obsolete) (deleted) — Splinter Review
This is the diff between branch and trunk on my disk, only the relevant part for this bug. I have *not tested* it yet, will do later, or somebody else can if you want to.
No longer depends on: 390779
Attached image Branch fixup screenshot (deleted) —
Attachment #275242 - Flags: approval-thunderbird2?
Tested Branch fixup, v1 with current branch, it indeed brings it on par with trunk, see screenshot. mscott: please approve. This is to fix bug 390779.
Comment on attachment 275242 [details] [diff] [review] Branch fixup, v1 >--- 2.0-branch/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp 2007-01-21 04:53:17.806002000 +0100 >+++ trunk/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp 2007-06-17 13:27:56.999825000 +0200 Dunno if we need a real diff against CVS 1.8 branch here. You only supplied your local diff between trunk and 1.8 branch. Adjusting approval flag to 1.8.1.7. No review needed?
Attachment #275242 - Flags: approval-thunderbird2? → approval1.8.1.7?
Attached patch Branch fixup, v1 (deleted) — Splinter Review
Same thing, but if you insist on it. The patch already had review and checkin-approval for branch. I simply checked in the wrong patch.
Attachment #275242 - Attachment is obsolete: true
Attachment #275313 - Flags: approval1.8.1.7?
Attachment #275242 - Flags: approval1.8.1.7?
Flags: blocking1.8.1.7? → blocking1.8.1.7+
Comment on attachment 275313 [details] [diff] [review] Branch fixup, v1 approved for 1.8.1.7, a=dveditz for release-drivers
Attachment #275313 - Flags: approval1.8.1.7? → approval1.8.1.7+
Ben, can you check-in attachment 275313 [details] [diff] [review] on MOZILLA_1_8_BRANCH, please?
Checked in on branch.
Keywords: checkin-needed
Keywords: fixed1.8.0.7
Target Milestone: --- → mozilla1.9alpha6
hskupin: Ben checked this into the 1.8 (.1) branch, not the 1.8.0 branch. Also, please do not change the target milestone unless it's your bug, that's for developer's use. But I'm not sure what you were getting at anyway since this was apparently already fixed on the trunk before you set it.
Keywords: fixed1.8.0.7
Target Milestone: mozilla1.9alpha6 → ---
Daniel, sorry for the fixed1.8.0.7 keyword. Yes, I ment fixed1.8.1.7. Something got wrong. I set the target milestone after the bug was fixed and which should give a hint for which version this issue is resolved. I don't think that this is inconvenient because Ben forgot that, isn't it? At least it will give us better results when doing QA.
Is the target milestone really used that way? Ok, guess that's all right if the bug's actually fixed.
Target Milestone: --- → mozilla1.9alpha6
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: