Closed
Bug 591953
Opened 14 years ago
Closed 13 years ago
"Open link" context menu should be activated for urls with dashes.
Categories
(Firefox :: Menus, defect)
Firefox
Menus
Tracking
()
VERIFIED
FIXED
Firefox 7
People
(Reporter: sylvain.pasche, Assigned: dao)
References
(Depends on 1 open bug, )
Details
Attachments
(1 file, 1 obsolete file)
(deleted),
patch
|
Gavin
:
review+
|
Details | Diff | Splinter Review |
It's valid to have dashes in urls, and selecting such urls (such as foo-bar.com) should show the "Open link" context menu, like it does for urls without dashes (google.com).
Updated•14 years ago
|
Assignee | ||
Comment 3•13 years ago
|
||
Comment 4•13 years ago
|
||
Does this patch fixes this bug for IDN urls?
(In reply to comment #3)
> Created attachment 537631 [details] [diff] [review] [review]
> accept hyphen and reject trailing non-alpha character
If you want to be really pedantic (er, I mean, accurate), it ought to be:
/^(?:[a-z\d](?:[a-z\d-]*[a-z\d])*)(?:\.[a-z\d](?:[a-z\d-]*[a-z\d])*)+$/i
Underscores are invalid (so so \w), and the hyphen could not be the first or last character of each dot-limited section.
> Underscores are invalid (so so \w), and the hyphen could not be the first or
> last character of each dot-limited section.
s/so so/so no/
Assignee | ||
Comment 7•13 years ago
|
||
a-z\d is too restrictive. \w may match special alphabetical characters beyond a-z.
(In reply to comment #7)
> a-z\d is too restrictive. \w may match special alphabetical characters
> beyond a-z.
Such as? In both Perl, \w is equivalent to [a-z0-9_], and that appears to be the case for Gecko's JS engine as well:
alert(/\w/.test("eee")); -> true
alert(/\w/.test("ééé")); -> false
(I assume you meant accented characters by "special alphabetical characters"? That would be an IDN, which would be unwieldy to cover using re.)
Ack, s/both//. I really should proofread before hitting that button!
Assignee | ||
Comment 10•13 years ago
|
||
Such as ü:
data:text/html;charset=UTF-8,www.günteramendt.de
é works as well.
Assignee | ||
Comment 11•13 years ago
|
||
Err, I think günteramendt.de in www.günteramendt.de matches because of \D\S*. I'm not sure under which conditions SpiderMonkey expands \w, or if it stopped doing so. Anyway, I don't think we need to exclude '_', prevent '-' from leading in a segment, etc.
Assignee | ||
Comment 12•13 years ago
|
||
This handles umlauts, accents, etc.. Since this makes the character class more complex anyway, I also replaced \w with a-z\d.
Attachment #537631 -
Attachment is obsolete: true
Attachment #537664 -
Flags: review?(gavin.sharp)
Attachment #537631 -
Flags: review?(gavin.sharp)
Comment 13•13 years ago
|
||
I still don't think that re is the best way to handle IDNs: there's more that we need to handle beyond accented Latin characters.
E.g., президент.рф (bug 612146) or the .中国 TLD that the Chinese recently started using.
I think the best way to do it is something along the lines of:
try {
linkText = Components.classes["@mozilla.org/network/idn-service;1"].
getService(Components.interfaces.nsIIDNService).
convertUTF8toACE(linkText);
} catch(e) {
linkText = "";
}
// Standard regexp test here
convertUTF8toACE will throw an exception if there are invalid characters or if it can't un-IDN the string and then we can run re on the results as usual to catch the things that convertUTF8toACE doesn't catch (e.g., the presence of whitespace).
This would properly fix bug 612146 and make Alexander L. Slovesnik (comment 4) happy.
Comment 14•13 years ago
|
||
Why not implement bug 540787 solution and use the method? I mean, if we're doing this already...
Comment 15•13 years ago
|
||
(In reply to comment #14)
> Why not implement bug 540787 solution and use the method? I mean, if we're
> doing this already...
The most that could do is replace the re portion of this thing. When I skimmed through the implementation of findURLInPlaintext, it did not appear to be IDN-aware. And I am not convinced that it would do a much better job than a re (the one in comment 5 conforms precisely to the relevant RFCs; you can't really improve on that). It may be slightly more efficient than a re, but in situation like this, that is just background noise (since this is matching over a short string and only when the context menu is activated).
Assignee | ||
Comment 16•13 years ago
|
||
(In reply to comment #13)
> I still don't think that re is the best way to handle IDNs: there's more
> that we need to handle beyond accented Latin characters.
>
> E.g., президент.рф (bug 612146) or the .中国 TLD that the Chinese recently
> started using.
[...]
> This would properly fix bug 612146 and make Alexander L. Slovesnik (comment
> 4) happy.
I didn't set out to fix bug 612146. Bug 612146 can deal with that.
Comment 17•13 years ago
|
||
(In reply to comment #16)
> I didn't set out to fix bug 612146. Bug 612146 can deal with that.
Okay, that makes sense organizationally; the reason I suggested that was because you were making accommodations for a subset of IDN domains, and I thought it would be easier to get it all done in one swoop instead of making a re change here and then immediately changing it again for 612146.
Comment 18•13 years ago
|
||
Comment on attachment 537664 [details] [diff] [review]
accept hyphens and umlauts/accents/etc., reject trailing non-alpha character
\xc0-\xff seems kind of arbitrary (why stop at 0xff?). Can we omit that for the moment and leave it to bug 612146?
Attachment #537664 -
Flags: review?(gavin.sharp) → review+
Assignee | ||
Comment 19•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 7
Comment 20•13 years ago
|
||
Mozilla/5.0 (Windows NT 5.1; rv:7.0a1) Gecko/20110627 Firefox/7.0a1
Verified issue on Ubuntu 11, WinXP, Mac OS X 10.6, Win 7. Selecting URL <foo-bar.com> show "Open link" in the context menu.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•