Closed Bug 7399 Opened 25 years ago Closed 25 years ago

Escaping illegal chars in URLs

Categories

(Core :: Networking, defect, P3)

x86
Windows NT
defect

Tracking

()

VERIFIED DUPLICATE of bug 10373

People

(Reporter: hjtoi-bugzilla, Assigned: gagan)

References

()

Details

Recently http://www.biztalk.org had spaces in links. They worked in IE and Opera, but not in Netscape nor Gecko. They later changed the spaces to underscores. In XML world at least the browser should escape ALL illegal characters in URLs (I just read a mail about that today, but can't remember on which list it was). So if there are spaces in URLs they should be escaped with %20 automatically by the browser. Gecko understands escaped URLs, it is just a matter of doing the escaping... The URL has a doc that contains one link that points to a file with a space in its name. IE handles that fine, NS and Gecko fail.
There are some problems with this: 1) different URL RFCs have different ideas of what illegal characters are 2) Should the URL, as given in the document already be legal? Is it the job of the browser to correct a URL when the correction might mess up the server? (What do current browsers do here?) I think one may end up having to stick to tradition on this, but I'm not really sure what the URL RFC's say about correction of URLs. (When the site you mention above had spaces in links, was the whole thing in quotes? If not, then the problem was with parsing.)
It took some time to find where I had read that piece about illegal characters in URIs (note, _URI_). The below URLs should answer your questions. The discussion happened on XML-DEV. Here is a link to the archive and the thread you should read: http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-May-1999/0573.html Here are some extracted relevant URLs from the discussion: http://www.w3.org/TR/WD-charmod#URIs http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2
According to those last two links, which point to HTML 4 section B.2.1 and the last working draft of the W3C Character Model respectively, we should indeed be escaping URIs. 1) We should probably take the superset. That way all bases are covered. 2) Yes, the URI in the document should indeed be legal. No, I would say that it is not our job to correct it. However, we should certainly not be sending invalid URIs to servers, so I suggest encoding would be best. Currently, we are dropping spaces in URIs altogether (this happens somewhere in the content sink, see bug 8319). We should certainly not be doing this.
Pushed past necko landing...
Changing all Networking Library/Browser bugs to Networking-Core component for Browser. Occasionally, Bugzilla will burp and cause Verified bugs to reopen when I do this in a bulk change. If this happens, I will fix. ;-)
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → DUPLICATE
*** This bug has been marked as a duplicate of 10373 ***
Status: RESOLVED → VERIFIED
Bulk move of all Networking-Core (to be deleted component) bugs to new Networking component.
You need to log in before you can comment on or make changes to this bug.