Closed Bug 85500 Opened 23 years ago Closed 23 years ago

Mozilla includes # anchors in GET URI in some cases

Categories

(Core :: Networking: HTTP, defect, P4)

defect

Tracking

()

VERIFIED FIXED
mozilla0.9.4

People

(Reporter: sharding, Assigned: neeti)

Details

Attachments

(1 file)

When Mozilla does an HTTP GET, if the URL includes more than one '#' character, it includes everything before the last '#' in the URI sent to the server. For example, loading http://foo.example.com/foo.html#bar# will result in: GET /foo.html#bar HTTP/1.1 etc., etc. Some servers handle this gracefully and ignore the anchor, but others spit back a 404. Either way, it seems like this isn't the correct behavior. Shouldn't it be lopping off everything after the first '#'? That's what it appears that Netscape 4.7x, IE and Opera do.
Well... a URI with two # characters in it is illegal, no? The second # should be escaped. So you'll get different results depending on whether the browser looks for # from the front (as IE/Opera/4.x seem to) or from the back (as we seem to). Confirming bug, though. We should try to deal with this invalid case in my opinion....
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: FreeBSD → All
Hardware: PC → All
Is it illegal? That wasn't clear to me. I'd agree that it should be avoided, but I wasn't able to find anywhere (specifically looking in RFC 2396) that said there can only be one '#'. Either way, it does exist in the wild, so Mozilla might as well deal with it cleanly.
RFC 2396. Section 2.4.3 -- Excluded US-ASCII Characters The character "#" is excluded because it is used to delimit a URI from a fragment identifier in URI references (Section 4) Data corresponding to excluded characters must be escaped in order to be properly represented within a URI. I agree with you, though. We should consider being compatible with other browsers on this
The character "#" is excluded because it is used to delimit a URI from a fragment identifier in URI references (Section 4) Data corresponding to excluded characters must be escaped in order to be properly represented within a URI. Right. I saw that, but I read that as saying that a "#" can't be in a URI. If there are two "#"s, the first one would be delimiting the URI from the fragment identifier and the second one would be part of the fragment identifier. So the real question is whether or not "#" is allowed in fragment identifiers. It turns out that it isn't; I just didn't see that part the first time I looked: The character restrictions described in Section 2 for URI also apply to the fragment in a URI-reference. So, you're right. It's not legal. We agree that Mozilla's handling of it should change, but I'll go ahead and contact the maintainers of the site I first saw this on to let them know the problem as well.
Priority: -- → P4
Target Milestone: --- → mozilla0.9.3
If this could be a long discussion, lets discuss this in a newsgroup. If we decide this is unsupported, I'll send this to evangelism.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
I don't know where the observed behaviour should come from, the urlparser does the right thing as it is visible with urltest (updated version): urltest http://foo.example.com/foo.html#bar# gives http://foo.example.com/foo.html#bar# http,,,foo.example.com,-1,/,foo,html,,,bar#,http://foo.example.com/foo.html#bar# Does anyone have any real live examples? The only possibility I see is that if this really happens then somewhere down in the http protocol someone does it's own parsing ... not good ...
The problem is inside nsHttpChannel::SetupTransaction() ... // use the URI path if not proxying (transparent proxying such as SSL proxy // or socks does not count here). nsXPIDLCString requestURIStr; const char* requestURI; if (!mConnectionInfo->ProxyHost() || mConnectionInfo->UsingSSL() || !PL_strcmp(mConnectionInfo->ProxyType(), "socks") || !PL_strcmp(mConnectionInfo->ProxyType(), "socks4")) { rv = mURI->GetPath(getter_Copies(requestURIStr)); if (NS_FAILED(rv)) return rv; requestURI = requestURIStr.get(); } else requestURI = mSpec.get(); // trim off the #ref portion if any... char *p = PL_strrchr(requestURI, '#'); if (p) *p = 0; ... This should be char *p = PL_strchr(requestURI, '#'). Every # as part of path or spec is escaped and is not found with left search of #.
Attached patch patch to fix the problem (deleted) — Splinter Review
cc-ing darin, who worked last at that code
Keywords: review
Whiteboard: seeking r/sr
r=darin on the patch
sr=rpotts
a=dbaron (on behalf of drivers)
fix checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Keywords: review
Resolution: --- → FIXED
Whiteboard: seeking r/sr
Verified per andreas' comment.
Status: RESOLVED → VERIFIED
QA Contact: benc → junruh
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: