Closed
Bug 659569
Opened 14 years ago
Closed 14 years ago
Unable to connect to Ubiquiti Network radio equipment w/o getting stuck in redirect loop.
Categories
(Core :: Networking: Cache, defect)
Core
Networking: Cache
Tracking
()
People
(Reporter: jst, Assigned: bjarne)
References
Details
(Keywords: regression, Whiteboard: [qa-])
Attachments
(6 files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
bzbarsky
:
review+
jst
:
approval-mozilla-aurora+
christian
:
approval-mozilla-beta-
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
bzbarsky
:
review+
mcmanus
:
review+
briansmith
:
feedback-
|
Details | Diff | Splinter Review |
This is so far something I've only ever seen when trying to log into the admin interface on a Ubiquiti wifi radio and but so far it seems like something that can happen elsewhere as well. The problem is that Firefox gets stuck in a redirect loop and ends up displaying the error for that. Hitting try again gets past that and all the way to the login screen, but logging in is unsuccessful. If I at this point clear the cache, I'm able to log in and things work fine for days until I get kicked out and need to log in again.
If I test with a clean profile and disable both the disk cache and the memory cache before navigating to the radio interface then I don't see the above redirect problem.
This is a regression since 4, so I think we should track this for 5 and 6 as this could be a more widespread problem than what I'm seeing.
Unfortunately I so far can only reproduce this issue with one of these radios locally, but I'm bisecting right now to see what changeset caused this, and maybe that'll shed some light on what might be going on here. I can attach logs etc too, if it's not clear from the changeset itself.
Reporter | ||
Comment 1•14 years ago
|
||
This regressed with revision http://hg.mozilla.org/mozilla-central/rev/6fe1e7832dc8. Cc:ing Bjarne.
Reporter | ||
Comment 3•14 years ago
|
||
This is a URILoader:5,LoadGroup:5,DocLoader:5,nsHttp:5,nsSocketTransport:5 log from before this broke (changeset b490b2316729, with s/PRUInt32/PRUint32/ in nsHttpTransaction.cpp to make it build).
Reporter | ||
Comment 4•14 years ago
|
||
Reporter | ||
Comment 5•14 years ago
|
||
... and in those logs the failing load is from 192.168.1.1.
Assignee | ||
Comment 6•14 years ago
|
||
This redirection uses a pattern which causes nsHttpChannel::CheckCache() to clear |mRedirectedCachekeys| prematurely, hence we lose the redirect-chain and the solution doesn't work anymore. Actually a quite stupid mistake in the original fix, but that's life once in a while...
I was able to reproduce by modifying test for bug #561276 to use the same pattern. Fix is to *not* clear |mRedirectedCachekeys| at the wrong time - see patch.
Passes local testing. Pushed to tryserver.
Updated•14 years ago
|
Comment 7•14 years ago
|
||
Comment on attachment 535036 [details] [diff] [review]
Proposed solution
r=me
Attachment #535036 -
Flags: review?(bzbarsky) → review+
Reporter | ||
Comment 8•14 years ago
|
||
Thanks for the quick fix Bjarne! Seems to work wonderfully for the initial page that ends up stuck in the redirect loop, but the need to clear the cache before being able to log in remains. I'll capture logs for that as well.
Reporter | ||
Comment 9•14 years ago
|
||
This is a log with Bjarne's patch applied of the same working build where the login page loads fine, but the login itself fails. The behavior here is that the login page loads, I enter the credentials and hit enter and I'm presented with the same login page again instead of actually getting logged in etc.
The radio name in this question is ap-bud instead of an IP address (sorry, had to switch radios here).
Reporter | ||
Comment 10•14 years ago
|
||
Er, sorry, I confused myself there... The previous log is of a working login, where the login succeeds. The upcoming log is of a session where the login itself does not work.
Reporter | ||
Comment 11•14 years ago
|
||
Here the login page loads fine (no redirect loop), but logging in brings me back to the login page rather than logging in. If I clear the cache, I can successfully log in.
Comment 12•14 years ago
|
||
can you guys get this into central so we can get this tested? will consider for 5 and 6.
Reporter | ||
Comment 13•14 years ago
|
||
Asa, this is only partially fixed by the attached patch. We could probably land that separately (and I'll go ahead and do that), but we need more work here.
Bjarne, if it's not clear from the logs what's going on here I could bring in one of these radios and hook it up to the network in the office so you can test it yourself. Let me know if that would help.
Reporter | ||
Comment 14•14 years ago
|
||
Proposed solution pushed as:
http://hg.mozilla.org/mozilla-central/rev/f584467626df
But there's more to do here, so leaving this bug open.
Assignee | ||
Comment 15•14 years ago
|
||
(In reply to comment #13)
> Bjarne, if it's not clear from the logs what's going on here I could bring
> in one of these radios and hook it up to the network in the office so you
> can test it yourself. Let me know if that would help.
That would be infinitely useful, please... :)
Reporter | ||
Comment 16•14 years ago
|
||
Ok, will do. I'll send you the address and login info etc once it's set up (hopefully tomorrow).
Assignee | ||
Comment 17•14 years ago
|
||
Thanks for the test-setup. Makes life a lot simpler... :)
The login-issue can be described as follows: From the login-page we POST the credentials. The response is a 302 pointing to "/" and necko translates this into a "GET /" request (see bug #598304). We have already loaded "/" and the result is cached, and it happens to be a 301 pointing to the login-page. Necko follows the redirect to the login-page and we're back to the beginning. The redirect-chain doesn't help here because the response to the login-page is a 200.
This worked previously because necko always validated cached redirects if a cookie was set in the request, hence it didn't use the cached 301-response. The real problem is that the 301-response is cached and it could easily be resolved on the server by specifying no-cache or must-validate.
(Note btw that changing the request-method on a 302 (or 301 for that matter) is dead wrong (see RFC2616, notes at the end of sections 10.3.2 and 10.3.3) but according to bug #598304 it might be necessary to avoid breaking the web.)
But how do we fix this? The original code used the presence of a cookie in the request to detect the situation, but bug #561276 decided that this was too coarse-grained. IMO this situation may occur only for replacement-channels where the request-method has changed. I.e. we could carry fwd the request-method of an originating channel to the replacement-channel and compare.
Comment 18•14 years ago
|
||
So the server should be sending a "Vary: Cookie" in this case and is not, right?
Assignee | ||
Comment 19•14 years ago
|
||
Not really. The cookie is also sent in the earlier, cached request. I assume the server applies a design-pattern which uses the cookie to identify a session but maintains state internally in the session, returning 301's to different pages depending on state. The real, uhh, confusing bit is why the server uses a 301 ("moved permanently") for this. By just replacing 301 with 302 the problem would be resolved since 302s are not cached unless explicitly specified... :(
See e.g. attachment #535107 [details]: The first request for "/" results in a 301 to "cookiechecker". The request for "cookiechecker" results in a 301 back to "/". Then, the request for "/" is found in the cache but CheckCache() decides to validate it (probably because it's captured without the redirect-chain-patch). The subsequent request for "/" sets the cookie, but now the server responds with a 301 to "login". The request for "login" also sets the cookie and results in a 200 and at this point we have a cached 301-response for "/" with the cookie set. After this we see a number of req/res for resources probably displayed on the login-page, before we get to the POST with credentials. Server responds with a 302 to "/" and observe that CheckCache() decides to validate the cached 301-entry for "/" again. This validation saves us because we get a 200 when requesting "/" with cookie set again. Normally we should have re-used the cached 301 which would send us back to "login". This is what happens in attachment #535110 [details].
The question is whether we should hack around this or not. I have a patch which does as suggested in last para of comment #17, but it essentially breaks RFC2616 (i.e. in this case makes the 301 non-cacheable) in order to allow servers to use a design-pattern and misuse 301 like described above. IMO we should go back to whoever writes this stuff and tell them to use 302 instead in these cases.
Reporter | ||
Comment 20•14 years ago
|
||
Fwiw, I tested IE and Chrome with one of these radios and neither browser has a problem logging into them. So I think we need to do *something* here, but I don't know enough about this to say what.
Assignee | ||
Comment 21•14 years ago
|
||
(In reply to comment #19)
> Not really. The cookie is also sent in the earlier, cached request.
To clarify: The cookie-values in this case are the same so "Vary: Cookie" wouldn't help, unfortunately.
(In reply to comment #20)
> Fwiw, I tested IE and Chrome with one of these radios and neither browser
> has a problem logging into them. So I think we need to do *something* here,
> but I don't know enough about this to say what.
Would be interesting to know if IE/Opera receives the same content we do... Otoh I perfectly agree that we must do something - IMO we have 3 options:
1) apply the hack I'll attach in a little while (deliberately allowing such behaviour from servers)
2) back out the fix for bug #561276 (losing the advantage of caching 3xx's which are used correctly)
3) use the power of our current market-share and communicate with whoever writes this software and make them change it
I've encountered the dilemma 1 vs 3 in a number of other bugs and I'd expect to see it in also in future issues, hence it might be worth to take this discussion on a principal level in order to get clear guidelines (or maybe there are some already that I'm unaware of..?)
I don't know how widespread this software is (or whether it is maintained and upgraded) so I'll leave the decision of what to do to others.
Assignee | ||
Comment 22•14 years ago
|
||
I don't like this solution but it allows me to log into the wireless and navigate it. If we want to follow this path, and if considered necessary, I could probably put together a test.
Attachment #535589 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 23•14 years ago
|
||
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17
Btw, it passes local testing and has been sent to tryserver.
Comment 24•14 years ago
|
||
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17
Looks ok to me from a code point of view. I think Patrick or Jason are more qualified to review the HTTP interaction "should we do this?" end of things.
Attachment #535589 -
Flags: review?(mcmanus)
Attachment #535589 -
Flags: review?(jduell.mcbugs)
Attachment #535589 -
Flags: review?(bzbarsky)
Attachment #535589 -
Flags: review+
Comment 25•14 years ago
|
||
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17
I would not be surprised if other browsers are invalidating the target of the redirect from the POST, since the spec says [1]:
"Some HTTP methods MUST cause a cache to invalidate an entity. This is either the entity referred to by the Request-URI, or by the Location or Content-Location headers (if present). These methods are: PUT, DELETE, POST. In order to prevent denial of service attacks, an invalidation based on the URI in a Location or Content-Location header MUST only be performed if the host part is the same as in the Request-URI."
I think we should do precisely what the spec says: If the host in the Location response header field of a POST (or PUT or DELETE) is the same as the host in the request URI, then doom the entries mentioned in the Location header field (and, similarly for Content-Location) when processing the POST (or PUT or DELETE).
[1] http://tools.ietf.org/html/draft-ietf-httpbis-p6-cache-14#section-2.5
Attachment #535589 -
Flags: feedback-
Comment 26•14 years ago
|
||
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17
I don't have a problem with what the patch does, so I'm happy to r+ it. We more or less want to bring the not-from-cache semantics of the POST through to the rewritten request.
If I understand Brian's suggestion it boils down to invalidating any cached resource (probably from a GET) that has POST/PUT/etc performed on it even for future GETs.. that has a certain logical attraction to it, but I would want to ponder how that interaction impacts the cache for deployed websites and imo we should study that in a different bug.
Attachment #535589 -
Flags: review?(mcmanus) → review+
Assignee | ||
Comment 27•14 years ago
|
||
(In reply to comment #25)
> I would not be surprised if other browsers are invalidating the target of
> the redirect from the POST
A fresh pair of eyes is often fruitful - this would be a far better solution (if it applies, which I believe it will)!
I remember reviewing a patch doing something like this some time ago but I cannot find it in any of my searches... Iirc the patch was ok but should be extended with more details so I believe I cleared the r-request. I'll try to dig it up somehow...
Assignee | ||
Comment 28•14 years ago
|
||
Yeah - this is probably also covered by RFC2616 section 13.10 as stated in bug #618835, comment #5. Wonder why I didn't recall this one - good catch Brian. :)
I'll verify that the approach from #618835 fixes the remaining issue of this bug.
Assignee | ||
Comment 29•14 years ago
|
||
WIP patch from bug #618835 fixes this problem. Since comment #14 pushed the fix for the original problem described in this bug, I resolve the remaining issue as a dup of bug #618835. Feel free to correct me if this is wrong procedure.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Reporter | ||
Comment 30•14 years ago
|
||
Updating title to match what was fixed here. Thanks again Bjarne for digging in here! We'll continue tracking the remaining issue in bug 618835. And I'm going to mark this fixed instead, not that it makes much difference...
Resolution: DUPLICATE → FIXED
Summary: Unable to log into Ubiquiti Network radio equipment w/o clearing the cache. → Unable to connect to Ubiquiti Network radio equipment w/o getting stuck in redirect loop.
Reporter | ||
Updated•14 years ago
|
Reporter | ||
Comment 31•14 years ago
|
||
Comment on attachment 535036 [details] [diff] [review]
Proposed solution
This needs to land on 5 beta and 6 aurora.
Attachment #535036 -
Flags: approval-mozilla-beta+
Attachment #535036 -
Flags: approval-mozilla-aurora+
Comment 32•14 years ago
|
||
Even though this is a Firefox 5 regression, we are going to remove tracking and ship without it because:
1. We don't think the issue is widespread
2. The workaround is pretty simple (just clicking the try again button)
3. The changes are in http code and we are late in the beta cycle
Clearing version-specific tracking flags.
status-firefox5:
--- → affected
status-firefox6:
--- → affected
Attachment #535036 -
Flags: approval-mozilla-beta+ → approval-mozilla-beta-
Comment 33•13 years ago
|
||
Bjarne, do you need to request aurora review to land this for Fx6?
Assignee | ||
Comment 34•13 years ago
|
||
This is not really a prerequisite for bug #618835 but it is related. Please observe that attachment #535036 [details] [diff] [review] was already approved for aurora (afaics) so I guess I can just request check-in?
Reporter | ||
Comment 35•13 years ago
|
||
status-firefox7:
--- → fixed
Comment 36•13 years ago
|
||
Is bug 666021 referring to the same issue as the one that was fixed here? Thanks.
Assignee | ||
Comment 37•13 years ago
|
||
(In reply to comment #36)
> Is bug 666021 referring to the same issue as the one that was fixed here?
Most likely. Could you try the latest nightly build...?
Comment 39•13 years ago
|
||
(In reply to Christian Legnitto [:LegNeato] from comment #32)
> 2. The workaround is pretty simple (just clicking the try again button)
I'm not sure it's worth pointing out at this late date, but the Try Again button is not sufficient to resolve this issue. It gets past the redirect warning, but then one gets stuck on an infinite loop on the device's login page. That is, the login page is presented, user enters a username/password and clicks the submit button (Login), and the login page is again presented, ad-infinitum.
My solution has been to use another browser to manage this device, which is kind of annoying, especially because most of the time when I want to manage it it is because there is a network problem in-progress...
Updated•13 years ago
|
Attachment #535589 -
Flags: review?(jduell.mcbugs)
Comment 41•13 years ago
|
||
qa-: QA is unable to verify this fix, Johnny can you please verify the fix?
Whiteboard: [qa-]
Comment 42•13 years ago
|
||
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #41)
> qa-: QA is unable to verify this fix, Johnny can you please verify the fix?
Sorry for the delay, I wasn't sure if this was targetted at mortals. Under WinXP, Aurora 9.02a2 seems to work just fine connecting to a Ubiquity device that FF6 could not talk to (running Firmware Version: XM.v5.2.1). The Aurora particulars:
Build identifier: Mozilla/5.0 (Windows NT 5.1; rv:9.0a2) Gecko/20110930 Firefox/9.0a2
Built from http://hg.mozilla.org/releases/mozilla-aurora/rev/79eab065f0d0
So looks good!!
Comment 43•13 years ago
|
||
> Aurora 9.02a2 seems to work just fine connecting to a Ubiquity device
> that FF6 could not talk to
I suppose I need to amend that. I'm afraid I lost track of what version I saw the failure under. It was probably FF4, which appears to have been upgraded away from this system. It appears that FF7.0.1 also works just fine to connect to this device.
Reporter | ||
Comment 44•13 years ago
|
||
Yup, this problem is now a thing of the past. Verified.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•