Closed
Bug 770243
Opened 12 years ago
Closed 12 years ago
MOZ_ASSERT(mCacheInputStream) failing @ nsHttpChannel::ReadFromCache
Categories
(Core :: Networking: Cache, defect)
Core
Networking: Cache
Tracking
()
RESOLVED
FIXED
mozilla18
People
(Reporter: mayhemer, Assigned: briansmith)
References
Details
(Keywords: regression, Whiteboard: [qa-])
Attachments
(4 files, 2 obsolete files)
(deleted),
patch
|
briansmith
:
review+
akeybl
:
approval-mozilla-aurora+
akeybl
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mayhemer
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mayhemer
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mayhemer
:
review+
|
Details | Diff | Splinter Review |
This should be a higher priority to check on, since when this fails in a release build we don't display any content (cannot read the cache entry and not having a content response from the server). STR, if I remember correctly: 1. configure proxy for authentication (basic), e.g. squid, for all protocols 2. load http://www.websocket.org/echo.html 3. [optional] try to connect (doesn't matter what the result is) 4. change the proxy settings to just set proxy for http, leave all other protocols blank 5. restart Fx 6. load http://www.websocket.org/echo.html => assertion failure > xul.dll!mozilla::net::nsHttpChannel::ReadFromCache(bool alreadyMarkedValid=false) Line 3491 + 0x2c bytes C++ xul.dll!mozilla::net::nsHttpChannel::ProcessNotModified() Line 2210 + 0xd bytes C++ xul.dll!mozilla::net::nsHttpChannel::ProcessResponse() Line 1265 + 0x8 bytes C++ xul.dll!mozilla::net::nsHttpChannel::OnStartRequest(nsIRequest * request=0x086e01e0, nsISupports * ctxt=0x00000000) Line 4784 + 0xe bytes C++ xul.dll!nsInputStreamPump::OnStateStart() Line 416 + 0x2c bytes C++
Reporter | ||
Comment 1•12 years ago
|
||
Just confirming that I am able to reproduce using the STR from comment 0. Hint: you may want to delete the cache before step 1, but I think it is not strictly necessary. Also, the assertion fails only after the proxy auth dialog is confirmed. (I have remembered the password.)
Reporter | ||
Comment 2•12 years ago
|
||
BTW: looks like we have to fix the logs in HttpCacheQuery. Also a log that channel X opens a query Y would be useful...
Reporter | ||
Comment 3•12 years ago
|
||
Simpler STR: 1. configure proxy for authentication (basic), e.g. squid, for all protocols 2. load http://www.websocket.org/echo.html 3. restart Fx 4. load http://www.websocket.org/echo.html Culprit is the following code at [1]: if (httpStatus != 304 && httpStatus != 206) { mCacheInputStream.CloseAndRelease(); } httpStatus is 407. I had to catch this at the review time. [1] http://hg.mozilla.org/mozilla-central/annotate/9c6ad99de47a/netwerk/protocol/http/nsHttpChannel.cpp#l1183
Reporter | ||
Comment 4•12 years ago
|
||
Assignee: nobody → honzab.moz
Attachment #638401 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #638410 -
Flags: review?(bsmith)
Assignee | ||
Comment 5•12 years ago
|
||
Honza, your change looks reasonable but in order to understand why it works it would be good to have test cases: 1. 401 followed by 304 2. 401 followed by cacheable 200 OK 3. 407 followed by 304 4. 407 followed by cacheable 200 OK I guess the cases that are failing now are #1 and #3. We need to test #2 and #4 to test that we're closing the input stream before we try to open an output stream for the cache entry.
Reporter | ||
Comment 7•12 years ago
|
||
Added a test. Quit self explaining.
Attachment #638410 -
Attachment is obsolete: true
Attachment #638410 -
Flags: review?(bsmith)
Attachment #642096 -
Flags: review?(bsmith)
Comment 8•12 years ago
|
||
We either need to take this fix in Aurora 16, or backout bug 722034. Tracking for that release.
tracking-firefox16:
--- → +
Assignee | ||
Comment 10•12 years ago
|
||
Honza, here's an alternate fix that I think is better because: (a) It does not special-case any status codes (b) It avoids acquiring the cache service lock at that point in the code (this may have be responsible for the CACHE_SERVICE_LOCK_WAIT_MAINTHREAD regression.) If you don't think this alternate fix is correct, then we can use yours, which I also think is correct. Note that this patch applies on top of yours. You can just qfold it onto yours.
Attachment #643594 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 11•12 years ago
|
||
Instead of listing the nits, I just created a patch to fix them. Up to you whether you want to fold this onto your patch.
Attachment #643595 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 12•12 years ago
|
||
Comment on attachment 642096 [details] [diff] [review] v1 + test Review of attachment 642096 [details] [diff] [review]: ----------------------------------------------------------------- Honza, thanks for fixing the bug. Your patch seems reasonable to me. I would prefer us to use the alternate fix that I attached, if possible, but if that's problematic then your patch is also good as-is.
Attachment #642096 -
Flags: review?(bsmith) → review+
Reporter | ||
Comment 13•12 years ago
|
||
Comment on attachment 643595 [details] [diff] [review] Address nits in test code Review of attachment 643595 [details] [diff] [review]: ----------------------------------------------------------------- :D funny...
Attachment #643595 -
Flags: review?(honzab.moz) → review+
Reporter | ||
Comment 14•12 years ago
|
||
Comment on attachment 643594 [details] [diff] [review] Close cache input stream only when we're sure we don't need it Review of attachment 643594 [details] [diff] [review]: ----------------------------------------------------------------- Yep, this could work. But aren't you actually removing the optimization of throwing away the stream ASAP we know we won't need it? Test passes for me, but w/o the "funny" patch since I cannot apply it.
Attachment #643594 -
Flags: review?(honzab.moz) → review+
Assignee | ||
Comment 15•12 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #14) > Yep, this could work. But aren't you actually removing the optimization of > throwing away the stream ASAP we know we won't need it? Throwing away the input stream could be expensive because that process takes the cache service lock on the main thread, so it is not necessarily an optimization. Plus, the optimization would only work if we released our descriptor (not just the input stream) so another channel could open a descriptor.
Assignee | ||
Comment 16•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/dcc9891978e1
Target Milestone: --- → mozilla17
Comment 17•12 years ago
|
||
Sorry, but I had to back this out due to xpcshell test failures. https://hg.mozilla.org/integration/mozilla-inbound/rev/a36544756614 https://tbpl.mozilla.org/php/getParsedLog.php?id=13690848&tree=Mozilla-Inbound TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug770243.js | test failed (with xpcshell return code: 0), see following log: TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug770243.js | Response body 2 == Response body 3 - See following stack:
Target Milestone: mozilla17 → ---
Assignee | ||
Comment 18•12 years ago
|
||
Out of time to investigate tonight, but I believe the root cause is actually bug 767277; i.e. I think we should have backed out bug 767277 instead (which I will do shortly). See the comment I am about to add to that bug.
Assignee | ||
Comment 19•12 years ago
|
||
Re-pushed to inbound after backing out bug 767277: https://hg.mozilla.org/integration/mozilla-inbound/rev/64b30d7e3932
Backed out again https://hg.mozilla.org/integration/mozilla-inbound/rev/ecb953611462 seems to have been failing a test here quite consistently.
Reporter | ||
Comment 21•12 years ago
|
||
What about the v1 (attachment 642096 [details] [diff] [review])? Does it also cause these problems?
Reporter | ||
Comment 22•12 years ago
|
||
Could we please move here forward? This makes diagnoses of some auth bugs much more complicated. Thanks.
Reporter | ||
Comment 23•12 years ago
|
||
Comment on attachment 642096 [details] [diff] [review] v1 + test https://tbpl.mozilla.org/?tree=Try&rev=b9f5c4565ae2
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 24•12 years ago
|
||
Honza, let me know if you want me to take this back. The test is failing in the same way as it did when these patches got backed out of mozilla-inbound. Bug 767277 wasn't the problem after all. Here's the comment I added to bug 767277 previously about it: (In reply to Brian Smith (:bsmith) from bug 767277 comment #12) > Backed out because I suspect it is causing the test for bug 770243 to fail: > https://hg.mozilla.org/integration/mozilla-inbound/rev/71854c060f90 > > The problem case is this: > > We async dispatch a runnable to close the cache entry "later," thus leaving > the cache entry open for a while after we've marked it valid. During that > time between when we marked the cache entry valid and the time we close it, > another channel decides to open the cache entry. That channel gets a > ACCESS_READ descriptor instead of an ACCESS_READ_WRITE descriptor, and so it > skips the validation of the cached response. > > Bug 770243's tests passed previously when this patch was not applied (we > should verify this last claim). But, with this patch, case 4 of that bug's > tests fails. If I modify that case to use do_execute_soon, then the test > succeeds. Debugging the failure, I saw that the cache is returning an > ACCESS_READ descriptor. This causes HttpCacheQuery::CheckCache() to skip the > validation of the entry on the assumption that it is valid. But, the test > only works when each channel opened does revalidation.
Reporter | ||
Comment 25•12 years ago
|
||
Ah! My new test is failing! As usual, the behavior is from some reason different on windows (the test passes). Why didn't I read that comment sooner more carefully... We are blocked all the time here just because of bad timing. I will add execute-soon or sync-on-cache-thread to the test. If there is anything we *really* need to fix, then let's do it in a different bug (bug 775830?) with reference to this one to have a tool for reproduce/check.
Comment 26•12 years ago
|
||
I've asked in bug 722034 if we should consider performing the backout in https://bugzilla.mozilla.org/show_bug.cgi?id=722034#c79 for FF16 as well. If so, we'll untrack this bug.
Updated•12 years ago
|
Assignee: honzab.moz → bsmith
Assignee | ||
Comment 28•12 years ago
|
||
We have a patch for this already that landed once but got backed out because it was causing a test to fail. We believe that the problem is with the test and that the patches already attached to this bug are safe enough to land in Nightly, Aurora, and Beta as soon as we tweak that test, which I am planning to do today.
Comment 29•12 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #28) > We have a patch for this already that landed once but got backed out because > it was causing a test to fail. We believe that the problem is with the test > and that the patches already attached to this bug are safe enough to land in > Nightly, Aurora, and Beta as soon as we tweak that test, which I am planning > to do today. Thanks for the update. Sounds like this will make it into beta 5, going to build on Tuesday.
Assignee | ||
Comment 30•12 years ago
|
||
All four patches in this bug need to be qfolded together before landing. Part 4 does three things: 1. Rebases the test (from Parts 1-3) to the current mozilla-central. 2. For each case, adds a check that each response is or is not from the cache, as appropriate. 3. Syncs with the cache between each case to avoid race conditions.
Attachment #664232 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 32•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=6336b7c8acb4 It looks shockingly green, except for the Linux "B" failures and the normal Android rainbow. I just retriggered the Linux builds to see if they will go green.
Reporter | ||
Updated•12 years ago
|
Attachment #664232 -
Flags: review?(honzab.moz) → review+
Assignee | ||
Comment 33•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/cbfe6a468a12
Keywords: regression
Target Milestone: --- → mozilla18
Assignee | ||
Comment 34•12 years ago
|
||
Comment on attachment 642096 [details] [diff] [review] v1 + test Try runs for -aurora and -beta are here: https://tbpl.mozilla.org/?tree=Try&rev=5c787fc0db0d https://tbpl.mozilla.org/?tree=Try&rev=be55a62862e4 Note that this approval request is for all four patches in this bug, folded together. [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 746018 User impact if declined: Some pages and sub-resources of pages will not load correctly, especially when HTTP proxy authentication is used. Testing completed (on m-c, etc.): Automated test case, landed today on m-i. Risk to taking this patch (and alternatives if risky): Moderate. There is some possibility that closing the cache entry later could cause unforseen loading problems similar to the problems we encountered and fixed with the automated test. String or UUID changes made by this patch: none
Attachment #642096 -
Flags: approval-mozilla-beta?
Attachment #642096 -
Flags: approval-mozilla-aurora?
Comment 35•12 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #34) > Risk to taking this patch (and alternatives if risky): Moderate. There is > some possibility that closing the cache entry later could cause unforseen > loading problems similar to the problems we encountered and fixed with the > automated test. We're leaning towards backing out bug 722034 and bug 746018 for FF16 at this point, given the moderate risk evaluation here, the other regression tracked in bug 788365, and the fact that a fix would go into our final beta for the first time (this didn't make beta 5). Can you make the case for taking a forward fix at this point?
Comment 36•12 years ago
|
||
Comment on attachment 642096 [details] [diff] [review] v1 + test [Triage Comment] Approving for Aurora in preparation for Beta 16 consideration.
Attachment #642096 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 37•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/cbfe6a468a12
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Assignee | ||
Comment 38•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/f59da8ed70ea
status-firefox17:
--- → fixed
status-firefox18:
--- → fixed
Assignee | ||
Comment 40•12 years ago
|
||
Bug 766274, especially bug 766274 comment 2 and bug 766274 comment16, show that this happens with regular (not proxy) HTTP auth and that this patch fixes the problem when applied to beta 16.
Comment 41•12 years ago
|
||
Comment on attachment 642096 [details] [diff] [review] v1 + test [Triage Comment] This bug has a medium risk evaluation, so I want to be very clear with why we've decided to approve for Beta. 1) We've seen reports here and in bug 766274, with very standard browser configurations 2) Our alternatives are to back out bug 722034 (deemed risky) or wontfix 3) Brian believes we'll see regressions on Nightly/Aurora over the next week, if any Given this, we're willing to take the fix on mozilla-beta with an escape valve of backing out and wontfixing before next Friday's final release build. In the future, bugs like this need to get attention much sooner in the release.
Attachment #642096 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 43•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/e34ef88d3b1e (In reply to Alex Keybl [:akeybl] from comment #41) > In the future, bugs like this need to get attention much sooner in the > release. I agree. And, also, Alex did everything he could to help push this along. The delay was mostly my fault.
status-firefox16:
--- → fixed
Assignee | ||
Comment 44•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/bf691ea80fa4 Add syncWithCacheIOThread to head_cache.js from bug 737615 (written by michal and r=honzab) because the tests require it, a=bustage
Reporter | ||
Comment 45•12 years ago
|
||
Alex, even you were not very happy to approve this patch, I have to say again it is quit important, I was hitting this issue very often. Also, for Beta I personally more trust my patch (the first patch "v1 + test" added to this bug) since it only and only adds 401 and 407 to an already well tested code path that simply doesn't throw away cache input stream that early. The patch has been reviewed, however it hasn't spent any time on m-c or m-a for real testing... Hence it may not be a very good option either...
Comment 46•12 years ago
|
||
No QA verification since this has in-testsuite coverage. Please add verifyme keyword to request manual verification.
Whiteboard: [qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•