Closed Bug 761228 Opened 12 years ago Closed 12 years ago

Startup Crash - Assertion failure: mCacheAsyncInputStream, at ../mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450

Categories

(Core :: Networking: HTTP, defect)

15 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16
Tracking Status
firefox13 --- unaffected
firefox14 --- unaffected
firefox15 + disabled
firefox16 + fixed

People

(Reporter: mconley, Assigned: briansmith)

References

Details

(Keywords: crash, regression, Whiteboard: [startupcrash])

Attachments

(3 files, 1 obsolete file)

On a Linux debug build of comm-central.

STR:

1. Start up Thunderbird, and open up the "What's New" tab.
2. Close Thunderbird with the "What's New" tab still open. 
3. Restart Thunderbird. The "What's New" tab should restore itself.
4. Wait a few seconds.

What happens?

Crash!

What's expected?

No crashes please.

Backtrace:

#0  0xb78ce424 in __kernel_vsyscall ()
#1  0xb7672c16 in nanosleep () at ../sysdeps/unix/syscall-template.S:82
#2  0xb7672a0f in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3  0xb4428669 in ah_crap_handler (signum=6) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsSigHandlers.cpp:87
#4  0xb442f07b in nsProfileLock::FatalSignalHandler (signo=6, info=0xbfb7297c, context=0xbfb729fc) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/toolkit/profile/nsProfileLock.cpp:190
#5  <signal handler called>
#6  0xb78ce424 in __kernel_vsyscall ()
#7  0xb78a7dde in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#8  0xb45005fe in mozilla::net::nsHttpChannel::ReadFromCache (this=0x9f116c00, alreadyMarkedValid=false) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450
#9  0xb4501538 in mozilla::net::nsHttpChannel::ProcessNotModified (this=0x9f116c00) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:2186
#10 0xb450475a in mozilla::net::nsHttpChannel::ProcessResponse (this=0x9f116c00) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:1250
#11 0xb4504b46 in mozilla::net::nsHttpChannel::OnStartRequest (this=0x9f116c00, request=0x9f13e640, ctxt=0x0) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:4747
#12 0xb4451e15 in nsInputStreamPump::OnStateStart (this=0x9f13e640) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/base/src/nsInputStreamPump.cpp:416
#13 0xb4452568 in nsInputStreamPump::OnInputStreamReady (this=0x9f13e640, stream=0x9f19853c) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/base/src/nsInputStreamPump.cpp:367
#14 0xb55fd8eb in nsInputStreamReadyEvent::Run (this=0x9f13f680) at /media/Projects/mozilla/thunderbird/mozilla/xpcom/io/nsStreamUtils.cpp:81
#15 0xb56125f7 in nsThread::ProcessNextEvent (this=0xb734cf20, mayWait=true, result=0xbfb730bf) at /media/Projects/mozilla/thunderbird/mozilla/xpcom/threads/nsThread.cpp:624
#16 0xb55ceec8 in NS_ProcessNextEvent_P (thread=<optimized out>, mayWait=true) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/xpcom/build/nsThreadUtils.cpp:213
#17 0xb54f37b4 in mozilla::ipc::MessagePump::Run (this=0xb2619490, aDelegate=0xb735b900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/glue/MessagePump.cpp:113
#18 0xb563fddc in MessageLoop::RunInternal (this=0xb735b900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:208
#19 0xb563fe01 in RunHandler (this=0xb735b900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:201
#20 MessageLoop::Run (this=0xb735b900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:175
#21 0xb50fd715 in nsBaseAppShell::Run (this=0xb12512e0) at /media/Projects/mozilla/thunderbird/mozilla/widget/xpwidgets/nsBaseAppShell.cpp:163
#22 0xb4f42a74 in nsAppStartup::Run (this=0xb1278130) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/components/startup/nsAppStartup.cpp:256
#23 0xb4423298 in XREMain::XRE_mainRun (this=0xbfb733e0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3786
#24 0xb4423546 in XREMain::XRE_main (this=0xbfb733e0, argc=1, argv=0xbfb746a4, aAppData=0xb73158c0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3863
#25 0xb44237ad in XRE_main (argc=1, argv=0xbfb746a4, aAppData=0xb73158c0, aFlags=0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3939
#26 0x0804a0ee in do_main (argv=0xbfb746a4, argc=1, exePath=0xbfb735dc "/media/Projects/mozilla/objdir-thunderbird-patches/mozilla/dist/bin/") at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:111
#27 main (argc=<error reading variable: Cannot access memory at address 0x6c9>, argv=<error reading variable: Cannot access memory at address 0x6cd>) at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:200
Summary: Startup Crash - Assertion failure: mCacheAsyncInputStream, at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450 → Startup Crash - Assertion failure: mCacheAsyncInputStream, at ../mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450
I see this on windows as well.
OS: Linux → All
This assertion that we're failing seems to have been introduced via bug 746018.
I'm seeing it maybe 1/3 of the time on Mac
This is how it looks in my terminal when it crashes:

Failed to load file:///Users/florian/buildhg/comm-central/obj-tbird/mozilla/dist/DailyDebug.app/Contents/MacOS/chrome/messenger/content/messenger/AccountManager.js
++DOMWINDOW == 28 (0x22deb3ec) [serial = 28] [outer = 0x1e7260f0]
WARNING: NS_ENSURE_TRUE(mScriptGlobalObject) failed: file /Users/florian/buildhg/comm-central/mozilla/content/xul/document/src/nsXULDocument.cpp, line 3598
WARNING: NS_ENSURE_TRUE((mCachedResponseHead->Status() / 100 != 3) || isCachedRedirect) failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2941
WARNING: cache check failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2857
--DOMWINDOW == 27 (0x22de992c) [serial = 27] [outer = 0x1e7260f0] [url = chrome://messenger/content/msgAccountCentral.xul]
--DOMWINDOW == 26 (0x22d9975c) [serial = 17] [outer = 0x228894d0] [url = about:blank]
--DOMWINDOW == 25 (0x22d7e4cc) [serial = 16] [outer = 0x2288f4b0] [url = about:blank]
--DOMWINDOW == 24 (0x22dd3e6c) [serial = 18] [outer = 0x1e7260f0] [url = about:blank]
Assertion failure: mCacheAsyncInputStream, at /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450


And the same part of the log when it doesn't crash:
Failed to load file:///Users/florian/buildhg/comm-central/obj-tbird/mozilla/dist/DailyDebug.app/Contents/MacOS/chrome/messenger/content/messenger/AccountManager.js
++DOMWINDOW == 28 (0x24d1beac) [serial = 28] [outer = 0x1e726060]
WARNING: NS_ENSURE_TRUE(mScriptGlobalObject) failed: file /Users/florian/buildhg/comm-central/mozilla/content/xul/document/src/nsXULDocument.cpp, line 3598
WARNING: NS_ENSURE_TRUE(mCachedResponseHead) failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2126
--DOMWINDOW == 27 (0x22de970c) [serial = 27] [outer = 0x1e726060] [url = chrome://messenger/content/msgAccountCentral.xul]
--DOMWINDOW == 26 (0x22d9969c) [serial = 17] [outer = 0x228894d0] [url = about:blank]
--DOMWINDOW == 25 (0x22d7e40c) [serial = 16] [outer = 0x2288f4c0] [url = about:blank]
--DOMWINDOW == 24 (0x22dd3d9c) [serial = 18] [outer = 0x1e726060] [url = about:blank]
Note: The 2 logs in comment 4 were with -purgecaches on the command line. Without it, I see an additional assertion, immediately after the "Failed to load" message:

Failed to load file:///Users/florian/buildhg/comm-central/obj-tbird/mozilla/dist/DailyDebug.app/Contents/MacOS/chrome/messenger/content/messenger/AccountManager.js
###!!! ASSERTION: Existing entry in disk StartupCache.: 'zipItem == nsnull', file /Users/florian/buildhg/comm-central/mozilla/startupcache/StartupCache.cpp, line 344
I've just tried with a fresh profile, and some of the messages in comment 4 and 5 don't appear.

In my terminal before the crash I see:
WARNING: NS_ENSURE_TRUE((mCachedResponseHead->Status() / 100 != 3) || isCachedRedirect) failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2941
WARNING: cache check failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2857
Assertion failure: mCacheAsyncInputStream, at /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450


And when there's no crash I see:
WARNING: NS_ENSURE_TRUE(mCachedResponseHead) failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2126
I reported this yesterday before the merge, so I'm pretty sure this crash just got merged over to comm-aurora.

I'm gonna take a run at this today to see what's going on.
Assignee: nobody → mconley
Having read into the patch a bit more, I'm starting to suspect that this is indeed caused by bug 746018. I'm compiling without those patches to confirm.
Yep, the patches in bug 746018 introduce the assertion, and our debug builds crash with them, and don't without them.

So either we've hit an edge case, or we're somehow abusing nsHttpChannel and need to change our ways. Either way, marking this as blocking 746018.
Blocks: 746018
I filed the platform bug, bug 761660 to track the fixing of the platform bug that is causing this crash. Even if you are doing something unusual with nsHttpChannel, we should not be crashing.

It would very very useful to get a log of this, as I cannot reproduce. Please run Daily with NSPR_LOG_MODULES=nsHttp:5 and add the log to this bug or to bug 761660. Thanks!
(In reply to Brian Smith (:bsmith) from comment #10)
> I filed the platform bug, bug 761660 to track the fixing of the platform bug
> that is causing this crash. Even if you are doing something unusual with
> nsHttpChannel, we should not be crashing.
> 

Note that this crash is due to an assertion failure, and I think we want to crash on those - so the crash, while unpleasant, is not necessarily unexpected / unwanted.

> It would very very useful to get a log of this, as I cannot reproduce.
> Please run Daily with NSPR_LOG_MODULES=nsHttp:5 and add the log to this bug
> or to bug 761660. Thanks!

Sure, posting in bug 761660.
Attached file Debug log (deleted) —
(In reply to Brian Smith (:bsmith) from comment #10)

> It would very very useful to get a log of this, as I cannot reproduce.
> Please run Daily with NSPR_LOG_MODULES=nsHttp:5 and add the log to this bug
> or to bug 761660. Thanks!

The build I used to generate this log has the
MOZ_ASSERT(mCacheAsyncInputStream); line commented out so that I could
work on something else, but I added a comment in the log at the point where the crash would occur.
Note to Thunderbird developers annoyed by this crash:

The debug log I attached shows that we crash during the HTTP request for testpilot.mozillalabs.com/testcases/index-tb.json so I think we can workaround the crash by disabling the Test Pilot add-on.
(In reply to Florian Quèze from comment #13)
> Note to Thunderbird developers annoyed by this crash:
> 
> The debug log I attached shows that we crash during the HTTP request for
> testpilot.mozillalabs.com/testcases/index-tb.json so I think we can
> workaround the crash by disabling the Test Pilot add-on.

Interesting - I wonder if TP (at least under Thunderbird) is trying to initialize before all the infrastructure is set up? If so, it would be nice to get a more human-readable error message.

Is there a well defined point in the platform startup when the HTTP channel caching becomes ready to use?
I don't think this has to do with using the HTTP cache before it is ready to be used.

The issue seems to be related to this check failing:

   NS_ENSURE_TRUE((mCachedResponseHead->Status() / 100 != 3) || isCachedRedirect)

It looks like we wrote a 3xx response into the cache and this new stricter check I added is refusing to cache that. But, then we get a 304 response from the server and we start to read from the cache entry input stream that we didn't create.

Still haven't determined why we're sending a conditional request. I wonder if TestPilot is doing a custom conditional request; if so, the problem is probably that we do not set mCustomConditionalRequest before we do the above check.
Test Pilot is trying to explicitly manage this particular file download to use a local cached version if it is unable to retrieve the on-line version; see

http://mxr.mozilla.org/comm-central/source/mail/app/profile/extensions/tbtestpilot@labs.mozilla.com/modules/remote-experiment-loader.js#448

and

http://mxr.mozilla.org/comm-central/source/mail/app/profile/extensions/tbtestpilot@labs.mozilla.com/modules/remote-experiment-loader.js#85

Copy Gregg Lind on the bug in case he can help with the specifics of the test pilot experiment loader.
Attached patch test case (obsolete) (deleted) — Splinter Review
Here is the test case. If you run it on an unpatched build, it will crash. With the upcoming patch, there is no crash and we correctly process the response.

make -C netwerk/test
SOLO_FILE=test_customConditionalRequest_304.js make -C netwerk/test check-one
Attachment #632111 - Flags: review?(jduell.mcbugs)
Comment on attachment 632111 [details] [diff] [review]
test case

Will update the test case to include another case.
Attachment #632111 - Flags: review?(jduell.mcbugs)
Attached patch test case [v2] (deleted) — Splinter Review
Here's the test case that can be used to reproduce the crash.
Assignee: mconley → bsmith
Attachment #632111 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #632187 - Flags: review?(jduell.mcbugs)
The bug was mainly that we returned from CheckCache() before setting mCustomConditionalRequest. This patch fixes the bug by remembering that that the request is a custom conditional request before we would ever return from CheckCache.
Attachment #632188 - Flags: review?(jduell.mcbugs)
No longer depends on: 761660
Assignee: bsmith → nobody
Component: General → Networking: HTTP
Product: Thunderbird → Core
QA Contact: general → networking.http
Hardware: x86 → All
Target Milestone: --- → mozilla16
Version: unspecified → 15 Branch
Assignee: nobody → bsmith
Comment on attachment 632187 [details] [diff] [review]
test case [v2]

Review of attachment 632187 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_304_responses.js
@@ +33,5 @@
> +  response.setHeader("Returned-From-Handler", "1");
> +}
> +
> +function run_test() {
> +  evict_cache_entries();

As mentioned on IRC, we don't currently store cache entries across XPCshell tests, so no need to evict at startup.

@@ +64,5 @@
> +
> +// Test that we can cope with a 304 response that was (erroneously) stored in
> +// the cache.
> +add_test(function test_304_stored_in_cache() {
> +  asyncOpenCacheEntry(

It's really nice that we have this new ability to create cache entries in tests.  Good stuff.
Attachment #632187 - Flags: review?(jduell.mcbugs) → review+
Comment on attachment 632188 [details] [diff] [review]
Fix 304 response handling for custom conditional responses and clarify handling of unexpected 304 responses

Review of attachment 632188 [details] [diff] [review]:
-----------------------------------------------------------------

Irving/Gregg:

> Test Pilot is trying to explicitly manage this particular file download to use a 
> local cached version if it is unable to retrieve the on-line version

That code does it's own, custom store of the response data into a file that has nothing to do with the HTTP cache.  AFACIT from glancing over it should work OK with this patch (you seem to handle the 304 case explicitly and use your copy).
Attachment #632188 - Flags: review?(jduell.mcbugs) → review+
Component: Networking: HTTP → Networking
Product: Core → MailNews Core
Target Milestone: mozilla16 → ---
Version: 15 Branch → 15
Apologies for noise, need to clear the now not-necessary Thunderbird tracking flag.
Component: Networking → Networking: HTTP
Product: MailNews Core → Core
Target Milestone: --- → mozilla16
Version: 15 → 15 Branch
Whiteboard: [startupcrash]
Bug 746018 has been in the release since FF10. Why should this be tracked for release now? Is this a high volume crash?
(In reply to Alex Keybl [:akeybl] from comment #25)
> Bug 746018 has been in the release since FF10. Why should this be tracked
> for release now? Is this a high volume crash?

No, that landed in FF15 and this is a regression since FF15.
(In reply to Brian Smith (:bsmith) from comment #26)
> (In reply to Alex Keybl [:akeybl] from comment #25)
> > Bug 746018 has been in the release since FF10. Why should this be tracked
> > for release now? Is this a high volume crash?
> 
> No, that landed in FF15 and this is a regression since FF15.

Mistook the "version" for the "target milestone". We'll track this new startup crash regression for FF15 in that case.
https://hg.mozilla.org/mozilla-central/rev/28d3f3778315
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Brian Smith (:bsmith) from comment #28)
> https://hg.mozilla.org/mozilla-central/rev/28d3f3778315

Is this good to uplift to FF15 at this point? Or does the patch carry significant risk of regression?
(In reply to Alex Keybl [:akeybl] from comment #29)
> (In reply to Brian Smith (:bsmith) from comment #28)
> > https://hg.mozilla.org/mozilla-central/rev/28d3f3778315
> 
> Is this good to uplift to FF15 at this point? Or does the patch carry
> significant risk of regression?

This was caused by the patches for bug 746018, which got backed out of Aurora 15 in bug 722034 comment 79.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: