Closed Bug 589296 Opened 14 years ago Closed 14 years ago

Crash in [@ nsHttpChannel::ContinueProcessResponse(unsigned int) ][@ nsHttpChannel::ContinueProcessResponse]

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta7+

People

(Reporter: jst, Assigned: mayhemer)

References

Details

(4 keywords, Whiteboard: [mozmill], [4b5])

Crash Data

Attachments

(1 file)

I've seen this crash on startup randomly twice now (recent trunk builds). Not able to reproduce tho. This was in a build with the patches from bug 513008 and bug 549767 applied, so could in theory be caused by those patches. Anyways, here's the stack:

#2  <signal handler called>
#3  0x00007f1dc7e9d7fa in nsHttpChannel::ContinueProcessResponse (this=
    0x7f1db87f1800, rv=2147500035)
    at ../../../../mozilla/netwerk/protocol/http/nsHttpChannel.cpp:1033
#4  0x00007f1dc7e9889f in nsHttpChannel::OnRedirectVerifyCallback (this=
    0x7f1db87f1800, result=2147500035)
    at ../../../../mozilla/netwerk/protocol/http/nsHttpChannel.cpp:4374
#5  0x00007f1dc7e2e50c in nsAsyncVerifyRedirectCallbackEvent::Run (
    this=<value optimized out>)
    at ../../../../mozilla/netwerk/base/src/nsAsyncRedirectVerifyHelper.cpp:73
#6  0x00007f1dc88207e4 in nsThread::ProcessNextEvent (this=0x7f1dc60491f0, 
    mayWait=0, result=0x7fffb728169c)
    at ../../../mozilla/xpcom/threads/nsThread.cpp:547

The crash is on the following line of code:

    if (mTransaction->SSLConnectFailed()) {

and mTransaction is null. mURI and mOriginalURI in the channel are both at this point "http://en-us.start3.mozilla.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official". Here's what p *this says:

(gdb) p *this
$12 = (nsHttpChannel) {<mozilla::net::HttpBaseChannel> = 
    {<nsHashPropertyBag> = {<nsIWritablePropertyBag> = {<nsIPropertyBag> = 
    {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7490}, <No data fields>}, <No data fields>}, <nsIWritablePropertyBag2> = {<nsIPropertyBag2> = {<nsIPropertyBag> = {<nsISupports> = {
              _vptr.nsISupports = 
    0x7f1dc92a7900}, <No data fields>}, <No data fields>}, <No data fields>}, 
      mRefCnt = {mValue = 7}, mPropertyHash = 
    {<nsBaseHashtable<nsStringHashKey, nsCOMPtr<nsIVariant>, nsIVariant*>> = 
    {<nsTHashtable<nsBaseHashtableET<nsStringHashKey, nsCOMPtr<nsIVariant> > >> = {mTable = {ops = 0x7f1dc9466c90, data = 0x0, hashShift = 28, maxAlphaFrac = 
    192 '\300', minAlphaFrac = 64 '@', entrySize = 32, entryCount = 1, 
              removedCount = 0, generation = 0, entryStore = 
    0x7f1db6c44e00 ""}}, <No data fields>}, <No data fields>}}, <nsIHttpChannel> = {<nsIChannel> = {<nsIRequest> = {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a79e8}, <No data fields>}, <No data fields>}, <No data fields>}, <nsIHttpChannelInternal> = {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7b78}, <No data fields>}, <nsIUploadChannel> = {<nsISupports> = {
        _vptr.nsISupports = 
    0x7f1dc92a7be8}, <No data fields>}, <nsIUploadChannel2> = {<nsISupports> = 
    {_vptr.nsISupports = 
    0x7f1dc92a7c20}, <No data fields>}, <nsISupportsPriority> = 
    {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7c50}, <No data fields>}, <nsIResumableChannel> = 
    {<nsISupports> = {_vptr.nsISupports = 0x7f1dc92a7c90}, <No data fields>}, 
    mURI = {<nsCOMPtr_base> = {mRawPtr = 0x7f1db592c3c0}, <No data fields>}, 
    mOriginalURI = {<nsCOMPtr_base> = {mRawPtr = 
    0x7f1db592c3c0}, <No data fields>}, mDocumentURI = {<nsCOMPtr_base> = {
        mRawPtr = 0x7f1db592c3c0}, <No data fields>}, mListener = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mListenerContext = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mLoadGroup = 
    {<nsCOMPtr_base> = {mRawPtr = 0x7f1db5b9e1a0}, <No data fields>}, mOwner = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mCallbacks = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mProgressSink = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mReferrer = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mApplicationCache = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mRequestHead = {
      mHeaders = {mHeaders = {<nsTArray_base> = {static sEmptyHdr = {mLength = 
    0, mCapacity = 0, mIsAutoArray = 0}, mHdr = 
    0x7f1db59b5ec0}, <No data fields>}}, mMethod = {_val = 
    0x7f1dc8b52b43 "GET"}, mVersion = 11 '\v', mRequestURI = 
    {<nsACString_internal> = {mData = 
    0x7f1db386d388 "/firefox?client=firefox-a&rls=org.mozilla:en-US:official", 
          mLength = 56, mFlags = 5}, <No data fields>}}, mUploadStream = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mResponseHead = {
      mRawPtr = 0x7f1db4eb1ba0}, mConnectionInfo = {mRawPtr = 0x7f1db59d7340}, 
    mSpec = {<nsACString_internal> = {mData = 
    0x7f1db59d8108 "http://en-us.start3.mozilla.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official", mLength = 87, mFlags = 5}, <No data fields>}, 
    mContentTypeHint = {<nsACString_internal> = {mData = 0x7f1dc94d9c90 "", 
        mLength = 0, mFlags = 1}, <No data fields>}, mContentCharsetHint = 
    {<nsACString_internal> = {mData = 0x7f1dc94d9c90 "", mLength = 0, mFlags = 
    1}, <No data fields>}, mUserSetCookieHeader = {<nsACString_internal> = {
        mData = 0x7f1dc94d9c90 "", mLength = 0, mFlags = 
    1}, <No data fields>}, mEntityID = {<nsACString_internal> = {mData = 
    0x7f1dc94d9c90 "", mLength = 0, mFlags = 1}, <No data fields>}, 
    mStartPos = 18446744073709551615, mStatus = 2147549183, mLoadFlags = 
    6881280, mPriority = 0, mCaps = 1 '\001', mRedirectionLimit = 20 '\024', 
    mCanceled = 0, mIsPending = 0, mWasOpened = 1, mResponseHeadersModified = 
    0, mAllowPipelining = 1, mForceAllowThirdPartyCookie = 0, 
    mUploadStreamHasHeaders = 0, mInheritApplicationCache = 0, 
    mChooseApplicationCache = 0, mLoadedFromApplicationCache = 
    0}, <nsIStreamListener> = {<nsIRequestObserver> = {<nsISupports> = {
        _vptr.nsISupports = 
    0x7f1dc92a7cc8}, <No data fields>}, <No data fields>}, <nsICachingChannel> = {<nsICacheInfoChannel> = {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7d08}, <No data fields>}, <No data fields>}, <nsICacheListener> = {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7db8}, <No data fields>}, <nsIEncodedChannel> = {<nsISupports> = 
    {_vptr.nsISupports = 
    0x7f1dc92a7de8}, <No data fields>}, <nsITransportEventSink> = 
    {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7e28}, <No data fields>}, <nsIProtocolProxyCallback> = 
    {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7e58}, <No data fields>}, <nsIHttpAuthenticableChannel> = 
    {<nsIProxiedChannel> = {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7e88}, <No data fields>}, <No data fields>}, <nsITraceableChannel> = {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7f30}, <No data fields>}, <nsIApplicationCacheChannel> = 
    {<nsIApplicationCacheContainer> = {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7f60}, <No data fields>}, <No data fields>}, <nsIAsyncVerifyRedirectCallback> = {<nsISupports> = {_vptr.nsISupports = 
    0x7f1dc92a7fc0}, <No data fields>}, mSecurityInfo = {<nsCOMPtr_base> = {
      mRawPtr = 0x0}, <No data fields>}, mProxyRequest = {<nsCOMPtr_base> = {
      mRawPtr = 0x0}, <No data fields>}, mTransactionPump = {mRawPtr = 0x0}, 
  mTransaction = {mRawPtr = 0x0}, mLogicalOffset = 278, mCacheEntry = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, mCachePump = {
    mRawPtr = 0x0}, mCachedResponseHead = {mRawPtr = 0x0}, mCacheAccess = 0, 
  mPostID = 0, mRequestTime = 1281980612, mOnCacheEntryAvailableCallback = 
    NULL, mAsyncCacheOpen = 1, mOfflineCacheEntry = {<nsCOMPtr_base> = {
      mRawPtr = 0x0}, <No data fields>}, mOfflineCacheAccess = 65280, 
  mOfflineCacheClientID = {<nsACString_internal> = {mData = 0x7f1dc94d9c90 "", 
      mLength = 0, mFlags = 1}, <No data fields>}, mAuthProvider = 
    {<nsCOMPtr_base> = {mRawPtr = 0x0}, <No data fields>}, 
  mPendingAsyncCallOnResume = NULL, mTargetProxyInfo = {<nsCOMPtr_base> = {
      mRawPtr = 0x0}, <No data fields>}, mSuspendCount = 0, mFallbackKey = 
    {<nsACString_internal> = {mData = 0x7f1dc94d9c90 "", mLength = 0, mFlags = 
    1}, <No data fields>}, mRedirectURI = {<nsCOMPtr_base> = {mRawPtr = 
    0x7f1db592c580}, <No data fields>}, mRedirectChannel = {<nsCOMPtr_base> = {
      mRawPtr = 0x0}, <No data fields>}, mRedirectType = 301, 
  mApplyConversion = 1, mCachedContentIsValid = 0, mCachedContentIsPartial = 
    0, mTransactionReplaced = 0, mAuthRetryPending = 0, mResuming = 0, 
  mInitedCacheEntry = 0, mCacheForOfflineUse = 0, mCachingOpportunistically = 
    0, mFallbackChannel = 0, mTracingEnabled = 1, mCustomConditionalRequest = 
    0, mFallingBack = 0, mWaitingForRedirectCallback = 0, mRemoteChannel = 0, 
  mRequestTimeInitialized = 1, mRedirectFuncStack = {<nsTArray_base> = {
      static sEmptyHdr = {mLength = 0, mCapacity = 0, mIsAutoArray = 0}, 
      mHdr = 0x7f1dc94d9788}, <No data fields>}}

That's about all the information I have about this for now, but if specific information would be useful here I can try to dig more out of the core file that I do have saved.
Severity: normal → critical
This happens at startup with no stored tabs, just loading the default startpage? IPC-build?
Keywords: crash
Summary: Crash on startup in nsHttpChannel::ContinueProcessResponse(). → Crash on startup in [@ nsHttpChannel::ContinueProcessResponse()]
(coming from bug 589843...) I hit this restoring a session on 32bit OS X 10.5.8
http://crash-stats.mozilla.com/report/index/bp-dab7e8a6-5f45-4572-bf9c-ad6862100823
OS: Linux → All
Hardware: x86_64 → All
Blocking since others are seeing this as well.

Bjarne, I hit this on startup with a regular trunk Firefox build, so IPC enabled, but not actually doing any IPC for networking. I was not restoring a session, just loading the default start page.
blocking2.0: --- → final+
Keywords: regression
Are any of you experiencing this problem have Firebug installed?
Ah, I found mTracingEnabled = PR_TRUE that is used by Firebug in the p *this, but it is enabled by default.
To all having chance to reproduce this, please set following env vars to always produce logs, it may help to figure out:

NSPR_LOG_FILE=somepath/log.log
NSPR_LOG_MODULES=URILoader:5,LoadGroup:5,DocLoader:5,nsHttp:5,nsSocketTransport:5
http://tinyurl.com/23mcb2z shows this is currently the #4 top crash on the trunk, but there are no comments in any of these reports.
(In reply to comment #5)
> Are any of you experiencing this problem have Firebug installed?

I have it installed but not enabled.

(In reply to comment #7)
> To all having chance to reproduce this, please set following env vars to always
> produce logs, it may help to figure out:
> 
> NSPR_LOG_FILE=somepath/log.log
> NSPR_LOG_MODULES=URILoader:5,LoadGroup:5,DocLoader:5,nsHttp:5,nsSocketTransport:5

Will this work in normal nightlies?
Per the crash reports this does not appear to be a startup only crash, most happen thousands of seconds after startup.
Summary: Crash on startup in [@ nsHttpChannel::ContinueProcessResponse()] → Crash in [@ nsHttpChannel::ContinueProcessResponse()]
(In reply to comment #9)
...
> Will this work in normal nightlies?

Yes.
To me this looks like nsHttpChannel::OnStopRequest() has been called before the nsHttpChannel is is informed about the decision for the redirect (by  nsAsyncVerifyRedirectCallbackEvent::Run() ). Even tho all the sinks currently respond in a synchronous way, the decision is sent to the nsHttpChannel in an event.

OnStopRequest() will clear a number of members, including mTransaction. See e.g. mCallbacks, mProgressSink and mIsPending.

I can try to whip up a test which triggers this problem.
Summary: Crash in [@ nsHttpChannel::ContinueProcessResponse()] → Crash in [@ nsHttpChannel::ContinueProcessResponse(unsigned int) ]
I slowly start thinking of bug 513008 is the culprit here.

I it is the only check-in between nightly 20100821040844 and first crashing 20100822040607 (between 5c5c3bf8dfeb and 29ac0213b54c).

I suspect the channel gets resumed or somehow canceled (cleaned) before we get the callback, same as Bjarne suspects.
Blocks: 513008
Checked that http://hg.mozilla.org/mozilla-central/rev/72d2863f43c7 is contained in 20100824040950 nightly and http://hg.mozilla.org/mozilla-central/rev/643758385088 in 20100822040607.

Also jst reported the patch was in his repo when the crash occurred.
Mozilla/5.0 (X11; Linux i686; rv:2.0b5pre) Gecko/20100826 Minefield/4.0b5pre

Just hit this on Linux: bp-a987478f-6264-4741-8eb3-a1f952100826

Frame 1 is nsHttpChannel::ProcessFailedSSLConnect

I middle-clicked a link and it had just started to load in a background tab, but trying again with that exact same link doesn't crash. This is the first time I've hit this crash (late into using this nightly) and I didn't hit it in yesterday's build which also had the fix for bug 513008, so it's difficult to narrow down when it happens.
(In reply to comment #4)
> Bjarne, I hit this on startup with a regular trunk Firefox build, so IPC
> enabled, but not actually doing any IPC for networking. I was not restoring a
> session, just loading the default start page.

Do you know if your cache is clean when you start? Phrased differently: Do you normally start up with a clean profile/cache, or could it be that cached content was being used when this crash was observed. (Just gathering more information.)

(In reply to comment #16)
> Just hit this on Linux: bp-a987478f-6264-4741-8eb3-a1f952100826
> 
> Frame 1 is nsHttpChannel::ProcessFailedSSLConnect

This stack is a little confusing to me...  could anyone explain how

nsAsyncVerifyRedirectCallbackEvent::Run()  can call
nsHttpChannel::ProcessFailedSSLConnect()   which then calls
nsHttpChannel::ContinueProcessResponse()   ?

If this is not explainable I guess this can be the root of the problem we see...
This is the top crasher.

We should collect logs from people.  Anybody experiencing this crash, please set following environment variables in a shell and start firefox from there:

NSPR_LOG_FILE=somepath/log.log
NSPR_LOG_MODULES=URILoader:5,LoadGroup:5,DocLoader:5,nsHttp:5,nsSocketTransport:5,cache:5

If the crash occurs, then BEFORE you RESTART firefox, please put the log at 'somepath' to this bug as an attachment or just email it to my bug mail.

Thanks a lot, we have to figure this out soon.

Also I vote for backing bug 513008 out for few days to see if it is triggering this crash.  JST, agree?
Keywords: topcrash
(In reply to comment #17)
> Do you know if your cache is clean when you start? Phrased differently: Do you
> normally start up with a clean profile/cache, or could it be that cached
> content was being used when this crash was observed. (Just gathering more
> information.)

This was not with a clean profile, the start page could very well have been in the cache.
This is the #1 topcrasher for 4.0b5pre, tentatively marking blocking beta5, unless beltzner disagrees. The earliest I can see in crash-stats is 27-August on Windows, and 23-August on Mac. This has different signatures for Mac/Linux and Windows.
blocking2.0: final+ → beta5+
Summary: Crash in [@ nsHttpChannel::ContinueProcessResponse(unsigned int) ] → Crash in [@ nsHttpChannel::ContinueProcessResponse(unsigned int) ][@ nsHttpChannel::ContinueProcessResponse]
I had this crash only happen once for me today, with last night's nightly, as I was opening a link from within Echofon.  http://crash-stats.mozilla.com/report/index/bp-2813c142-9628-4f6a-a4a9-6ab782100829

Haven't been able to reproduce since.
(In reply to comment #14)
> I slowly start thinking of bug 513008 is the culprit here.

After digging through this all day I tend to agree that this is probably caused by bug 513008. Not sure we should put all blame on it, though. :)

Some observations from the printout in comment #0:

mAsyncCacheOpen is set, meaning that OpenCacheEntry() found an entry. It was most likely found in the "normal" cache (let's ignore the app-cache for now) which means OpenNormalCacheEntry() was called. Furthermore, mOnCacheEntryAvailableCallback is cleared, meaning that OnCacheEntryAvailable() was called (before mIsPending was cleared). OnCacheEntryAvailable() has then called OnNormalCacheEntryAvailable(), which eventually calls Connect(FALSE).

Note that if OnNormalCacheEntryAvailable returns a failure-code, OnCacheEntryAvailable() will call CloseCacheEntry() and then AsyncAbort() which in turn calls HandleAsyncNotifyListener(), DoNotifyListener() and finally OnStopRequest(). Perhaps this is what happens here...?

Continuing, mRequestTime and mRequestTimeInitialized are set, meaning that SetupTransaction() was called. This probably happened from the call to Connect(FALSE) mentioned above. But it means that mCacheEntry at that point was not set, or it was set but CheckCache() fails.

What I don't see at the moment is how ProcessResponse() gets called. It means that OnStartRequest() is invoked somehow while mTransaction is set, but I don't see how. A little stuck for the moment...
--> beta6+
blocking2.0: beta5+ → -
did you mean to set "2.0-" ?   renominating.
blocking2.0: - → ?
blocking2.0: ? → beta6+
I got this crash once more, and this time I had the logging enabled, but the log file was empty! :( Repeated runs of firefox do produce data in the logs, but not that one time when I hit the crash. I can't imagine the logging isn't started before we hit this crash...
(In reply to comment #25)
> I got this crash once more, and this time I had the logging enabled, but the
> log file was empty! :( 

That just makes it a bit more interesting, no...?  :)  (Maybe we should file a new bug for this?)

Seriously: I'll try to summarize results from yesterday into a theory of what happens to the channel causing this crash.

The channel starts loading default startpage for minefield (mURI in dump from comment #0). Connect(TRUE) is called, entering the firstTime-block and calling OpenCacheEntry(). IMO (since |mAsyncCacheOpen| is set) OpenCacheEntry() calls down to OpenNormalCacheEntry() which returns success (if-block which sets |mAsyncCacheOpen| and |mOnCacheEntryAvailableCallback|). Hence, Connect(TRUE) returns success at the end of the firstTime-block.

So, default startpage is found in the cache, and since |mOnCacheEntryAvailableCallback| is cleared I believe OnCacheEntryAvailable() was called. OnNormalCacheEntryAvailable() is therefore called, which finally calls Connect(FALSE). Connect(FALSE) skips firstTime-block and I assume |mCacheEntry| has been set at this point, causing a call to CheckCache().

Now, since |mRequestTime| is set in comment #0, SetupTransaction() must have been called. I believe this is because CheckCache() decides that the entry must be validated, and Connect(FALSE) ends up calling SetupTransaction(). Loading the URI manually from the server confirms that there is no header setting the entry valid for any amount of time. (There are two concerns though: The response-code is 302 as opposed to 301 in comment #0, and Cache-Control: private is set. But lets ignore these for now...)

The channel now goes to the net to load the URI and on the response, ProcessResponse() is called which triggers calls to redirection-sinks, which eventually results in the callback to OnRedirectVerifyCallback(), which crashes.

Honza, Michal: Does this make sense or am I wrong in this analysis? Is there anything in async-reading-from-cache which can cause a call to OnStopRequest() after we have set up the transaction?
(In reply to comment #22)
> mAsyncCacheOpen is set, meaning that OpenCacheEntry() found an entry. It was

That's wrong. mAsyncCacheOpen==PR_TRUE means that AsyncOpenCacheEntry() succeeded and posted a request to the background thread.


> Note that if OnNormalCacheEntryAvailable returns a failure-code,
> OnCacheEntryAvailable() will call CloseCacheEntry() and then AsyncAbort() which
> in turn calls HandleAsyncNotifyListener(), DoNotifyListener() and finally
> OnStopRequest(). Perhaps this is what happens here...?

OnNormalCacheEntryAvailable() fails only in specific cases:

- channel was canceled before OnCacheEntryAvailable()
- cache entry wasn't found and LOAD_ONLY_FROM_CACHE was specified
- Connect(PR_FALSE) failed

Could it be one of these cases?
(In reply to comment #26)
> So, default startpage is found in the cache, and since
> |mOnCacheEntryAvailableCallback| is cleared I believe OnCacheEntryAvailable()
> was called. OnNormalCacheEntryAvailable() is therefore called, which finally
> calls Connect(FALSE). Connect(FALSE) skips firstTime-block and I assume
> |mCacheEntry| has been set at this point, causing a call to CheckCache().

But it could also happen that the startpage isn't found in the cache...


> Honza, Michal: Does this make sense or am I wrong in this analysis? Is there
> anything in async-reading-from-cache which can cause a call to OnStopRequest()
> after we have set up the transaction?

That makes sense with the possibility that the cache entry wasn't found. Regarding the OnStopRequest(), IMO the only possibility is via AsyncAbort() in OnCacheEntryAvailable().
(In reply to comment #25)
> I can't imagine the logging isn't
> started before we hit this crash...

Please add 'sync' among the modules to NSPR_LOG_MODULES.  Sorry, I forgot to mention it...

I have set my env to do this as well, but I cannot reproduce the crash for 3 days of intensive Minefield usage.

Bjarne: I would like to get answer to a different question - how is that possible that we are in the middle of opening a cache entry (waiting for OnCacheEntryAvaialble) and also in the middle of redirecting (waiting for a redirect callback).  I don't think it is possible, but I might be wrong...
(In reply to comment #29)
> Bjarne: I would like to get answer to a different question - how is that
> possible that we are in the middle of opening a cache entry (waiting for
> OnCacheEntryAvaialble) and also in the middle of redirecting (waiting for a
> redirect callback).  I don't think it is possible, but I might be wrong...

I actually think OnCacheEntryAvailable() has been called since |mOnCacheEntryAvailableCallback| is cleared. It must have been set since |mAsyncCacheOpen| is set, and there is only one place it is cleared afaics.
(In reply to comment #30)
> (In reply to comment #29)
> > Bjarne: I would like to get answer to a different question - how is that
> > possible that we are in the middle of opening a cache entry (waiting for
> > OnCacheEntryAvaialble) and also in the middle of redirecting (waiting for a
> > redirect callback).  I don't think it is possible, but I might be wrong...
> 
> I actually think OnCacheEntryAvailable() has been called since
> |mOnCacheEntryAvailableCallback| is cleared. It must have been set since
> |mAsyncCacheOpen| is set, and there is only one place it is cleared afaics.

Exactly. mAsyncCacheOpen is set to PR_FALSE only in OpenCacheEntry() and it's not an error.
Got this one today http://crash-stats.mozilla.com/report/index/bp-515d31ca-4b59-4fc7-952b-60e4f2100901
on Mozilla/5.0 (Windows NT 6.0; rv:2.0b6pre) Gecko/20100901 Firefox/4.0b6pre is it this bug?
Finding it hard/impossible to reproduce. FWIW I got it shortly after trying to reproduce Bug 590247
Would enable logging from now on in case it happens again.
Partially backing out nsHttpChannel only changes and disabling ProcessRequest call on the background thread.
Attachment #471285 - Flags: review?(michal.novotny)
Ok, here's a theory:

Say ContinueProcessResponse() is pushed on the manual callstack. Then, a new function f() is pushed on the stack, sinks are invoked followed by WaitForRedirectCallback(). When OnRedirectVerifyCallback() is called it pops f() and executes it. Say now that f() for whatever reason pushes another function g() on the stack and invokes sinks and WaitForRedirectCallback() again. When OnRedirectVerifyCallback() is called the second time it pops and executes g(). Here's the snag: Even if g() for some reason fails and somehow triggers OnStopRequest(), the manual callstack will pop and execute ContinueProcessResponse().

Could this be what we see? I.e. are there functions f() and g() which may cause this scenario? (Too late for me now to read code to figure out - will give it a shot tomorrow if still unresolved.)
Adding relnote keyword since I hit this testing Beta 5.
Keywords: relnote
Comment on attachment 471285 [details] [diff] [review]
v1 - backout bug 513008 [Check-in comment 39]

Seems to be correct.
Attachment #471285 - Flags: review?(michal.novotny) → review+
(In reply to comment #34)
> Could this be what we see? I.e. are there functions f() and g() which may cause
> this scenario? (Too late for me now to read code to figure out - will give it a
> shot tomorrow if still unresolved.)

I have not been able to identify candidates for f() and g() yet. Any ideas (or arguments why this is impossible) are appreciated. It's worth mentioning that ContinueProcessResponse() is called with failure-status, otherwise we wouldn't need |mTransaction| to be set.
Comment on attachment 471285 [details] [diff] [review]
v1 - backout bug 513008 [Check-in comment 39]

Not sure this backout/disability patch needs a=2.0...
Attachment #471285 - Flags: approval2.0?
Attachment #471285 - Flags: approval2.0? → approval2.0+
Comment on attachment 471285 [details] [diff] [review]
v1 - backout bug 513008 [Check-in comment 39]

http://hg.mozilla.org/mozilla-central/rev/a93c9118f16d
Attachment #471285 - Attachment description: v1 - backout bug 513008 → v1 - backout bug 513008 [Check-in comment 39]
We are getting this crash mostly each day for our Mozmill testrun against trunk builds. See http://crash-stats.mozilla.com/report/index/4d04336a-9dd8-41fc-aa0c-388f42100905
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Whiteboard: [mozmill]
(In reply to comment #40)
> We are getting this crash mostly each day for our Mozmill testrun against trunk
> builds. See
> http://crash-stats.mozilla.com/report/index/4d04336a-9dd8-41fc-aa0c-388f42100905

After the backout? Any chance of getting a log?
Looks like the crash happened for our update tests and we weren't able to update our builds on that machine. I have now updated the builds manually, means the build id is now 100906. I'm sure that will fix the crashes.
Back out of bug 513008 apparently helped to get rid of this crash on nightly builds.  Curious is that this crash is exceptionally only windows crash.

We cannot get the log from people now.  If anyone is willing to get log, please use this nightly build, and disable auto-update:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010-08-31-04-mozilla-central/

I'll try to figure out using a human brain debugger what's going on here.
(In reply to comment #43)
> Curious is that this crash is exceptionally only windows crash.

This bug was filed on Linux and comment 3 confirmed it on OS X.
(In reply to comment #42)
> Looks like the crash happened for our update tests and we weren't able to
> update our builds on that machine. I have now updated the builds manually,
> means the build id is now 100906. I'm sure that will fix the crashes.

Any chance to identify which part of your tests which caused this crash consistently and report it here so we could try to reproduce? We're pretty much in the dark here..
looks like the backout change did not make it into beta5.  the crash volume is growing fast with this signature and is about 8 times the volume of the #2 crash in early b5 testing before release.  wonder if we should consider respinning b5 to pick up this change?
Whiteboard: [mozmill] → [mozmill], [4b5]
(In reply to comment #46)
> looks like the backout change did not make it into beta5.

I assumed that that was intentional given comment 23.
Beta 5 was just released; will have to wait for beta 6. I may have screwed this up, but based on the comments at the time of code freeze, it didn't look like a solution was readily apparent, or a known set of triggers was, either.

We should keep an eye on this as the beta goes out there; are we convinced that it's happening randomly / through normal use, or is it exacerbated by a certain type of page or software addition? Based on the volume I would have expected to experience it, and to date have not, though I'm not a perfect litmus test for our userbase (I also missed the Farmville problems, woe!)
(In reply to comment #45)
> (In reply to comment #42)
> > Looks like the crash happened for our update tests and we weren't able to
> > update our builds on that machine. I have now updated the builds manually,
> > means the build id is now 100906. I'm sure that will fix the crashes.
> 
> Any chance to identify which part of your tests which caused this crash
> consistently and report it here so we could try to reproduce? We're pretty much
> in the dark here..

I'm at a conference right now and can't check our Mozmill tests when this crash happens. Anthony, can you have a look at this? You should take a nightly build before Aug 31 and simply check when the build crashes. If you can reproduce it on qa-set I can try to get a stack and local variable output end of this week.
Henric and/or Anthony, if you guys try to reproduce this then please make sure that you run with the environment variables set as Honza mentions earlier in this bug, getting logs from a crashing build would be very very valuable here, more so than getting this in a debugger, even. To recap, here's the environment variables you should make sure are set:

NSPR_LOG_FILE=firefox.log
NSPR_LOG_MODULES=URILoader:5,LoadGroup:5,DocLoader:5,nsHttp:5,nsSocketTransport:5,sync

and if you hit this crash, make firefox.log available to Honza and others (ideally attach it to this bug, or if it contains private info we can work out other ways to get the necessary parts to the people who need it).
Choff: how has this crash reacted to our wider beta audience? Still near the top?
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/4.0b5 shows this is currently the #1 crash for B5, with 9399 crashes on Windows.

(In reply to comment #51)
> Choff: how has this crash reacted to our wider beta audience? Still near the
> top?
(In reply to comment #49)
> Anthony, can you have a look at this? You should take a nightly build
> before Aug 31 and simply check when the build crashes. If you can reproduce it
> on qa-set I can try to get a stack and local variable output end of this week.

So far, I've been unable to reproduce this crash.
Honza: When I review trunk crash-stats, I still see a handful of crashes such as http://crash-stats.mozilla.com/report/index/381abd8a-4146-4764-8b88-e0ba92100908. That person was using yesterday's build. If you look http://tinyurl.com/32e2a78 and select the "Table" tab, you can see that overall the numbers have dwindled significantly since the 2010090100 build.

(In reply to comment #43)
> Back out of bug 513008 apparently helped to get rid of this crash on nightly
> builds.  Curious is that this crash is exceptionally only windows crash.
> 
> We cannot get the log from people now.  If anyone is willing to get log, please
> use this nightly build, and disable auto-update:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010-08-31-04-mozilla-central/
> 
> I'll try to figure out using a human brain debugger what's going on here.
Marcia, those are tracemonkey builds.  See:
http://hg.mozilla.org/tracemonkey/shortlog/8d80930e10f3

- the latest tracemonkey changeset is 9ce0b72525ba (2010-09-08 01:13 -0700)
- after that you can see merge with mozilla-central where a93c9118f16d is present.

I cannot find the tracemonkey nightly build ID that contains this fix, but I believe this is it.  I cannot see a different cause of the bug then patch for bug 513008.
I *think* we can call this fixed, but Honza would know with more certainty as he was looking into crash data after this change landed etc. Honza, what's your thoughts, can we close this and open a new bug to track getting the backed out changes re-landed once we figure out what the problem is?
I think we already have that tracking bug.  And I cannot create a fix unless we re-land the backout as the fix is part of that code.  So, I'll fix and re-land in that followup bug.

Closing this one.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Can we get this landed on mozilla-central GECKO20b5_20100831_RELBRANCH as well, please? We may want to do a beta6 based on this.
If we get patch from bug 594882 reviewed soon, we may rather land the true fix.  Otherwise I'll do that.
Please land the fix we know works, which is the backout. We can take the true fix in Beta 6.
This was also fixed on the relbranch (see bug 513008 for cset)
Crash Signature: [@ nsHttpChannel::ContinueProcessResponse(unsigned int) ] [@ nsHttpChannel::ContinueProcessResponse]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: