Closed Bug 1375659 Opened 7 years ago Closed 7 years ago

Intermittent test_quit_restart.py TestServerQuitApplication.test_attempt_quit | application crashed [@ mozilla::dom::FetchBodyConsumer<mozilla::dom::Response>::BeginConsumeBodyMainThread]

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: baku)

References

Details

(Keywords: crash, intermittent-failure, regression)

Crash Data

Attachments

(1 file, 2 obsolete files)

Not sure why this got filed under Headless. It's happening for all kind of builds during Marionette jobs. Here the stack details: [task 2017-06-22T15:24:09.981435Z] 15:24:09 INFO - Crash reason: SIGSEGV [task 2017-06-22T15:24:09.982957Z] 15:24:09 INFO - Crash address: 0x58 [task 2017-06-22T15:24:09.984698Z] 15:24:09 INFO - Process uptime: not available [task 2017-06-22T15:24:09.986195Z] 15:24:09 INFO - [task 2017-06-22T15:24:09.987801Z] 15:24:09 INFO - Thread 0 (crashed) [task 2017-06-22T15:24:09.989655Z] 15:24:09 INFO - 0 libxul.so!mozilla::dom::FetchBodyConsumer<mozilla::dom::Response>::BeginConsumeBodyMainThread [Response.h:0000dd6746c4 : 107 + 0x3] [task 2017-06-22T15:24:09.990874Z] 15:24:09 INFO - rax = 0x00007ffea920b1d0 rdx = 0x0000000000000000 [task 2017-06-22T15:24:09.992140Z] 15:24:09 INFO - rcx = 0x0000000000000003 rbx = 0x00007f5499837dd0 [task 2017-06-22T15:24:09.993801Z] 15:24:09 INFO - rsi = 0x00007ffea920b1d0 rdi = 0x0000000000000000 [task 2017-06-22T15:24:09.994974Z] 15:24:09 INFO - rbp = 0x00007ffea920b250 rsp = 0x00007ffea920b1b0 [task 2017-06-22T15:24:09.995641Z] 15:24:09 INFO - r8 = 0x0000000000000000 r9 = 0x00000000000007ca [task 2017-06-22T15:24:09.996262Z] 15:24:09 INFO - r10 = 0x00007ffea920b300 r11 = 0x0000000000000000 [task 2017-06-22T15:24:09.996408Z] 15:24:09 INFO - r12 = 0x00007ffea920b1d0 r13 = 0x00007ffea920b38f [task 2017-06-22T15:24:09.996994Z] 15:24:09 INFO - r14 = 0x00007ffea920b1c0 r15 = 0x00007f54c285b710 [task 2017-06-22T15:24:09.997179Z] 15:24:09 INFO - rip = 0x00007f54be8ee094 [task 2017-06-22T15:24:09.997734Z] 15:24:09 INFO - Found by: given as instruction pointer in context [task 2017-06-22T15:24:09.997979Z] 15:24:09 INFO - 1 libxul.so!BeginConsumeBodyRunnable<mozilla::dom::Response>::Run [FetchConsumer.cpp:0000dd6746c4 : 67 + 0x5] [task 2017-06-22T15:24:09.998689Z] 15:24:09 INFO - rbx = 0x00007f54cd064500 rbp = 0x00007ffea920b260 [task 2017-06-22T15:24:09.999260Z] 15:24:09 INFO - rsp = 0x00007ffea920b260 r12 = 0x00007f54b406fc28 [task 2017-06-22T15:24:09.999987Z] 15:24:09 INFO - r13 = 0x00007ffea920b38f r14 = 0x00007ffea920b2d0 [task 2017-06-22T15:24:10.000514Z] 15:24:09 INFO - r15 = 0x00007f54c285b710 rip = 0x00007f54be8ee3a7 [task 2017-06-22T15:24:10.001038Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.001099Z] 15:24:10 INFO - 2 libxul.so!nsThread::ProcessNextEvent(bool, bool*) + 0x36f [task 2017-06-22T15:24:10.001585Z] 15:24:10 INFO - rbx = 0x00007f54cd064500 rbp = 0x00007ffea920b370 [task 2017-06-22T15:24:10.002123Z] 15:24:10 INFO - rsp = 0x00007ffea920b270 r12 = 0x00007f54b406fc28 [task 2017-06-22T15:24:10.002648Z] 15:24:10 INFO - r13 = 0x00007ffea920b38f r14 = 0x00007ffea920b2d0 [task 2017-06-22T15:24:10.003194Z] 15:24:10 INFO - r15 = 0x00007f54c285b710 rip = 0x00007f54bfd5088f [task 2017-06-22T15:24:10.003732Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.004277Z] 15:24:10 INFO - 3 libxul.so!NS_ProcessPendingEvents(nsIThread*, unsigned int) + 0x3b [task 2017-06-22T15:24:10.004804Z] 15:24:10 INFO - rbx = 0x0000000000000000 rbp = 0x00007ffea920b3c0 [task 2017-06-22T15:24:10.005315Z] 15:24:10 INFO - rsp = 0x00007ffea920b380 r12 = 0x00007f54cd064500 [task 2017-06-22T15:24:10.005392Z] 15:24:10 INFO - r13 = 0x00007ffea920b38f r14 = 0x00000000ffffffff [task 2017-06-22T15:24:10.005988Z] 15:24:10 INFO - r15 = 0x00000000001e6fc2 rip = 0x00007f54c060a2bb [task 2017-06-22T15:24:10.006487Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.008246Z] 15:24:10 INFO - 4 libxul.so!mozilla::ShutdownXPCOM(nsIServiceManager*) + 0xf1 [task 2017-06-22T15:24:10.008774Z] 15:24:10 INFO - rbx = 0x00007f54cd064500 rbp = 0x00007ffea920b440 [task 2017-06-22T15:24:10.008842Z] 15:24:10 INFO - rsp = 0x00007ffea920b3d0 r12 = 0x00007ffea920b400 [task 2017-06-22T15:24:10.009332Z] 15:24:10 INFO - r13 = 0x00007f54cd0b3298 r14 = 0x00007ffea920b3f0 [task 2017-06-22T15:24:10.009846Z] 15:24:10 INFO - r15 = 0x00007f54bfd39040 rip = 0x00007f54c0610601 [task 2017-06-22T15:24:10.010335Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.010412Z] 15:24:10 INFO - 5 libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup [nsAppRunner.cpp:0000dd6746c4 : 1467 + 0x8] [task 2017-06-22T15:24:10.010924Z] 15:24:10 INFO - rbx = 0x00007f54cd0bd1f8 rbp = 0x00007ffea920b470 [task 2017-06-22T15:24:10.011463Z] 15:24:10 INFO - rsp = 0x00007ffea920b450 r12 = 0x0000000000000000 [task 2017-06-22T15:24:10.011549Z] 15:24:10 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000 [task 2017-06-22T15:24:10.011663Z] 15:24:10 INFO - r15 = 0x00007f54bfd39040 rip = 0x00007f54c09fb832 [task 2017-06-22T15:24:10.012212Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.012804Z] 15:24:10 INFO - 6 libxul.so!mozilla::DefaultDelete<ScopedXPCOMStartup>::operator() [UniquePtr.h:0000dd6746c4 : 528 + 0x5] [task 2017-06-22T15:24:10.013425Z] 15:24:10 INFO - rbx = 0x00007f54cd0bd1f8 rbp = 0x00007ffea920b490 [task 2017-06-22T15:24:10.013534Z] 15:24:10 INFO - rsp = 0x00007ffea920b480 r12 = 0x0000000000000000 [task 2017-06-22T15:24:10.014501Z] 15:24:10 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000 [task 2017-06-22T15:24:10.015044Z] 15:24:10 INFO - r15 = 0x00007f54bfd39040 rip = 0x00007f54c09fb7a3 [task 2017-06-22T15:24:10.015165Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.015786Z] 15:24:10 INFO - 7 libxul.so!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x3a5 [task 2017-06-22T15:24:10.016425Z] 15:24:10 INFO - rbx = 0x00007ffea920b530 rbp = 0x00007ffea920b520 [task 2017-06-22T15:24:10.017047Z] 15:24:10 INFO - rsp = 0x00007ffea920b4a0 r12 = 0x0000000000000000 [task 2017-06-22T15:24:10.017220Z] 15:24:10 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000 [task 2017-06-22T15:24:10.018086Z] 15:24:10 INFO - r15 = 0x00007f54bfd39040 rip = 0x00007f54c09f8655 [task 2017-06-22T15:24:10.018581Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.019257Z] 15:24:10 INFO - 8 libxul.so!XRE_main [nsAppRunner.cpp:0000dd6746c4 : 4869 + 0x5] [task 2017-06-22T15:24:10.019324Z] 15:24:10 INFO - rbx = 0x00007ffea920b530 rbp = 0x00007ffea920b700 [task 2017-06-22T15:24:10.019880Z] 15:24:10 INFO - rsp = 0x00007ffea920b530 r12 = 0x00007ffea920b680 [task 2017-06-22T15:24:10.020473Z] 15:24:10 INFO - r13 = 0x0000000000000005 r14 = 0x00007ffea920c868 [task 2017-06-22T15:24:10.021108Z] 15:24:10 INFO - r15 = 0x00007ffea920b720 rip = 0x00007f54c09f827f [task 2017-06-22T15:24:10.021164Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.021735Z] 15:24:10 INFO - 9 firefox!do_main(int, char**, char**) + 0x8a [task 2017-06-22T15:24:10.022342Z] 15:24:10 INFO - rbx = 0x0000000000421a73 rbp = 0x00007ffea920c750 [task 2017-06-22T15:24:10.022407Z] 15:24:10 INFO - rsp = 0x00007ffea920b710 r12 = 0x00007ffea920e7bc [task 2017-06-22T15:24:10.023005Z] 15:24:10 INFO - r13 = 0x00007ffea920c868 r14 = 0x00007f54c2791050 [task 2017-06-22T15:24:10.023630Z] 15:24:10 INFO - r15 = 0x0000000000000005 rip = 0x000000000041b34a [task 2017-06-22T15:24:10.023724Z] 15:24:10 INFO - Found by: call frame info [task 2017-06-22T15:24:10.024255Z] 15:24:10 INFO - 10 firefox!main + 0x74 [task 2017-06-22T15:24:10.024885Z] 15:24:10 INFO - rbx = 0x00007ffea920c868 rbp = 0x00007ffea920c780 [task 2017-06-22T15:24:10.025487Z] 15:24:10 INFO - rsp = 0x00007ffea920c760 r12 = 0x0000000000000005 [task 2017-06-22T15:24:10.025550Z] 15:24:10 INFO - r13 = 0x00007ffea920c898 r14 = 0x000001cf98e5226c [task 2017-06-22T15:24:10.026278Z] 15:24:10 INFO - r15 = 0x0000000000000000 rip = 0x0000000000411854 [task 2017-06-22T15:24:10.026397Z] 15:24:10 INFO - Found by: call frame info Maybe this is the same underlying crash as bug 1374922? Andrea, can you please have a look? Thanks.
Component: Headless → DOM
Flags: needinfo?(amarchesini)
Product: Firefox → Core
(In reply to Henrik Skupin (:whimboo) from comment #1) > Maybe this is the same underlying crash as bug 1374922? Andrea, can you > please have a look? Thanks. Andrea, can you please check?
Attached patch crash.patch (obsolete) (deleted) — Splinter Review
This patch does several things: 1. mBody and mGlobal are nullified only in the DTOR of FetchConsumer. 2. mShuttingDown is a boolean that is set to true when the consuming is done, or aborted. 3. mShuttingDown is used to avoid the beginning of the body consumption in case Fetch has been already aborted. 4. mConsumeBodyPump is nullified differently, directly when mShuttingDown is set to true.
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Attachment #8882749 - Flags: review?(bkelly)
Attached patch crash.patch (obsolete) (deleted) — Splinter Review
Same as before, but nsProxyRelease needs to use CancelableRunnables.
Attachment #8882749 - Attachment is obsolete: true
Attachment #8882749 - Flags: review?(bkelly)
Attachment #8882803 - Flags: review?(bkelly)
Crash Signature: [@ mozilla::dom::FetchBodyConsumer<mozilla::dom::Response>::BeginConsumeBodyMainThread] → [@ mozilla::dom::FetchBodyConsumer<mozilla::dom::Response>::BeginConsumeBodyMainThread] [@ mozilla::dom::FetchBodyConsumer<T>::BeginConsumeBodyMainThread]
This is top #14 of Nightly 20170702030204 on Windows.
Comment on attachment 8882803 [details] [diff] [review] crash.patch Review of attachment 8882803 [details] [diff] [review]: ----------------------------------------------------------------- I think we need to better handle the case where the worker thread is shutdown. I don't think the NS_ProxyRelease() stuff will work reliably since its not covered within a WorkerHolder, AFAICT. Also, our event targets on workers are a bit of a mess and make it harder to do this correctly. ::: dom/fetch/FetchConsumer.cpp @@ +391,1 @@ > : mTargetThread(NS_GetCurrentThread()) I think its better to use aGlobalObject->EventTargetFor(TaskCategory::Other) instead of NS_GetCurrentThread(). This will ensure the runnable is labeled properly. @@ +408,5 @@ > { > NS_ProxyRelease("FetchBodyConsumer::mBody", > mTargetThread, mBody.forget()); > + NS_ProxyRelease("FetchBodyConsumer::mGlobal", > + mTargetThread, mGlobal.forget()); So, this has a reasonable chance of leaking on a Worker thread. I think we should at least have a WorkerHolder here to ensure the thread is still alive. Also, I have some patches I am working on to make the WorkerPrivate::EventTarget class continue to accept runnables as long as a WorkerHolder is active. This would avoid these leaks. Alternatively, use the WorkerPrivate ControlEventTarget for the mTargetThread here. So should the consumer be its own Holder instead of being ref'd from FetchBodyWorkerHolder? ::: xpcom/threads/nsProxyRelease.h @@ +31,4 @@ > { > public: > ProxyReleaseEvent(const char* aName, already_AddRefed<T> aDoomed) > + : CancelableRunnable(aName), mDoomed(aDoomed.take()) {} I think this should override Cancel() to call Run(), right? We must execute the release on the target thread.
Attachment #8882803 - Flags: review?(bkelly) → review-
Attached patch crash.patch (deleted) — Splinter Review
Better approach without extra WorkerHolders.
Attachment #8882803 - Attachment is obsolete: true
Attachment #8884251 - Flags: review?(bkelly)
Comment on attachment 8884251 [details] [diff] [review] crash.patch Review of attachment 8884251 [details] [diff] [review]: ----------------------------------------------------------------- I'd like to chat in IRC before r+'ing this. Are you around today? ::: dom/fetch/FetchConsumer.cpp @@ +414,5 @@ > void > FetchBodyConsumer<Derived>::AssertIsOnTargetThread() const > { > + MOZ_ASSERT_IF(mGlobal, > + mGlobal->EventTargetFor(TaskCategory::Other) == mTargetThread); Can you write this as `MOZ_ASSERT(mTargetThread->IsOnCurrentThread())`? I think we sometimes return different event targets from EventTargetFor() on the main thread if we've entered xpcom shutdown, etc. Note, if its a WorkerPrivate::EventTarget then this will return false if the worker has started shutting down. I think we should probably make that work through shutdown, though.
Attachment #8884251 - Flags: review?(bkelly)
See this bug for something that might help here.
Comment on attachment 8884251 [details] [diff] [review] crash.patch I'm going to go ahead and r+ this since its a top crasher. I don't have any specific objections to the patch. I would like to chat at some point, though, as its not 100% clear to me how these changes fix the crash. You could also simply post a description of what the crash was and how this fixes in this bug. Also, please consider if you need bug 1379243 for the mGlobal->EventTargetFor() stuff to work right on worker threads.
Attachment #8884251 - Flags: review+
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e6a2348df217 FetchConsumer must check if the operation has been aborted before starting, r=bkelly
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Please request Beta/ESR52 approval on this (and associated bugs) whenever we're feeling confident that the crashes are solved.
Flags: needinfo?(amarchesini)
Comment on attachment 8884251 [details] [diff] [review] crash.patch Approval Request Comment [Feature/Bug causing the regression]: Fetch API [User impact if declined]: UAF in the Fetch Body consumption. [Is this code covered by automated tests?]: no, it's racy. [Has the fix been verified in Nightly?]: it was an intermittent failure. [Needs manual test from QE? If yes, steps to reproduce]: none [List of other uplifts needed for the feature/fix]: 1374922 [Is the change risky?]: not too much. [Why is the change risky/not risky?]: There is a special boolean value to avoid the creation of a fetch stream pump in case the operation has been already aborted before starting. [String changes made/needed]: none
Flags: needinfo?(amarchesini)
Attachment #8884251 - Flags: approval-mozilla-esr52?
Attachment #8884251 - Flags: approval-mozilla-beta?
Comment on attachment 8884251 [details] [diff] [review] crash.patch Should fix a crash, let's uplift this for beta 9.
Attachment #8884251 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8884251 [details] [diff] [review] crash.patch Since this is fallout from bug 1371889 let's uplift to ESR 53.3.0.
Attachment #8884251 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
This needs rebased patches for Beta/ESR52.
Flags: needinfo?(amarchesini)
Actually, we need to uplift bugs 1373555, 1374922 and 1375749. Do we want to proceed? the crash seems gone in m-i/m-c.
Flags: needinfo?(amarchesini) → needinfo?(ryanvm)
Do we need bug 1380604 uplifted as well? That fix seems related to the FetchConsumer stuff here.
Flags: needinfo?(amarchesini)
That one as well, yes.
Flags: needinfo?(amarchesini)
Keywords: regression
Flags: needinfo?(ryanvm)
This won't be landing on 55.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: