Closed
Bug 1450874
Opened 7 years ago
Closed 7 years ago
Crash in nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild
Categories
(Core :: DOM: Service Workers, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox59 | --- | unaffected |
firefox60 | --- | unaffected |
firefox61 | + | fixed |
People
(Reporter: calixte, Assigned: bkelly)
References
(Blocks 1 open bug, )
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(2 files)
(deleted),
patch
|
asuth
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
asuth
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-4a398844-871f-4b8c-ab8e-c4c870180403.
=============================================================
Top 10 frames of crashing thread:
0 xul.dll nsTArray_Impl<RefPtr<mozilla::ComputedStyle>, nsTArrayInfallibleAllocator>::AppendElement<already_AddRefed<mozilla::ComputedStyle>, nsTArrayInfallibleAllocator> xpcom/ds/nsTArray.h:2286
1 xul.dll mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild dom/workers/WorkerLoadInfo.cpp:472
2 xul.dll mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded dom/serviceworkers/ServiceWorkerPrivate.cpp:1822
3 xul.dll mozilla::dom::ServiceWorkerPrivate::SendFetchEvent dom/serviceworkers/ServiceWorkerPrivate.cpp:1757
4 xul.dll static nsresult mozilla::dom::`anonymous namespace'::ContinueDispatchFetchEventRunnable::Run dom/serviceworkers/ServiceWorkerManager.cpp:2328
5 xul.dll nsPermissionManager::WhenPermissionsAvailable extensions/cookie/nsPermissionManager.cpp:3444
6 xul.dll static void <lambda_499c2de420bd99c0430356c54ec813d7>::operator dom/serviceworkers/ServiceWorkerManager.cpp:2496
7 xul.dll nsresult mozilla::detail::RunnableFunction<<lambda_499c2de420bd99c0430356c54ec813d7> >::Run xpcom/threads/nsThreadUtils.h:551
8 xul.dll mozilla::net::HttpBaseChannel::EnsureUploadStreamIsCloneable netwerk/protocol/http/HttpBaseChannel.cpp:918
9 xul.dll mozilla::dom::ServiceWorkerManager::DispatchFetchEvent dom/serviceworkers/ServiceWorkerManager.cpp:2509
=============================================================
There are 223 crashes (from 17 installations) in nightly 61 with buildid 20180402220122. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1448141.
[1] https://hg.mozilla.org/mozilla-central/rev?node=c22501d278b4
Flags: needinfo?(bkelly)
Reporter | ||
Updated•7 years ago
|
Crash Signature: [@ nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild] → [@ nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild]
[@ mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild]
[@ libxul.so@0x1bf53d7 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIf…
Comment 1•7 years ago
|
||
This is the #4 top crash for Firefox 61.0a1 on Windows:
https://crash-stats.mozilla.com/signature/?product=Firefox&version=61.0a1&signature=nsTArray_Impl%3CT%3E%3A%3AAppendElement%3CT%3E%20%7C%20mozilla%3A%3Adom%3A%3AWorkerLoadInfo%3A%3AInterfaceRequestor%3A%3AMaybeAddTabChild&date=%3E%3D2018-03-27T10%3A54%3A15.000Z&date=%3C2018-04-03T10%3A54%3A15.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#summary
Windows 7 206 79.5%
Windows 10 49 18.9%
Windows 8.1 4 1.5%
Comment 2•7 years ago
|
||
I can easily reproduce this by attempting to load:
https://www.cnet.com/news/mozilla-new-firefox-reality-browser-is-where-vr-meets-the-web/
Reporter | ||
Updated•7 years ago
|
Crash Signature: mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded] → mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded]
[@ libxul.so@0x1be2eb7 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded]
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Updated•7 years ago
|
Priority: -- → P1
Assignee | ||
Comment 3•7 years ago
|
||
Mike, does it reproduce in a clean profile for you? Any chance you can save off the profile for me to look at?
Flags: needinfo?(mconley)
Assignee | ||
Updated•7 years ago
|
Blocks: ServiceWorkers-stability
Comment 4•7 years ago
|
||
(In reply to Ben Kelly [Part-time, not reviewing][:bkelly] from comment #3)
> Mike, does it reproduce in a clean profile for you? Any chance you can save
> off the profile for me to look at?
Hm, no, it doesn't.
This is my default profile - I'm a little reluctant to send it anywhere, really. I trust you, certainly, but I'm a bit antsy about putting it anywhere, mailing it somehow, etc.
Is there some debugging information I can give you instead?
Flags: needinfo?(mconley) → needinfo?(bkelly)
Assignee | ||
Comment 5•7 years ago
|
||
I think this is perhaps related to bug 1448141 P4. If the service worker script fails to load we don't actually clear our mWorkerPrivate. So we can try dispatching to a shutting down worker.
Assignee | ||
Comment 6•7 years ago
|
||
Mike, can you send me these files and folders from your profile?
serviceworker.txt
storage/default/https+++www.cnet.com
The cnet.com folder may have a userContextId part as well if you are using containers.
Flags: needinfo?(bkelly) → needinfo?(mconley)
Assignee | ||
Comment 7•7 years ago
|
||
Also, if you could at least save your current profile off that would help me. I want to provide you a test build later today.
Assignee | ||
Comment 8•7 years ago
|
||
This is kind of a stab in the dark since I haven't reproduced the issue myself yet.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b22577b77790d6de49f28011bd3b1ea8d11a13c
Reporter | ||
Comment 9•7 years ago
|
||
I can always reproduce the crash with the cnet.com url:
https://crash-stats.mozilla.com/report/index/e9e5610e-6d94-4baa-b78e-940b50180403
Assignee | ||
Comment 10•7 years ago
|
||
Comment 11•7 years ago
|
||
Affects outlook.live.com as well. Just try opening any received mail.
https://crash-stats.mozilla.com/report/index/d620645e-37f6-46fa-b383-02ead0180403
Comment 12•7 years ago
|
||
Hey bkelly - the try build appears to fix the crash for me!
Flags: needinfo?(mconley)
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #12)
> Hey bkelly - the try build appears to fix the crash for me!
Ok. It appears to still fail in a debug build, though... trying to get that sorted and then I'll post for review. I'll also try to write a test.
Assignee | ||
Comment 14•7 years ago
|
||
Assignee | ||
Comment 15•7 years ago
|
||
Comment on attachment 8964592 [details] [diff] [review]
P1 If a service worker script self closes due to a compilation error, then don't re-use the worker for further events. r=asuth
Andrew, this patch makes us follow our TerminateWorker() path if we detect that the service worker thread started shutting down on its own.
Self-shutdown was not possible in previous code, but was introduced in bug 1448141. When the worker script cannot be compiled we now immediately enter the closing state on the worker thread. This prevents the service worker code from accidentally holding it alive while it dispatched functional events.
The ServiceWorkerPrivate code, however, was never written with the expectation that worker thread would terminate itself. So this patch catches the condition by checking for a non-running status and treating it like a normal terminate.
I believe we are hitting this path because navigating to a controlled page effectively queues up multiple fetch events before we are able to run the update to clear the bad service worker. The navigation tries to start the service worker, fails, but then loads from the network. Then a style sheet or script resource tries to hit the service worker and triggers the crash.
Note, I can't completely explain the last frame in the crash stack. It seems the strong ref from SWP should keep the WorkerPrivate and it's info's nsTArray at least in memory. Its possible there is something further wrong due to WorkerRef, etc, or maybe the crash stack is just bogus.
In any case, I think this check is reasonably safe and local testing shows that it fixes the crash.
Attachment #8964592 -
Flags: review?(bugmail)
Assignee | ||
Comment 16•7 years ago
|
||
Comment on attachment 8964618 [details] [diff] [review]
P2 Don't save service worker time stamps if the fetch event didn't actually dispatch. r=asuth
This patch makes us skip trying to set the various time stamps after resuming a failed channel. Ideally we would avoid calling this method, but fixing that would require additional plumbing I would prefer not to rush for this top crasher.
Attachment #8964618 -
Flags: review?(bugmail)
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 17•7 years ago
|
||
BTW, my local STR are:
1. Visit https://blog.wanderview.com/blog/2017/03/13/firefox-52-settimeout-changes/
2. Close the browser
3. Go to your profile dir and delete storage/default/https+++blog.wanderview.com
4. Open the browser and visit the URL in (1) again.
I'm going to see if I can trigger in a test now.
Assignee | ||
Comment 18•7 years ago
|
||
I can't seem to get an automated test to trigger this crash. I think its too timing dependent. I'll probably just go without a test.
Comment 19•7 years ago
|
||
Ben, did you try by using a Marionette test? Not sure which harness you were using but only Marionette offers a real restart of the browser.
Comment 20•7 years ago
|
||
(In reply to Ben Kelly [Part-time, not reviewing][:bkelly] from comment #15)
> Note, I can't completely explain the last frame in the crash stack. It
> seems the strong ref from SWP should keep the WorkerPrivate and it's info's
> nsTArray at least in memory. Its possible there is something further wrong
> due to WorkerRef, etc, or maybe the crash stack is just bogus.
Disassembly for the linux crasher https://crash-stats.mozilla.com/report/index/e9e5610e-6d94-4baa-b78e-940b50180403 via radare2 seems to show that we're dying dereferencing mLoadInfo.mInterfaceRequestor and it's null. Specifically, we're in InterfaceRequestor::MaybeAddTabChild, we've returned from the call to do_GetWeakReference(tabChild) and are doing [(first arg to InterfaceRequestor::MaybeAddTabChild which should be the 'this') + 0x20], and the assembly then does another dereference on that with +0x20 before calling the actual nsTArray append method.
I need to go to dinner but will finish review this evening.
Assignee | ||
Comment 21•7 years ago
|
||
Ok. I think that makes sense. This clears the mInterfaceRequestor if we have started closing:
https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#2858
Assignee | ||
Comment 22•7 years ago
|
||
Sorry, its more likely cleared here:
https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#288
In any case, I still think checking the running status is a fine fix.
Updated•7 years ago
|
Attachment #8964592 -
Flags: review?(bugmail) → review+
Updated•7 years ago
|
Attachment #8964618 -
Flags: review?(bugmail) → review+
Comment 23•7 years ago
|
||
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d602877c654a
P1 If a service worker script self closes due to a compilation error, then don't re-use the worker for further events. r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/b054a5905c4b
P2 Don't save service worker time stamps if the fetch event didn't actually dispatch. r=asuth
Comment 24•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d602877c654a
https://hg.mozilla.org/mozilla-central/rev/b054a5905c4b
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Updated•7 years ago
|
Crash Signature: mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded]
[@ libxul.so@0x1be2eb7 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded] → mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded]
[@ libxul.so@0x1be2eb7 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded]
[@ libxul.so@0x1bd9997 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded]
Assignee | ||
Comment 25•7 years ago
|
||
We've gone a few buildid's without this crash triggering, so I believe this fix worked.
Updated•7 years ago
|
tracking-firefox61:
--- → +
You need to log in
before you can comment on or make changes to this bug.
Description
•