Closed
Bug 956284
Opened 11 years ago
Closed 11 years ago
Fault in cycle collector: overflowing refcount
Categories
(Core :: DOM: Workers, defect)
Tracking
()
VERIFIED
FIXED
mozilla29
Tracking | Status | |
---|---|---|
firefox26 | --- | unaffected |
firefox27 | --- | unaffected |
firefox28 | + | verified |
firefox29 | + | verified |
firefox-esr24 | --- | unaffected |
b2g18 | --- | unaffected |
b2g-v1.1hd | --- | unaffected |
b2g-v1.2 | --- | unaffected |
b2g-v1.3 | --- | fixed |
b2g-v1.3T | --- | fixed |
b2g-v1.4 | --- | fixed |
People
(Reporter: whimboo, Assigned: ttaubert)
References
Details
(5 keywords, Whiteboard: [mozmill])
Attachments
(1 file)
(deleted),
patch
|
khuey
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
By running our Mozmill tests for Firefox Aurora we noticed a lot of aborts caused by overflowing refcounts in Firefox:
Report:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/mozilla-aurora_functional/16324/console
Message:
Fault in cycle collector: overflowing refcount (ptr: 0x10fd748b0)
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226
So far this problem only seems to exist on OS X across different platform versions and always seem to happen on exit.
Reporter | ||
Updated•11 years ago
|
Whiteboard: [mozmill]
Comment 1•11 years ago
|
||
My guess is that we end up having refcnt 0 - 1 for some reason. Don't or didn't we have some
worker related bug for this?
Reporter | ||
Comment 2•11 years ago
|
||
Also happens for holly branch builds with slightly different lines of code:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/project_functional/233/console
Fault in cycle collector: overflowing refcount (ptr: 0x10de89870)
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1248
###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1248
status-firefox27:
--- → ?
status-firefox28:
--- → affected
status-firefox29:
--- → affected
tracking-firefox28:
--- → ?
tracking-firefox29:
--- → ?
There's bug 937220.
Reporter | ||
Comment 4•11 years ago
|
||
Finally I got a crash report for the holly build: bp-fe34fe81-c94a-4e50-a419-62d612140103
Stack:
0 libmozalloc.dylib mozalloc_abort(char const*) memory/mozalloc/mozalloc_abort.cpp
1 XUL Abort /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp
2 XUL NS_DebugBreak /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp
3 XUL GCGraphBuilder::DescribeRefCountedNode(unsigned int, char const*) xpcom/base/nsCycleCollector.cpp
4 XUL nsDOMEventTargetHelper::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) content/events/src/nsDOMEventTargetHelper.cpp
5 XUL nsXHREventTarget::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) content/base/src/nsXMLHttpRequest.cpp
6 XUL mozilla::dom::workers::XMLHttpRequest::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) dom/workers/XMLHttpRequest.cpp
7 XUL GCGraphBuilder::Traverse(PtrInfo*) xpcom/base/nsCycleCollector.cpp
8 XUL nsCycleCollector::MarkRoots(js::SliceBudget&) xpcom/base/nsCycleCollector.cpp
9 XUL nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*) xpcom/base/nsCycleCollector.cpp
10 XUL nsCycleCollector_collect(nsICycleCollectorListener*) xpcom/base/nsCycleCollector.cpp
11 XUL mozilla::CycleCollectedJSRuntime::OnGC(JSGCStatus) xpcom/base/CycleCollectedJSRuntime.cpp
12 XUL Collect /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/js/src/../../../../js/src/jsgc.cpp
13 XUL mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) dom/workers/WorkerPrivate.cpp
14 XUL (anonymous namespace)::GarbageCollectRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) dom/workers/WorkerPrivate.cpp
15 XUL mozilla::dom::workers::WorkerRunnable::Run() dom/workers/WorkerRunnable.cpp
16 XUL mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnablesLocked() dom/workers/WorkerPrivate.cpp
17 XUL mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) dom/workers/WorkerPrivate.cpp
18 XUL (anonymous namespace)::WorkerThreadPrimaryRunnable::Run() dom/workers/RuntimeService.cpp
19 XUL nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp
20 XUL NS_ProcessNextEvent(nsIThread*, bool) /builds/slave/m-in-osx64-0000000000000000000/build/xpcom/glue/nsThreadUtils.cpp
21 XUL mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/ipc/glue/../../../../ipc/glue/MessagePump.cpp
22 XUL MessageLoop::Run() /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/ipc/chromium/../../../../ipc/chromium/src/base/message_loop.cc
23 XUL nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp
24 libnss3.dylib _pt_root
25 libsystem_pthread.dylib libsystem_pthread.dylib@0x1899
26 libsystem_pthread.dylib libsystem_pthread.dylib@0x172a
27 libsystem_pthread.dylib libsystem_pthread.dylib@0x5fc9
28 libnss3.dylib null_signal_handler
Keywords: crashreportid
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> There's bug 937220.
So does it mean it is a dupe? Given the stack I would say so. If it helps I could try to find a pattern to get it reproduced.
Comment 6•11 years ago
|
||
This actually also looks like it could be a top mac crash, as it shows up overall 24 in the top crash list, which has to be fairly bad as it is OSX only.
https://crash-stats.mozilla.com/report/index/ba3b3008-31b7-4310-b521-2e9d32140102
Depends on: 937220
Comment 7•11 years ago
|
||
Oops, the other bug talks about that already, never mind.
Updated•11 years ago
|
Component: XUL → DOM: Workers
Comment 8•11 years ago
|
||
What are the steps to reproduce this? Maybe that would help with a fix.
Group: core-security
Comment 9•11 years ago
|
||
Probably just a dupe, so I'm going to set it as sec-other.
Keywords: sec-other
Reporter | ||
Comment 10•11 years ago
|
||
I can see this on and off during our Mozmill tests. I don't have a pattern yet but I will try to nail it down, and hopefully get manual steps.
Reporter | ||
Comment 13•11 years ago
|
||
No,sorry. I'm completely filled up with work on bug 956315. I asked Cosmin to do the investigation on the test side via bug 959562. You might want to trigger him.
Flags: needinfo?(hskupin)
Reporter | ||
Comment 14•11 years ago
|
||
Could this bug in some way be related to bug 937220 which also has been started with Firefox 28?
Comment 15•11 years ago
|
||
I think there's a reasonable chance it is the same issue.
Reporter | ||
Updated•11 years ago
|
Comment 16•11 years ago
|
||
Henrik, have you made any headway with the investigation here? If not, can this be offloaded to someone else on your team?
Reporter | ||
Comment 17•11 years ago
|
||
Nope, sorry. I was hoping to get an update from bug 937220. Cosmin, would you be able to continue here in finding a minimzed testcase for our testrun?
Flags: needinfo?(cosmin.malutan)
Assignee | ||
Comment 18•11 years ago
|
||
I tried to bisect this and it looks like https://hg.mozilla.org/mozilla-central/rev/8196cbbcadcd from bug 935032 might be the culprit. Does that sound plausible or is my testcase broken?
Comment 19•11 years ago
|
||
That seems a little unlikely to me.
This is almost certainly a regression from bug 928312.
Assignee | ||
Comment 21•11 years ago
|
||
The refcount overflows for a XMLHttpRequest instance that tries to request "resource://gre/modules/workers/lz4_internal.js".
Assignee | ||
Comment 22•11 years ago
|
||
It looks like XMLHttpRequest::Unpin() is getting called even when XMLHttpRequest::MaybePin() fails. Should Unpin() check mRooted?
MaybePin() bails out after calling mWorkerPrivate->AddFeature() unsuccessfully.
(In reply to Tim Taubert [:ttaubert] from comment #22)
> It looks like XMLHttpRequest::Unpin() is getting called even when
> XMLHttpRequest::MaybePin() fails. Should Unpin() check mRooted?
>
> MaybePin() bails out after calling mWorkerPrivate->AddFeature()
> unsuccessfully.
Mmm, that was something I had convinced myself couldn't happen. AddFeature fails because the worker is shutting down, presumably?
I think the fix is probably to change http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.cpp#1642 to set aRv to a failure code before returning.
Assignee | ||
Comment 25•11 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> Mmm, that was something I had convinced myself couldn't happen. AddFeature
> fails because the worker is shutting down, presumably?
Yes, this is all happening on shutdown.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #24)
> I think the fix is probably to change
> http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.
> cpp#1642 to set aRv to a failure code before returning.
Yeah, I wondered why we check aRv even though MaybePin() doesn't touch it.
Great debugging work btw.
Assignee | ||
Comment 27•11 years ago
|
||
Let's give it a shot.
Attachment #8366748 -
Flags: review?(khuey)
Attachment #8366748 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 28•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Flags: needinfo?(cosmin.malutan)
Reporter | ||
Updated•11 years ago
|
Comment 29•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Comment 30•11 years ago
|
||
I checked the latest tinderbox build and passed our tests. It didn't failed in 30 runs so I can say is fixed.
Reporter | ||
Comment 31•11 years ago
|
||
Thanks Cosmin! Lets see how the Nightly build from today will behave. If all is fine please mark this bug as verified, and also update status-firefox29 to verified.
Assignee | ||
Updated•11 years ago
|
Assignee | ||
Comment 32•11 years ago
|
||
Comment on attachment 8366748 [details] [diff] [review]
0001-Bug-956284-Set-error-code-for-MaybePin-if-AddFeature.patch
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 928312
User impact if declined: Possible crash on shutdown.
Testing completed (on m-c, etc.): First reports after landing are looking good.
Risk to taking this patch (and alternatives if risky): Small patch with a low risk.
String or IDL/UUID changes made by this patch: None.
Attachment #8366748 -
Flags: approval-mozilla-aurora?
Comment 33•11 years ago
|
||
I ran the tests that failed against Nightly build for 50 times and it passes.
Comment 34•11 years ago
|
||
This is where the work actually happened, so I'll set this to sec-high.
Updated•11 years ago
|
Keywords: regression
Updated•11 years ago
|
Attachment #8366748 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 35•11 years ago
|
||
Updated•11 years ago
|
status-b2g18:
--- → unaffected
status-b2g-v1.1hd:
--- → unaffected
status-b2g-v1.2:
--- → unaffected
status-b2g-v1.3:
--- → fixed
status-b2g-v1.4:
--- → fixed
Updated•11 years ago
|
status-firefox-esr24:
--- → unaffected
Reporter | ||
Comment 36•11 years ago
|
||
No more crashes happened today for our Mozmill testruns for Aurora builds on OS X. So all is working fine!
Status: RESOLVED → VERIFIED
Updated•11 years ago
|
status-b2g-v1.3T:
--- → fixed
Updated•10 years ago
|
Group: core-security
Updated•8 years ago
|
Keywords: csectype-intoverflow
You need to log in
before you can comment on or make changes to this bug.
Description
•