Closed Bug 956284 Opened 11 years ago Closed 11 years ago

Fault in cycle collector: overflowing refcount

Categories

(Core :: DOM: Workers, defect)

28 Branch
x86_64
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla29
Tracking Status
firefox26 --- unaffected
firefox27 --- unaffected
firefox28 + verified
firefox29 + verified
firefox-esr24 --- unaffected
b2g18 --- unaffected
b2g-v1.1hd --- unaffected
b2g-v1.2 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: whimboo, Assigned: ttaubert)

References

Details

(5 keywords, Whiteboard: [mozmill])

Attachments

(1 file)

By running our Mozmill tests for Firefox Aurora we noticed a lot of aborts caused by overflowing refcounts in Firefox: Report: http://mm-ci-master.qa.scl3.mozilla.com:8080/job/mozilla-aurora_functional/16324/console Message: Fault in cycle collector: overflowing refcount (ptr: 0x10fd748b0) ###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226 ###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1226 So far this problem only seems to exist on OS X across different platform versions and always seem to happen on exit.
Whiteboard: [mozmill]
My guess is that we end up having refcnt 0 - 1 for some reason. Don't or didn't we have some worker related bug for this?
Also happens for holly branch builds with slightly different lines of code: http://mm-ci-master.qa.scl3.mozilla.com:8080/job/project_functional/233/console Fault in cycle collector: overflowing refcount (ptr: 0x10de89870) ###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1248 ###!!! ABORT: cycle collector fault: file ../../../../xpcom/base/nsCycleCollector.cpp, line 1248
Finally I got a crash report for the holly build: bp-fe34fe81-c94a-4e50-a419-62d612140103 Stack: 0 libmozalloc.dylib mozalloc_abort(char const*) memory/mozalloc/mozalloc_abort.cpp 1 XUL Abort /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp 2 XUL NS_DebugBreak /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/base/../../../../xpcom/base/nsDebugImpl.cpp 3 XUL GCGraphBuilder::DescribeRefCountedNode(unsigned int, char const*) xpcom/base/nsCycleCollector.cpp 4 XUL nsDOMEventTargetHelper::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) content/events/src/nsDOMEventTargetHelper.cpp 5 XUL nsXHREventTarget::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) content/base/src/nsXMLHttpRequest.cpp 6 XUL mozilla::dom::workers::XMLHttpRequest::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) dom/workers/XMLHttpRequest.cpp 7 XUL GCGraphBuilder::Traverse(PtrInfo*) xpcom/base/nsCycleCollector.cpp 8 XUL nsCycleCollector::MarkRoots(js::SliceBudget&) xpcom/base/nsCycleCollector.cpp 9 XUL nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*) xpcom/base/nsCycleCollector.cpp 10 XUL nsCycleCollector_collect(nsICycleCollectorListener*) xpcom/base/nsCycleCollector.cpp 11 XUL mozilla::CycleCollectedJSRuntime::OnGC(JSGCStatus) xpcom/base/CycleCollectedJSRuntime.cpp 12 XUL Collect /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/js/src/../../../../js/src/jsgc.cpp 13 XUL mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) dom/workers/WorkerPrivate.cpp 14 XUL (anonymous namespace)::GarbageCollectRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) dom/workers/WorkerPrivate.cpp 15 XUL mozilla::dom::workers::WorkerRunnable::Run() dom/workers/WorkerRunnable.cpp 16 XUL mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnablesLocked() dom/workers/WorkerPrivate.cpp 17 XUL mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) dom/workers/WorkerPrivate.cpp 18 XUL (anonymous namespace)::WorkerThreadPrimaryRunnable::Run() dom/workers/RuntimeService.cpp 19 XUL nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp 20 XUL NS_ProcessNextEvent(nsIThread*, bool) /builds/slave/m-in-osx64-0000000000000000000/build/xpcom/glue/nsThreadUtils.cpp 21 XUL mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/ipc/glue/../../../../ipc/glue/MessagePump.cpp 22 XUL MessageLoop::Run() /builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/ipc/chromium/../../../../ipc/chromium/src/base/message_loop.cc 23 XUL nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp 24 libnss3.dylib _pt_root 25 libsystem_pthread.dylib libsystem_pthread.dylib@0x1899 26 libsystem_pthread.dylib libsystem_pthread.dylib@0x172a 27 libsystem_pthread.dylib libsystem_pthread.dylib@0x5fc9 28 libnss3.dylib null_signal_handler
Keywords: crashreportid
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3) > There's bug 937220. So does it mean it is a dupe? Given the stack I would say so. If it helps I could try to find a pattern to get it reproduced.
This actually also looks like it could be a top mac crash, as it shows up overall 24 in the top crash list, which has to be fairly bad as it is OSX only. https://crash-stats.mozilla.com/report/index/ba3b3008-31b7-4310-b521-2e9d32140102
Depends on: 937220
Oops, the other bug talks about that already, never mind.
Component: XUL → DOM: Workers
What are the steps to reproduce this? Maybe that would help with a fix.
Group: core-security
Probably just a dupe, so I'm going to set it as sec-other.
Keywords: sec-other
I can see this on and off during our Mozmill tests. I don't have a pattern yet but I will try to nail it down, and hopefully get manual steps.
Tracking in case it's not a dupe, but if it is we can untrack.
Blocks: 959562
Any luck tracking this down yet, Henrik?
Flags: needinfo?(hskupin)
No,sorry. I'm completely filled up with work on bug 956315. I asked Cosmin to do the investigation on the test side via bug 959562. You might want to trigger him.
Flags: needinfo?(hskupin)
Could this bug in some way be related to bug 937220 which also has been started with Firefox 28?
I think there's a reasonable chance it is the same issue.
Henrik, have you made any headway with the investigation here? If not, can this be offloaded to someone else on your team?
Nope, sorry. I was hoping to get an update from bug 937220. Cosmin, would you be able to continue here in finding a minimzed testcase for our testrun?
Flags: needinfo?(cosmin.malutan)
I tried to bisect this and it looks like https://hg.mozilla.org/mozilla-central/rev/8196cbbcadcd from bug 935032 might be the culprit. Does that sound plausible or is my testcase broken?
That seems a little unlikely to me.
This is almost certainly a regression from bug 928312.
The refcount overflows for a XMLHttpRequest instance that tries to request "resource://gre/modules/workers/lz4_internal.js".
It looks like XMLHttpRequest::Unpin() is getting called even when XMLHttpRequest::MaybePin() fails. Should Unpin() check mRooted? MaybePin() bails out after calling mWorkerPrivate->AddFeature() unsuccessfully.
(In reply to Tim Taubert [:ttaubert] from comment #22) > It looks like XMLHttpRequest::Unpin() is getting called even when > XMLHttpRequest::MaybePin() fails. Should Unpin() check mRooted? > > MaybePin() bails out after calling mWorkerPrivate->AddFeature() > unsuccessfully. Mmm, that was something I had convinced myself couldn't happen. AddFeature fails because the worker is shutting down, presumably?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23) > Mmm, that was something I had convinced myself couldn't happen. AddFeature > fails because the worker is shutting down, presumably? Yes, this is all happening on shutdown. (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #24) > I think the fix is probably to change > http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest. > cpp#1642 to set aRv to a failure code before returning. Yeah, I wondered why we check aRv even though MaybePin() doesn't touch it.
Let's give it a shot.
Attachment #8366748 - Flags: review?(khuey)
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Flags: needinfo?(cosmin.malutan)
Blocks: 937220
No longer depends on: 937220
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
I checked the latest tinderbox build and passed our tests. It didn't failed in 30 runs so I can say is fixed.
Thanks Cosmin! Lets see how the Nightly build from today will behave. If all is fine please mark this bug as verified, and also update status-firefox29 to verified.
Comment on attachment 8366748 [details] [diff] [review] 0001-Bug-956284-Set-error-code-for-MaybePin-if-AddFeature.patch [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 928312 User impact if declined: Possible crash on shutdown. Testing completed (on m-c, etc.): First reports after landing are looking good. Risk to taking this patch (and alternatives if risky): Small patch with a low risk. String or IDL/UUID changes made by this patch: None.
Attachment #8366748 - Flags: approval-mozilla-aurora?
I ran the tests that failed against Nightly build for 50 times and it passes.
This is where the work actually happened, so I'll set this to sec-high.
Keywords: sec-othersec-high
Keywords: regression
Attachment #8366748 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
No more crashes happened today for our Mozmill testruns for Aurora builds on OS X. So all is working fine!
Status: RESOLVED → VERIFIED
No longer blocks: 937220
Keywords: qawanted
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: