937220 - crash in mozalloc_abort(char const* const) | NS_DebugBreak (###!!! ABORT: cycle collector fault: file e:/builds/moz2_slave/m-cen-w64-ntly-000000000000000/build/xpcom/base/nsCycleCollector.cpp, line 1054)

Reporter

Description

•

11 years ago

This bug was filed from the Socorro interface and is report bp-321d7a93-a8f5-41da-a53a-d275b2131111. ============================================================= This increased in volume on 20131107 and has been ramping up in volume since then. 0 mozalloc.dll mozalloc_abort(char const * const) memory/mozalloc/mozalloc_abort.cpp 1 xul.dll NS_DebugBreak xpcom/base/nsDebugImpl.cpp

Andrew McCreight [:mccr8]

Comment 1

•

11 years ago

Are there reports with any information further up the stack? The first two frames are really generic.

Tracy Walker [:tracy]

Reporter

Comment 2

•

11 years ago

No, I clicked into a couple dozen random reports and they all look the same.

Robert Kaiser

Comment 3

•

11 years ago

What we do have is an abort message in the App Notes: xpcom_runtime_abort(###!!! ABORT: cycle collector fault: file e:/builds/moz2_slave/m-cen-w64-ntly-000000000000000/build/xpcom/base/nsCycleCollector.cpp, line 1054)

Summary: crash in mozalloc_abort(char const* const) | NS_DebugBreak → crash in mozalloc_abort(char const* const) | NS_DebugBreak (###!!! ABORT: cycle collector fault: file e:/builds/moz2_slave/m-cen-w64-ntly-000000000000000/build/xpcom/base/nsCycleCollector.cpp, line 1054)

Andrew McCreight [:mccr8]

Comment 4

•

11 years ago

That's quite peculiar. I'm not sure why it isn't printing out a more useful message.

Robert Kaiser

Comment 5

•

11 years ago

Given that cycle collector fault and that it rose up on Nightly at the same time as bug 937191, is this another regression from bug 928312?

Andrew McCreight [:mccr8]

Comment 6

•

11 years ago

Does the file path mean this is a Win64 crash, or is this just built on a Win64 machine or something?

Andrew McCreight [:mccr8]

Comment 7

•

11 years ago

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #5) > Given that cycle collector fault and that it rose up on Nightly at the same > time as bug 937191, is this another regression from bug 928312? That sounds plausible, but we don't have much to go on here.

Andrew McCreight [:mccr8]

Comment 8

•

11 years ago

If I'm reading the crash-stats report correctly, 100% of these 233 crashes are on AMD, which is unusual.

Andrew McCreight [:mccr8]

Comment 9

•

11 years ago

Also of note, I looked at half a dozen crashes and none of them were on the main thread, which is evidence in favor of comment 5.

Benjamin Smedberg

Comment 10

•

11 years ago

This is a win64 crash: you can tell by looking for "Build Architecture" in the crash report. I'm looking into whether this is win64-only.

Benjamin Smedberg

Comment 11

•

11 years ago

https://crash-stats.mozilla.com/search/?app_notes=cycle&app_notes=collector&app_notes=fault&version=28.0a1&_facets=version&_facets=signature&_facets=cpu_name&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=cpu_info&_columns=cpu_name shows that the recent spike is almost entirely win64, but multiple computers (not a single installation crashing repeatedly). Cycle collector aborts across other versions are distributed normally.

(Away)

Comment 12

•

11 years ago

I'm seeing both Intel and AMD CPUID's in here, so it doesn't look manufacturer-specific. The "amd64" field just means Win64. What's misleading is that on 64-bit processes, we don't get a "GenuineIntel" or "AuthenticAMD" in label in front of the Family/Model/Stepping notes. This seems to be a general problem, not specific to our code; windbg's !cpuid extension seems to suffer the same problem. I wonder if this is actually a Win64-only crash or if the stackwalking just happened to make it so that only Win64 builds hit this particular variant of this signature. There appear to be some odd frames between abort and NS_DebugBreak.

Benjamin Smedberg

Comment 13

•

11 years ago

> I wonder if this is actually a Win64-only crash or if the stackwalking just > happened to make it so that only Win64 builds hit this particular variant of > this signature I'm pretty certain that this is win64-only. The crash-stats search I linked wasn't searching by signature, but was searching for "cycle collector fault" in the abort message, which should catch all the variants.

(Away)

Comment 14

•

11 years ago

Pointing the debugger at a local copy of the xul image somehow lets it figure out the stack better. It's an overflowing refcount in DescribeRefCountedNode. xul!NS_DebugBreak xul!Fault xul!GCGraphBuilder::DescribeRefCountedNode xul!nsDOMEventTargetHelper::cycleCollection::Traverse xul!mozilla::dom::workers::WorkerPrivateParent<mozilla::dom::workers::WorkerPrivate>::cycleCollection::Traverse xul!mozilla::dom::workers::XMLHttpRequest::cycleCollection::Traverse xul!GCGraphBuilder::Traverse xul!nsCycleCollector::MarkRoots xul!nsCycleCollector::BeginCollection xul!nsCycleCollector::Collect xul!nsCycleCollector_collect xul!`anonymous namespace'::WorkerJSRuntime::CustomGCCallback mozjs!Collect mozjs!js::GC mozjs!js::DestroyContext xul!`anonymous namespace'::WorkerThreadRunnable::Run xul!nsThread::ProcessNextEvent xul!NS_ProcessNextEvent xul!nsThread::ThreadFunc nss3!_PR_NativeRunThread nss3!pr_root msvcr100!_callthreadstartex

(Away)

Comment 15

•

11 years ago

I also see the same stack as comment 14 in the signatures containing js::frontend::Parser<js::frontend::FullParseHandler>::noteNameUse. noteNameUse disappears from the stack after the debugger gets to see the full image.

Andrew McCreight [:mccr8]

Comment 16

•

11 years ago

Thanks for the stack! The faults in that method are: if (refCount == 0) Fault("zero refcount", mCurrPi); if (refCount == UINT32_MAX) Fault("overflowing refcount", mCurrPi); Presumably it is the first one.

Blocks: 928312

(Away)

Comment 17

•

11 years ago

(In reply to Andrew McCreight [:mccr8] from comment #16) > Thanks for the stack! > > The faults in that method are: > if (refCount == 0) > Fault("zero refcount", mCurrPi); > if (refCount == UINT32_MAX) > Fault("overflowing refcount", mCurrPi); > > Presumably it is the first one. Based on the return address and the string pushed, it looks like the second one (overflow).

Andrew McCreight [:mccr8]

Comment 18

•

11 years ago

Weird.

Andrew McCreight [:mccr8]

Comment 19

•

11 years ago

This is almost certainly something horrible.

Group: core-security

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 20

•

11 years ago

So it sounds like our refcount is underflowing. Fun. I had an idea of how to rewrite WorkerPrivate to have less crazy ownership. Maybe I should just do that.

Patch - Used for STR 11 years ago Steven MacLeod [:smacleod] (deleted), patch		Details \| Diff \| Splinter Review
sessionstore.js for STR 11 years ago Steven MacLeod [:smacleod] (deleted), application/x-javascript		Details