Closed Bug 1007763 Opened 10 years ago Closed 10 years ago

crash in nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::IncrementLength(unsigned int) | nsThread::ProcessNextEvent(bool, bool*)

Categories

(Core :: General, defect)

31 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 991845
Tracking Status
firefox31 - verified

People

(Reporter: lizzard, Unassigned)

Details

(Keywords: crash, topcrash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-f2ad669f-7969-4c4b-aca8-1f6ef2140505.
=============================================================

This is the #7 topcrasher this week for 31.0a2 with 126/5017 crashes. The crash signature appears at a very low volume for Firefox 29 as well. The crashes on 31.0a2 spiked suddenly with the 2014050200 build. I'm not sure if that helps us narrow the regression range since the migration had just happened around the 29th and there may not have been a lot of users for 31.0a2 yet. 

possible regression range, 
https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?startdate=2014-04-30&enddate=2014-05-02

stack:

0 	xul.dll 	nsTArray_base<nsTArrayInfallibleAllocator,nsTArray_CopyWithMemutils>::IncrementLength(unsigned int) 	obj-firefox/dist/include/mozilla/ThreadLocal.h
1 	xul.dll 	nsThread::ProcessNextEvent(bool,bool *) 	xpcom/threads/nsThread.cpp
2 	xul.dll 	NS_ProcessNextEvent(nsIThread *,bool) 	xpcom/glue/nsThreadUtils.cpp
3 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate *) 	ipc/glue/MessagePump.cpp
4 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
5 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
6 	xul.dll 	nsThread::ThreadFunc(void *) 	xpcom/threads/nsThread.cpp
7 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c
8 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c
9 	msvcr100.dll 	_callthreadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c
10 	msvcr100.dll 	_threadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c
11 	kernel32.dll 	kernel32.dll@0x4ee1c 	
12 	ntdll.dll 	__RtlUserThreadStart 	
13 	ntdll.dll 	_RtlUserThreadStart
Different build ID's give me different frames. 

xul!nsXPConnect::GetCurrentNativeCallContext+0x706a4c
xul!nsTArray_Impl<nsAnimation,nsTArrayInfallibleAllocator>::AppendElements+0x78613f
xul!nsTArray_Impl<nsINode const *,nsTArrayInfallibleAllocator>::AppendElements<nsINode const *>+0x8cba8c

Not sure why crash-stats put them under IncrementLength (though I could imagine IncrementLength getting inlined into AppendElements -- dunno about the nsXPConnect one). Ted, does this look like fallout from the dymp_syms changes?
Flags: needinfo?(ted)
Yeah, in bug 1003085 comment 10 I found some issues. There may be others.
Flags: needinfo?(ted)
Specifically, if you're seeing function+large offset without source info, it's probably using a PUBLIC record instead of a FUNC record, which is bad. The stackwalker code is supposed to prefer FUNC over PUBLIC, but there could be other issues making that not work.
Today this is showing up as a top crasher on nightly 32a1.
Ignore comment #4. I was looking at the wrong set of data in crash stats, thinking this was on 32a1. Removing tracking flags.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> Specifically, if you're seeing function+large offset without source info,
> it's probably using a PUBLIC record instead of a FUNC record, which is bad.
> The stackwalker code is supposed to prefer FUNC over PUBLIC, but there could
> be other issues making that not work.

Hmm, on second thought this is probably unrelated to the symbol changes. The large offsets are coming from WinDbg rather than Socorro, and they're even present on reports from before bug 1003085.
Topcrash, tracking.
User comments and URLs indicate this often occurs while downloading large files from Mega.

I was able to reproduce the crash twice. The second time, I caught a full heap dump, but optimized code leaves very few clues as to what kind of array is busted and how it got that way. It's also not clear why this is only on Aurora. I will investigate more tomorrow.
This is an OOM crash. The stuff about AppendElements and IncrementLength were red herrings; PGO folded a bunch of MOZ_CRASH calls together. Judging by the stack and registers, the real crash is a failed JS_NewRuntime here: http://hg.mozilla.org/mozilla-central/annotate/b5bdc1aaf378/xpcom/base/CycleCollectedJSRuntime.cpp#l473

xul!mozilla::CycleCollectedJSRuntime::CycleCollectedJSRuntime+0x61
xul!`anonymous namespace'::WorkerJSRuntime::WorkerJSRuntime+0xe 
xul!`anonymous namespace'::WorkerThreadPrimaryRunnable::Run+0xa2
xul!nsThread::ProcessNextEvent+0x2a0
xul!NS_ProcessNextEvent+0x2d
xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0xc7
xul!MessageLoop::RunHandler+0x51
xul!MessageLoop::Run+0x19
xul!nsThread::ThreadFunc+0x90

The moderate correlation with Mega is likely due to their use of workers and that they store in-progress downloads entirely in memory.

We don't seem to hit this crash on 29 or 32. In those versions the download just sort of peters out without crashing the browser. In theory the JS_NewRuntime crash could still happen, but I am guessing that timing causes something else to fail first, in a more graceful way. (Andrew, does that sound reasonable?)

It's unfortunate that it's a naked MOZ_CRASH without the usual OOM annotation machinery. That might be something we could improve for this issue, but otherwise this is just another OOM bug.
It's also possible that on other versions the signature moves around. And it looks like we have exactly that in bug 991845.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Already tracking the duplicate. Untrack.
This was fixed by backing out GGC.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #12)
> This was fixed by backing out GGC.

Also, no more crashes for this in Firefox 31 since 31.0b9.
You need to log in before you can comment on or make changes to this bug.