Closed Bug 739142 Opened 13 years ago Closed 13 years ago

crash in nsTimerImpl::PostTimerEvent

Categories

(Core :: Networking: HTTP, defect)

14 Branch
All
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla14
Tracking Status
firefox14 + fixed

People

(Reporter: scoobidiver, Unassigned)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

There's a spike in crashes from 14.0a1/20120324. The regression range for the spike is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ab2ff3b5611f&tochange=df1f94b2bdee It might be a regression from bug 738334. Signature nsTimerImpl::PostTimerEvent() More Reports Search UUID fc9cad76-9dc4-4873-8d98-a17392120324 Date Processed 2012-03-24 16:53:20 Uptime 86 Last Crash 1.5 minutes before submission Install Age 2.1 hours since version was first installed. Install Time 2012-03-24 14:47:50 Product Firefox Version 14.0a1 Build ID 20120324031100 Release Channel nightly OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info AuthenticAMD family 15 model 36 stepping 2 Crash Reason EXCEPTION_ACCESS_VIOLATION_READ Crash Address 0x1000020 User Comments it appears going onto an https site silently crashes firefox App Notes Cisco VPN AdapterVendorID: 0x1002, AdapterDeviceID: 0x5955, AdapterSubsysID: 10921734, AdapterDriverVersion: 8.561.0.0 D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers- EMCheckCompatibility True Total Virtual Memory 2147352576 Available Virtual Memory 1911676928 System Memory Use Percentage 83 Available Page File 759312384 Available Physical Memory 163864576 Frame Module Signature [Expand] Source 0 xul.dll nsTimerImpl::PostTimerEvent xpcom/threads/nsTimerImpl.cpp:605 1 xul.dll TimerThread::Run xpcom/threads/TimerThread.cpp:319 2 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:657 3 xul.dll nsThread::ThreadFunc xpcom/threads/nsThread.cpp:289 4 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426 5 nspr4.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:122 6 msvcr100.dll _callthreadstartex f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c:314 7 msvcr100.dll _threadstartex f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c:292 8 kernel32.dll BaseThreadInitThunk 9 ntdll.dll __RtlUserThreadStart 10 ntdll.dll _RtlUserThreadStart More reports at: https://crash-stats.mozilla.com/report/list?signature=nsTimerImpl%3A%3APostTimerEvent%28%29
There's no web worker running here as far as I can tell, so this probably isn't a regression from 738334.
With combined signatures, it's #8 top crasher in 14.0a1 over the last day.
Crash Signature: [@ nsTimerImpl::PostTimerEvent()] → [@ nsTimerImpl::PostTimerEvent()] [@ RtlEnterCriticalSection | PR_Lock | nsTimerImpl::PostTimerEvent()]
Keywords: topcrash
Some disasembly: 1004E290 mov eax,dword ptr [nsTimerEvent::sAllocator (10DEEC60h)] 1004E295 push ebx 1004E296 mov ebx,dword ptr [esp+8] 1004E29A push ebp 1004E29B push esi 1004E29C push edi 1004E29D lea edi,[eax+20h] 1004E2A0 mov esi,eax 1004E2A2 mov eax,dword ptr [edi] 1004E2A4 push eax 1004E2A5 call dword ptr [__imp__PR_Lock (109B9A40h)] Crash is @ 0x1004E2A2. EAX = 0x00000000 EDI = 0x00000020 ESI = 0x00000000 ESP = 0x07a2f6a8
(In reply to ben turner [:bent] from comment #4) > Perhaps related to bug 733277? It's not in the regression range.
It messes with timer locks and allocators, which is exactly where this crash is.
From comment 3, I bet money that what happens is that sAllocator is null here. Investigating.
Assignee: nobody → ehsan
Should this be Trunk? Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120403 Firefox/14.0a1 ID:20120403063158 just crashed like bp-449f0d79-66c2-4332-8dd3-610292120403 on a random Google search results page.
OK, this is really weird. I grabbed the minidumps for two crashes from two different builds. They both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is happening on another thread in fact! Here's the stack from the minidump corresponding to https://crash-stats.mozilla.com/report/index/fc9cad76-9dc4-4873-8d98-a17392120324: ntdll.dll!_KiFastSystemCallRet@0() ntdll.dll!_ZwWaitForSingleObject@12() + 0xc bytes ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0xc4 bytes ntdll.dll!_RtlEnterCriticalSection@4() - 0x155c4 bytes > xul.dll!google_breakpad::AutoExceptionHandler::AutoExceptionHandler() Line 461 C++ xul.dll!google_breakpad::ExceptionHandler::HandleException(_EXCEPTION_POINTERS * exinfo) Line 500 C++ msvcr71.dll!__CxxUnhandledExceptionFilter(_EXCEPTION_POINTERS * pPtrs) + 0x46 bytes C++ kernel32.dll!_UnhandledExceptionFilter@4() + 0xf5 bytes ntdll.dll!___RtlUserThreadStart@8() + 0x147ae bytes ntdll.dll!@_EH4_CallFilterFunc@8() + 0x12 bytes ntdll.dll!ExecuteHandler2@20() + 0x26 bytes ntdll.dll!ExecuteHandler@20() + 0x24 bytes ntdll.dll!_RtlDispatchException@8() + 0xd3 bytes ntdll.dll!_KiUserExceptionDispatcher@8() + 0xf bytes msvcr100.dll!memmove(unsigned char * dst, unsigned char * src, unsigned long count) Line 185 Asm xul.dll!nsTArray_base<nsTArrayDefaultAllocator>::ShiftData(unsigned int start, unsigned int oldLen, unsigned int newLen, unsigned int elemSize, unsigned int elemAlign) Line 277 + 0xd bytes C++ xul.dll!nsHttpPipeline::WriteSegments(nsAHttpSegmentWriter * writer, unsigned int count, unsigned int * countWritten) Line 775 C++ xul.dll!nsHttpConnection::OnSocketReadable() Line 1369 C++ xul.dll!nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream * in) Line 1489 C++ xul.dll!nsSocketInputStream::OnSocketReady(unsigned int condition) Line 257 C++ xul.dll!nsSocketTransport::OnSocketReady(PRFileDesc * fd, short outFlags) Line 1578 C++ xul.dll!nsSocketTransportService::DoPollIteration(bool wait) Line 757 C++ xul.dll!nsSocketTransportService::Run() Line 648 C++ xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 658 C++ xul.dll!nsThread::ThreadFunc(void * arg) Line 289 + 0xf bytes C++ nspr4.dll!_PR_NativeRunThread(void * arg) Line 448 C nspr4.dll!pr_root(void * arg) Line 122 + 0xd bytes C msvcr100.dll!_callthreadstartex() Line 314 + 0x6 bytes C msvcr100.dll!_threadstartex(void * ptd) Line 292 + 0x5 bytes C kernel32.dll!@BaseThreadInitThunk@12() + 0x12 bytes ntdll.dll!___RtlUserThreadStart@8() + 0x27 bytes ntdll.dll!__RtlUserThreadStart@8() + 0x1b bytes In both cases, these threads were listed right before the timer thread. These threads don't appear on the breakpad site at all. Looking at exinfo->ExceptionRecord in ExceptionHandler::HandleException, this is indeed an access violation, and the ExceptionAddress field points to the memmove function on the stack. I don't know why Breakpad blames this on the wrong thread. Should this be moved to Necko, and another bug be filed to investigate this on the breakpad side?
Please, fix this quickly. I like the changes on Firefox 14 but it crash per minute and is very inestable
This version is super faster and the loading times are amazing (at least for me). Do you have introduced some changes to Gecko?
When you load the minidump into MSVC does it get the correct crashing thread?
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12) > When you load the minidump into MSVC does it get the correct crashing thread? No, but I think it just grabs the crashing thread from the minidump, which is written by breakpad.
Yunier, can you reproduce this crash reliably on Nightly?
The minidump is not written by breakpad, it's written by the Windows debughlp function MinidumpWriteDump. We do pass in the exception info from the crash site which ends up here: http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#977
Hmm, OK. Is the crash stats report also generated from the minidump? The weird thing is that the crash report doesn't have any mention of the thread running nsHttpPipeline::WriteSegments. Here's another question: is it possible that a crash on a thread is handled on a different thread by breakpad?
The crash report is generated from the minidump using minidump_stackwalk (in the Breakpad tree). There are a few tricky bits: 1) Breakpad uses a background thread that does the actual dump writing. It saves the ID of that thread and tries to ignore it in the output. Perhaps that's what's failing htere? 2) We use SetUnhandledExceptionFilter to catch crashes, so whatever the behavior of that is. As bsmedberg says, we simply pass on the EXCEPTION_POINTERS that we get from that callback to MinidumpWriteDump.
(In reply to Ted Mielczarek [:ted] from comment #17) > The crash report is generated from the minidump using minidump_stackwalk (in > the Breakpad tree). There are a few tricky bits: > 1) Breakpad uses a background thread that does the actual dump writing. It > saves the ID of that thread and tries to ignore it in the output. Perhaps > that's what's failing htere? > 2) We use SetUnhandledExceptionFilter to catch crashes, so whatever the > behavior of that is. As bsmedberg says, we simply pass on the > EXCEPTION_POINTERS that we get from that callback to MinidumpWriteDump. SetUnhandledExceptionFilter causes the exception handler to be called on the thread raising the exception, always. As I said, when I was examining the exception being handled in the debugger, it was an access violation caused by the memmove call on the thread which ended up getting removed from the crash report, which is why I'm pretty sure that this crash is being misblamed at nsTimerImpl::PostTimerEvent for some reason...
I discussed this with Ted on IRC. One theory is that the memmove call is writing over the address holding sAllocator, which causes the timer thread to crash while that memove is happening on another thread (and then for memmove to crash as well). I'm pretty sure that this crash is also hiding under other seemingly unrelated signatures as well. Moving over to Necko.
Assignee: ehsan → nobody
Component: XPCOM → Networking: HTTP
QA Contact: xpcom → networking.http
I recognize the stack trace in comment 9 as one from bug 741056. The patch for 741056 went to inbound a few hours ago.
bug 738812 also has an odd timer thing going on.
(In reply to Patrick McManus [:mcmanus] from comment #20) > I recognize the stack trace in comment 9 as one from bug 741056. The patch > for 741056 went to inbound a few hours ago. That is very plausible!
Depends on: 741056
(In reply to Patrick McManus [:mcmanus] from comment #21) > bug 738812 also has an odd timer thing going on. That doesn't look relevant to me. The timer code run there runs on the main thread, not on the timer thread.
Ehsan, can you point me to any info on how to do the magic you talk about in comment 9? " I grabbed the minidumps for two crashes from two different builds. They both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is happening on another thread in fact! "
(In reply to Patrick McManus [:mcmanus] from comment #24) > Ehsan, can you point me to any info on how to do the magic you talk about in > comment 9? " > I grabbed the minidumps for two crashes from two different builds. They > both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is > happening on another thread in fact! " You need special privileges to download minidumps from crash-stats (or know someone who does!) because they can contain user's private data on the stack. I'll forward the minidump for you using email. :-)
(In reply to Ehsan Akhgari [:ehsan] from comment #25) > You need special privileges to download minidumps from crash-stats (or know > someone who does!) because they can contain user's private data on the > stack. I'll forward the minidump for you using email. :-) ah. I'm actually curious if any of the crash data for 740319 shows the same thing..
I don't have access myself. Ping joe (or #developers).
Interesting crash, should be tracked by bug 739762 too. (In reply to Yunier José Sosa Vázquez from comment #11) > This version is super faster and the loading times are amazing (at least for > me). Do you have introduced some changes to Gecko? Pipelining patches has landed.
Blocks: 739762
(In reply to Ehsan Akhgari [:ehsan] from comment #14) > Yunier, can you reproduce this crash reliably on Nightly? When I open Nightly and start to browser this crash per minute. In my PC Nightly during opened more or less one minute only, after this crash. Some times the Windows Error it sohw me, anothers times the Mozilla Crash Reporter informs me about this.
(In reply to Honza Bambas (:mayhemer) from comment #28) > Interesting crash, should be tracked by bug 739762 too. > > (In reply to Yunier José Sosa Vázquez from comment #11) > > This version is super faster and the loading times are amazing (at least for > > me). Do you have introduced some changes to Gecko? > > Pipelining patches has landed. Great !!! this feature many users will note. I have commented with my friends this and have said the same
FYI, if having access to minidumps would be useful, you can file a bug and get that access. It's just an LDAP bit.
Yunier, it would be great if you can test this again in tonight's Nightly and let us know if the crash has been fixed. Thanks!
(In reply to Ehsan Akhgari [:ehsan] from comment #32) > Yunier, it would be great if you can test this again in tonight's Nightly > and let us know if the crash has been fixed. Thanks! Ehsan the problem has been fixed in the last Nightly (2012-04-05) and all work perfect. Congratulations for you work for fix this bug. I see a great interest by the developers to know more about the bug and resolve it quickly
Well, Patrick deserves all of the credit here! :-) Thanks for verifying this, Yunier!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
(In reply to Ehsan Akhgari [:ehsan] from comment #34) > Well, Patrick deserves all of the credit here! :-) > > Thanks for verifying this, Yunier! Well, congratulations to Patrick too :-). The people of Mozilla is great.
Target Milestone: --- → mozilla14
(In reply to Swarnava Sengupta (:Swarnava) from comment #36) > also there are 28% crash in 18.0.1 With so few crashes, it isn't worth a new bug.
the low volume of leftovers could be a dup of bug 792920
You need to log in before you can comment on or make changes to this bug.