Closed
Bug 739142
Opened 13 years ago
Closed 13 years ago
crash in nsTimerImpl::PostTimerEvent
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
RESOLVED
FIXED
mozilla14
People
(Reporter: scoobidiver, Unassigned)
References
Details
(Keywords: crash, regression, topcrash)
Crash Data
There's a spike in crashes from 14.0a1/20120324.
The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ab2ff3b5611f&tochange=df1f94b2bdee
It might be a regression from bug 738334.
Signature nsTimerImpl::PostTimerEvent() More Reports Search
UUID fc9cad76-9dc4-4873-8d98-a17392120324
Date Processed 2012-03-24 16:53:20
Uptime 86
Last Crash 1.5 minutes before submission
Install Age 2.1 hours since version was first installed.
Install Time 2012-03-24 14:47:50
Product Firefox
Version 14.0a1
Build ID 20120324031100
Release Channel nightly
OS Windows NT
OS Version 6.1.7601 Service Pack 1
Build Architecture x86
Build Architecture Info AuthenticAMD family 15 model 36 stepping 2
Crash Reason EXCEPTION_ACCESS_VIOLATION_READ
Crash Address 0x1000020
User Comments it appears going onto an https site silently crashes firefox
App Notes
Cisco VPN
AdapterVendorID: 0x1002, AdapterDeviceID: 0x5955, AdapterSubsysID: 10921734, AdapterDriverVersion: 8.561.0.0
D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers-
EMCheckCompatibility True
Total Virtual Memory 2147352576
Available Virtual Memory 1911676928
System Memory Use Percentage 83
Available Page File 759312384
Available Physical Memory 163864576
Frame Module Signature [Expand] Source
0 xul.dll nsTimerImpl::PostTimerEvent xpcom/threads/nsTimerImpl.cpp:605
1 xul.dll TimerThread::Run xpcom/threads/TimerThread.cpp:319
2 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:657
3 xul.dll nsThread::ThreadFunc xpcom/threads/nsThread.cpp:289
4 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426
5 nspr4.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:122
6 msvcr100.dll _callthreadstartex f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c:314
7 msvcr100.dll _threadstartex f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c:292
8 kernel32.dll BaseThreadInitThunk
9 ntdll.dll __RtlUserThreadStart
10 ntdll.dll _RtlUserThreadStart
More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsTimerImpl%3A%3APostTimerEvent%28%29
There's no web worker running here as far as I can tell, so this probably isn't a regression from 738334.
Reporter | ||
Comment 2•13 years ago
|
||
With combined signatures, it's #8 top crasher in 14.0a1 over the last day.
Crash Signature: [@ nsTimerImpl::PostTimerEvent()] → [@ nsTimerImpl::PostTimerEvent()]
[@ RtlEnterCriticalSection | PR_Lock | nsTimerImpl::PostTimerEvent()]
tracking-firefox14:
--- → ?
Keywords: topcrash
Comment 3•13 years ago
|
||
Some disasembly:
1004E290 mov eax,dword ptr [nsTimerEvent::sAllocator (10DEEC60h)]
1004E295 push ebx
1004E296 mov ebx,dword ptr [esp+8]
1004E29A push ebp
1004E29B push esi
1004E29C push edi
1004E29D lea edi,[eax+20h]
1004E2A0 mov esi,eax
1004E2A2 mov eax,dword ptr [edi]
1004E2A4 push eax
1004E2A5 call dword ptr [__imp__PR_Lock (109B9A40h)]
Crash is @ 0x1004E2A2.
EAX = 0x00000000
EDI = 0x00000020
ESI = 0x00000000
ESP = 0x07a2f6a8
Updated•13 years ago
|
Perhaps related to bug 733277?
Reporter | ||
Comment 5•13 years ago
|
||
(In reply to ben turner [:bent] from comment #4)
> Perhaps related to bug 733277?
It's not in the regression range.
It messes with timer locks and allocators, which is exactly where this crash is.
Comment 7•13 years ago
|
||
From comment 3, I bet money that what happens is that sAllocator is null here. Investigating.
Assignee: nobody → ehsan
Comment 8•13 years ago
|
||
Should this be Trunk?
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120403 Firefox/14.0a1 ID:20120403063158 just crashed like bp-449f0d79-66c2-4332-8dd3-610292120403 on a random Google search results page.
Comment 9•13 years ago
|
||
OK, this is really weird. I grabbed the minidumps for two crashes from two different builds. They both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is happening on another thread in fact! Here's the stack from the minidump corresponding to https://crash-stats.mozilla.com/report/index/fc9cad76-9dc4-4873-8d98-a17392120324:
ntdll.dll!_KiFastSystemCallRet@0()
ntdll.dll!_ZwWaitForSingleObject@12() + 0xc bytes
ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0xc4 bytes
ntdll.dll!_RtlEnterCriticalSection@4() - 0x155c4 bytes
> xul.dll!google_breakpad::AutoExceptionHandler::AutoExceptionHandler() Line 461 C++
xul.dll!google_breakpad::ExceptionHandler::HandleException(_EXCEPTION_POINTERS * exinfo) Line 500 C++
msvcr71.dll!__CxxUnhandledExceptionFilter(_EXCEPTION_POINTERS * pPtrs) + 0x46 bytes C++
kernel32.dll!_UnhandledExceptionFilter@4() + 0xf5 bytes
ntdll.dll!___RtlUserThreadStart@8() + 0x147ae bytes
ntdll.dll!@_EH4_CallFilterFunc@8() + 0x12 bytes
ntdll.dll!ExecuteHandler2@20() + 0x26 bytes
ntdll.dll!ExecuteHandler@20() + 0x24 bytes
ntdll.dll!_RtlDispatchException@8() + 0xd3 bytes
ntdll.dll!_KiUserExceptionDispatcher@8() + 0xf bytes
msvcr100.dll!memmove(unsigned char * dst, unsigned char * src, unsigned long count) Line 185 Asm
xul.dll!nsTArray_base<nsTArrayDefaultAllocator>::ShiftData(unsigned int start, unsigned int oldLen, unsigned int newLen, unsigned int elemSize, unsigned int elemAlign) Line 277 + 0xd bytes C++
xul.dll!nsHttpPipeline::WriteSegments(nsAHttpSegmentWriter * writer, unsigned int count, unsigned int * countWritten) Line 775 C++
xul.dll!nsHttpConnection::OnSocketReadable() Line 1369 C++
xul.dll!nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream * in) Line 1489 C++
xul.dll!nsSocketInputStream::OnSocketReady(unsigned int condition) Line 257 C++
xul.dll!nsSocketTransport::OnSocketReady(PRFileDesc * fd, short outFlags) Line 1578 C++
xul.dll!nsSocketTransportService::DoPollIteration(bool wait) Line 757 C++
xul.dll!nsSocketTransportService::Run() Line 648 C++
xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 658 C++
xul.dll!nsThread::ThreadFunc(void * arg) Line 289 + 0xf bytes C++
nspr4.dll!_PR_NativeRunThread(void * arg) Line 448 C
nspr4.dll!pr_root(void * arg) Line 122 + 0xd bytes C
msvcr100.dll!_callthreadstartex() Line 314 + 0x6 bytes C
msvcr100.dll!_threadstartex(void * ptd) Line 292 + 0x5 bytes C
kernel32.dll!@BaseThreadInitThunk@12() + 0x12 bytes
ntdll.dll!___RtlUserThreadStart@8() + 0x27 bytes
ntdll.dll!__RtlUserThreadStart@8() + 0x1b bytes
In both cases, these threads were listed right before the timer thread. These threads don't appear on the breakpad site at all. Looking at exinfo->ExceptionRecord in ExceptionHandler::HandleException, this is indeed an access violation, and the ExceptionAddress field points to the memmove function on the stack.
I don't know why Breakpad blames this on the wrong thread. Should this be moved to Necko, and another bug be filed to investigate this on the breakpad side?
Comment 10•13 years ago
|
||
Please, fix this quickly. I like the changes on Firefox 14 but it crash per minute and is very inestable
Comment 11•13 years ago
|
||
This version is super faster and the loading times are amazing (at least for me). Do you have introduced some changes to Gecko?
Comment 12•13 years ago
|
||
When you load the minidump into MSVC does it get the correct crashing thread?
Comment 13•13 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12)
> When you load the minidump into MSVC does it get the correct crashing thread?
No, but I think it just grabs the crashing thread from the minidump, which is written by breakpad.
Comment 14•13 years ago
|
||
Yunier, can you reproduce this crash reliably on Nightly?
Comment 15•13 years ago
|
||
The minidump is not written by breakpad, it's written by the Windows debughlp function MinidumpWriteDump. We do pass in the exception info from the crash site which ends up here: http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#977
Comment 16•13 years ago
|
||
Hmm, OK. Is the crash stats report also generated from the minidump? The weird thing is that the crash report doesn't have any mention of the thread running nsHttpPipeline::WriteSegments.
Here's another question: is it possible that a crash on a thread is handled on a different thread by breakpad?
Comment 17•13 years ago
|
||
The crash report is generated from the minidump using minidump_stackwalk (in the Breakpad tree). There are a few tricky bits:
1) Breakpad uses a background thread that does the actual dump writing. It saves the ID of that thread and tries to ignore it in the output. Perhaps that's what's failing htere?
2) We use SetUnhandledExceptionFilter to catch crashes, so whatever the behavior of that is. As bsmedberg says, we simply pass on the EXCEPTION_POINTERS that we get from that callback to MinidumpWriteDump.
Comment 18•13 years ago
|
||
(In reply to Ted Mielczarek [:ted] from comment #17)
> The crash report is generated from the minidump using minidump_stackwalk (in
> the Breakpad tree). There are a few tricky bits:
> 1) Breakpad uses a background thread that does the actual dump writing. It
> saves the ID of that thread and tries to ignore it in the output. Perhaps
> that's what's failing htere?
> 2) We use SetUnhandledExceptionFilter to catch crashes, so whatever the
> behavior of that is. As bsmedberg says, we simply pass on the
> EXCEPTION_POINTERS that we get from that callback to MinidumpWriteDump.
SetUnhandledExceptionFilter causes the exception handler to be called on the thread raising the exception, always.
As I said, when I was examining the exception being handled in the debugger, it was an access violation caused by the memmove call on the thread which ended up getting removed from the crash report, which is why I'm pretty sure that this crash is being misblamed at nsTimerImpl::PostTimerEvent for some reason...
Comment 19•13 years ago
|
||
I discussed this with Ted on IRC. One theory is that the memmove call is writing over the address holding sAllocator, which causes the timer thread to crash while that memove is happening on another thread (and then for memmove to crash as well).
I'm pretty sure that this crash is also hiding under other seemingly unrelated signatures as well. Moving over to Necko.
Assignee: ehsan → nobody
Component: XPCOM → Networking: HTTP
QA Contact: xpcom → networking.http
Comment 20•13 years ago
|
||
I recognize the stack trace in comment 9 as one from bug 741056. The patch for 741056 went to inbound a few hours ago.
Comment 21•13 years ago
|
||
bug 738812 also has an odd timer thing going on.
Comment 22•13 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #20)
> I recognize the stack trace in comment 9 as one from bug 741056. The patch
> for 741056 went to inbound a few hours ago.
That is very plausible!
Depends on: 741056
Comment 23•13 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #21)
> bug 738812 also has an odd timer thing going on.
That doesn't look relevant to me. The timer code run there runs on the main thread, not on the timer thread.
Comment 24•13 years ago
|
||
Ehsan, can you point me to any info on how to do the magic you talk about in comment 9? "
I grabbed the minidumps for two crashes from two different builds. They both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is happening on another thread in fact! "
Comment 25•13 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #24)
> Ehsan, can you point me to any info on how to do the magic you talk about in
> comment 9? "
> I grabbed the minidumps for two crashes from two different builds. They
> both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is
> happening on another thread in fact! "
You need special privileges to download minidumps from crash-stats (or know someone who does!) because they can contain user's private data on the stack. I'll forward the minidump for you using email. :-)
Comment 26•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #25)
> You need special privileges to download minidumps from crash-stats (or know
> someone who does!) because they can contain user's private data on the
> stack. I'll forward the minidump for you using email. :-)
ah. I'm actually curious if any of the crash data for 740319 shows the same thing..
Comment 27•13 years ago
|
||
I don't have access myself. Ping joe (or #developers).
Comment 28•13 years ago
|
||
Interesting crash, should be tracked by bug 739762 too.
(In reply to Yunier José Sosa Vázquez from comment #11)
> This version is super faster and the loading times are amazing (at least for
> me). Do you have introduced some changes to Gecko?
Pipelining patches has landed.
Blocks: 739762
Comment 29•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #14)
> Yunier, can you reproduce this crash reliably on Nightly?
When I open Nightly and start to browser this crash per minute. In my PC Nightly during opened more or less one minute only, after this crash. Some times the Windows Error it sohw me, anothers times the Mozilla Crash Reporter informs me about this.
Comment 30•13 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #28)
> Interesting crash, should be tracked by bug 739762 too.
>
> (In reply to Yunier José Sosa Vázquez from comment #11)
> > This version is super faster and the loading times are amazing (at least for
> > me). Do you have introduced some changes to Gecko?
>
> Pipelining patches has landed.
Great !!! this feature many users will note. I have commented with my friends this and have said the same
Comment 31•13 years ago
|
||
FYI, if having access to minidumps would be useful, you can file a bug and get that access. It's just an LDAP bit.
Comment 32•13 years ago
|
||
Yunier, it would be great if you can test this again in tonight's Nightly and let us know if the crash has been fixed. Thanks!
Comment 33•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #32)
> Yunier, it would be great if you can test this again in tonight's Nightly
> and let us know if the crash has been fixed. Thanks!
Ehsan the problem has been fixed in the last Nightly (2012-04-05) and all work perfect. Congratulations for you work for fix this bug. I see a great interest by the developers to know more about the bug and resolve it quickly
Comment 34•13 years ago
|
||
Well, Patrick deserves all of the credit here! :-)
Thanks for verifying this, Yunier!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 35•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #34)
> Well, Patrick deserves all of the credit here! :-)
>
> Thanks for verifying this, Yunier!
Well, congratulations to Patrick too :-). The people of Mozilla is great.
Updated•13 years ago
|
status-firefox14:
--- → fixed
Reporter | ||
Updated•12 years ago
|
Target Milestone: --- → mozilla14
Comment 36•12 years ago
|
||
This crash was back on Firefox 17.0.1 https://crash-stats.mozilla.com/report/index/db23335a-20ce-45ac-87e4-0e8032121210
also there are 28% crash in 18.0.1
More report:- https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=nsTimerImpl%3A%3APostTimerEvent%28%29&reason_type=contains&date=02%2F08%2F2013%2017%3A54%3A31&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=nsTimerImpl%3A%3APostTimerEvent%28%29
Reporter | ||
Comment 37•12 years ago
|
||
(In reply to Swarnava Sengupta (:Swarnava) from comment #36)
> also there are 28% crash in 18.0.1
With so few crashes, it isn't worth a new bug.
Comment 38•12 years ago
|
||
the low volume of leftovers could be a dup of bug 792920
You need to log in
before you can comment on or make changes to this bug.
Description
•