705154 - Crash in chromehang | mach_msg_trap | ABORT: HangMonitor triggered

Reporter

Description

•

13 years ago

This bug was filed from the Socorro interface and is report bp-a75ba7c2-ab2e-4bc5-8e2c-eff802111124 . ============================================================= Also bp-00403124-e021-4bfa-b7b2-0a2fd2111124 0 libmozalloc.dylib mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:66 1 XUL NS_DebugBreak_P xpcom/base/nsDebugImpl.cpp:388 2 XUL mozilla::HangMonitor::ThreadMain xpcom/threads/HangMonitor.cpp:111 3 libnspr4.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:187 4 libSystem.B.dylib _pthread_start 5 libSystem.B.dylib thread_start I did have some memory intensive pages loaded, but I've been doing that for several days and have not seen this abort.

Bob Clary [:bc] (inactive)

Reporter

Updated

•

13 years ago

Component: General → XPCOM

Product: Firefox → Core

QA Contact: general → xpcom

Version: unspecified → Trunk

Bob Clary [:bc] (inactive)

Reporter

Comment 1

•

13 years ago

This started just after I updated to today's Nightly. I've been crashing regularly every few minutes since. I've been disabling extensions one by one to see if that helps. I looked at the push log for the last couple of days but didn't see anything that stood out as a possible cause.

Keywords: regression

Scoobidiver (away)

Comment 2

•

13 years ago

Every crashes that have this stack trace have chromehang in their crash signature. The second term in the crash signature is the first frame in thread 0.

Summary: crash chromehang | ABORT: HangMonitor triggered → Crash in chromehang | mach_msg_trap

Bob Clary [:bc] (inactive)

Reporter

Updated

•

13 years ago

Summary: Crash in chromehang | mach_msg_trap → Crash in chromehang | mach_msg_trap | ABORT: HangMonitor triggered

sci-fi

Comment 3

•

13 years ago

Hi, Mainly updating to add myself to the CC-list for this bug. FWIW, I filed Bug 705003 which might be related. I saw HangMonitor Aborts right after the Bug 429592 patches were incorporated into the tinderbox builds. Most of my Nightly Crash Reports were unsuccessfully sent (about:crashes says cannot find most of those OOIDs), but one seems to have made it through: https://crash-stats.mozilla.com/report/index/bp-305eb917-2c82-4245-9f0d-ad1ad2111124 ... which apparently now has a pointer to this very bug-report. I also posted to the nightly discussion mail-list, hopefully to fore-warn people that this would likely be seen in the next "official" Nightly builds. BTW Starting Nightly via CLI with -safe-mode did *not* help anything with this bug, eventually we still got 'pop'ed. ;) (I went back to a tinderbox build before the 429592 patches were applied.) HTH

Bob Clary [:bc] (inactive)

Reporter

Comment 4

•

13 years ago

sci-fi, thanks. If you reload open the crash report from about:crashes and hit reload a few times it will probably be submitted. That's what I have to do. That definitely looks like a candidate.

Blocks: hang-detector

sci-fi

Comment 5

•

13 years ago

d.a.

Comment 6

•

13 years ago

I only get this crash if I let Firefox sit idle for a few minutes. If I constantly use it, I don't get any crash. If I were to step away from the computer it will crash.

Scoobidiver (away)

Updated

•

13 years ago

No longer blocks: hang-detector

Depends on: hang-detector

Scoobidiver (away)

Updated

•

13 years ago

Blocks: hang-detector

No longer depends on: hang-detector

Honza Bambas (:mayhemer)

Comment 7

•

13 years ago

A side note, might not be related, though: I also experienced some system crashes today on my 17-inch, Late 2006 iMac with 10.7.2 and ATI Radeon X1600 128 MB. The crash was in plugin-container, I unfortunately didn't get the info on time since the system has crashed again and didn't give detailed info the second time. Confirming the crash happen with Nightly in the background.

Benjamin Smedberg

Assignee

Comment 8

•

13 years ago

This bug appears as if the hang detector might be malfunctioning, but none of the crash reports have a usable stack on thread 0 (which is the interesting thread). I propose to disable the hang detector on mac for this weekends nightlies so that Ted and I can loop back around on Tuesday to figure out why we aren't getting better stacks. It may be that we need to get symbols for OS libraries.

Benjamin Smedberg

Assignee

Comment 9

•

13 years ago

Attached patch Disable the hang monitor on mac, rev. 1 (deleted) — Details — Splinter Review

Tagging a few possible reviewers of the temporary disablement, but if there's somebody else around who can review please feel free.

Attachment #576996 - Flags: review?(smichaud)

Attachment #576996 - Flags: review?(jmathies)

Attachment #576996 - Flags: review?(gavin.sharp)

Benjamin Smedberg

Assignee

Comment 10

•

13 years ago

Please ignore the xpcom/ bits of this patch, they are for a different bug.

Benjamin Smedberg

Assignee

Updated

•

13 years ago

Assignee: nobody → benjamin

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 11

•

13 years ago

Comment on attachment 576996 [details] [diff] [review] Disable the hang monitor on mac, rev. 1 (it'd be nice if the #ifndef DEBUG was an #ifdef instead, easier to read that way IMO)

Attachment #576996 - Flags: review?(gavin.sharp) → review+

Benoit Girard (:BenWa)

Comment 12

•

13 years ago

I'm also getting this quite frequently when I leave firefox in the background. Here are some of my crash signatures: http://crash-stats.mozilla.com/report/index/bp-41e2788b-82f0-41be-bf84-6e76f2111125 http://crash-stats.mozilla.com/report/index/bp-535b250a-1a4b-4aa2-a630-4a8712111125 http://crash-stats.mozilla.com/report/index/bp-4ab0bfbc-983d-40ea-9a17-dd1792111125 They all have CoreFoundation@0x4c901 in the main thread which translates to: > atos -l 0x0 -o /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation 0x4c901 CFDictionaryRemoveAllValues (in CoreFoundation) + 17

Benjamin Smedberg

Assignee

Comment 13

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/2729a78cd35e once we hit unlabeled addresses we're not walking the stack correctly and I really want to see what is "above" all this on the stack. Leaving this bug open to track the real problem and reenable the hang monitor.

Status: NEW → ASSIGNED

John Daggett (:jtd)

Comment 14

•

13 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #8) > This bug appears as if the hang detector might be malfunctioning, but none > of the crash reports have a usable stack on thread 0 (which is the > interesting thread). I propose to disable the hang detector on mac for this > weekends nightlies so that Ted and I can loop back around on Tuesday to > figure out why we aren't getting better stacks. It may be that we need to > get symbols for OS libraries. Why aren't we backing this out rather than putting in band-aids like this?

Benjamin Smedberg

Assignee

Comment 15

•

13 years ago

Why would we back it out when the pref was specifically designed so that we could disable it? It's still giving valuable data on Windows/Linux.

John Daggett (:jtd)

Comment 16

•

13 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #15) > Why would we back it out when the pref was specifically designed so that we > could disable it? It's still giving valuable data on Windows/Linux. Valuable data == user crashes. This feature has resulted in almost an order of magnitude increase in crashes on Windows: https://bugzilla.mozilla.org/show_bug.cgi?id=429592#c117 Like all regressions, this should be backed out or disabled by default on all platforms.

Jim Mathies [:jimm]

Comment 18

•

13 years ago

Comment on attachment 576996 [details] [diff] [review] Disable the hang monitor on mac, rev. 1 http://hg.mozilla.org/mozilla-central/rev/2729a78cd35e

Attachment #576996 - Flags: review?(smichaud)

Attachment #576996 - Flags: review?(jmathies)

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 19

•

13 years ago

(In reply to John Daggett (:jtd) from comment #16) > Valuable data == user crashes. Nightly user crashes. This feature's goal was to turn hangs into crashes so that we could track them - that necessarily involves an increase in crash reports. Assuming the functionality is working as expected (an assumption that apparently might not hold true on Mac), there's no reason to back it out solely because the crash count increased. We do need to investigate the crash reports, of course...

Status: ASSIGNED → NEW

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 20

•

13 years ago

(In reply to John Daggett (:jtd) from comment #16) > https://bugzilla.mozilla.org/show_bug.cgi?id=429592#c117 Sorry, I wasn't up to date on the comments in that bug when I wrote my last reply - it's obviously not a simple tradeoff, and the discussion there is much more nuanced. Forget I said anything!

Status: NEW → ASSIGNED

Benjamin Smedberg

Assignee

Comment 21

•

13 years ago

The stack which I didn't account for is: #0 0x00007fff863cad7a in mach_msg_trap () #1 0x00007fff863cb3ed in mach_msg () #2 0x00007fff8060a902 in __CFRunLoopRun () #3 0x00007fff80609d8f in CFRunLoopRunSpecific () #4 0x00007fff8587574e in RunCurrentEventLoopInMode () #5 0x00007fff85875553 in ReceiveNextEventCommon () #6 0x00007fff8587540c in BlockUntilNextEventMatchingListInMode () #7 0x00007fff83dd6eb2 in _DPSNextEvent () #8 0x00007fff83dd6801 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] () #9 0x00007fff83d9c68f in -[NSApplication run] () #10 0x00000001028b8fd4 in nsAppShell::Run (this=0x100303a20) at /builds/mozilla-central/src/widget/src/cocoa/nsAppShell.mm:780 #11 0x0000000102616df5 in nsAppStartup::Run (this=0x1177ed5b0) at /builds/mozilla-central/src/toolkit/components/startup/nsAppStartup.cpp:220 #12 0x0000000101444ff0 in XRE_main (argc=3, argv=0x7fff5fbff860, aAppData=0x1000071c0) at /builds/mozilla-central/src/toolkit/xre/nsAppRunner.cpp:3558 #13 0x0000000100001aeb in do_main (exePath=0x7fff5fbff430 "/builds/mozilla-central/ff-debug/dist/NightlyDebug.app/Contents/MacOS/", argc=3, argv=0x7fff5fbff860) at /builds/mozilla-central/src/browser/app/nsBrowserApp.cpp:201 #14 0x0000000100001d4a in main (argc=3, argv=0x7fff5fbff860) at /builds/mozilla-central/src/browser/app/nsBrowserApp.cpp:287 So the cocoa version of nsAppShell doesn't delegate to nsBaseAppShell::Run which means that the XPCOM event loop is not the outermost event loop at all on mac. I think this can be fixed by subclassing [NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] and suspending the hang monitor when appropriate, but I also need to write down how all the different native event loops work, because each one is a little bit different and together they are a nightmare.

Benjamin Smedberg

Assignee

Comment 22

•

13 years ago

Attached patch Suspend the hang monitor by overriding a method in the event loop, rev. 1 (deleted) — Details — Splinter Review

Attachment #578045 - Flags: review?(smichaud)

Steven Michaud [:smichaud] (Retired)

Comment 23

•

13 years ago

Comment on attachment 578045 [details] [diff] [review] Suspend the hang monitor by overriding a method in the event loop, rev. 1 I haven't tested this. But it looks reasonable to me, and it should do no harm.

Attachment #578045 - Flags: review?(smichaud) → review+

Matt Brubeck (:mbrubeck)

Comment 24

•

13 years ago

Mac patch checked in for mozilla11. Leaving the bug open for the remaining patch. https://hg.mozilla.org/mozilla-central/rev/1b3f17ffa656

Target Milestone: --- → mozilla11

Benjamin Smedberg

Assignee

Comment 25

•

13 years ago

The other one landed already ;-)

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Disable the hang monitor on mac, rev. 1 13 years ago Benjamin Smedberg (deleted), patch	Gavin : review+	Details \| Diff \| Splinter Review
Suspend the hang monitor by overriding a method in the event loop, rev. 1 13 years ago Benjamin Smedberg (deleted), patch	smichaud : review+	Details \| Diff \| Splinter Review