Closed Bug 1323207 Opened 8 years ago Closed 8 years ago

Add better crash signatures for known places that call JS during painting

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
mozilla53
Tracking Status
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- wontfix

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

There are various assertions that check that we don't enter the JS engine during painting. This is useful to find new places that do this, but we have a number of existing unfixed issues (bug 1310335 and various bugs blocking bug 1279086) that hit this, and unfortunately the crash signatures these generate are very generic, and can shift over time. This is bad because it makes it hard for anybody looking at crash lists how frequent these crashes are, or if a crash signature is new or not. One way to improve these signatures would be to set a flag while we're in InterruptCallback, and do a release assert at known points that the flag is not set (like at the start of nsContentUtils::IsPatternMatching). Anybody fixing one of these issues would have to remove the assertion.
This should get backported to branches with bug 1308039 (52) and maybe bug 1279086 (51).
Is the crash reason field not sufficient?
(In reply to Bill McCloskey (:billm) from comment #2) > Is the crash reason field not sufficient? That does not address the problem of distinguishing the different reasons that we are entering the JS engine during painting. (Arguably, the crash reason field is not really sufficient to distinguish painting crashes from other crashes, given that crash triagers mostly look at signatures, but that has not yet been a problem in practice.) For example, how do we tell apart instances of bug 1310335 from instances of bug 1311843 without looking at individual crash reports, and without having to come up with some custom super search that looks at proto signatures? (And more importantly, how do we tell if somebody adds a new place that starts hitting this assert.)
Comment on attachment 8819024 [details] Bug 1323207, part 2 - Assert early if we're painting at various points we enter JS. https://reviewboard.mozilla.org/r/98902/#review99162 ::: xpcom/ds/nsObserverList.cpp:109 (Diff revision 1) > + MOZ_RELEASE_ASSERT(js::AllowGCBarriers(CycleCollectedJSContext::Get()->Context()), > + "Observers can be implement in JS, so they should not be called during painting."); I'm worried you're going to find a lot more stuff with this, but let's see what happens :-).
Attachment #8819024 - Flags: review?(wmccloskey) → review+
Comment on attachment 8819023 [details] Bug 1323207, part 1 - Add method to check AllocGCBarriers. https://reviewboard.mozilla.org/r/98900/#review99372
Attachment #8819023 - Flags: review?(jcoppeard) → review+
(In reply to Bill McCloskey (:billm) from comment #6) > I'm worried you're going to find a lot more stuff with this, but let's see > what happens :-). Yeah, it may have to get backed out, but I figured I'd try it.
Pushed by amccreight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a04031da95dc part 1 - Add method to check AllocGCBarriers. r=jonco https://hg.mozilla.org/integration/autoland/rev/491237dbcb5d part 2 - Assert early if we're painting at various points we enter JS. r=billm
This breaks, but only on Windows 8 (!?): https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=491237dbcb5d29543e2f9bacf37f7eefbbf650b2 It is hitting the observer service assert, but during shutdown, when it does the NS_XPCOM_SHUTDOWN_OBSERVER_ID notification. I have no idea why it is asserting there.
Oh I see, I think maybe this is happening in the GPU process...
Backed out for mass-crashing on Windows 8 x64 debug e10s: https://hg.mozilla.org/integration/autoland/rev/6cea439a7674fdc4e0f0c6c62f9a035c4f10e633 https://hg.mozilla.org/integration/autoland/rev/e8014faca57975f249d5b4b62331c51c315d7a90 Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=491237dbcb5d29543e2f9bacf37f7eefbbf650b2 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=8027895&repo=autoland 10:30:09 INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *)] 10:30:09 INFO - Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\tmp8cevqd.mozrunner\minidumps\4cb1637c-5ce6-433c-8757-c42b148dff6b.dmp 10:30:09 INFO - Operating system: Windows NT 10:30:09 INFO - 6.2.9200 10:30:09 INFO - CPU: amd64 10:30:09 INFO - family 6 model 30 stepping 5 10:30:09 INFO - 8 CPUs 10:30:09 INFO - 10:30:09 INFO - GPU: UNKNOWN 10:30:09 INFO - 10:30:09 INFO - Crash reason: EXCEPTION_ACCESS_VIOLATION_READ 10:30:09 INFO - Crash address: 0x120 10:30:09 INFO - Assertion: Unknown assertion type 0x00000000 10:30:09 INFO - Process uptime: 17 seconds 10:30:09 INFO - 10:30:09 INFO - Thread 0 (crashed) 10:30:09 INFO - 0 xul.dll!nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *) [nsObserverList.cpp:491237dbcb5d : 110 + 0x7] 10:30:09 INFO - rax = 0x0000000000000000 rdx = 0x0000005989dab188 10:30:09 INFO - rcx = 0x000000000000002a rbx = 0x0000000000000000 10:30:09 INFO - rsi = 0x0000000000000000 rdi = 0x0000005989dab4c8 10:30:09 INFO - rbp = 0x000007fc3814b110 rsp = 0x000000598995f2b0 10:30:09 INFO - r8 = 0x000007fc3814b110 r9 = 0x0000000000000000 10:30:09 INFO - r10 = 0x000007fc38163cb0 r11 = 0x000000598995f240 10:30:09 INFO - r12 = 0x000000598995f6b0 r13 = 0x0000000000000000 10:30:09 INFO - r14 = 0x0000005989dab188 r15 = 0x0000000000000003 10:30:09 INFO - rip = 0x000007fc337113bc 10:30:09 INFO - Found by: given as instruction pointer in context 10:30:09 INFO - 1 xul.dll!nsObserverService::NotifyObservers(nsISupports *,char const *,char16_t const *) [nsObserverService.cpp:491237dbcb5d : 281 + 0x11] 10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110 10:30:09 INFO - rsp = 0x000000598995f2f0 r12 = 0x000000598995f6b0 10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188 10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007fc33711637 10:30:09 INFO - Found by: call frame info 10:30:09 INFO - 2 xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager *) [XPCOMInit.cpp:491237dbcb5d : 909 + 0x63] 10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110 10:30:09 INFO - rsp = 0x000000598995f330 r12 = 0x000000598995f6b0 10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188 10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007fc3379e382 10:30:09 INFO - Found by: call frame info 10:30:09 INFO - 3 xul.dll!XRE_InitChildProcess [nsEmbedFunctions.cpp:491237dbcb5d : 760 + 0x9] 10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110 10:30:09 INFO - rsp = 0x000000598995f3c0 r12 = 0x000000598995f6b0 10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188 10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007fc36c42589 10:30:09 INFO - Found by: call frame info 10:30:09 INFO - 4 firefox.exe!content_process_main(int,char * * const) [plugin-container.cpp:491237dbcb5d : 115 + 0x10] 10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110 10:30:09 INFO - rsp = 0x000000598995f680 r12 = 0x000000598995f6b0 10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188 10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007f63df61d45 10:30:09 INFO - Found by: call frame info 10:30:09 INFO - 5 firefox.exe!NS_internal_main(int,char * *,char * *) [nsBrowserApp.cpp:491237dbcb5d : 430 + 0xa] 10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110 10:30:09 INFO - rsp = 0x000000598995f6e0 r12 = 0x000000598995f6b0 10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188 10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007f63df618f9 10:30:09 INFO - Found by: call frame info 10:30:09 INFO - 6 firefox.exe!wmain [nsWindowsWMain.cpp:491237dbcb5d : 115 + 0x14] 10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110 10:30:09 INFO - rsp = 0x000000598995f760 r12 = 0x000000598995f6b0 10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188 10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007f63df6266f 10:30:09 INFO - Found by: call frame info 10:30:09 INFO - 7 firefox.exe!__scrt_common_main_seh [exe_common.inl : 253 + 0x22] 10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110 10:30:09 INFO - rsp = 0x000000598995f7c0 r12 = 0x000000598995f6b0 10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188 10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007f63dfa115d 10:30:09 INFO - Found by: call frame info 10:30:09 INFO - 8 kernel32.dll!BaseThreadInitThunk + 0x1a 10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110 10:30:09 INFO - rsp = 0x000000598995f800 r12 = 0x000000598995f6b0 10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188 10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007fc533f167e 10:30:09 INFO - Found by: call frame info 10:30:09 INFO - 9 ntdll.dll!RtlUserThreadStart + 0x21 10:30:09 INFO - rbp = 0x000007fc3814b110 rsp = 0x000000598995f830 10:30:09 INFO - rip = 0x000007fc557ac3f1 10:30:09 INFO - Found by: stack scanning 10:30:09 INFO - 10 KERNELBASE.dll!GetLegacyComposition + 0x1180 10:30:09 INFO - rbp = 0x000007fc3814b110 rsp = 0x000000598995f860 10:30:09 INFO - rip = 0x000007fc52aa09d0 10:30:09 INFO - Found by: stack scanning
Flags: needinfo?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #12) > Oh I see, I think maybe this is happening in the GPU process... and it looks like it isn't hitting the assert per se. I'm guessing CycleCollectedJSContext::Get()->Context() is doing a null deref or something.
Flags: needinfo?(continuation)
I added a null check on Context(). Hopefully that will be enough. https://treeherder.mozilla.org/#/jobs?repo=try&revision=253a2861ac5b64c89f929b125c72928485c09e3d
Blocks: 1324237
For now, I'll land it without the observer service changes.
Pushed by amccreight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/40440a0e3073 part 1 - Add method to check AllocGCBarriers. r=jonco https://hg.mozilla.org/integration/autoland/rev/5f06a88bdc86 part 2 - Assert early if we're painting at various points we enter JS. r=billm
Are you sure CycleCollectedJSContext::Get() isn't returning null?
(In reply to Bill McCloskey (:billm) from comment #21) > Are you sure CycleCollectedJSContext::Get() isn't returning null? The stub XPCOM the GPU process uses does start the CC, or so I thought, but that does seem like the most likely remaining explanation.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(In reply to Andrew McCreight [:mccr8] from comment #22) > (In reply to Bill McCloskey (:billm) from comment #21) > > Are you sure CycleCollectedJSContext::Get() isn't returning null? > > The stub XPCOM the GPU process uses does start the CC, or so I thought, but > that does seem like the most likely remaining explanation. It doesn't get nulled out at shutdown?
(In reply to Bill McCloskey (:billm) from comment #24) > It doesn't get nulled out at shutdown? The observer service is called much earlier than that. Plus things are fine except for the GPU process.
Blocks: 1324308
Depends on: 1324642
Backout by wmccloskey@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/335c5d74c1a1 Back out, part 2 - Assert early if we're painting at various points we enter JS (a=backout) https://hg.mozilla.org/integration/mozilla-inbound/rev/51d359b72c57 Back out, part 1 - Add method to check AllocGCBarriers (a=backout)
Seems that this was backed out. This is the last week before the merge of FF53. Do we need to get this in?
Flags: needinfo?(continuation)
This was backed out because the patch that was causing this issue was also backed out, so there's no need for it any more.
Flags: needinfo?(continuation)
Resolution: FIXED → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: