Closed Bug 1009675 Opened 11 years ago Closed 10 years ago

Crash when commenting on github.com [@ mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>]

Categories

(Core :: DOM: Core & HTML, defect)

31 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox30 --- wontfix
firefox31 + verified
firefox32 + verified
firefox33 + verified
firefox-esr24 --- unaffected
b2g-v1.3 --- unaffected
b2g-v1.3T --- unaffected
b2g-v1.4 --- unaffected
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: azakai, Assigned: bzbarsky)

References

(Blocks 1 open bug, )

Details

(5 keywords, Whiteboard: [GGC][testcase in comment 57][adv-main31+])

Crash Data

Attachments

(8 files, 3 obsolete files)

(deleted), patch
abillings
: sec-approval+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
peterv
: review+
Details | Diff | Splinter Review
(deleted), patch
peterv
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is report bp-0e3f20b3-f50e-401f-b1a9-c2c312140513. ============================================================= I've seen this twice in an hour today, both times when clicking to respond on a github issue or pull request. Here's the other report: https://crash-stats.mozilla.com/report/index/d6a754ad-37f8-4ab9-bd2c-f96b22140513
Those are both null-derefs at line 603 of jsfriendapi.h: 600 inline const js::Class * 601 GetObjectClass(JSObject *obj) 602 { 603 return reinterpret_cast<const shadow::Object*>(obj)->type->clasp; 604 } sadly the bits that would tell me who called GetObjectClass are missing from the stack, because of inlining. :( The caller we do have is Promise::RunTask doing callbacks[i]->Call(cx, value), and I guess breakpad thinks this is calling WrapperPromiseCallback::Call. I _think_ that the AnyCallback::Call that calls doesn't do any GetObjectClass calls in its inline part, but I'm not sure. WrapperPromiseCallback::Call definitely does GetObjectClass via UNWRAP_OBJECT, but we're checking isObject() before doing that... Catching this in a debug build that would give us more stack info would be really helpful.
Happening on OS X, too. I merged a pull request on GitHub, and, although it was successful, Nightly (2014-05-13) crashed with this signature. Would a regression window be helpful?
OS: Linux → All
This crash signature first appeared on 2014-02-07 and is the #18 crash for Firefox 31.0a2 with 72 out of 8660 crashes in the last week. From the comments and urls in the crash reports, it appears to commonly happen during pull requests or merges on github. Crashing thread: 0 XUL mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>) obj-firefox/x86_64/dist/include/jsfriendapi.h 1 XUL mozilla::dom::Promise::RunTask() dom/promise/Promise.cpp 2 XUL mozilla::dom::PromiseResolverTask::Run() dom/promise/Promise.cpp 3 XUL nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp 4 XUL NS_ProcessPendingEvents(nsIThread*, unsigned int) xpcom/glue/nsThreadUtils.cpp 5 XUL nsBaseAppShell::NativeEventCallback() widget/xpwidgets/nsBaseAppShell.cpp 6 XUL nsAppShell::ProcessGeckoEvents(void*) widget/cocoa/nsAppShell.mm 7 CoreFoundation CoreFoundation@0x12b31 8 CoreFoundation CoreFoundation@0x12455 9 CoreFoundation CoreFoundation@0x357f5 10 AppKit AppKit@0x119b37
Keywords: topcrash
Interestingly, none of the crashes are on Windows. All are Mac or Linux... That's pretty odd. Logan, a regression range would definitely be helpful.
I can't seem to reproduce the issue, unfortunately. It was a very specific situation for me (accepting a pull request). Commenting on issues doesn't appear to make it crash.
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 ID:20140511004003 CSet: b122b633fe71 I also hit this crash on github when I clicked the middle-button of my mouse onto the comment field: bp-88979999-6ad3-48ad-89ef-940b62140515 A lot of the comments on crashstats actually talk about github.
Crash Signature: [@ mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>)] → [@ mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>)]
I may have been wrong with my last comment. It actually crashed when I clicked the Submit button to post my comment. I hit one more crash. I will try to figure out a pattern to get it crashing reproducible.
(In reply to Henrik Skupin (:whimboo) from comment #8) > I may have been wrong with my last comment. It actually crashed when I > clicked the Submit button to post my comment. I hit one more crash. I will > try to figure out a pattern to get it crashing reproducible. Unable to reproduce this on 31a2 Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 ID: 20140429004000
I'm going through the list of top crashers on nightly 32a1, and this is currently in the top 10, happening mostly on OS X and Linux (there were at least a couple of reports on Windows). Most of the comments in the reports have to do with interactions in GitHub, like commenting on GitHub; pushing a Git commit; clicking on various anchor tag links on github.com. In crash stats, the column corresponding to "First Appearance" says that this started happening on 32a1 around 5/2, but it looks like it started earlier by looking at the reports in 31a2. 0 XUL mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>) obj-firefox/x86_64/dist/include/jsfriendapi.h 1 XUL mozilla::dom::Promise::RunTask() dom/promise/Promise.cpp 2 XUL mozilla::dom::PromiseResolverTask::Run() dom/promise/Promise.cpp 3 XUL nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp 4 XUL NS_ProcessPendingEvents(nsIThread*, unsigned int) xpcom/glue/nsThreadUtils.cpp 5 XUL nsBaseAppShell::NativeEventCallback() widget/xpwidgets/nsBaseAppShell.cpp 6 XUL nsAppShell::ProcessGeckoEvents(void*) widget/cocoa/nsAppShell.mm 7 CoreFoundation CoreFoundation@0x7f731 8 CoreFoundation CoreFoundation@0x70ea2 9 CoreFoundation CoreFoundation@0x7062f 10 libnss3.dylib PR_GetCurrentThread 11 XUL XPCJSContextStack::Pop() obj-firefox/x86_64/dist/include/nsTArray-inl.h 12 CoreFoundation CoreFoundation@0x700b5 13 HIToolbox HIToolbox@0x2ea0d 14 HIToolbox HIToolbox@0x2e685 15 HIToolbox HIToolbox@0x2e5bc 16 AppKit AppKit@0x243de 17 AppKit AppKit@0xa506d8 18 AppKit AppKit@0xa50795 19 AppKit AppKit@0xa50779 20 AppKit AppKit@0xa5075e 21 libobjc.A.dylib libobjc.A.dylib@0x1d3fe 22 libobjc.A.dylib libobjc.A.dylib@0x51c4 23 XUL nsAutoRetainCocoaObject::~nsAutoRetainCocoaObject() widget/cocoa/nsCocoaUtils.h 24 AppKit AppKit@0x23a2b 25 CoreFoundation CoreFoundation@0x63abb 26 CoreFoundation CoreFoundation@0x78fd2 27 libobjc.A.dylib libobjc.A.dylib@0x5080 28 AppKit AppKit@0x23947 29 AppKit AppKit@0x3bb703 30 AppKit AppKit@0x3bb6d0 31 AppKit AppKit@0x3bb6f0 32 XUL -[ToolbarWindow sendEvent:] widget/cocoa/nsCocoaWindow.mm 33 XUL -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] widget/cocoa/nsAppShell.mm 34 AppKit AppKit@0x17b2c 35 AppKit AppKit@0xa50779 36 AppKit AppKit@0xa50795 37 AppKit AppKit@0xa5078f 38 XUL nsAppShell::Run() widget/cocoa/nsAppShell.mm 39 XUL nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp 40 XUL XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp 41 XUL XREMain::XRE_main(int, char**, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp 42 XUL XRE_main toolkit/xre/nsAppRunner.cpp 43 firefox main browser/app/nsBrowserApp.cpp 44 firefox start
Version: 24 Branch → 32 Branch
Just hit this: https://crash-stats.mozilla.com/report/index/f17f2d50-964d-4083-a2e6-1369c2140516 I was submitting a comment on a GitHub pull request
All of that has the issues from comment 1. We need a stack from a debug build or info on regression range or something more to go on.
I created a Mozmill script earlier today to create comments and close/reopen issues via automation. But even after a couple of minutes no crash happened. Not sure about per-requisits we might have to setup first.
Maybe it's unrelated, but this happened after I referenced an issue in the comment box.
What I tried so far on an issue is: * referencing another issue * referencing a user * setting/removing labels * open/close the issue * close the issue with adding a comment Nothing broke it so far. When I re-read the last comments I wonder if that is more PR related.
Top crash, tracking!
Crash Signature: [@ mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>)] → [@ mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>)] [@ mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>) ]
I just saw this commenting on a pull request. [0] As soon as I click submit, FF crashes. I just submitted a second comment and did not experience a crash, so there must be more to it. [0] https://github.com/mozilla/calculator/pull/71/files#r12855957
I'd really appreciate it if people who use github regularly did that with debug builds for a while here. A crash in a debug build might give use some information about what's actually going on here...
Noob question; I just need to add to my .mozconfig: ac_add_options --enable-debug-symbols export MOZ_DEBUG_SYMBOLS=1 export CFLAGS="-gdwarf-2" export CXXFLAGS="-gdwarf-2" and run a build, and paste the link to the crash report here?
nvm, got all set up with a debug build.
haven't had any luck trying to reproduce.
Sorry but I cannot use a debug version of Aurora with that profile. I get segfaults one after the other. I may have to file another bug, and investigate first.
Henrik, are you on Linux and not setting the JS_DISABLE_SLOW_SCRIPT_SIGNALS environment variable? See https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_gdb#I_keep_getting_a_SIG32_in_the_debugger._How_do_I_fix_it.3F
Looks like that this wasn't the problem. I still get lots of those SIGSEVs, something similar to: Assertion failure: offset < length(), at /mozilla/code/firefox/aurora/js/src/jsscript.h:942 Program received signal SIGSEGV, Segmentation fault. 0x00007ffff2ea094c in offsetToPC (offset=<optimized out>, this=<optimized out>) at /mozilla/code/firefox/aurora/js/src/jsscript.h:942 942 JS_ASSERT(offset < length()); Any debug symbols seem to have been optimized out, so stacks I get contain 'libxul.so +0x02F0ED67' like lines. That's with debug builds from ftp.m.o, and self-build debug builds.
I hit a lot of SIGSEVs when running Firefox. But I finally hit this crash with a debug build of Aurora. Sadly I hit 'c' in the debugger, so no way to further debug it. Boris, please let me know which commands I have to issue to give you more details. Program received signal SIGSEGV, Segmentation fault. 0x00007ffff292b8ef in mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>) () from /mozilla/bin/aurora/libxul.so (gdb) bt #0 0x00007ffff292b8ef in mozilla::dom::WrapperPromiseCallback::Call(JS::Handle<JS::Value>) () from /mozilla/bin/aurora/libxul.so #1 0x00007ffff292985d in mozilla::dom::Promise::RunTask() () from /mozilla/bin/aurora/libxul.so #2 0x00007ffff292afc0 in mozilla::dom::PromiseResolverMixin::RunInternal() () from /mozilla/bin/aurora/libxul.so #3 0x00007ffff292afda in mozilla::dom::PromiseResolverTask::Run() () from /mozilla/bin/aurora/libxul.so #4 0x00007ffff1bcff41 in nsThread::ProcessNextEvent(bool, bool*) () from /mozilla/bin/aurora/libxul.so #5 0x00007ffff1bba4a9 in NS_ProcessNextEvent(nsIThread*, bool) () from /mozilla/bin/aurora/libxul.so #6 0x00007ffff1be2e3b in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () from /mozilla/bin/aurora/libxul.so #7 0x00007ffff203fcb3 in MessageLoop::Run() () from /mozilla/bin/aurora/libxul.so #8 0x00007ffff27319f5 in nsBaseAppShell::Run() () from /mozilla/bin/aurora/libxul.so #9 0x00007ffff301e091 in nsAppStartup::Run() () from /mozilla/bin/aurora/libxul.so #10 0x00007ffff2fea98a in XREMain::XRE_mainRun() () from /mozilla/bin/aurora/libxul.so #11 0x00007ffff2fed069 in XREMain::XRE_main(int, char**, nsXREAppData const*) () from /mozilla/bin/aurora/libxul.so #12 0x00007ffff2fed2e8 in XRE_main () from /mozilla/bin/aurora/libxul.so #13 0x00000000004063bb in do_main(int, char**, nsIFile*) () #14 0x0000000000403834 in main ()
Henrik, that backtrace doesn't have any filenames or line numbers. :( In fact, this stack is identical to the breakpad stacks, but breakpad thinks the actual crash is in jsfriendapi... Are you using an --enable-debug _and_ --enable-optimize build or something? :(
Oh, and as far as what to look for, I'd like to know which thing here is actually null and how it got into the code that crashes.
Yeah, that's what I was wondering too. Optimizations are disabled. So I talked with Glandium on IRC, and we setup a better mozconfig for me now. Once the build is ready I will continue.
Summary: crash in mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>) → Crash when commenting on github.com [@ mozilla::dom::WrapperPromiseCallback::Call(JSContext*, JS::Handle<JS::Value>]
Ok, I was able to fix my constant SIGSEVs by putting 'handle SIG32 noprint' in my .gdbinit file. Lets hope the next crash doesn't take that long, given that using a debug build is absolutely slow on my machine and causes lot of hangs.
This is the #9 topcrash for Firefox 31 right now, with 99/4154 crashes in the last 7 days. There don't appear to be crashes with this signature for other versions of Firefox. 77 of the 99 crashes are in Mac OSX 10.9 - maybe that will help reproduce the bug.
I've been trying to reproduce on Mac for weeks now; using debug builds for all my GitHub interactions. But I just don't use github that much. :(
For what it's worth, I have a workable (and portable) Mac ASan build at http://people.mozilla.org/~stmichaud/bmo/firefox-asan.dmg. It's a universal build, though only the 64-bit component actually uses ASan. But we've never had portable ASan builds before -- even semi-official ones like mine. So there's probably a large backlog of bugs waiting to be discovered, which may interfere with diagnosing this one. (One was discovered only the day after I posted my build -- bug 1021612.) But it may be worth a try. I will keep updating my Mac ASan build to current m-c code, roughly once a week.
> I will keep updating my Mac ASan build to current m-c code, roughly once a week. I'll also fix LLVM bugs (locally) that interfere with its usability on the Mac. I've already fixed one.
BTW, pretty sure this is not *just* GitHub. I forgot when it crashed for me, but I'm pretty sure it wasn't while I was using GitHub.
I get it, and I really only use Linux. I have had it without github as well. Non debug build (from source)
> I get it, and I really only use Linux. Then you might want to try one of the official Linux ASan builds, here: https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan/
> Non debug build (from source) Were you able to get a better stack than the one breakpad gives here? Note that I'm 99% sure this is not an issue ASan will help with: we have a null deref, not a bogus-memory deref...
I should attach gdb to it. Will do that.
If you're going to attach gdb, don't use a debug build -- debug builds have their symbols stripped. Use a "regular" (non-debug) m-c nightly instead (which doesn't have its symbols stripped, because it needs to support the Gecko profiler). Gdb, like ASan builds, is much more likely to give you a good (complete) stack than Breakpad. Also, I'm not yet convinced these are null-dereferences -- see bug 1018360. But I'm currently checking.
The Windows crash addresses are 0x2b2b2b2b2b2b2b2b ("++++++++") or 0x4b4b4b4b4b4b4b4b ("KKKKKKKK"). I think that's reason enough to mark this bug security sensitive. Those who have permission to download minidumps can do that and then run minidump-stackwalk on them. The latest version of minidump-stackwalk displays the entire "thread state" (the values of all the user-level registers) for the crashing thread. All the minidumps I looked at had either rax = 0x2b2b2b2b2b2b2b2b or rax = 0x4b4b4b4b4b4b4b4b These were the only register values > 0x7fffffffffffffff -- so the crashes must have happened dereferencing the "address" in rax. Here's how to download and build the latest minidump-stackwalk (on OS X): 1) Install a reasonably recent XCode (I have 4.5.2) and the latest XCode commandline tools. 2) Visit https://github.com/mozilla/socorro and click "download zip". 3) CC=clang CXX=clang++ make breakpad 4) CC="clang -Wno-switch" CXX="clang++ -Wno-switch" make stackwalker
Group: core-security
> These were the only register values > 0x7fffffffffffffff I meant to say: These were the only register values > 0x7fffffffffff
Needless to say, people should start testing with ASan builds.
At least for the 2014-06-06-03-02-06-mozilla-central nightly on OS X, the crash address is also 0x2b2b2b2b2b2b2b2b ("++++++++"). I loaded XUL from this m-c nightly into a disassembler (Hopper Disassembler, http://www.hopperapp.com/), then found the following assembly code at the address where these crashes happen in that copy of XUL (offset 0x1370818): 0000000001370809 488D8538FEFFFF lea rax, qword [ss:rbp-0x2a0+var_216] 0000000001370810 49894718 mov qword [ds:r15+0x18], rax 0000000001370814 488B4308 mov rax, qword [ds:rbx+0x8] 0000000001370818 488B00 mov rax, qword [ds:rax] 000000000137081b 8B4808 mov ecx, dword [ds:rax+0x8] Then I downloaded minidumps corresponding to several of this bug's crashes in this particular m-c nightly and ran the latest minidump-stackwalk on them. They all showed the following as part of the crashing thread's crash state: rax = 0x2b2b2b2b2b2b2b2b
0x2b2b2b2b2b2b2b2b and 0x4b4b4b4b4b4b4b4b could conceivably be values used for "poisoning" memory that's been freed. But I thought we always used 0x5a5a5a5a5a5a5a5a for memory poisoning.
> If the OS X and Linux crashes are anything like the crashes on Windows They're not. They all seem to list 0x0 as "Address" in the summary table at https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3AWrapperPromiseCallback%3A%3ACall%28JSContext*%2C+JS%3A%3AHandle%3CJS%3A%3AValue%3E%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2014-06-09+02%3A00%3A00&range_value=1#tab-reports https://crash-stats.mozilla.com/report/index/753e42bd-744a-4498-9b4e-a20892140606 is an example of a 014-06-06-03-02-06 build in crash-stats and it claims 0. All that said, here's what firebot has to say about 0x2b2b2b2b and 0x4b4b4b4b, if that's where we're crashing: *firebot* 0x2b2b2b2b is The poison value written over the nursery after it is swept. When debugging a GGC crash, 0x2b or not 0x2b: that is the question. *firebot* 0x4b4b4b4b is the poison value (JS_SWEPT_TENURED_PATTERN) written over tenured heap things and arenas after they are swept. Terrence, can I somehow enable some sort of "gc more often" mode at runtime after I've loaded github?
Flags: needinfo?(terrence)
> They all seem to list 0x0 as "Address" in the summary table But that's wrong. See bug 1018360.
Oh, good to know! So knowing that we have a swept object, not a null pointer, is actually pretty useful. It means we should be looking for GC hazards in particular, I'd think. I'll do some auditing of the Promise code tomorrow with that in mind.
Was able to catch this in a debug build with bz's suggestion to force GC more often: * thread #1: tid = 0x71f414, 0x00000001015f8860 XUL`js::GetObjectClass(obj=0x00000001141f7638) + 16 at jsfriendapi.h:610, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001015f8860 XUL`js::GetObjectClass(obj=0x00000001141f7638) + 16 at jsfriendapi.h:610 frame #1: 0x00000001015f8965 XUL`mozilla::dom::GetDOMClass(obj=0x00000001141f7638) + 21 at BindingUtils.h:185 frame #2: 0x0000000103b9b16f XUL`tag_nsresult mozilla::dom::UnwrapObject<mozilla::dom::Promise, mozilla::dom::Promise*>(obj=0x00000001141f7638, value=0x00007fff5fbfcb78, protoID=Promise, protoDepth=0) + 31 at BindingUtils.h:223 frame #3: 0x0000000103b97ee7 XUL`tag_nsresult mozilla::dom::UnwrapObject<(obj=0x00000001141f7638, value=0x00007fff5fbfcb78)331, mozilla::dom::Promise, mozilla::dom::Promise*>(JSObject*, mozilla::dom::Promise*&) + 39 at BindingUtils.h:259 frame #4: 0x0000000103ba08d2 XUL`mozilla::dom::WrapperPromiseCallback::Call(this=0x0000000130760340, aCx=0x00000001117df170, aValue=Handle<JS::Value> at 0x00007fff5fbfcc88) + 946 at PromiseCallback.cpp:244 frame #5: 0x0000000103b95b41 XUL`mozilla::dom::Promise::RunTask(this=0x000000012d20cce0) + 897 at Promise.cpp:1025 frame #6: 0x0000000103b96447 XUL`mozilla::dom::Promise::RunResolveTask(this=0x000000012d20cce0, aValue=Handle<JS::Value> at 0x00007fff5fbfcf18, aState=Resolved, aAsynchronous=SyncTask) + 871 at Promise.cpp:1219 frame #7: 0x0000000103b9cc9f XUL`mozilla::dom::PromiseResolverMixin::RunInternal(this=0x000000012b37b658) + 207 at Promise.cpp:116 frame #8: 0x0000000103b9cd5c XUL`mozilla::dom::PromiseResolverTask::Run(this=0x000000012b37b640) + 28 at Promise.cpp:144 frame #9: 0x00000001016b6272 XUL`nsThread::ProcessNextEvent(this=0x0000000100321aa0, aMayWait=false, aResult=0x00007fff5fbfd163) + 1570 at nsThread.cpp:766 frame #10: 0x000000010159f36a XUL`NS_ProcessPendingEvents(thread=0x0000000100321aa0, timeout=20) + 154 at nsThreadUtils.cpp:210 frame #11: 0x0000000103454859 XUL`nsBaseAppShell::NativeEventCallback(this=0x0000000111642600) + 201 at nsBaseAppShell.cpp:98 frame #12: 0x00000001033e03dd XUL`nsAppShell::ProcessGeckoEvents(aInfo=0x0000000111642600) + 445 at nsAppShell.mm:286 It's still open (and I'll try to leave it) so if I can do some more investigation, let me know.
Took me some time but I think I have a solid STR: 1) Open https://github.com/mozilla/rust/issues/414 2) Open the web console 3) Paste and execute this: setInterval(function () $("#discussion_bucket").updateContent("<div id='discussion_bucket'>" + new Array(100).join("<h1>foobar</h1>") + "</div>"), 10) 4) Wait a few seconds until Firefox crashes.
BTW, you don't need to be logged in to try the STR above.
Last good nightly: 2014-03-28 First bad nightly: 2014-03-29 Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6fa163ff81a3&tochange=4f3443da36a1 I guess this has been around since we enabled generational GC in bug 619558, then?
Version: 32 Branch → 31 Branch
Whiteboard: [GGC]
(In reply to Boris Zbarsky [:bz] from comment #47) > > If the OS X and Linux crashes are anything like the crashes on Windows > > They're not. They all seem to list 0x0 as "Address" in the summary table at > https://crash-stats.mozilla.com/report/ > list?signature=mozilla%3A%3Adom%3A%3AWrapperPromiseCallback%3A%3ACall%28JSCon > text*%2C+JS%3A%3AHandle%3CJS%3A%3AValue%3E%29&product=Firefox&query_type=cont > ains&range_unit=weeks&process_type=any&hang_type=any&date=2014-06- > 09+02%3A00%3A00&range_value=1#tab-reports > > https://crash-stats.mozilla.com/report/index/753e42bd-744a-4498-9b4e- > a20892140606 is an example of a 014-06-06-03-02-06 build in crash-stats and > it claims 0. > > All that said, here's what firebot has to say about 0x2b2b2b2b and > 0x4b4b4b4b, if that's where we're crashing: > > *firebot* 0x2b2b2b2b is The poison value written over the nursery after it is > swept. When debugging a GGC crash, 0x2b or not 0x2b: that is the > question. > *firebot* 0x4b4b4b4b is the poison value (JS_SWEPT_TENURED_PATTERN) written > over > tenured heap things and arenas after they are swept. > > Terrence, can I somehow enable some sort of "gc more often" mode at runtime > after I've loaded github? Yes, absolutely. In an --enable-debug or --enable-gczeal build, you can set the environment JS_GC_ZEAL to a flag to trigger several helpful tools. Use JS_GC_ZEAL=help to get a list of the various modes.
Flags: needinfo?(terrence)
> you can set the environment JS_GC_ZEAL to a flag to trigger several helpful tools. That fails the "at runtime after I've loaded github" criterion. ;)
From terrence on IRC: the C interface you want is JS_SetGCZeal(cx, zealMode, frequency)
Tim, thank you for coming up with those STR and for the stack. And Steven, thank you _very_ much for figuring out the right direction to be poking here! So we're on this line in WrapperPromiseCallback::Call: 244 nsresult r = UNWRAP_OBJECT(Promise, valueObj, returnedPromise); and we have: (gdb) p valueObj.ptr->type_.value $7 = ('js::types::TypeObject' *) 0x2b2b2b2b2b2b2b2b valueObj came from here: JS::Rooted<JSObject*> valueObj(aCx, &retValue.toObject()); and retValue came from here: JS::Rooted<JS::Value> retValue(aCx, mCallback->Call(value, rv, CallbackObject::eRethrowExceptions)); What I think we have here is a rooting hazard that the analysis is missing for some reason. In particular, in AnyCallback::Call we do this (with some error handling elided): CallSetup s(this, aRv, aExceptionHandling); return Call(s.GetContext(), JS::UndefinedHandleValue, value, aRv); where the Call() function returns a JS::Value. So between when that inner call returns and when we ourselves return there's an unrooted Value hanging out. But ~CallSetup can GC, and then we end up with a dead object. Here's a testcase that reliably reproduces the crash, in every build going back to Firefox 29 (when we enabled Promise). All you have to do is install https://www.squarefree.com/extensions/domFuzzLite3.xpi and run this: <script> var s = document.querySelector("script"); var obs = new MutationObserver(function() { fuzzPriv.forceGC(); }); obs.observe(s, { childList: true }); Promise.resolve(5).then(function() { s.appendChild(document.createTextNode("x")); return {}; }) </script> or use your favorite less-reliable method of triggering GC if you don't want the extension.
Looks like we only enabled exact rooting in Firefox 28, so builds before that should not be affected at all. Firefox 28 is _probably_ affected only if there is a non-Promise API that's calling WebIDL callbacks that return "any" or "object". Oh, and the reason the testcase in comment 57 crashes is that ~CallSetup runs mutation observers, so that lets us explicitly trigger a gc in ~CallSetup. Steve Fink is going to fix the analysis; I'll fix the bindings bug here.
Assignee: nobody → bzbarsky
While this might show up on the release population, it doesn't currently show up as a topcrasher there and that is likely due to more GitHub devs using pre-release channels. Will track this for FF30 and it could be a potential ride-along, depending on risk, but right now doesn't look like a chemspill driver - let's wait and watch what happens in the next day or two as we absorb users on FF30 release.
For what it's worth, I suspect risk will be pretty low and the bigger worry for me is potential exploitability than stability... I agree we shouldn't block the release on this, but we should strongly consider it as a ride-along.
Peter, if you won't be able to get to this stuff pretty soon, let me know, please. This needs backporting to beta 31.
Attachment #8437929 - Flags: review?(peterv)
Attached patch part 2. Return WebIDL 'any' values as handles (obsolete) (deleted) — — Splinter Review
Attachment #8437930 - Flags: review?(peterv)
Attached patch part 3. Return WebIDL 'object' values as handles (obsolete) (deleted) — — Splinter Review
Attachment #8437931 - Flags: review?(peterv)
Comment on attachment 8437929 [details] [diff] [review] part 1. Change the return value of getRetvalDeclarationForType to allow more than two states for the outparam bit Review of attachment 8437929 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/bindings/Codegen.py @@ +5808,5 @@ > + args.append(CGGeneric("result")) > + else: > + assert resultOutParam is "ptr" > + args.append(CGGeneric("&result")) > + Trailing whitespace.
Attachment #8437929 - Flags: review?(peterv) → review+
Comment on attachment 8437930 [details] [diff] [review] part 2. Return WebIDL 'any' values as handles Review of attachment 8437930 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/html/content/src/nsGenericHTMLElement.h @@ +132,2 @@ > { > + return GetItemValue(aCx, GetWrapperPreserveColor(), aRetval, aError); I'd drop |return |. ::: dom/base/DOMRequest.cpp @@ +82,5 @@ > > NS_IMETHODIMP > DOMRequest::GetResult(JS::MutableHandle<JS::Value> aResult) > { > + GetResult(aResult); This doesn't seem right, I think you wanted to pass null for the context. ::: dom/base/DOMRequest.h @@ +60,5 @@ > + /* void GetResult(JS::MutableHandle<JS::Value> aRetval) const > + { > + GetResult(nullptr, aRetval); > + } > + */ And this can go. ::: dom/base/nsGlobalWindow.cpp @@ +10587,5 @@ > +nsGlobalWindow::GetInterface(JSContext* aCx, nsIJSID* aIID, > + JS::MutableHandle<JS::Value> aRetval, > + ErrorResult& aError) > +{ > + return dom::GetInterface(aCx, this, aIID, aRetval, aError); I'd drop |return |. ::: dom/bindings/BindingUtils.h @@ +1661,2 @@ > { > + return GetInterfaceImpl(aCx, aThis, aThis, aIID, aRetval, aError); I'd drop |return |. ::: dom/events/MessageEvent.cpp @@ +78,2 @@ > JS::Rooted<JS::Value> data(aCx, mData); > if (!JS_WrapValue(aCx, &data)) { Could just do aData.set(mData); if (!JS_WrapValue(aCx, aData)) { ? ::: js/xpconnect/src/event_impl_gen.py @@ +311,2 @@ > fd.write(" nsresult rv = NS_ERROR_UNEXPECTED;\n") > + fd.write(" if (m%s && !XPCVariant::VariantDataToJS(m%s, &rv, aRetval)) {\n" % (firstCap(a.name), firstCap(a.name))) Hmm, this used to return null if the member was null, shouldn't we keep that behaviour?
Attachment #8437930 - Flags: review?(peterv) → review+
> Could just do Yes, done. > Hmm, this used to return null if the member was null Good catch. Fixed like so: fd.write(" nsresult rv = NS_ERROR_UNEXPECTED;\n") fd.write(" if (!m%s) {\n" % firstCap(a.name)) fd.write(" aRetval.setNull();\n") fd.write(" } else if (!XPCVariant::VariantDataToJS(m%s, &rv, aRetval)) {\n" % (firstCap(a.name))) fd.write(" aRv.Throw(NS_ERROR_FAILURE);\n") fd.write(" }\n") Applied the rest of the comment 65 and comment 66 comments.
Comment on attachment 8437931 [details] [diff] [review] part 3. Return WebIDL 'object' values as handles Review of attachment 8437931 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/base/src/nsDocument.cpp @@ +5998,2 @@ > JSObject* constructorObject = JS_GetFunctionObject(constructor); > + if (!constructorObject) { This shouldn't ever happen, no? IIRC it returns the same object. ::: dom/base/nsGlobalWindow.h @@ +1000,5 @@ > { > if (mDoc) { > mDoc->WarnOnceAbout(nsIDocument::eWindow_Content); > } > + return GetContent(aCx, aRetval, aError); I'd drop |return |. ::: dom/nfc/MozNDEFRecord.h @@ +70,1 @@ > } This could just be: if (mType) { JS::ExposeObjectToActiveJS(mType); } retval.set(mType); @@ +76,5 @@ > + JS::ExposeObjectToActiveJS(mId); > + retval.set(mId); > + } else { > + retval.set(nullptr); > + } Same here. @@ +90,1 @@ > } And here.
Attachment #8437931 - Flags: review?(peterv) → review+
> This shouldn't ever happen, no? IIRC it returns the same object. Indeed. Changed to just do: aRetval.set(JS_GetFunctionObject(constructor)); Applied the other comments.
Attachment #8437929 - Attachment is obsolete: true
Attached patch part 2. Return WebIDL 'any' values as handles (deleted) — — Splinter Review
Attachment #8437930 - Attachment is obsolete: true
Attachment #8437931 - Attachment is obsolete: true
Comment on attachment 8438523 [details] [diff] [review] part 1. Change the return value of getRetvalDeclarationForType to allow more than two states for the outparam bit This is actually a sec-approval request for all three patches. [Security approval request comment] How easily could an exploit be constructed based on the patch? Not very easily, imo. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No. Which older supported branches are affected by this flaw? Anything based on Gecko 28 or later. I think that means just 29, 30, beta 31, aurora 32 If not all supported branches, which bug introduced the flaw? Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? I have backports for Aurora 32 and beta 31. It should be pretty simple to create them for 30 and 29 as needed, I expect. How likely is this patch to cause regressions; how much testing does it need? This has moderate risk of me and Peter missing something about the mass-changes... In some ways the more testing we can get the better.
Attachment #8438523 - Flags: sec-approval?
Attached patch Part 2 for Aurora 32 (deleted) — — Splinter Review
Part 1 and part 3 apply on Aurora as-is
Attachment #8438524 - Attachment is obsolete: true
Comment on attachment 8438523 [details] [diff] [review] part 1. Change the return value of getRetvalDeclarationForType to allow more than two states for the outparam bit sec-approval+ for this to go into trunk. Since there is some risk, putting it in sooner is probably better.
Attachment #8438523 - Flags: sec-approval? → sec-approval+
Attached patch Addendum to part 2 for beta 31 (deleted) — — Splinter Review
Attachment #8438663 - Flags: review?(peterv)
Attached patch Addendum to part 3 for beta 31 (deleted) — — Splinter Review
Attachment #8438664 - Flags: review?(peterv)
Blocks: ExactRooting
Comment on attachment 8438523 [details] [diff] [review] part 1. Change the return value of getRetvalDeclarationForType to allow more than two states for the outparam bit [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 753203, effectively. User impact if declined: Crashes that are pretty easy to trigger from content and may be exploitable. Testing completed (on m-c, etc.): Passes tests. Risk to taking this patch (and alternatives if risky): Medium-ish risk; there were a lot of changes here... String or IDL/UUID changes made by this patch: None.
Attachment #8438523 - Flags: approval-mozilla-beta?
Attachment #8438523 - Flags: approval-mozilla-aurora?
Attachment #8438558 - Flags: approval-mozilla-aurora?
Attachment #8438526 - Flags: approval-mozilla-aurora?
Attached patch Part 2 for beta 31 (deleted) — — Splinter Review
Attachment #8438558 - Attachment is obsolete: true
Attachment #8438558 - Flags: approval-mozilla-aurora?
Attachment #8438524 - Attachment is obsolete: false
Attachment #8438558 - Attachment is obsolete: false
Attachment #8438558 - Flags: approval-mozilla-aurora?
Attached patch Part 3 for beta 31 (deleted) — — Splinter Review
Attachment #8438680 - Flags: approval-mozilla-beta?
Attachment #8438683 - Flags: approval-mozilla-beta?
Attachment #8438663 - Flags: review?(peterv) → review+
Attachment #8438664 - Flags: review?(peterv) → review+
For the record, this is the updated analysis's hazard output for this bug: Function '_ZN7mozilla3dom11AnyCallback4CallEN2JS6HandleINS2_5ValueEEERNS_11ErrorResultENS0_14CallbackObject17ExceptionHandlingE|JS::Value mozilla::dom::AnyCallback::Call(class JS::Handle<JS::Value>, mozilla::ErrorResult*, uint32)' has unrooted '<returnvalue>' of type 'JS::Value' live across GC call '_ZN7mozilla3dom14CallbackObject9CallSetupD1Ev|void mozilla::dom::CallbackObject::CallSetup::~CallSetup()' at dist/include/mozilla/dom/PromiseBinding.h:126 dist/include/mozilla/dom/PromiseBinding.h:129 GC Function: _ZN7mozilla3dom14CallbackObject9CallSetupD1Ev|void mozilla::dom::CallbackObject::CallSetup::~CallSetup() void mozilla::dom::CallbackObject::CallSetup::~CallSetup(int32) uint8 JS_ReportPendingException(JSContext*) uint8 js_ReportUncaughtException(JSContext*) void js::CallErrorReporter(JSContext*, int8*, JSErrorReport*) IndirectCall: hook
Steve, that looks lovely.
Attachment #8438523 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8438526 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8438558 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Since there are some risks, I've only approved the aurora change. I will approve the beta changes for beta 3.
This hasn't actually landed on beta 31 yet, has it?
Flags: needinfo?(lhenry)
Correct. It has not.
Flags: needinfo?(lhenry)
Beta 2 built. Taking the patches for beta 3.
Attachment #8438680 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8438683 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8438523 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
If it applies cleanly there, might be worth it just to reduce the differences between the branches. It's not high priority, though, since afaik exact marking is not enabled on b2g30.
Flags: needinfo?(bzbarsky)
Lots of tedious unbitrotting trying to get the beta patches to apply to b2g30.
Given that exact rooting is required to hit this, marking v1.4 as unaffected since the benefits of any backport don't justify the effort in getting these patches to apply.
Depends on: 1029652
Whiteboard: [GGC][testcase in comment 57] → [GGC][testcase in comment 57][adv-main31+]
Rerpoduced the original issue using the following build: - http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/05/2014-05-13-03-02-01-mozilla-central/ - Used the STR from comment # 52 and reproduced the issue several times Went through the following verification: Win 8.1: [PASSED] - fx 33.0a1 [20140714030201]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-03-02-01-mozilla-central/ - fx 32.0a2 [20140714004001]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-00-40-01-mozilla-aurora/ - fx 31.0b9 [20140710141843]: http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/31.0b9/win32/en-US/ Ubuntu 13.10 x64: [PASSED] - fx 33.0a1 [20140714030201]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-03-02-01-mozilla-central/ - fx 32.0a2 [20140714004001]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-00-40-01-mozilla-aurora/ - fx 31.0b9 [20140710141843]: http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/31.0b9/linux-x86_64/en-US/ OSX 10.9.4 x64: [PASSED] - fx 33.0a1 [20140714030201]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-03-02-01-mozilla-central/ - fx 32.0a2 [20140714004001]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-07-14-00-40-01-mozilla-aurora/ - fx 31.0b9 [20140710141843]: http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/31.0b9/mac/en-US/ - ran through the STR using several different tabs - ran through the STR using several different windows - ran through the STR using private windows/tabs - ran through the STR using e10s windows/tabs
Status: RESOLVED → VERIFIED
QA Whiteboard: [qa!]
Group: core-security → core-security-release
Group: core-security-release
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: