Closed Bug 608021 Opened 14 years ago Closed 13 years ago

Crashes based in nsFrameMessageManager::ReceiveMessage [@ js::Interpret] [@ js::TraceRecorder::record_NativeCallComplete] [@ js_LookupPropertyWithFlags]

Categories

(Core :: JavaScript Engine, defect)

2.0 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox6 --- fixed
blocking2.0 --- -
fennec - ---

People

(Reporter: jdm, Assigned: smaug)

References

Details

(Keywords: crash, topcrash, Whiteboard: fennec-related-jscript-crashers)

Crash Data

Attachments

(1 file, 1 obsolete file)

I've been seeing mysterious JS engine crashes that are all a few frames removed from nsFrameMessageManager::ReceiveMessage, which makes me suspicious. http://crash-stats.mozilla.com/report/index/dd2e2b60-a87d-401d-bc6c-7c76d2101028 0 libxul.so js::Interpret js/src/jsinterp.cpp:2760 1 libxul.so js::Invoke js/src/jsinterp.cpp:637 2 libxul.so js::ExternalInvoke js/src/jsinterp.cpp:855 3 libxul.so JS_CallFunctionValue js/src/jsinterp.h:955 4 libxul.so nsFrameMessageManager::ReceiveMessage content/base/src/nsFrameMessageManager.cpp:468 5 libxul.so nsAsyncMessageToChild::Run content/base/src/nsFrameLoader.cpp:1812 6 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:547 7 libxul.so NS_ProcessNextEvent_P nsThreadUtils.cpp:250 8 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:111 9 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:220 10 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:512 11 libxul.so nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:187 12 libxul.so nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192 13 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:3684 14 libxul.so GeckoStart toolkit/xre/nsAndroidStartup.cpp:131 15 libc.so libc.so@0x11453 16 libc.so libc.so@0x10f3f http://crash-stats.mozilla.com/report/index/89f2c194-93ac-49f8-a565-f4e392101027 0 libxul.so js::TraceRecorder::record_NativeCallComplete js/src/jstracer.cpp:8093 1 libxul.so js::Interpret js/src/jstracer.h:981 2 libxul.so js::Invoke js/src/jsinterp.cpp:638 3 libxul.so js::ExternalInvoke js/src/jsinterp.cpp:856 4 libxul.so JS_CallFunctionValue js/src/jsinterp.h:956 5 libxul.so nsFrameMessageManager::ReceiveMessage content/base/src/nsFrameMessageManager.cpp:468 6 libxul.so nsAsyncMessageToChild::Run content/base/src/nsFrameLoader.cpp:1812 7 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:547 8 libxul.so NS_ProcessNextEvent_P nsThreadUtils.cpp:250 9 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:111 10 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:220 11 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:512 12 libxul.so nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:187 13 libxul.so nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192 14 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:3684 15 libxul.so GeckoStart toolkit/xre/nsAndroidStartup.cpp:131 16 libc.so libc.so@0xfdd7 17 libc.so libc.so@0xf8a3 http://crash-stats.mozilla.com/report/index/bd647b32-12aa-4a75-9083-f5b5b2101028 0 libxul.so js::NewBuiltinClassInstance js/src/jsgc.h:629 1 libxul.so js_ConsumeJSONText js/src/json.cpp:848 2 libxul.so JS_ConsumeJSONText js/src/jsapi.cpp:5353 3 libxul.so nsFrameMessageManager::ReceiveMessage content/base/src/nsFrameMessageManager.cpp:392 4 libxul.so mozilla::dom::TabParent::ReceiveMessage jsapi.h:757 5 libxul.so mozilla::dom::TabParent::RecvAsyncMessage dom/ipc/TabParent.cpp:286 6 libxul.so mozilla::dom::PBrowserParent::OnMessageReceived PBrowserParent.cpp:573 7 libxul.so mozilla::dom::PContentParent::OnMessageReceived PContentParent.cpp:463 8 libxul.so mozilla::ipc::AsyncChannel::OnDispatchMessage ipc/glue/AsyncChannel.cpp:262 9 libxul.so mozilla::ipc::RPCChannel::OnMaybeDequeueOne ipc/glue/RPCChannel.cpp:440 10 libxul.so RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run ipc/chromium/src/base/task.h:308 11 libxul.so mozilla::ipc::RPCChannel::DequeueTask::Run RPCChannel.h:474 12 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:344 13 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:354 14 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:451 15 libxul.so mozilla::ipc::DoWorkRunnable::Run ipc/glue/MessagePump.cpp:71 16 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:547 17 libxul.so NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:191 18 libxul.so XPCWrappedNative::CallMethod js/src/xpconnect/src/xpcwrappednative.cpp:3054 19 libxul.so XPC_WN_CallMethod js/src/xpconnect/src/xpcwrappednativejsops.cpp:1631 20 libxul.so js::Interpret js/src/jsinterp.cpp:4704 21 libxul.so js::Execute js/src/jsinterpinlines.h:493 22 libxul.so JS_ExecuteScript js/src/jsapi.cpp:4816 23 libxul.so mozJSComponentLoader::GlobalForLocation js/src/xpconnect/loader/mozJSComponentLoader.cpp:1240 24 libxul.so mozJSComponentLoader::ImportInto nsTHashtable.h:199 25 libxul.so mozJSComponentLoader::Import js/src/xpconnect/loader/mozJSComponentLoader.cpp:1375 26 libxul.so nsXPCComponents_Utils::Import js/src/xpconnect/src/xpccomponents.cpp:3779 27 libxul.so NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:191 28 libxul.so XPCWrappedNative::CallMethod js/src/xpconnect/src/xpcwrappednative.cpp:3054 29 libxul.so XPC_WN_CallMethod js/src/xpconnect/src/xpcwrappednativejsops.cpp:1631 30 libxul.so js::Interpret js/src/jsinterp.cpp:4704 31 libxul.so js::Invoke js/src/jsinterp.cpp:638 32 libxul.so js::ExternalGetOrSet js/src/jsinterp.cpp:856 33 libxul.so js_GetPropertyHelper js/src/jsobj.cpp:4856 34 libxul.so js::Interpret js/src/jsinterp.cpp:4114 35 libxul.so js::Invoke js/src/jsinterp.cpp:638 36 libxul.so js::ExternalInvoke js/src/jsinterp.cpp:856 37 libxul.so JS_CallFunctionValue js/src/jsinterp.h:956 38 libxul.so nsXPCWrappedJSClass::CallMethod js/src/xpconnect/src/xpcwrappedjsclass.cpp:1696 39 libxul.so nsXPCWrappedJS::CallMethod js/src/xpconnect/src/xpcwrappedjs.cpp:572 40 libxul.so PrepareAndDispatch xpcom/reflect/xptcall/src/md/unix/xptcstubs_arm.cpp:134 41 libxul.so libxul.so@0x8fba3c 42 libxul.so nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:429 Further crashes: http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-28%2008%3A00%3A00&signature=js%3A%3AInterpret&version=Fennec%3A4.0b2pre http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-28%2008%3A00%3A00&signature=js%3A%3ATraceRecorder%3A%3Arecord_NativeCallComplete&version=Fennec%3A4.0b2pre http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-28%2007%3A00%3A00&signature=js%3A%3ANewBuiltinClassInstance&version=Fennec%3A4.0b2pre
Nothing has changed in message manager for some time. Is it possible that something in JS engine has regressed?
Try cc'ing more JS people. Has anyone reviewed the JS API usage in the message manager code? Such code in our experience needs guru-level code review... /be
JS exception handling in message manager needs still work, I believe. And yes, it would be great to get someone very familiar with JS API to look at the code. But if the crashes are something new, perhaps there is some real JS engine bug, or some change how to use API.
The combined signatures of js::Interpret, js::TraceRecord::record_NativeCallComplete and js_LookupPropertyWithFlags make this one of the higher-volume crashes for Fennec 4.0b2, and we continue to see these stacks in 4.0b3pre. Seeing as this is also one of the few actionable stacks we possess in the top crashes, it would be wonderful for a knowledgeable JSAPI person to peruse the message manager's usage and make sure it's kosher.
Summary: Crashes based in nsFrameMessageManager::ReceiveMessage → Crashes based in nsFrameMessageManager::ReceiveMessage [@ js::Interpret] [@ js::TraceRecorder::record_NativeCallComplete] [@ js_LookupPropertyWithFlags]
js::Interpret crashes are the #7 topcrash for 4.0b3 at the moment. It's interesting to note that every stack I look at has this signature: 0 libxul.so js::Interpret js/src/jsinterp.cpp:2824 1 libxul.so js::Invoke js/src/jsinterp.cpp:657 2 libxul.so js::ExternalInvoke js/src/jsinterp.cpp:858 3 libxul.so JS_CallFunctionValue js/src/jsinterp.h:962 4 libxul.so nsFrameMessageManager::ReceiveMessage content/base/src/nsFrameMessageManager.cpp:459 5 libxul.so nsAsyncMessageToChild::Run content/base/src/nsFrameLoader.cpp:1831 So it appears to be limited to situations in which we're using the message manager in non-remote situations, with uptimes varying from 0 to 1329.
Keywords: topcrash
tracking-fennec: --- → ?
blocking2.0: --- → ?
tracking-fennec: ? → 2.0+
blocking2.0: ? → -
Whiteboard: fennec-related-jscript-crashers
tracking-fennec: 2.0+ → 2.0-
It is #13 top crasher in Fennec 4.0. In Fennec, stack traces now look like: 0 libxul.so js::Interpret js/src/jscompartment.h:553 1 libxul.so js::Invoke js/src/jsinterp.cpp:653 2 libxul.so js::ExternalInvoke js/src/jsinterp.cpp:863 3 libxul.so JS_CallFunctionValue js/src/jsapi.cpp:5173 4 libxul.so nsFrameMessageManager::ReceiveMessage content/base/src/nsFrameMessageManager.cpp:463 5 libxul.so mozilla::dom::TabChild::RecvAsyncMessage dom/ipc/TabChild.cpp:761 6 libxul.so mozilla::dom::PBrowserChild::OnMessageReceived PBrowserChild.cpp:884 7 libxul.so mozilla::dom::PContentChild::OnMessageReceived PContentChild.cpp:1133 8 libxul.so mozilla::ipc::AsyncChannel::OnDispatchMessage ipc/glue/AsyncChannel.cpp:262 9 libxul.so mozilla::ipc::RPCChannel::OnMaybeDequeueOne ipc/glue/RPCChannel.cpp:435 10 libxul.so RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run ipc/chromium/src/base/task.h:308 11 libxul.so mozilla::ipc::RPCChannel::DequeueTask::Run RPCChannel.h:475
OS: Mac OS X → All
Hardware: x86 → All
Version: unspecified → 2.0 Branch
Running my patch queue (which makes some SpecialPowers-related changes) on Linux opt tryserver builds, I am hitting what I think might be this bug seemingly consistently, right at the end of the test run at about the time I would expect things to be winding up for a normal process exit. http://tbpl.mozilla.org/?tree=Try&pusher=cmccormack@mozilla.com&rev=e0065ad9ad00 http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307326012.1307326810.8802.gz
Cameron, can you reproduce the crash easily? If so, what are the steps to reproduce?
I have been able to reproduce the crash with the following method: 1) perform each build step at http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307324450.1307328061.12699.gz&fulltext=1 2) simply running the failing test command |python mochitest/runtests.py --appname=firefox/firefox-bin --utility-path=bin --extra-profile-file=bin/plugins --certificate-path=certs --autorun --close-when-done --console-level=INFO --symbols-path=symbols --setpref=dom.ipc.plugins.enabled=false --ipcplugins| makes the crash reproduce for me. 3) running --debugger=gdb makes the crash non-reproducible, of course 4) attaching gdb to the process near the end of the mochitest run makes it reproducible Now I just need to figure out how to load the symbols.
Update: #5 top crasher in Fennec 5 Betas
In case it's useful, here's another stack trace, this time on a Linux x86 debug build, and this time getting a bit further and hitting an assertion: http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307779731.1307783410.2819.gz#err0 Assertion failure: !regs_, at ../../../js/src/vm/Stack.cpp:376 0 linux-gate.so + 0x424 1 libxul.so!JS_Assert [jsutil.cpp : 89 + 0xb] 2 libxul.so!js::ContextStack::~ContextStack [Stack.cpp : 376 + 0x28] 3 libxul.so!JSContext::~JSContext [jscntxt.cpp : 1466 + 0x2d] 4 libxul.so!js::Foreground::delete_<JSContext> [jsutil.h : 493 + 0x10] 5 libxul.so!js_DestroyContext [jscntxt.cpp : 675 + 0xa] 6 libxul.so!JS_DestroyContext [jsapi.cpp : 1031 + 0x12] 7 libxul.so!XPCCallContext::~XPCCallContext [xpccallcontext.cpp : 402 + 0xd] 8 libxul.so!XPC_WN_CallMethod [xpcwrappednativejsops.cpp : 1597 + 0xd] 9 libxul.so!js::CallJSNative [jscntxtinlines.h : 277 + 0x18] 10 libxul.so!js::Interpret [jsinterp.cpp : 4676 + 0x2b] 11 libxul.so!js::RunScript [jsinterp.cpp : 613 + 0x19] 12 libxul.so!js::Invoke [jsinterp.cpp : 694 + 0x18] 13 libxul.so!js::ExternalInvoke [jsinterp.cpp : 816 + 0x19] 14 libxul.so!js::JSProxyHandler::call [jsproxy.cpp : 273 + 0x27] 15 libxul.so!JSWrapper::call [jswrapper.cpp : 250 + 0x69] 16 libxul.so!JSCrossCompartmentWrapper::call [jswrapper.cpp : 652 + 0x26] 17 libxul.so!js::JSProxy::call [jsproxy.cpp : 839 + 0x32] 18 libxul.so!js::proxy_Call [jsproxy.cpp : 1104 + 0x1f] 19 libxul.so!js::CallJSNative [jscntxtinlines.h : 277 + 0x18] 20 libxul.so!js::Invoke [jsinterp.cpp : 649 + 0x34] 21 libxul.so!js::Interpret [jsinterp.cpp : 4687 + 0x3e] 22 libxul.so!js::RunScript [jsinterp.cpp : 613 + 0x19] 23 libxul.so!js::Invoke [jsinterp.cpp : 694 + 0x18] 24 libxul.so!js::ExternalInvoke [jsinterp.cpp : 816 + 0x19] 25 libxul.so!JS_CallFunctionValue [jsapi.cpp : 5087 + 0x3c] 26 libxul.so!nsFrameMessageManager::ReceiveMessage [nsFrameMessageManager.cpp : 452 + 0x49] 27 libxul.so!nsAsyncMessageToChild::Run [nsFrameLoader.cpp : 1907 + 0x53] 28 libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 618 + 0x16] 29 libxul.so!NS_ProcessNextEvent_P [nsThreadUtils.cpp : 245 + 0x1f] 30 libxul.so!mozilla::ipc::MessagePump::Run [MessagePump.cpp : 110 + 0x15] 31 libxul.so!MessageLoop::RunInternal [message_loop.cc : 218 + 0x20] 32 libxul.so!MessageLoop::RunHandler [message_loop.cc : 202 + 0xa] 33 libxul.so!MessageLoop::Run [message_loop.cc : 176 + 0xa] 34 libxul.so!nsBaseAppShell::Run [nsBaseAppShell.cpp : 189 + 0xc] 35 libxul.so!nsAppStartup::Run [nsAppStartup.cpp : 222 + 0x19] ...
The assertion indicates that code is running on the stack which is part of the context being destroyed. Since this all involves nsFrameMessageManager, my suspicion would be that nsFrameMessageManager is passing a 'cx' to JS_CallFunctionValue (frame 25) that gets used to do all the calls between frame 25 and the XPC call in frame 8. Then, some content code (called by frame 8) calls ReleaseJSContext (there's only a few call-sites: http://mxr.mozilla.org/mozilla-central/ident?i=releaseJSContext) passing this same context used by frame 8 through 25, hence the assert. Thus, I would look for data-flow paths whereby the 'cx' passed to frame 25 makes its way to ReleaseJSContext before control returns to frame 26. I hope that is helpful.
Thanks Luke. The patches I have applied (both for comment 7 and comment 11), modify the mochitest framework so that at the end of each test, inside TestRunner.testFinished, some extra communication is done with the chrome process: TestRunner.testFinished: * send an async "ping" message from content to chrome * wait for the async "pong" message to be received from chrome * inside the message listener for that "pong" message, do * send a couple of sync messages to chrome asking it to delete some crash dump files * call TestRunner.runNextTest() At the time TestRunner.testFinished is called by each individual test, the chrome process might have got a plugin crash notification, which I get SpecialPowersObserver.js to send on with an async message to the content's SpecialPowers.js object. The sending of the async ping/pong messages at the top of TestRunner.testFinished is done so that any crash notification message can be processed by the test harness before it moves on to the next test. The behaviour of TestRunner.runNextTest when called after the last test has run is to immediately call nsIAppStartup.quit(eForceQuit) (when not in e10s mode). I don't really know how shutdown works, but I am thinking that it must be within that quit() call that things are torn down while the message listener code is still on the stack. If I wrap the contents of my "pong" listener inside a setTimeout(..., 0), so that the quit() is done without the message listener on the stack, then I don't get the assertion and crash.
OK, I'll try to reproduce the bug and if I can, I'll take this.
(In reply to comment #9) > 1) perform each build step at > http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307324450.1307328061.12699. > gz&fulltext=1 Not sure what "each build step" > 2) simply running the failing test command |python mochitest/runtests.py > --appname=firefox/firefox-bin --utility-path=bin > --extra-profile-file=bin/plugins --certificate-path=certs --autorun > --close-when-done --console-level=INFO --symbols-path=symbols > --setpref=dom.ipc.plugins.enabled=false --ipcplugins| makes the crash > reproduce for me. Where should I run this? I tried few obvious ones, and it didn't work
And python runtests.py --autorun --close-when-done --setpref=dom.ipc.plugins.enabled=false --ipcplugins didn't crash.
Attached patch WIP (obsolete) (deleted) — Splinter Review
Something like this might help, but since I don't know how to reproduce this, I can't verify.
(In reply to comment #15) > (In reply to comment #9) > > 1) perform each build step at > > http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307324450.1307328061.12699. > > gz&fulltext=1 > Not sure what "each build step" From the log: ======== BuildStep started ======== download === Output === wget --progress=dot:mega -N http://stage.mozilla.org/pub/mozilla.org/firefox/try-builds/cmccormack@mozilla.com-e0065ad9ad00/try-linux-debug/firefox-7.0a1.en-US.linux-i686.tar.bz2 in dir /home/cltbld/talos-slave/test/build (timeout 1200 secs) I still haven't been able to reproduce in my local build; only by downloading the try build and running the relevant steps. Of course, this is a linux crash, so we can't use the debug symbols provided since they only work on windows.
Olli, trying with your patch gives me this assertion not long after startup of a `make mochitest-ipcplugins` with my patch queue also applied. ###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file ../../../dist/include/nsCOMPtr.h, line 544 nsCOMPtr<nsIInProcessContentFrameMessageManager>::Assert_NoQueryNeeded() (/home/cam/mozilla/obj-linux64-dbg-gcc4.5/content/base/src/../../../dist/include/nsCOMPtr.h:543) nsCOMPtr<nsIInProcessContentFrameMessageManager>::operator=(nsIInProcessContentFrameMessageManager*) (/home/cam/mozilla/obj-linux64-dbg-gcc4.5/content/base/src/../../../dist/include/nsCOMPtr.h:665) nsFrameLoader::EnsureMessageManager() (/home/cam/mozilla/content/base/src/nsFrameLoader.cpp:2052) nsFrameLoader::MaybeCreateDocShell() (/home/cam/mozilla/content/base/src/nsFrameLoader.cpp:1513) nsFrameLoader::CheckURILoad(nsIURI*) (/home/cam/mozilla/content/base/src/nsFrameLoader.cpp:526) nsFrameLoader::LoadURI(nsIURI*) (/home/cam/mozilla/content/base/src/nsFrameLoader.cpp:414) ... Just to clarify, from the description in comment 13, do you think my use of the message manager is correct or not (i.e., should I not be calling nsIAppStartup.quit() inside a message listener)? BTW, STR should be to apply the following patches on a Linux64 gcc4.5 debug build, and then to do a `make mochitest-ipcplugins`: http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/bug-642175-mochitest-cleanup http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/bug-642175-plugin-crash-cleanup-mochitests http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/bug-642175-plugin-crash-cleanup-mochitest-fixes
(In reply to comment #19) > Olli, trying with your patch gives me this assertion not long after startup > of a `make mochitest-ipcplugins` with my patch queue also applied. Bah. Though the assertion may not be that bad, or easily fixable? I assume the crash is still there? > (i.e., should I not be calling > nsIAppStartup.quit() inside a message listener)? That sounds pretty evil. I wonder if we can really handle that... > BTW, STR should be to apply the following patches on a Linux64 gcc4.5 debug > build, and then to do a `make mochitest-ipcplugins`: In top level objdir? > > http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/ > bug-642175-mochitest-cleanup > http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/ > bug-642175-plugin-crash-cleanup-mochitests > http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/ > bug-642175-plugin-crash-cleanup-mochitest-fixes
(In reply to comment #20) > > (i.e., should I not be calling > > nsIAppStartup.quit() inside a message listener)? > That sounds pretty evil. I wonder if we can really handle that... I guess I need to debug when the cx is being deleted during quit().
(In reply to comment #20) > > Olli, trying with your patch gives me this assertion not long after startup > > of a `make mochitest-ipcplugins` with my patch queue also applied. > Bah. Though the assertion may not be that bad, or easily fixable? > I assume the crash is still there? I am too much of an XPCOM noob to know how to fix it, unfortunately. The assertion happens early on and then crashes, so it doesn't get to the crash we care about. > > (i.e., should I not be calling > > nsIAppStartup.quit() inside a message listener)? > That sounds pretty evil. I wonder if we can really handle that... I am happy with my workaround of dispatching a runnable to do the quit() if it's decided that calling it inside the listener is evil and shouldn't be supported. (Though this doesn't help with the crashes for which this bug was originally filed.) > > BTW, STR should be to apply the following patches on a Linux64 gcc4.5 debug > > build, and then to do a `make mochitest-ipcplugins`: > In top level objdir? Yep.
With regards to Luke's hint, my money is on the frameloader being destroyed by content JS, which in turn calls Disconnect on the child message manager. This posts an event which calls nsFrameScriptExecutor::DestroyCx, which in turn can call ReleaseJSContext. It looks like there's machinery in nsInProcessTabChildGlobal to prevent this in the case of LoadFrameScript (see mLoadingScript) - could we extend this to work with receiving async messages as well?
(In reply to comment #19) > BTW, STR should be to apply the following patches on a Linux64 gcc4.5 debug > build, and then to do a `make mochitest-ipcplugins`: > > http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/ > bug-642175-mochitest-cleanup > http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/ > bug-642175-plugin-crash-cleanup-mochitests > http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/ > bug-642175-plugin-crash-cleanup-mochitest-fixes With the WIP patch and those patches make mochitest-ipcplugins to crash in ###!!! ABORT: file /home/smaug/mozilla/hg/mozilla/ipc/chromium/src/base/pickle.cc, line 98 And in fact, I get that crash even without WIP. So, I still don't know how to reproduce this :(
(In reply to comment #23) > With regards to Luke's hint, my money is on the frameloader being destroyed > by content JS, which in turn calls Disconnect on the child message manager. > This posts an event which calls nsFrameScriptExecutor::DestroyCx, which in > turn can call ReleaseJSContext. It looks like there's machinery in > nsInProcessTabChildGlobal to prevent this in the case of LoadFrameScript > (see mLoadingScript) - could we extend this to work with receiving async > messages as well? Did you look at the WIP? It is delaying DestroyCx
Cameron, I couldn't reproduce the assertion you saw with Olli's patch. Granted, I'm on a Mac, but for the output you pasted, that shouldn't make a difference. Are you sure that you did a thorough rebuild?
You're right, I must not have. I did a clean build and don't get that assertion now. Now I still get a crash on shutdown, but slightly earlier I think: #0 0x00002aaaabddf1b2 in nsFrameMessageManager::ReceiveMessage (this=0x5a5a5a5a5a5a5a5a, aTarget=0x2aaac34313d0, aMessage=..., aSync=0, aJSON=..., aObjectsArray=0x2aaacd0554c8, aJSONRetVal=0x0, aContext=0x5a5a5a5a5a5a5a5a) at /home/cam/mozilla/content/base/src/nsFrameMessageManager.cpp:337 #1 0x00002aaaabddfd0d in nsFrameMessageManager::ReceiveMessage (this=0x2aaac34811d0, aTarget=0x2aaac34313d0, aMessage=..., aSync=0, aJSON=..., aObjectsArray=0x2aaacd0554c8, aJSONRetVal=0x0, aContext=0x0) at /home/cam/mozilla/content/base/src/nsFrameMessageManager.cpp:467 #2 0x00002aaaabd54261 in nsAsyncMessageToChild::Run (this=0x2aaac656b580) at /home/cam/mozilla/content/base/src/nsFrameLoader.cpp:1908 #3 0x00002aaaacdd327c in nsThread::ProcessNextEvent (this=0x2aaab742dc60, mayWait=0, result=0x7fffffffd38c) at /home/cam/mozilla/xpcom/threads/nsThread.cpp:618 ...
That is interesting stack. "this=0x5a5a5a5a5a5a5a5a" /me thinks..
Attached patch WIP2 (deleted) — Splinter Review
Keep message managers alive long enough. Would be great if someone could test this.
Attachment #538746 - Attachment is obsolete: true
That seems to work for me! I just pushed this to try to make sure: http://tbpl.mozilla.org/?tree=Try&rev=683c7ee50b20
Comment on attachment 538855 [details] [diff] [review] WIP2 So there are two parts here. One is the keep message manager alive when calling ReceiveMessage, the other is to make sure cx isn't deleted when calling ReceiveMessage on tabchilds. (In the chrome message manager the cx from chrome XUL is reused.)
Attachment #538855 - Flags: review?(jst)
Crash Signature: [@ js::Interpret] [@ js::TraceRecorder::record_NativeCallComplete] [@ js_LookupPropertyWithFlags]
Johnny, do you think you'll be able to get to this review soon? I'm really hopeful that it'll squash one of the topcrashers for Fennec, and it deserves some baking time before we try to get it into Aurora.
Crash Signature: [@ js::Interpret] [@ js::TraceRecorder::record_NativeCallComplete] [@ js_LookupPropertyWithFlags] → [@ js::Interpret] [@ js::TraceRecorder::record_NativeCallComplete] [@ js_LookupPropertyWithFlags]
Attachment #538855 - Flags: review?(jst) → review+
http://hg.mozilla.org/mozilla-central/rev/97bf954c6970 Marking fixed, but please reopen or file followups if this doesn't help with the Fennec problems.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment on attachment 538855 [details] [diff] [review] WIP2 I have not seen any new crashes with stacks that look like this patch could be a culprit, which is a long way of saying that this patch seems to be, if not helping, at least not harming. It also has the potential to fix a top crasher for us, so I'd like to get it into aurora asap.
Attachment #538855 - Flags: approval-mozilla-aurora?
I found a few crashes on tinderbox this morning which seem to be the one I was having problems with in bug 642175 (quitting the browser from inside a message listener) now that that has landed, but it looks intermittent now. I'm going to put back in my workaround in bug 666060. I don't know if that means anything for the Fennec-related crashes.
Do you have links to those crashes? Could you file a followup bug, and post the links there?
Oh, sorry, I think I misread the logs. It's not a crash in the message manager at all. It is just a real, unexpected crash in a test. Sorry for the noise. (http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1308677954.1308682390.5623.gz&fulltext=1#err7 was one of the logs, btw.)
Assignee: general → Olli.Pettay
(In reply to comment #34) > Comment on attachment 538855 [details] [diff] [review] [review] > WIP2 > > I have not seen any new crashes with stacks that look like this patch could > be a culprit, which is a long way of saying that this patch seems to be, if > not helping, at least not harming. It also has the potential to fix a top > crasher for us, so I'd like to get it into aurora asap. How much of this patch impacts desktop/single-process firefox?
The patch should be pretty safe. But the code is run in single-process Fx, at least with InstallTrigger etc. which is implemented as Messagemanager script.
Attachment #538855 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: