Closed
Bug 1049275
Opened 10 years ago
Closed 10 years ago
Frequent hangs with e10s enabled
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
FIXED
mozilla35
Tracking | Status | |
---|---|---|
e10s | m2+ | --- |
People
(Reporter: blassey, Assigned: billm)
References
Details
Attachments
(2 files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
the stack implicates a deadlock with CPOWs
+2286 js::jit::DoGetPropFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICGetProp_Fallback*, JS::MutableHan
dle<JS::Value>, JS::MutableHandle<JS::Value>) (in XUL) + 712 [0x104350c18]
+ 2286 js::proxy_GetGeneric(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::Mutab
leHandle<JS::Value>) (in XUL) + 9 [0x10458ec79]
+ 2286 js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHa
ndle<JS::Value>) (in XUL) + 446 [0x10458d58e]
+ 2286 js::CrossCompartmentWrapper::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<
jsid>, JS::MutableHandle<JS::Value>) const (in XUL) + 356 [0x1045aa9e4]
+ 2286 js::proxy_GetGeneric(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS:
:MutableHandle<JS::Value>) (in XUL) + 9 [0x10458ec79]
+ 2286 js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::Mut
ableHandle<JS::Value>) (in XUL) + 446 [0x10458d58e]
+ 2286 CPOWProxyHandler::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>
, JS::MutableHandle<JS::Value>) const (in XUL) + 93 [0x1021c4b4d]
+ 2286 mozilla::jsipc::WrapperOwner::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS
::Handle<jsid>, JS::MutableHandle<JS::Value>) (in XUL) + 248 [0x1021c4c58]
+ 2286 mozilla::jsipc::JavaScriptBase<mozilla::jsipc::PJavaScriptParent>::CallGet(unsigned long lon
g const&, mozilla::jsipc::ObjectVariant const&, nsString const&, mozilla::jsipc::ReturnStatus*, mozilla::jsipc::JSVariant*) (in XUL) + 13 [0x1021c884d]
+ 2286 mozilla::jsipc::PJavaScriptParent::CallGet(unsigned long long const&, mozilla::jsipc::Obje
ctVariant const&, nsString const&, mozilla::jsipc::ReturnStatus*, mozilla::jsipc::JSVariant*) (in XUL) + 608 [0x1020ae110]
+ 2286 mozilla::ipc::MessageChannel::Call(IPC::Message*, IPC::Message*) (in XUL) + 21 [0x101ef9375]
+ 2286 mozilla::ipc::MessageChannel::UrgentCall(IPC::Message*, IPC::Message*) (in XUL) + 262 [0x101ef8e96]
+ 2286 mozilla::ipc::MessageChannel::SendAndWait(IPC::Message*, IPC::Message*) (in XUL) + 200 [0x101ef8cd8]
+ 2286 mozilla::ipc::MessageChannel::WaitForSyncNotify() (in XUL) + 69 [0x101ef8ff5]
+ 2286 PR_WaitCondVar (in libnss3.dylib) + 105 [0x101904219]
+ 2286 _pthread_cond_wait (in libsystem_pthread.dylib) + 727 [0x7fff982d0c3b]
+ 2286 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff8f9f6716]
Assignee | ||
Comment 1•10 years ago
|
||
Next time it happens can you also get a stack for the child process?
Assignee | ||
Comment 2•10 years ago
|
||
Also, is this a recent thing or has it always happened? Can you try to see if a particular add-on is triggering it?
Reporter | ||
Comment 3•10 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #2)
> Also, is this a recent thing or has it always happened? Can you try to see
> if a particular add-on is triggering it?
I've been noticing 3 or 4 hangs a day for the last few days.
(In reply to Bill McCloskey (:billm) from comment #1)
> Next time it happens can you also get a stack for the child process?
will do
Reporter | ||
Comment 4•10 years ago
|
||
sampled both this time. The parent looks roughly the same (same last few frames anyway). Here's what the Child process looks like:
2439 nsListControlFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) (in XUL) + 399 [0x101b7d47f]
2439 nsListControlFrame::ReflowAsDropdown(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) (in XUL) + 654 [0x101b7db2e]
2439 nsComboboxControlFrame::GetAvailableDropdownSpace(int*, int*, nsPoint*) (in XUL) + 124 [0x101b73fbc]
2439 nsDeviceContext::GetClientRect(nsRect&) (in XUL) + 37 [0x1009bcd75]
2439 nsDeviceContext::ComputeClientRectUsingScreen(nsRect*) (in XUL) + 102 [0x1009bcde6]
2439 nsScreenManagerProxy::ScreenForNativeWidget(void*, nsIScreen**) (in XUL) + 122 [0x1014df5ca]
2439 mozilla::dom::PScreenManagerChild::SendScreenForBrowser(mozilla::dom::PBrowserChild*, mozilla::dom::ScreenDetails*, bool*) (in XUL) + 483 [0x1005efb23]
2439 mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) (in XUL) + 197 [0x1003910e5]
2439 mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) const (in XUL) + 813 [0x10038ff5d]
2439 NS_DebugBreak (in XUL) + 1180 [0x1000719fc]
2439 CrashReporter::AppendAppNotesToCrashReport(nsACString_internal const&) (in XUL) + 315 [0x10205fb6b]
2439 mozilla::dom::PCrashReporterChild::SendAppendAppNotes(nsCString const&) (in XUL) + 441 [0x10048b179]
2439 mozilla::ipc::MessageChannel::Send(IPC::Message*) (in XUL) + 84 [0x100390b74]
2439 PR_Lock (in libnss3.dylib) + 14 [0x107d44f4e]
2439 _pthread_mutex_lock (in libsystem_pthread.dylib) + 372 [0x7fff982d1779]
2439 __psynch_mutexwait (in libsystem_kernel.dylib) + 10 [0x7fff8f9f6746]
Reporter | ||
Comment 5•10 years ago
|
||
So, the child process has crashed and is trying to send crash data to the parent, but the parent is issuing an urgent message. From an irc conversation with billm, the child process needs to "win" in this situation.
Reporter | ||
Updated•10 years ago
|
Blocks: old-e10s-m2
Assignee | ||
Comment 6•10 years ago
|
||
I was able to reproduce this while debugging something else. It turns out that we're not waiting on an IPC message; we're just waiting for a lock. It happens like this:
take a lock in the IPC code
do something that causes the IPC code to assert
try to send crash reporter data to the parent, invoking IPC
try to take the same IPC lock, which is already locked
I'm still unsure how to address this. It looks like we don't send any critical data over the IPC channel--just some textual annotations. What's even stranger is that these annotations look like they get attached to the parent process's crashes. I might be wrong about that though--I'll need to look more into how child process crash dumps are generated.
Assignee | ||
Comment 9•10 years ago
|
||
Please see the explanation in comment 6. This patch just turns off the crash annotations when we abort. I was wrong about these annotations getting attached to the parent--they are attached to the child minidump. However, they don't seem critical. And right now we're not getting any crash reports because of this, so we really need to fix it.
Attachment #8488331 -
Flags: review?(benjamin)
Comment 10•10 years ago
|
||
The other comments here are way above my head.
I found my way here by searching the E10S bugs for the string 'hang' in the title.
I performed that search because that's what was happening. It seemed to happen when I'd open a link or bookmark in a new tab. I've disabled E10S in the meantime.
If I can gather more information, please tell me how. What tool do you use to generate those stack traces?
I am a Java developer, so I know how to produce a stack trace or a thread dump--in my world, but not this one. :)
Comment 11•10 years ago
|
||
Pardon me: re: Comment 10: Whoops, this bug is for OSX. I am on Windows 7, 64bit.
Assignee | ||
Comment 12•10 years ago
|
||
This might be the cause of your crash, but we won't know until the bug is fixed.
OS: Mac OS X → All
Assignee | ||
Comment 13•10 years ago
|
||
er, your hangs, sorry.
Updated•10 years ago
|
Attachment #8488331 -
Flags: review?(benjamin) → review+
Comment 14•10 years ago
|
||
Bill: is your r+'d avoid-ipc-hang patch ready to land? We're trying to close out our remaining M2 bugs today.
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 15•10 years ago
|
||
Flags: needinfo?(wmccloskey)
Comment 16•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Comment 17•10 years ago
|
||
Small followup. I just compiled from changeset 205234:bf5fcc0c4b27 and the hang is fixed. I now get a tab crash page.
After the crash, the browser is quite slow reloading the other tabs. It probably took a minute to get back to normal. I have no idea if that's normal.
Assignee | ||
Comment 18•10 years ago
|
||
(In reply to Jack Moffitt [:jack] from comment #17)
> Small followup. I just compiled from changeset 205234:bf5fcc0c4b27 and the
> hang is fixed. I now get a tab crash page.
>
> After the crash, the browser is quite slow reloading the other tabs. It
> probably took a minute to get back to normal. I have no idea if that's
> normal.
Usually session restore reloads tabs on demand. But the tab crashed page actually loads them all right away. Maybe that's what you're seeing.
Comment 19•10 years ago
|
||
(In reply to Jack Moffitt [:jack] from comment #17)
> After the crash, the browser is quite slow reloading the other tabs. It
> probably took a minute to get back to normal. I have no idea if that's
> normal.
Jack, you are probably seeing session restore bug 1037179.
You need to log in
before you can comment on or make changes to this bug.
Description
•