Closed
Bug 1126089
Opened 10 years ago
Closed 9 years ago
Investigate allowing messages to be sent from TabChildGlobal's unload event
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
RESOLVED
FIXED
mozilla39
People
(Reporter: billm, Assigned: billm)
References
Details
Attachments
(5 files, 4 obsolete files)
(deleted),
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
ttaubert
:
review+
|
Details | Diff | Splinter Review |
Right now you're not able to send messages from an "unload" event handler for a TabChildGlobal. The message is never received because the message manager for the TabParent has already been disconnected. However, for bug 1109875 it would be really useful to be able to send messages here. I think me might be able to change TabChild::RecvDestroy to send a reply to the parent after the unload event is dispatched. The parent could then disconnect the message manager at that time.
Comment 1•9 years ago
|
||
This blocks 1109875, which is an m8. I figure this should probably be an m8 then.
tracking-e10s:
--- → m8+
Assignee | ||
Comment 2•9 years ago
|
||
Here are some of the changes I made: When we remove a <browser> element from the DOM, we immediately prohibit sending messages using its message manager. We fire the message-manager-close event at this time. When the child process receives the Destroy message, it fires the unload event on its frame script. Then we destroy the docshell, which fires some more events on the document. After that, we send the __delete__ message to the parent. We're not allowed to send messages after that. When the parent receives the __delete__ message, it shuts down the message manager and notifies message-manager-disconnect observers. While the parent is waiting for the __delete__ message, it can dispatch incoming message manager messages. It needs to keep a strong reference to the frameloader and <browser> element during this time to ensure that they're not destroyed. I've tried to stick to the terminology that "Close" means "sending is no longer allowed" and "Disconnect" means "everything is being torn down, the object is no longer usable." On interesting question is the order of the frame script unload event versus the final pagehide/unload events on the document. It seems a bit strange that we fire unload on the frame script first. However, the only alternative is to fire it after the docshell is torn down. We have a number of unload handlers that would like to use the docshell, so that seems too late.
Assignee | ||
Comment 3•9 years ago
|
||
This patch renames our uses of message-manager-disconnect to message-manager-close, since that seems to be what people want. I'll hold off on getting review for this until Olli approves the naming.
Assignee | ||
Comment 4•9 years ago
|
||
This patch changes the browser-chrome test harness so that it waits until all tabs have been completely unloaded before running the memory leak detection code.
Assignee | ||
Updated•9 years ago
|
Comment 6•9 years ago
|
||
So there is the somewhat unusual case where <browser> is removed and then added back to document. That would lead to a new nsFrameLoader to be created for the <browser>. So there would be <browser> -> fl2, and fl1-><browser> and fl2-><browser> edges for awhile. But I think that is ok.
Comment 7•9 years ago
|
||
Comment on attachment 8570241 [details] [diff] [review] unload-order ># HG changeset patch ># Parent 600f44fd317c145b4f583b7db9358615fd52c70b ># User Bill McCloskey <wmccloskey@mozilla.com> > >diff --git a/dom/base/nsFrameLoader.cpp b/dom/base/nsFrameLoader.cpp >--- a/dom/base/nsFrameLoader.cpp >+++ b/dom/base/nsFrameLoader.cpp >@@ -519,23 +519,84 @@ nsFrameLoader::GetDocShell(nsIDocShell * > > *aDocShell = mDocShell; > NS_IF_ADDREF(*aDocShell); > > return rv; > } > > void >+nsFrameLoader::DestroyComplete() >+{ >+ // We can get here by removing the frameloader from the DOM (in which case >+ // nsFrameLoader::Finalize was already called) or if the child process >+ // crashes. >+ >+ if (mRemoteBrowser) { >+ // Drop the strong references created in Finalize. >+ mOwnerContentStrong = nullptr; >+ mRemoteBrowser->SetFrameLoader(nullptr); >+ >+ // Call TabParent::Destroy if we haven't already (in case of a crash). >+ mRemoteBrowser->SetOwnerElement(nullptr); >+ mRemoteBrowser->Destroy(); >+ mRemoteBrowser = nullptr; >+ } >+ >+ if (mMessageManager) { >+ mMessageManager->Disconnect(); >+ } >+ >+ if (mChildMessageManager) { >+ static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->Disconnect(); >+ } >+} >+ >+void >+nsFrameLoader::MidDestroy() >+{ >+ // We go through the event loop twice. The first time is so that runnables >+ // dispatched when a frame script unloads have a chance to run. The second >+ // time is so that any messages sent by those runnables have a chance to be >+ // dispatched through the event loop. >+ NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::DestroyComplete)); >+} >+ >+void > nsFrameLoader::Finalize() > { >+ // This code runs after the frameloader has been detached from the <browser> >+ // element. >+ >+ // Ask the TabChild to fire the frame script "unload" event, destroy its >+ // docshell, and finally destroy the PBrowser actor. This eventually leads to >+ // nsFrameLoader::DestroyComplete being called. >+ if (mRemoteBrowser) { >+ mRemoteBrowser->Destroy(); >+ } >+ >+ // Fire the "unload" event if we're in-process. >+ if (mChildMessageManager) { >+ static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->FireUnloadEvent(); >+ } >+ >+ // Destroy the docshell. > nsCOMPtr<nsIBaseWindow> base_win(do_QueryInterface(mDocShell)); > if (base_win) { > base_win->Destroy(); > } > mDocShell = nullptr; >+ >+ if (mChildMessageManager) { >+ // Stop handling events in the in-process frame script. >+ static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->DisconnectEvents(); >+ >+ // Dispatch a runnable that eventually calls nsFrameLoader::DestroyComplete. >+ NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::MidDestroy)); >+ } > } > > static void > FirePageHideEvent(nsIDocShellTreeItem* aItem, > EventTarget* aChromeEventHandler) > { > nsCOMPtr<nsIDocument> doc = aItem->GetDocument(); > NS_ASSERTION(doc, "What happened here?"); >@@ -1353,51 +1414,50 @@ nsFrameLoader::SwapWithOtherLoader(nsFra > > FirePageShowEvent(ourDocshell, ourEventTarget, true); > FirePageShowEvent(otherDocshell, otherEventTarget, true); > > mInSwap = aOther->mInSwap = false; > return NS_OK; > } > >-void >-nsFrameLoader::DestroyChild() >-{ >- if (mRemoteBrowser) { >- mRemoteBrowser->SetOwnerElement(nullptr); >- mRemoteBrowser->Destroy(); >- mRemoteBrowser = nullptr; >- } >-} >- > NS_IMETHODIMP > nsFrameLoader::Destroy() > { >+ // nsFrameLoader::Destroy is called just before the frameloader is detached >+ // from the <browser> element. >+ > if (mDestroyCalled) { > return NS_OK; > } > mDestroyCalled = true; > >+ // After this point, we return an error when trying to send a message using >+ // the message manager on the frame. > if (mMessageManager) { >- mMessageManager->Disconnect(); >+ mMessageManager->Close(); > } >- if (mChildMessageManager) { >- static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->Disconnect(); >+ >+ // Retain references to the <browser> element and the frameloader in case we >+ // receive any messages from the message manager on the frame. These >+ // references are dropped in DestroyComplete. >+ if (mRemoteBrowser) { >+ mOwnerContentStrong = mOwnerContent; >+ mRemoteBrowser->SetFrameLoader(this); Destroy can be called from nsFrameLoader's dtor, so you can't pass pointer to it to be kept alive elsewhere. Perhaps you need a flag in dtor and use it here and if it is set, call Finalize or something? > nsFrameMessageManager::AddMessageListener(const nsAString& aMessage, >- nsIMessageListener* aListener) >+ nsIMessageListener* aListener, >+ bool aListenWhenClosed) > { > nsAutoTObserverArray<nsMessageListenerInfo, 1>* listeners = > mListeners.Get(aMessage); > if (!listeners) { > listeners = new nsAutoTObserverArray<nsMessageListenerInfo, 1>(); > mListeners.Put(aMessage, listeners); > } else { > uint32_t len = listeners->Length(); >@@ -315,16 +316,17 @@ nsFrameMessageManager::AddMessageListene > return NS_OK; > } > } > } > > nsMessageListenerInfo* entry = listeners->AppendElement(); > NS_ENSURE_TRUE(entry, NS_ERROR_OUT_OF_MEMORY); > entry->mStrongListener = aListener; >+ entry->mListenWhenClosed = aListenWhenClosed; Why do we need this? >+nsInProcessTabChildGlobal::DisconnectEvents() Odd method names. Don't know what it means. I guess you want DisconnectEventListeners > #ifdef DEBUG > if (mOwner) { > nsCOMPtr<nsIFrameLoaderOwner> owner = do_QueryInterface(mOwner); > nsRefPtr<nsFrameLoader> fl = owner->GetFrameLoader(); > if (fl) { > NS_ASSERTION(this == fl->GetTabChildGlobalAsEventTarget(), > "Wrong event target!"); So this isn't true anymore, if the <browser> is just moved and a new nsFrameLoader created > NS_ASSERTION(fl->mMessageManager == mChromeMessageManager, > "Wrong message manager!"); ditto. I don't actually understand the ownership model in InProcessTabChildGlobal case. Who is keeping the nsFrameLoader alive? Could you explain? (remote case is clear) > already_AddRefed<nsFrameLoader> > TabParent::GetFrameLoader() const Would it make sense to add a parameter to tell whether the caller wants to use also the cached mFrameLoader, not only the one from mFrameElement? That way you wouldn't need to have 'if (!frameLoader || mIsDestroyed) {' everywhere.
Attachment #8570241 -
Flags: review?(bugs) → review+
Comment 8•9 years ago
|
||
The naming "close" and "disconnect" sounds good to me.
Updated•9 years ago
|
Attachment #8570241 -
Flags: review+ → review-
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #6) > So there is the somewhat unusual case where <browser> is removed and then > added back to document. > That would lead to a new nsFrameLoader to be created for the <browser>. > So there would be <browser> -> fl2, and fl1-><browser> and fl2-><browser> > edges for awhile. > But I think that is ok. Yes, I worried about this too. I think it's okay though. At worst we would confuse some listeners since msg.target would be the same browser element for both frameloaders. The fact that listeners have to opt into getting messages after the tab is closed makes it much less worrisome to me. We actually have a test for this too: browserElement_DOMRequestError.js.
Assignee | ||
Comment 10•9 years ago
|
||
> Destroy can be called from nsFrameLoader's dtor, so you can't pass pointer to it to be kept > alive elsewhere. > Perhaps you need a flag in dtor and use it here and if it is set, call Finalize or something? Still curious about this, but I'll add the flag. > >+ entry->mListenWhenClosed = aListenWhenClosed; > > Why do we need this? I found that lots of browser code fails if we deliver messages after the tab has been closed. The main problem is that the XBL has been detached, so properties stored on the <browser> element via XBL (like the message manager) no longer work. The result is that a lot of message listeners throw exceptions. We could fix all these listeners, but I think most listeners don't want these late messages anyway. I think it's better to ask people to opt into getting these late messages. > So this isn't true anymore, if the <browser> is just moved and a new nsFrameLoader created I might be wrong, but I assumed that we can't attach a new frameloader to the <browser> until nsFrameLoader::Finalize is called (because we don't null out the current frameloader on the <browser> until right after Finalize). Inside nsFrameLoader::Finalize we call nsInProcessTabChildGlobal::DisconnectEventListeners, so I assumed we wouldn't get any more events in nsInProcessTabChildGlobal::PreHandleEvent after that, so this wouldn't be a problem. > I don't actually understand the ownership model in InProcessTabChildGlobal case. > Who is keeping the nsFrameLoader alive? Could you explain? The runnable for MidDestroy and DestroyComplete keep a strong reference to the frameloader. I'll add a comment for this. > Would it make sense to add a parameter to tell whether the caller wants to use also the > cached mFrameLoader, not only the one from mFrameElement? > That way you wouldn't need to have 'if (!frameLoader || mIsDestroyed) {' > everywhere. Sure, I will do that.
Assignee | ||
Comment 11•9 years ago
|
||
Attachment #8570241 -
Attachment is obsolete: true
Attachment #8570696 -
Flags: review?(bugs)
Comment 12•9 years ago
|
||
Comment on attachment 8570696 [details] [diff] [review] Allow messages to be sent after frame script unload event >+void >+nsFrameLoader::MidDestroy() >+{ >+ // We go through the event loop twice. The first time is so that runnables >+ // dispatched when a frame script unloads have a chance to run. The second >+ // time is so that any messages sent by those runnables have a chance to be >+ // dispatched through the event loop. >+ NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::DestroyComplete)); >+} I don't understand the need for MidDestroy. Feels really odd. Why do we have two levels asynchronousness? Why not 3? Why not 42? Shouldn't we have a marker somewhere in fmm message queue that once all the messages sent during unload have been process we can call DestroyComplete. >+++ b/dom/base/nsIMessageManager.idl >@@ -208,19 +208,26 @@ interface nsIMessageListenerManager : nsISupports update uuid >+ void SetFrameLoader(nsFrameLoader* aFrameLoader); Perhaps call this void CacheFrameLoader > already_AddRefed<nsFrameLoader> GetFrameLoader() const; >+ already_AddRefed<nsFrameLoader> GetFrameLoaderAfterDestroy() const; Odd name for the method, since I would expect the method to return null before Destroy, but that is not what the method does. I would just have GetFrameLoader(bool aUseCachedFrameLoaderAfterDestroy = false)
Attachment #8570696 -
Flags: review?(bugs) → review-
Assignee | ||
Comment 13•9 years ago
|
||
Attachment #8570696 -
Attachment is obsolete: true
Attachment #8571756 -
Flags: review?(bugs)
Assignee | ||
Comment 14•9 years ago
|
||
This isn't exactly what you wanted. The two deviations: 1. I passed a runnable into FinalizeFrameLoader. This makes sense to me because it allows us to use the same Runnable for the mNeedsAsyncDestroy case as for normal destruction. I could switch it back pretty easily. 2. With the changes in this patch, calling Destroy (or StartDestroy or DestroyComplete) from ~nsFrameLoader just seems really broken to me. We can't handle messages correctly if the frameloader is already dead. So I put in the release assert that we've already called Destroy when we run ~nsFrameLoader. It passes on try server. Perhaps we could land this a few days ahead of the full patch to make sure it doesn't get hit by nightly users?
Comment 15•9 years ago
|
||
Comment on attachment 8571756 [details] [diff] [review] Allow messages to be sent after frame script unload event Could you verify this doesn't break unload event handling for iframe. As far as I see, it shouldn't. So, load data:text/html,<iframe> and then in the web console run document.body.firstChild.contentWindow.onunload = function(e) { console.log(e.type); }; console.log("will remove"); document.body.removeChild(document.body.firstChild); console.log("did remove"); 'unload' should be dispatched between the two other console.logs >+class nsFrameLoaderDestroyRunnable : public nsRunnable >+{ >+ enum DestroyPhase >+ { >+ // See the implementation of Run for an explanation of these phases. >+ DESTROY_DOCSHELL, >+ WAIT_FOR_UNLOAD_MESSAGES, >+ DESTROY_COMPLETE >+ }; Please use eFooBar syntax for enums >+nsFrameLoaderDestroyRunnable::Run() >+{ >+ switch (mPhase) { >+ case DESTROY_DOCSHELL: >+ mFrameLoader->DestroyDocShell(); > >- mDocShell = nullptr; >- } >+ // In the out-of-process case, TabParent will eventually call >+ // DestroyComplete once it receives a __delete__ message from the child. What if the child process crashes? We should not still in that case. >+nsInProcessTabChildGlobal::FireUnloadEvent() > { >- // Let the frame scripts know the child is being closed. We do any other >- // cleanup after the event has been fired. See DelayedDisconnect >- nsContentUtils::AddScriptRunner( >- NS_NewRunnableMethod(this, &nsInProcessTabChildGlobal::DelayedDisconnect) >- ); >-} >+ // We're called from nsDocument::MaybeInitializeFinalizeFrameLoaders, so it >+ // should be safe to run script. >+ MOZ_ASSERT(nsContentUtils::IsSafeToRunScript()); > >-void >-nsInProcessTabChildGlobal::DelayedDisconnect() >-{ >- // Don't let the event escape >- mOwner = nullptr; >- >- // Fire the "unload" event >+ // Don't let the unload event propagate to chrome event handlers. >+ mPreventEventsEscaping = true; > DOMEventTargetHelper::DispatchTrustedEvent(NS_LITERAL_STRING("unload")); >+ mPreventEventsEscaping = false; >+} Why do we want to reset the value to false? events shouldn't propagate to <xul:browser> once we're being destroyed. So I'd just set mPreventEventsEscaping = true; in DESTROY_DOCSHELL phase. (you want to check that 'unload' event from content page is propagated to the chrome the same way with and without the patch when a tab is removed.) mOwner is a raw reference, so you must set it to null at some point (before the element has died). Otherwise we'll get almost for certain a security bug later.
Attachment #8571756 -
Flags: review?(bugs) → review-
Assignee | ||
Comment 16•9 years ago
|
||
I did some digging and the existing code *does* fire the pagehide event on the <xul:browser> when a tab is closed. Here's the sequence of events: 1. nsInProcessTabChildGlobal::Disconnect adds a script runner to do the actual disconnect. 2. nsFrameLoader::Destroy calls FinalizeFrameLoader, which adds another script runner to call Finalize on the frame loader. 3. We end up in ~mozAutoDocUpdate, presumably after the <browser> element has been removed from the document. https://dxr.mozilla.org/mozilla-central/source/dom/base/mozAutoDocUpdate.h#35 4. That calls EndUpdate on the nsDocument, which calls MaybeInitializeFinalizeFrameLoaders: https://dxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#4958 5. That calls nsFrameLoader::Finalize, which tears down the docshell and fires the pagehide event. 6. Back in ~mozAutoDocUpdate, we then call RemoveScriptBlocker, which triggers nsInProcessTabChildGlobal::DelayedDisconnect. In short, we enqueue these delayed runnables in one order but ~mozAutoDocUpdate runs them in the opposite order. The result is that we do fire the pagehide event on the <browser>, but not on anything above it, since it has already been removed from the document. Here is a test to make sure we don't fail to trigger pagehide.
Attachment #8572287 -
Flags: review?(bugs)
Assignee | ||
Comment 17•9 years ago
|
||
Also, given comment 16, I was trying to remember why we don't fire pagehide events on frame script sin non-e10s. It's because of this code: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsInProcessTabChildGlobal.cpp#233 SetChromeEventHandlerInternal resets mParentTarget to null on the nsGlobalWindow. When we try to handle the pagehide event, we try need to recompute mParentTarget, which calls UpdateParentTarget. That can't find the frame script global target because we've already detached the frameloader from the <browser> element.
Assignee | ||
Comment 18•9 years ago
|
||
Attachment #8571756 -
Attachment is obsolete: true
Attachment #8572455 -
Flags: review?(bugs)
Assignee | ||
Comment 19•9 years ago
|
||
Oops, somehow my comment was lost by bz attach. I made the changes you asked for with the enum and nulling out mOwner. I left mPreventEventsEscaping as it was because of comment 16. I added a comment though.
Comment 20•9 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #19) > Oops, somehow my comment was lost by bz attach. It looks like you put the comment in the section for things to be added to the patch commit message, not the section for things to be posted as a comment.
Updated•9 years ago
|
Attachment #8572287 -
Flags: review?(bugs) → review+
Updated•9 years ago
|
Attachment #8572455 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 21•9 years ago
|
||
Landing just the assertion for now. https://hg.mozilla.org/integration/mozilla-inbound/rev/fd928bcff5c0
Keywords: leave-open
Assignee | ||
Comment 23•9 years ago
|
||
I don't see any crashes in ~nsFrameLoader, so it sounds safe to land the rest of the patches here.
Assignee | ||
Comment 24•9 years ago
|
||
Comment on attachment 8570243 [details] [diff] [review] testing-fix The main patch in this bug makes tab closing asynchronous. We get a message-manager-close notification when a tab starts to close and a message-manager-disconnect message when it finishes closing. This patch makes the test harness wait until all tabs have finished closing before it runs the leak detection code. Otherwise it detects leaks.
Attachment #8570243 -
Flags: review?(jmaher)
Assignee | ||
Comment 25•9 years ago
|
||
Comment on attachment 8570242 [details] [diff] [review] rename-mm-disconnect I converted everything to message-manager-close. I think these clients really just want to know when a <browser> is removed from the DOM, so the close message will be fine for that.
Attachment #8570242 -
Flags: review?(bugs)
Comment 26•9 years ago
|
||
Comment on attachment 8570243 [details] [diff] [review] testing-fix Review of attachment 8570243 [details] [diff] [review]: ----------------------------------------------------------------- I really don't feel comfortable reviewing this one. The code here I don't understand. Is there another reviewer for this, or should I spend some more time trying to understand this in detail?
Attachment #8570243 -
Flags: review?(jmaher)
Assignee | ||
Comment 27•9 years ago
|
||
Comment on attachment 8570243 [details] [diff] [review] testing-fix OK, I'll give this one to Olli too.
Attachment #8570243 -
Flags: review?(bugs)
Comment 28•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #26) > I really don't feel comfortable reviewing this one. The code here I don't > understand. Is there another reviewer for this, or should I spend some more > time trying to understand this in detail? Just my two cents from working on the leak detection code (without being able to r+ anything in testing/): This looks good to me and should indeed do what it's supposed to.
Updated•9 years ago
|
Attachment #8570243 -
Flags: review?(bugs) → review+
Comment 29•9 years ago
|
||
Comment on attachment 8570242 [details] [diff] [review] rename-mm-disconnect I filed Bug 1142489
Attachment #8570242 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 30•9 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/7d9a91ee3d48 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/f63a2cf3fa11
Comment 32•9 years ago
|
||
Backed out for mochitest-bc failures. https://hg.mozilla.org/integration/mozilla-inbound/rev/76aab45b3300 https://treeherder.mozilla.org/logviewer.html#?job_id=7593014&repo=mozilla-inbound
Comment 33•9 years ago
|
||
Had a suspicion what's causing the test failure. We can't use the child process message manager to revive crashed tabs any longer, fixing that to use the frame message manager would have been a follow-up anyway. The problem I'm hitting now are two more test failures: TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_393716.js | also duplicated text data - Got , expected Another value: 1426274018440 TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_scrollPositions.js | force-reload resets scroll positions - Got {"scroll":"105,398"}, expected null For at least the second test it looks like some messages don't arrive for whatever reason. As soon as I set addMessageListener()'s third parameter back to false the test works again.
Comment 35•9 years ago
|
||
Comment on attachment 8577404 [details] [diff] [review] WIP (to revive crashed tabs) Eh, this should probably be a follow-up.
Attachment #8577404 -
Attachment is obsolete: true
Comment 36•9 years ago
|
||
Tried to find out what was actually causing the problem with browser_crashedTabs.js. The browser isn't marked as "revived" by sessionstore because somehow the permanentKey seems wrong. If I use the <browser> element itself and put that in the WeakSet the test succeeds.
Comment 37•9 years ago
|
||
aMessage.objects.browser.permanentKey is undefined now when we receive the "RemoveTabRevived" message.
Assignee | ||
Comment 38•9 years ago
|
||
I poked at this more and I want to write down what I found while it's still fresh in my head. This stuff is really complex! First, I think comment 16 is all wrong. We run DelayedDisconnect before nsFrameLoader::Finalize, which is the order they're enqueued in. I think the reason we still run pagehide on the <browser> element is the SetChromeEventHandler call in DelayedDisconnect. The next time an event runs, it causes nsGlobalWindow::UpdateParentTarget to run. If that function can't find a TabChild to use, it default to mChromeEventHandler, which should still be the <browser> element: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#3008 Now here's why the test failure happens: Without my patch, the XBL binding for the <browser> element is detached after the "unload" event for frame script runs. With my patch, XBL is detached before, so the "unload" event handler can't use XBL. Longer explanation (mostly for me): The code to detach XBL is enqueued as a script runner here: http://mxr.mozilla.org/mozilla-central/source/dom/base/Element.cpp#1695 That happens after nsDocShell::Destroy. Without my patch, nsDocShell::Destroy calls Disconnect on the nsInProcessTabChildGlobal, which enqueues DelayedDisconnect. So DelayedDisconnect will run before XBL is detached. And DelayedDisconnect runs the "unload" event. With my patch, nsDocShell::Destroy calls FinalizeFrameLoader and passes it a runnable. That runnable is also supposed to run before XBL is detached. However, when we run it, we get here: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#7357 That branch is taken because mUpdateNestLevel is 1. So we don't finalize the frameloader just yet. Instead, we finish running script runners, which includes detaching XBL. After we detach XBL (and run any other script runners), we get here: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#4963 That code decrements mUpdateNestLevel and then finalizes the frame loader, which triggers the "unload" event.
Assignee | ||
Comment 39•9 years ago
|
||
This patch fixes the problem by using an async message for the revived message. It causes us to receive the revived message after the <browser> has been added back to the document and permanentKey is available again. I'm guessing we were using a sync message to ensure that no data gets thrown away between when the message is sent and when it's received. But the only data comes from messages, which are processed in order, so I think an async message should be fine. Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=de8fc269d907
Attachment #8579146 -
Flags: review?(ttaubert)
Comment 40•9 years ago
|
||
Comment on attachment 8579146 [details] [diff] [review] patch Review of attachment 8579146 [details] [diff] [review]: ----------------------------------------------------------------- Seems fine assuming it passes tests :)
Attachment #8579146 -
Flags: review?(ttaubert) → review+
Comment 41•9 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #40) > Seems fine assuming it passes tests :) It does! Missed the try run.
Assignee | ||
Comment 42•9 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/3bcebfd13538 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/288efd5e2ca0 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/9af9111e9c27
Keywords: leave-open
Comment 43•9 years ago
|
||
Try push missed doing ASan, though. Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/8491e8e66dfd About a 20% intermittent ASan e10s bc1 "browser_crashedTabs.js | This test exceeded the timeout threshold", and also a 100% in both orange and green runs (apparently at least in e10s browser-chrome on ASan, we actually don't care in the least about crashes, bc2 has a permanent crash but all the runs are green) six instances of "SUMMARY: AddressSanitizer: SEGV /builds/slave/m-in-l64-asan-0000000000000000/build/src/js/src/ctypes/CTypes.cpp:2150 js::ctypes::ConvertToJS(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, void*, bool, bool, JS::MutableHandle<JS::Value>)". While the harness seems not to care about that, I find it rather disturbing, so I backed out all three instead of just backing out the manifest change that started the test running on Linux.
Assignee | ||
Comment 44•9 years ago
|
||
The crashes are intentional crashes, so those are okay. The timeouts don't look too bad either. Given that ASAN probably has to spend a lot of time generating those stack traces, it makes sense that the test will take a while. I'll just increase the timeout.
Assignee | ||
Comment 45•9 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/ce1ac10eae4a remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/ca6d6b665bc1 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/8cadd6b561b3
Assignee | ||
Comment 46•9 years ago
|
||
Actually, now I'm wondering why this test generates ASAN warnings for its intentional crashes while others don't. Is there something we need to do in the test to signal that crashes are expected?
Flags: needinfo?(jmaher)
Comment 47•9 years ago
|
||
https://dxr.mozilla.org/mozilla-central/search?q=_expectingProcessCrash&case=true :)
Flags: needinfo?(jmaher)
Assignee | ||
Comment 48•9 years ago
|
||
I think that only applies to mochitests-plain. I looked around a bit more and it seems that ASAN builds are compiled without the crash reporter. So I added skip-if = !crashreporter to the test annotation to avoid running it on ASAN. That's what we do for browser_thumbnails_bg_crash_during_capture.js, which is a similar test.
https://hg.mozilla.org/mozilla-central/rev/ce1ac10eae4a https://hg.mozilla.org/mozilla-central/rev/ca6d6b665bc1 https://hg.mozilla.org/mozilla-central/rev/8cadd6b561b3 https://hg.mozilla.org/mozilla-central/rev/ce3fea0cb25b
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in
before you can comment on or make changes to this bug.
Description
•