Closed Bug 1126089 Opened 10 years ago Closed 9 years ago

Investigate allowing messages to be sent from TabChildGlobal's unload event

Categories

(Core :: DOM: Content Processes, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
e10s m5+ ---
firefox39 --- fixed

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(5 files, 4 obsolete files)

(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
ttaubert
: review+
Details | Diff | Splinter Review
Right now you're not able to send messages from an "unload" event handler for a TabChildGlobal. The message is never received because the message manager for the TabParent has already been disconnected. However, for bug 1109875 it would be really useful to be able to send messages here.

I think me might be able to change TabChild::RecvDestroy to send a reply to the parent after the unload event is dispatched. The parent could then disconnect the message manager at that time.
This blocks 1109875, which is an m8. I figure this should probably be an m8 then.
tracking-e10s: --- → m8+
Attached patch unload-order (obsolete) (deleted) — Splinter Review
Here are some of the changes I made:

When we remove a <browser> element from the DOM, we immediately prohibit sending messages using its message manager. We fire the message-manager-close event at this time.

When the child process receives the Destroy message, it fires the unload event on its frame script. Then we destroy the docshell, which fires some more events on the document. After that, we send the __delete__ message to the parent. We're not allowed to send messages after that.

When the parent receives the __delete__ message, it shuts down the message manager and notifies message-manager-disconnect observers.

While the parent is waiting for the __delete__ message, it can dispatch incoming message manager messages. It needs to keep a strong reference to the frameloader and <browser> element during this time to ensure that they're not destroyed.

I've tried to stick to the terminology that "Close" means "sending is no longer allowed" and "Disconnect" means "everything is being torn down, the object is no longer usable."

On interesting question is the order of the frame script unload event versus the final pagehide/unload events on the document. It seems a bit strange that we fire unload on the frame script first. However, the only alternative is to fire it after the docshell is torn down. We have a number of unload handlers that would like to use the docshell, so that seems too late.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Attachment #8570241 - Flags: review?(bugs)
Attached patch rename-mm-disconnect (deleted) — Splinter Review
This patch renames our uses of message-manager-disconnect to message-manager-close, since that seems to be what people want. I'll hold off on getting review for this until Olli approves the naming.
Attached patch testing-fix (deleted) — Splinter Review
This patch changes the browser-chrome test harness so that it waits until all tabs have been completely unloaded before running the memory leak detection code.
Might as well bump this to M5 since two other M5s depend on it.
So there is the somewhat unusual case where <browser> is removed and then added back to document.
That would lead to a new nsFrameLoader to be created for the <browser>.
So there would be <browser> -> fl2, and fl1-><browser> and fl2-><browser> edges for awhile.
But I think that is ok.
Comment on attachment 8570241 [details] [diff] [review]
unload-order

># HG changeset patch
># Parent 600f44fd317c145b4f583b7db9358615fd52c70b
># User Bill McCloskey <wmccloskey@mozilla.com>
>
>diff --git a/dom/base/nsFrameLoader.cpp b/dom/base/nsFrameLoader.cpp
>--- a/dom/base/nsFrameLoader.cpp
>+++ b/dom/base/nsFrameLoader.cpp
>@@ -519,23 +519,84 @@ nsFrameLoader::GetDocShell(nsIDocShell *
> 
>   *aDocShell = mDocShell;
>   NS_IF_ADDREF(*aDocShell);
> 
>   return rv;
> }
> 
> void
>+nsFrameLoader::DestroyComplete()
>+{
>+  // We can get here by removing the frameloader from the DOM (in which case
>+  // nsFrameLoader::Finalize was already called) or if the child process
>+  // crashes.
>+
>+  if (mRemoteBrowser) {
>+    // Drop the strong references created in Finalize.
>+    mOwnerContentStrong = nullptr;
>+    mRemoteBrowser->SetFrameLoader(nullptr);
>+
>+    // Call TabParent::Destroy if we haven't already (in case of a crash).
>+    mRemoteBrowser->SetOwnerElement(nullptr);
>+    mRemoteBrowser->Destroy();
>+    mRemoteBrowser = nullptr;
>+  }
>+
>+  if (mMessageManager) {
>+    mMessageManager->Disconnect();
>+  }
>+
>+  if (mChildMessageManager) {
>+    static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->Disconnect();
>+  }
>+}
>+
>+void
>+nsFrameLoader::MidDestroy()
>+{
>+  // We go through the event loop twice. The first time is so that runnables
>+  // dispatched when a frame script unloads have a chance to run. The second
>+  // time is so that any messages sent by those runnables have a chance to be
>+  // dispatched through the event loop.
>+  NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::DestroyComplete));
>+}
>+
>+void
> nsFrameLoader::Finalize()
> {
>+  // This code runs after the frameloader has been detached from the <browser>
>+  // element.
>+
>+  // Ask the TabChild to fire the frame script "unload" event, destroy its
>+  // docshell, and finally destroy the PBrowser actor. This eventually leads to
>+  // nsFrameLoader::DestroyComplete being called.
>+  if (mRemoteBrowser) {
>+    mRemoteBrowser->Destroy();
>+  }
>+
>+  // Fire the "unload" event if we're in-process.
>+  if (mChildMessageManager) {
>+    static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->FireUnloadEvent();
>+  }
>+
>+  // Destroy the docshell.
>   nsCOMPtr<nsIBaseWindow> base_win(do_QueryInterface(mDocShell));
>   if (base_win) {
>     base_win->Destroy();
>   }
>   mDocShell = nullptr;
>+
>+  if (mChildMessageManager) {
>+    // Stop handling events in the in-process frame script.
>+    static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->DisconnectEvents();
>+
>+    // Dispatch a runnable that eventually calls nsFrameLoader::DestroyComplete.
>+    NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::MidDestroy));
>+  }
> }
> 
> static void
> FirePageHideEvent(nsIDocShellTreeItem* aItem,
>                   EventTarget* aChromeEventHandler)
> {
>   nsCOMPtr<nsIDocument> doc = aItem->GetDocument();
>   NS_ASSERTION(doc, "What happened here?");
>@@ -1353,51 +1414,50 @@ nsFrameLoader::SwapWithOtherLoader(nsFra
> 
>   FirePageShowEvent(ourDocshell, ourEventTarget, true);
>   FirePageShowEvent(otherDocshell, otherEventTarget, true);
> 
>   mInSwap = aOther->mInSwap = false;
>   return NS_OK;
> }
> 
>-void
>-nsFrameLoader::DestroyChild()
>-{
>-  if (mRemoteBrowser) {
>-    mRemoteBrowser->SetOwnerElement(nullptr);
>-    mRemoteBrowser->Destroy();
>-    mRemoteBrowser = nullptr;
>-  }
>-}
>-
> NS_IMETHODIMP
> nsFrameLoader::Destroy()
> {
>+  // nsFrameLoader::Destroy is called just before the frameloader is detached
>+  // from the <browser> element.
>+
>   if (mDestroyCalled) {
>     return NS_OK;
>   }
>   mDestroyCalled = true;
> 
>+  // After this point, we return an error when trying to send a message using
>+  // the message manager on the frame.
>   if (mMessageManager) {
>-    mMessageManager->Disconnect();
>+    mMessageManager->Close();
>   }
>-  if (mChildMessageManager) {
>-    static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->Disconnect();
>+
>+  // Retain references to the <browser> element and the frameloader in case we
>+  // receive any messages from the message manager on the frame. These
>+  // references are dropped in DestroyComplete.
>+  if (mRemoteBrowser) {
>+    mOwnerContentStrong = mOwnerContent;
>+    mRemoteBrowser->SetFrameLoader(this);
Destroy can be called from nsFrameLoader's dtor, so you can't pass pointer to it to be kept alive elsewhere.
Perhaps you need a flag in dtor and use it here and if it is set, call Finalize or something?
> nsFrameMessageManager::AddMessageListener(const nsAString& aMessage,
>-                                          nsIMessageListener* aListener)
>+                                          nsIMessageListener* aListener,
>+                                          bool aListenWhenClosed)
> {
>   nsAutoTObserverArray<nsMessageListenerInfo, 1>* listeners =
>     mListeners.Get(aMessage);
>   if (!listeners) {
>     listeners = new nsAutoTObserverArray<nsMessageListenerInfo, 1>();
>     mListeners.Put(aMessage, listeners);
>   } else {
>     uint32_t len = listeners->Length();
>@@ -315,16 +316,17 @@ nsFrameMessageManager::AddMessageListene
>         return NS_OK;
>       }
>     }
>   }
> 
>   nsMessageListenerInfo* entry = listeners->AppendElement();
>   NS_ENSURE_TRUE(entry, NS_ERROR_OUT_OF_MEMORY);
>   entry->mStrongListener = aListener;
>+  entry->mListenWhenClosed = aListenWhenClosed;

Why do we need this?

>+nsInProcessTabChildGlobal::DisconnectEvents()
Odd method names. Don't know what it means.
I guess you want DisconnectEventListeners

> #ifdef DEBUG
>   if (mOwner) {
>     nsCOMPtr<nsIFrameLoaderOwner> owner = do_QueryInterface(mOwner);
>     nsRefPtr<nsFrameLoader> fl = owner->GetFrameLoader();
>     if (fl) {
>       NS_ASSERTION(this == fl->GetTabChildGlobalAsEventTarget(),
>                    "Wrong event target!");
So this isn't true anymore, if the <browser> is just moved and a new nsFrameLoader created


>       NS_ASSERTION(fl->mMessageManager == mChromeMessageManager,
>                    "Wrong message manager!");
ditto.


I don't actually understand the ownership model in InProcessTabChildGlobal case.
Who is keeping the nsFrameLoader alive? Could you explain?
(remote case is clear)



> already_AddRefed<nsFrameLoader>
> TabParent::GetFrameLoader() const
Would it make sense to add a parameter to tell whether the caller wants to use also the
cached mFrameLoader, not only the one from mFrameElement? 
That way you wouldn't need to have 'if (!frameLoader || mIsDestroyed) {'
everywhere.
Attachment #8570241 - Flags: review?(bugs) → review+
The naming "close" and "disconnect" sounds good to me.
Attachment #8570241 - Flags: review+ → review-
(In reply to Olli Pettay [:smaug] from comment #6)
> So there is the somewhat unusual case where <browser> is removed and then
> added back to document.
> That would lead to a new nsFrameLoader to be created for the <browser>.
> So there would be <browser> -> fl2, and fl1-><browser> and fl2-><browser>
> edges for awhile.
> But I think that is ok.

Yes, I worried about this too. I think it's okay though. At worst we would confuse some listeners since msg.target would be the same browser element for both frameloaders. The fact that listeners have to opt into getting messages after the tab is closed makes it much less worrisome to me.

We actually have a test for this too: browserElement_DOMRequestError.js.
> Destroy can be called from nsFrameLoader's dtor, so you can't pass pointer to it to be kept
> alive elsewhere.
> Perhaps you need a flag in dtor and use it here and if it is set, call Finalize or something?

Still curious about this, but I'll add the flag.

> >+  entry->mListenWhenClosed = aListenWhenClosed;
> 
> Why do we need this?

I found that lots of browser code fails if we deliver messages after the tab has been closed. The main problem is that the XBL has been detached, so properties stored on the <browser> element via XBL (like the message manager) no longer work. The result is that a lot of message listeners throw exceptions.

We could fix all these listeners, but I think most listeners don't want these late messages anyway. I think it's better to ask people to opt into getting these late messages.

> So this isn't true anymore, if the <browser> is just moved and a new nsFrameLoader created

I might be wrong, but I assumed that we can't attach a new frameloader to the <browser> until nsFrameLoader::Finalize is called (because we don't null out the current frameloader on the <browser> until right after Finalize). Inside nsFrameLoader::Finalize we call nsInProcessTabChildGlobal::DisconnectEventListeners, so I assumed we wouldn't get any more events in nsInProcessTabChildGlobal::PreHandleEvent after that, so this wouldn't be a problem.

> I don't actually understand the ownership model in InProcessTabChildGlobal case.
> Who is keeping the nsFrameLoader alive? Could you explain?

The runnable for MidDestroy and DestroyComplete keep a strong reference to the frameloader. I'll add a comment for this.

> Would it make sense to add a parameter to tell whether the caller wants to use also the
> cached mFrameLoader, not only the one from mFrameElement? 
> That way you wouldn't need to have 'if (!frameLoader || mIsDestroyed) {'
> everywhere.

Sure, I will do that.
Attachment #8570241 - Attachment is obsolete: true
Attachment #8570696 - Flags: review?(bugs)
Comment on attachment 8570696 [details] [diff] [review]
Allow messages to be sent after frame script unload event

>+void
>+nsFrameLoader::MidDestroy()
>+{
>+  // We go through the event loop twice. The first time is so that runnables
>+  // dispatched when a frame script unloads have a chance to run. The second
>+  // time is so that any messages sent by those runnables have a chance to be
>+  // dispatched through the event loop.
>+  NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::DestroyComplete));
>+}
I don't understand the need for MidDestroy. Feels really odd. Why do we have two levels asynchronousness? Why not 3? Why not 42?
Shouldn't we have a marker somewhere in fmm message queue that once all the messages sent during unload have been process we can call DestroyComplete.


>+++ b/dom/base/nsIMessageManager.idl
>@@ -208,19 +208,26 @@ interface nsIMessageListenerManager : nsISupports
update uuid


>+    void SetFrameLoader(nsFrameLoader* aFrameLoader);
Perhaps call this
void CacheFrameLoader


>     already_AddRefed<nsFrameLoader> GetFrameLoader() const;
>+    already_AddRefed<nsFrameLoader> GetFrameLoaderAfterDestroy() const;
Odd name for the method, since I would expect the method to return null before Destroy, but that is not what the method does.
I would just have GetFrameLoader(bool aUseCachedFrameLoaderAfterDestroy = false)
Attachment #8570696 - Flags: review?(bugs) → review-
Attachment #8570696 - Attachment is obsolete: true
Attachment #8571756 - Flags: review?(bugs)
This isn't exactly what you wanted. The two deviations:

1. I passed a runnable into FinalizeFrameLoader. This makes sense to me because it allows us to use the same Runnable for the mNeedsAsyncDestroy case as for normal destruction. I could switch it back pretty easily.

2. With the changes in this patch, calling Destroy (or StartDestroy or DestroyComplete) from ~nsFrameLoader just seems really broken to me. We can't handle messages correctly if the frameloader is already dead. So I put in the release assert that we've already called Destroy when we run ~nsFrameLoader. It passes on try server. Perhaps we could land this a few days ahead of the full patch to make sure it doesn't get hit by nightly users?
Comment on attachment 8571756 [details] [diff] [review]
Allow messages to be sent after frame script unload event

Could you verify this doesn't break unload event handling for iframe. As far as I see, it shouldn't.
So, load
data:text/html,<iframe>
and then in the web console run
document.body.firstChild.contentWindow.onunload = function(e) { console.log(e.type); }; console.log("will remove"); document.body.removeChild(document.body.firstChild); console.log("did remove");
'unload' should be dispatched between the two other console.logs

>+class nsFrameLoaderDestroyRunnable : public nsRunnable
>+{
>+  enum DestroyPhase
>+  {
>+    // See the implementation of Run for an explanation of these phases.
>+    DESTROY_DOCSHELL,
>+    WAIT_FOR_UNLOAD_MESSAGES,
>+    DESTROY_COMPLETE
>+  };
Please use eFooBar syntax for enums


>+nsFrameLoaderDestroyRunnable::Run()
>+{
>+  switch (mPhase) {
>+  case DESTROY_DOCSHELL:
>+    mFrameLoader->DestroyDocShell();
> 
>-    mDocShell = nullptr;
>-  }
>+    // In the out-of-process case, TabParent will eventually call
>+    // DestroyComplete once it receives a __delete__ message from the child.
What if the child process crashes? We should not still in that case.

>+nsInProcessTabChildGlobal::FireUnloadEvent()
> {
>-  // Let the frame scripts know the child is being closed. We do any other
>-  // cleanup after the event has been fired. See DelayedDisconnect
>-  nsContentUtils::AddScriptRunner(
>-     NS_NewRunnableMethod(this, &nsInProcessTabChildGlobal::DelayedDisconnect)
>-  );
>-}
>+  // We're called from nsDocument::MaybeInitializeFinalizeFrameLoaders, so it
>+  // should be safe to run script.
>+  MOZ_ASSERT(nsContentUtils::IsSafeToRunScript());
> 
>-void
>-nsInProcessTabChildGlobal::DelayedDisconnect()
>-{
>-  // Don't let the event escape
>-  mOwner = nullptr;
>-
>-  // Fire the "unload" event
>+  // Don't let the unload event propagate to chrome event handlers.
>+  mPreventEventsEscaping = true;
>   DOMEventTargetHelper::DispatchTrustedEvent(NS_LITERAL_STRING("unload"));
>+  mPreventEventsEscaping = false;
>+}
Why do we want to reset the value to false? events shouldn't propagate to 
<xul:browser> once we're being destroyed.
So I'd just set mPreventEventsEscaping = true; in DESTROY_DOCSHELL phase.
(you want to check that 'unload' event from content page is propagated to the chrome the same way with and without the patch when a tab is removed.)

mOwner is a raw reference, so you must set it to null at some point (before the element has died).
Otherwise we'll get almost for certain a security bug later.
Attachment #8571756 - Flags: review?(bugs) → review-
I did some digging and the existing code *does* fire the pagehide event on the <xul:browser> when a tab is closed. Here's the sequence of events:

1. nsInProcessTabChildGlobal::Disconnect adds a script runner to do the actual disconnect.
2. nsFrameLoader::Destroy calls FinalizeFrameLoader, which adds another script runner to call Finalize on the frame loader.
3. We end up in ~mozAutoDocUpdate, presumably after the <browser> element has been removed from the document.
https://dxr.mozilla.org/mozilla-central/source/dom/base/mozAutoDocUpdate.h#35
4. That calls EndUpdate on the nsDocument, which calls MaybeInitializeFinalizeFrameLoaders:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#4958
5. That calls nsFrameLoader::Finalize, which tears down the docshell and fires the pagehide event.
6. Back in ~mozAutoDocUpdate, we then call RemoveScriptBlocker, which triggers nsInProcessTabChildGlobal::DelayedDisconnect.

In short, we enqueue these delayed runnables in one order but ~mozAutoDocUpdate runs them in the opposite order. The result is that we do fire the pagehide event on the <browser>, but not on anything above it, since it has already been removed from the document.

Here is a test to make sure we don't fail to trigger pagehide.
Attachment #8572287 - Flags: review?(bugs)
Also, given comment 16, I was trying to remember why we don't fire pagehide events on frame script sin non-e10s. It's because of this code:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsInProcessTabChildGlobal.cpp#233
SetChromeEventHandlerInternal resets mParentTarget to null on the nsGlobalWindow. When we try to handle the pagehide event, we try need to recompute mParentTarget, which calls UpdateParentTarget. That can't find the frame script global target because we've already detached the frameloader from the <browser> element.
Oops, somehow my comment was lost by bz attach. I made the changes you asked for with the enum and nulling out mOwner. I left mPreventEventsEscaping as it was because of comment 16. I added a comment though.
(In reply to Bill McCloskey (:billm) from comment #19)
> Oops, somehow my comment was lost by bz attach.

It looks like you put the comment in the section for things to be added to the patch commit message, not the section for things to be posted as a comment.
Attachment #8572287 - Flags: review?(bugs) → review+
Attachment #8572455 - Flags: review?(bugs) → review+
I don't see any crashes in ~nsFrameLoader, so it sounds safe to land the rest of the patches here.
Comment on attachment 8570243 [details] [diff] [review]
testing-fix

The main patch in this bug makes tab closing asynchronous. We get a message-manager-close notification when a tab starts to close and a message-manager-disconnect message when it finishes closing.

This patch makes the test harness wait until all tabs have finished closing before it runs the leak detection code. Otherwise it detects leaks.
Attachment #8570243 - Flags: review?(jmaher)
Comment on attachment 8570242 [details] [diff] [review]
rename-mm-disconnect

I converted everything to message-manager-close. I think these clients really just want to know when a <browser> is removed from the DOM, so the close message will be fine for that.
Attachment #8570242 - Flags: review?(bugs)
Comment on attachment 8570243 [details] [diff] [review]
testing-fix

Review of attachment 8570243 [details] [diff] [review]:
-----------------------------------------------------------------

I really don't feel comfortable reviewing this one.  The code here I don't understand.  Is there another reviewer for this, or should I spend some more time trying to understand this in detail?
Attachment #8570243 - Flags: review?(jmaher)
Comment on attachment 8570243 [details] [diff] [review]
testing-fix

OK, I'll give this one to Olli too.
Attachment #8570243 - Flags: review?(bugs)
(In reply to Joel Maher (:jmaher) from comment #26)
> I really don't feel comfortable reviewing this one.  The code here I don't
> understand.  Is there another reviewer for this, or should I spend some more
> time trying to understand this in detail?

Just my two cents from working on the leak detection code (without being able to r+ anything in testing/): This looks good to me and should indeed do what it's supposed to.
Attachment #8570243 - Flags: review?(bugs) → review+
Comment on attachment 8570242 [details] [diff] [review]
rename-mm-disconnect

I filed Bug 1142489
Attachment #8570242 - Flags: review?(bugs) → review+
Attached patch WIP (to revive crashed tabs) (obsolete) (deleted) — Splinter Review
Had a suspicion what's causing the test failure. We can't use the child process message manager to revive crashed tabs any longer, fixing that to use the frame message manager would have been a follow-up anyway.

The problem I'm hitting now are two more test failures:

TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_393716.js | also duplicated text data - Got , expected Another value: 1426274018440
TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_scrollPositions.js | force-reload resets scroll positions - Got {"scroll":"105,398"}, expected null

For at least the second test it looks like some messages don't arrive for whatever reason. As soon as I set addMessageListener()'s third parameter back to false the test works again.
Comment on attachment 8577404 [details] [diff] [review]
WIP (to revive crashed tabs)

Eh, this should probably be a follow-up.
Attachment #8577404 - Attachment is obsolete: true
Tried to find out what was actually causing the problem with browser_crashedTabs.js. The browser isn't marked as "revived" by sessionstore because somehow the permanentKey seems wrong. If I use the <browser> element itself and put that in the WeakSet the test succeeds.
aMessage.objects.browser.permanentKey is undefined now when we receive the "RemoveTabRevived" message.
I poked at this more and I want to write down what I found while it's still fresh in my head. This stuff is really complex!

First, I think comment 16 is all wrong. We run DelayedDisconnect before nsFrameLoader::Finalize, which is the order they're enqueued in. I think the reason we still run pagehide on the <browser> element is the SetChromeEventHandler call in DelayedDisconnect. The next time an event runs, it causes nsGlobalWindow::UpdateParentTarget to run. If that function can't find a TabChild to use, it default to mChromeEventHandler, which should still be the <browser> element:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#3008

Now here's why the test failure happens:
Without my patch, the XBL binding for the <browser> element is detached after the "unload" event for frame script runs. With my patch, XBL is detached before, so the "unload" event handler can't use XBL.

Longer explanation (mostly for me):
The code to detach XBL is enqueued as a script runner here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/Element.cpp#1695
That happens after nsDocShell::Destroy.

Without my patch, nsDocShell::Destroy calls Disconnect on the nsInProcessTabChildGlobal, which enqueues DelayedDisconnect. So DelayedDisconnect will run before XBL is detached. And DelayedDisconnect runs the "unload" event.

With my patch, nsDocShell::Destroy calls FinalizeFrameLoader and passes it a runnable. That runnable is also supposed to run before XBL is detached. However, when we run it, we get here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#7357
That branch is taken because mUpdateNestLevel is 1. So we don't finalize the frameloader just yet. Instead, we finish running script runners, which includes detaching XBL. After we detach XBL (and run any other script runners), we get here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#4963
That code decrements mUpdateNestLevel and then finalizes the frame loader, which triggers the "unload" event.
Attached patch patch (deleted) — Splinter Review
This patch fixes the problem by using an async message for the revived message. It causes us to receive the revived message after the <browser> has been added back to the document and permanentKey is available again. I'm guessing we were using a sync message to ensure that no data gets thrown away between when the message is sent and when it's received. But the only data comes from messages, which are processed in order, so I think an async message should be fine.

Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=de8fc269d907
Attachment #8579146 - Flags: review?(ttaubert)
Comment on attachment 8579146 [details] [diff] [review]
patch

Review of attachment 8579146 [details] [diff] [review]:
-----------------------------------------------------------------

Seems fine assuming it passes tests :)
Attachment #8579146 - Flags: review?(ttaubert) → review+
(In reply to Tim Taubert [:ttaubert] from comment #40)
> Seems fine assuming it passes tests :)

It does! Missed the try run.
Try push missed doing ASan, though. Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/8491e8e66dfd

About a 20% intermittent ASan e10s bc1 "browser_crashedTabs.js | This test exceeded the timeout threshold", and also a 100% in both orange and green runs (apparently at least in e10s browser-chrome on ASan, we actually don't care in the least about crashes, bc2 has a permanent crash but all the runs are green) six instances of "SUMMARY: AddressSanitizer: SEGV /builds/slave/m-in-l64-asan-0000000000000000/build/src/js/src/ctypes/CTypes.cpp:2150 js::ctypes::ConvertToJS(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, void*, bool, bool, JS::MutableHandle<JS::Value>)". While the harness seems not to care about that, I find it rather disturbing, so I backed out all three instead of just backing out the manifest change that started the test running on Linux.
The crashes are intentional crashes, so those are okay.

The timeouts don't look too bad either. Given that ASAN probably has to spend a lot of time generating those stack traces, it makes sense that the test will take a while. I'll just increase the timeout.
Actually, now I'm wondering why this test generates ASAN warnings for its intentional crashes while others don't. Is there something we need to do in the test to signal that crashes are expected?
Flags: needinfo?(jmaher)
I think that only applies to mochitests-plain.

I looked around a bit more and it seems that ASAN builds are compiled without the crash reporter. So I added skip-if = !crashreporter to the test annotation to avoid running it on ASAN. That's what we do for browser_thumbnails_bg_crash_during_capture.js, which is a similar test.
Depends on: 1250871
Depends on: 1370015
No longer depends on: 1370015
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: