Closed Bug 1126089 Opened 10 years ago Closed 9 years ago

Investigate allowing messages to be sent from TabChildGlobal's unload event

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla39

Tracking Flags:

Tracking

Status

e10s

m5+

---

firefox39

---

fixed

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(5 files, 4 obsolete files)

unload-order 9 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	smaug : review-	Details \| Diff \| Splinter Review
rename-mm-disconnect 9 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	smaug : review+	Details \| Diff \| Splinter Review
testing-fix 9 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	smaug : review+	Details \| Diff \| Splinter Review
Allow messages to be sent after frame script unload event 9 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	smaug : review-	Details \| Diff \| Splinter Review
Allow messages to be sent after frame script unload event 9 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	smaug : review-	Details \| Diff \| Splinter Review
Add test for pageload event on browser element when tab closed 9 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	smaug : review+	Details \| Diff \| Splinter Review
Allow messages to be sent after frame script unload event 9 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	smaug : review+	Details \| Diff \| Splinter Review
WIP (to revive crashed tabs) 9 years ago Tim Taubert [:ttaubert] (inactive) (deleted), patch		Details \| Diff \| Splinter Review
patch 9 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	ttaubert : review+	Details \| Diff \| Splinter Review

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Description

•

10 years ago

Right now you're not able to send messages from an "unload" event handler for a TabChildGlobal. The message is never received because the message manager for the TabParent has already been disconnected. However, for bug 1109875 it would be really useful to be able to send messages here.

I think me might be able to change TabChild::RecvDestroy to send a reply to the parent after the unload event is dispatched. The parent could then disconnect the message manager at that time.

Tim Taubert [:ttaubert] (inactive)

Updated

•

9 years ago

Blocks: 1109875

Mike Conley (:mconley) (:⚙️)

Comment 1

•

9 years ago

This blocks 1109875, which is an m8. I figure this should probably be an m8 then.

tracking-e10s: --- → m8+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 2

•

9 years ago

Attached patch unload-order (obsolete) (deleted) — Details — Splinter Review

Here are some of the changes I made:

When we remove a <browser> element from the DOM, we immediately prohibit sending messages using its message manager. We fire the message-manager-close event at this time.

When the child process receives the Destroy message, it fires the unload event on its frame script. Then we destroy the docshell, which fires some more events on the document. After that, we send the __delete__ message to the parent. We're not allowed to send messages after that.

When the parent receives the __delete__ message, it shuts down the message manager and notifies message-manager-disconnect observers.

While the parent is waiting for the __delete__ message, it can dispatch incoming message manager messages. It needs to keep a strong reference to the frameloader and <browser> element during this time to ensure that they're not destroyed.

I've tried to stick to the terminology that "Close" means "sending is no longer allowed" and "Disconnect" means "everything is being torn down, the object is no longer usable."

On interesting question is the order of the frame script unload event versus the final pagehide/unload events on the document. It seems a bit strange that we fire unload on the frame script first. However, the only alternative is to fire it after the docshell is torn down. We have a number of unload handlers that would like to use the docshell, so that seems too late.

Assignee: nobody → wmccloskey

Status: NEW → ASSIGNED

Attachment #8570241 - Flags: review?(bugs)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 3

•

9 years ago

Attached patch rename-mm-disconnect (deleted) — Details — Splinter Review

This patch renames our uses of message-manager-disconnect to message-manager-close, since that seems to be what people want. I'll hold off on getting review for this until Olli approves the naming.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 4

•

9 years ago

Attached patch testing-fix (deleted) — Details — Splinter Review

This patch changes the browser-chrome test harness so that it waits until all tabs have been completely unloaded before running the memory leak detection code.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

9 years ago

Blocks: 1130215, 1131313

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 5

•

9 years ago

Might as well bump this to M5 since two other M5s depend on it.

tracking-e10s: m8+ → m5+

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

9 years ago

So there is the somewhat unusual case where <browser> is removed and then added back to document.
That would lead to a new nsFrameLoader to be created for the <browser>.
So there would be <browser> -> fl2, and fl1-><browser> and fl2-><browser> edges for awhile.
But I think that is ok.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 7

•

9 years ago

Comment on attachment 8570241 [details] [diff] [review]
unload-order

># HG changeset patch
># Parent 600f44fd317c145b4f583b7db9358615fd52c70b
># User Bill McCloskey <wmccloskey@mozilla.com>
>
>diff --git a/dom/base/nsFrameLoader.cpp b/dom/base/nsFrameLoader.cpp
>--- a/dom/base/nsFrameLoader.cpp
>+++ b/dom/base/nsFrameLoader.cpp
>@@ -519,23 +519,84 @@ nsFrameLoader::GetDocShell(nsIDocShell *
> 
>   *aDocShell = mDocShell;
>   NS_IF_ADDREF(*aDocShell);
> 
>   return rv;
> }
> 
> void
>+nsFrameLoader::DestroyComplete()
>+{
>+  // We can get here by removing the frameloader from the DOM (in which case
>+  // nsFrameLoader::Finalize was already called) or if the child process
>+  // crashes.
>+
>+  if (mRemoteBrowser) {
>+    // Drop the strong references created in Finalize.
>+    mOwnerContentStrong = nullptr;
>+    mRemoteBrowser->SetFrameLoader(nullptr);
>+
>+    // Call TabParent::Destroy if we haven't already (in case of a crash).
>+    mRemoteBrowser->SetOwnerElement(nullptr);
>+    mRemoteBrowser->Destroy();
>+    mRemoteBrowser = nullptr;
>+  }
>+
>+  if (mMessageManager) {
>+    mMessageManager->Disconnect();
>+  }
>+
>+  if (mChildMessageManager) {
>+    static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->Disconnect();
>+  }
>+}
>+
>+void
>+nsFrameLoader::MidDestroy()
>+{
>+  // We go through the event loop twice. The first time is so that runnables
>+  // dispatched when a frame script unloads have a chance to run. The second
>+  // time is so that any messages sent by those runnables have a chance to be
>+  // dispatched through the event loop.
>+  NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::DestroyComplete));
>+}
>+
>+void
> nsFrameLoader::Finalize()
> {
>+  // This code runs after the frameloader has been detached from the <browser>
>+  // element.
>+
>+  // Ask the TabChild to fire the frame script "unload" event, destroy its
>+  // docshell, and finally destroy the PBrowser actor. This eventually leads to
>+  // nsFrameLoader::DestroyComplete being called.
>+  if (mRemoteBrowser) {
>+    mRemoteBrowser->Destroy();
>+  }
>+
>+  // Fire the "unload" event if we're in-process.
>+  if (mChildMessageManager) {
>+    static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->FireUnloadEvent();
>+  }
>+
>+  // Destroy the docshell.
>   nsCOMPtr<nsIBaseWindow> base_win(do_QueryInterface(mDocShell));
>   if (base_win) {
>     base_win->Destroy();
>   }
>   mDocShell = nullptr;
>+
>+  if (mChildMessageManager) {
>+    // Stop handling events in the in-process frame script.
>+    static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->DisconnectEvents();
>+
>+    // Dispatch a runnable that eventually calls nsFrameLoader::DestroyComplete.
>+    NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::MidDestroy));
>+  }
> }
> 
> static void
> FirePageHideEvent(nsIDocShellTreeItem* aItem,
>                   EventTarget* aChromeEventHandler)
> {
>   nsCOMPtr<nsIDocument> doc = aItem->GetDocument();
>   NS_ASSERTION(doc, "What happened here?");
>@@ -1353,51 +1414,50 @@ nsFrameLoader::SwapWithOtherLoader(nsFra
> 
>   FirePageShowEvent(ourDocshell, ourEventTarget, true);
>   FirePageShowEvent(otherDocshell, otherEventTarget, true);
> 
>   mInSwap = aOther->mInSwap = false;
>   return NS_OK;
> }
> 
>-void
>-nsFrameLoader::DestroyChild()
>-{
>-  if (mRemoteBrowser) {
>-    mRemoteBrowser->SetOwnerElement(nullptr);
>-    mRemoteBrowser->Destroy();
>-    mRemoteBrowser = nullptr;
>-  }
>-}
>-
> NS_IMETHODIMP
> nsFrameLoader::Destroy()
> {
>+  // nsFrameLoader::Destroy is called just before the frameloader is detached
>+  // from the <browser> element.
>+
>   if (mDestroyCalled) {
>     return NS_OK;
>   }
>   mDestroyCalled = true;
> 
>+  // After this point, we return an error when trying to send a message using
>+  // the message manager on the frame.
>   if (mMessageManager) {
>-    mMessageManager->Disconnect();
>+    mMessageManager->Close();
>   }
>-  if (mChildMessageManager) {
>-    static_cast<nsInProcessTabChildGlobal*>(mChildMessageManager.get())->Disconnect();
>+
>+  // Retain references to the <browser> element and the frameloader in case we
>+  // receive any messages from the message manager on the frame. These
>+  // references are dropped in DestroyComplete.
>+  if (mRemoteBrowser) {
>+    mOwnerContentStrong = mOwnerContent;
>+    mRemoteBrowser->SetFrameLoader(this);
Destroy can be called from nsFrameLoader's dtor, so you can't pass pointer to it to be kept alive elsewhere.
Perhaps you need a flag in dtor and use it here and if it is set, call Finalize or something?
> nsFrameMessageManager::AddMessageListener(const nsAString& aMessage,
>-                                          nsIMessageListener* aListener)
>+                                          nsIMessageListener* aListener,
>+                                          bool aListenWhenClosed)
> {
>   nsAutoTObserverArray<nsMessageListenerInfo, 1>* listeners =
>     mListeners.Get(aMessage);
>   if (!listeners) {
>     listeners = new nsAutoTObserverArray<nsMessageListenerInfo, 1>();
>     mListeners.Put(aMessage, listeners);
>   } else {
>     uint32_t len = listeners->Length();
>@@ -315,16 +316,17 @@ nsFrameMessageManager::AddMessageListene
>         return NS_OK;
>       }
>     }
>   }
> 
>   nsMessageListenerInfo* entry = listeners->AppendElement();
>   NS_ENSURE_TRUE(entry, NS_ERROR_OUT_OF_MEMORY);
>   entry->mStrongListener = aListener;
>+  entry->mListenWhenClosed = aListenWhenClosed;

Why do we need this?

>+nsInProcessTabChildGlobal::DisconnectEvents()
Odd method names. Don't know what it means.
I guess you want DisconnectEventListeners

> #ifdef DEBUG
>   if (mOwner) {
>     nsCOMPtr<nsIFrameLoaderOwner> owner = do_QueryInterface(mOwner);
>     nsRefPtr<nsFrameLoader> fl = owner->GetFrameLoader();
>     if (fl) {
>       NS_ASSERTION(this == fl->GetTabChildGlobalAsEventTarget(),
>                    "Wrong event target!");
So this isn't true anymore, if the <browser> is just moved and a new nsFrameLoader created


>       NS_ASSERTION(fl->mMessageManager == mChromeMessageManager,
>                    "Wrong message manager!");
ditto.


I don't actually understand the ownership model in InProcessTabChildGlobal case.
Who is keeping the nsFrameLoader alive? Could you explain?
(remote case is clear)



> already_AddRefed<nsFrameLoader>
> TabParent::GetFrameLoader() const
Would it make sense to add a parameter to tell whether the caller wants to use also the
cached mFrameLoader, not only the one from mFrameElement? 
That way you wouldn't need to have 'if (!frameLoader || mIsDestroyed) {'
everywhere.

Attachment #8570241 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 8

•

9 years ago

The naming "close" and "disconnect" sounds good to me.

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

9 years ago

Attachment #8570241 - Flags: review+ → review-

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 9

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #6)
> So there is the somewhat unusual case where <browser> is removed and then
> added back to document.
> That would lead to a new nsFrameLoader to be created for the <browser>.
> So there would be <browser> -> fl2, and fl1-><browser> and fl2-><browser>
> edges for awhile.
> But I think that is ok.

Yes, I worried about this too. I think it's okay though. At worst we would confuse some listeners since msg.target would be the same browser element for both frameloaders. The fact that listeners have to opt into getting messages after the tab is closed makes it much less worrisome to me.

We actually have a test for this too: browserElement_DOMRequestError.js.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 10

•

9 years ago

> Destroy can be called from nsFrameLoader's dtor, so you can't pass pointer to it to be kept
> alive elsewhere.
> Perhaps you need a flag in dtor and use it here and if it is set, call Finalize or something?

Still curious about this, but I'll add the flag.

> >+  entry->mListenWhenClosed = aListenWhenClosed;
> 
> Why do we need this?

I found that lots of browser code fails if we deliver messages after the tab has been closed. The main problem is that the XBL has been detached, so properties stored on the <browser> element via XBL (like the message manager) no longer work. The result is that a lot of message listeners throw exceptions.

We could fix all these listeners, but I think most listeners don't want these late messages anyway. I think it's better to ask people to opt into getting these late messages.

> So this isn't true anymore, if the <browser> is just moved and a new nsFrameLoader created

I might be wrong, but I assumed that we can't attach a new frameloader to the <browser> until nsFrameLoader::Finalize is called (because we don't null out the current frameloader on the <browser> until right after Finalize). Inside nsFrameLoader::Finalize we call nsInProcessTabChildGlobal::DisconnectEventListeners, so I assumed we wouldn't get any more events in nsInProcessTabChildGlobal::PreHandleEvent after that, so this wouldn't be a problem.

> I don't actually understand the ownership model in InProcessTabChildGlobal case.
> Who is keeping the nsFrameLoader alive? Could you explain?

The runnable for MidDestroy and DestroyComplete keep a strong reference to the frameloader. I'll add a comment for this.

> Would it make sense to add a parameter to tell whether the caller wants to use also the
> cached mFrameLoader, not only the one from mFrameElement? 
> That way you wouldn't need to have 'if (!frameLoader || mIsDestroyed) {'
> everywhere.

Sure, I will do that.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 11

•

9 years ago

Attached patch Allow messages to be sent after frame script unload event (obsolete) (deleted) — Details — Splinter Review

Attachment #8570241 - Attachment is obsolete: true

Attachment #8570696 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 12

•

9 years ago

Comment on attachment 8570696 [details] [diff] [review]
Allow messages to be sent after frame script unload event

>+void
>+nsFrameLoader::MidDestroy()
>+{
>+  // We go through the event loop twice. The first time is so that runnables
>+  // dispatched when a frame script unloads have a chance to run. The second
>+  // time is so that any messages sent by those runnables have a chance to be
>+  // dispatched through the event loop.
>+  NS_DispatchToCurrentThread(NS_NewRunnableMethod(this, &nsFrameLoader::DestroyComplete));
>+}
I don't understand the need for MidDestroy. Feels really odd. Why do we have two levels asynchronousness? Why not 3? Why not 42?
Shouldn't we have a marker somewhere in fmm message queue that once all the messages sent during unload have been process we can call DestroyComplete.


>+++ b/dom/base/nsIMessageManager.idl
>@@ -208,19 +208,26 @@ interface nsIMessageListenerManager : nsISupports
update uuid


>+    void SetFrameLoader(nsFrameLoader* aFrameLoader);
Perhaps call this
void CacheFrameLoader


>     already_AddRefed<nsFrameLoader> GetFrameLoader() const;
>+    already_AddRefed<nsFrameLoader> GetFrameLoaderAfterDestroy() const;
Odd name for the method, since I would expect the method to return null before Destroy, but that is not what the method does.
I would just have GetFrameLoader(bool aUseCachedFrameLoaderAfterDestroy = false)

Attachment #8570696 - Flags: review?(bugs) → review-

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 13

•

9 years ago

Attached patch Allow messages to be sent after frame script unload event (obsolete) (deleted) — Details — Splinter Review

Attachment #8570696 - Attachment is obsolete: true

Attachment #8571756 - Flags: review?(bugs)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 14

•

9 years ago

This isn't exactly what you wanted. The two deviations:

1. I passed a runnable into FinalizeFrameLoader. This makes sense to me because it allows us to use the same Runnable for the mNeedsAsyncDestroy case as for normal destruction. I could switch it back pretty easily.

2. With the changes in this patch, calling Destroy (or StartDestroy or DestroyComplete) from ~nsFrameLoader just seems really broken to me. We can't handle messages correctly if the frameloader is already dead. So I put in the release assert that we've already called Destroy when we run ~nsFrameLoader. It passes on try server. Perhaps we could land this a few days ahead of the full patch to make sure it doesn't get hit by nightly users?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 15

•

9 years ago

Comment on attachment 8571756 [details] [diff] [review]
Allow messages to be sent after frame script unload event

Could you verify this doesn't break unload event handling for iframe. As far as I see, it shouldn't.
So, load
data:text/html,<iframe>
and then in the web console run
document.body.firstChild.contentWindow.onunload = function(e) { console.log(e.type); }; console.log("will remove"); document.body.removeChild(document.body.firstChild); console.log("did remove");
'unload' should be dispatched between the two other console.logs

>+class nsFrameLoaderDestroyRunnable : public nsRunnable
>+{
>+  enum DestroyPhase
>+  {
>+    // See the implementation of Run for an explanation of these phases.
>+    DESTROY_DOCSHELL,
>+    WAIT_FOR_UNLOAD_MESSAGES,
>+    DESTROY_COMPLETE
>+  };
Please use eFooBar syntax for enums


>+nsFrameLoaderDestroyRunnable::Run()
>+{
>+  switch (mPhase) {
>+  case DESTROY_DOCSHELL:
>+    mFrameLoader->DestroyDocShell();
> 
>-    mDocShell = nullptr;
>-  }
>+    // In the out-of-process case, TabParent will eventually call
>+    // DestroyComplete once it receives a __delete__ message from the child.
What if the child process crashes? We should not still in that case.

>+nsInProcessTabChildGlobal::FireUnloadEvent()
> {
>-  // Let the frame scripts know the child is being closed. We do any other
>-  // cleanup after the event has been fired. See DelayedDisconnect
>-  nsContentUtils::AddScriptRunner(
>-     NS_NewRunnableMethod(this, &nsInProcessTabChildGlobal::DelayedDisconnect)
>-  );
>-}
>+  // We're called from nsDocument::MaybeInitializeFinalizeFrameLoaders, so it
>+  // should be safe to run script.
>+  MOZ_ASSERT(nsContentUtils::IsSafeToRunScript());
> 
>-void
>-nsInProcessTabChildGlobal::DelayedDisconnect()
>-{
>-  // Don't let the event escape
>-  mOwner = nullptr;
>-
>-  // Fire the "unload" event
>+  // Don't let the unload event propagate to chrome event handlers.
>+  mPreventEventsEscaping = true;
>   DOMEventTargetHelper::DispatchTrustedEvent(NS_LITERAL_STRING("unload"));
>+  mPreventEventsEscaping = false;
>+}
Why do we want to reset the value to false? events shouldn't propagate to 
<xul:browser> once we're being destroyed.
So I'd just set mPreventEventsEscaping = true; in DESTROY_DOCSHELL phase.
(you want to check that 'unload' event from content page is propagated to the chrome the same way with and without the patch when a tab is removed.)

mOwner is a raw reference, so you must set it to null at some point (before the element has died).
Otherwise we'll get almost for certain a security bug later.

Attachment #8571756 - Flags: review?(bugs) → review-

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 16

•

9 years ago

Attached patch Add test for pageload event on browser element when tab closed (deleted) — Details — Splinter Review

I did some digging and the existing code *does* fire the pagehide event on the <xul:browser> when a tab is closed. Here's the sequence of events:

1. nsInProcessTabChildGlobal::Disconnect adds a script runner to do the actual disconnect.
2. nsFrameLoader::Destroy calls FinalizeFrameLoader, which adds another script runner to call Finalize on the frame loader.
3. We end up in ~mozAutoDocUpdate, presumably after the <browser> element has been removed from the document.
https://dxr.mozilla.org/mozilla-central/source/dom/base/mozAutoDocUpdate.h#35
4. That calls EndUpdate on the nsDocument, which calls MaybeInitializeFinalizeFrameLoaders:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#4958
5. That calls nsFrameLoader::Finalize, which tears down the docshell and fires the pagehide event.
6. Back in ~mozAutoDocUpdate, we then call RemoveScriptBlocker, which triggers nsInProcessTabChildGlobal::DelayedDisconnect.

In short, we enqueue these delayed runnables in one order but ~mozAutoDocUpdate runs them in the opposite order. The result is that we do fire the pagehide event on the <browser>, but not on anything above it, since it has already been removed from the document.

Here is a test to make sure we don't fail to trigger pagehide.

Attachment #8572287 - Flags: review?(bugs)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 17

•

9 years ago

Also, given comment 16, I was trying to remember why we don't fire pagehide events on frame script sin non-e10s. It's because of this code:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsInProcessTabChildGlobal.cpp#233
SetChromeEventHandlerInternal resets mParentTarget to null on the nsGlobalWindow. When we try to handle the pagehide event, we try need to recompute mParentTarget, which calls UpdateParentTarget. That can't find the frame script global target because we've already detached the frameloader from the <browser> element.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 18

•

9 years ago

Attached patch Allow messages to be sent after frame script unload event (deleted) — Details — Splinter Review

Attachment #8571756 - Attachment is obsolete: true

Attachment #8572455 - Flags: review?(bugs)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 19

•

9 years ago

Oops, somehow my comment was lost by bz attach. I made the changes you asked for with the enum and nulling out mOwner. I left mPreventEventsEscaping as it was because of comment 16. I added a comment though.

Andrew McCreight [:mccr8]

Comment 20

•

9 years ago

(In reply to Bill McCloskey (:billm) from comment #19)
> Oops, somehow my comment was lost by bz attach.

It looks like you put the comment in the section for things to be added to the patch commit message, not the section for things to be posted as a comment.

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

9 years ago

Attachment #8572287 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

9 years ago

Attachment #8572455 - Flags: review?(bugs) → review+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 21

•

9 years ago

Landing just the assertion for now.
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd928bcff5c0

Keywords: leave-open

Carsten Book [:Tomcat]

Comment 22

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/fd928bcff5c0

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 23

•

9 years ago

I don't see any crashes in ~nsFrameLoader, so it sounds safe to land the rest of the patches here.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 24

•

9 years ago

Comment on attachment 8570243 [details] [diff] [review]
testing-fix

The main patch in this bug makes tab closing asynchronous. We get a message-manager-close notification when a tab starts to close and a message-manager-disconnect message when it finishes closing.

This patch makes the test harness wait until all tabs have finished closing before it runs the leak detection code. Otherwise it detects leaks.

Attachment #8570243 - Flags: review?(jmaher)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 25

•

9 years ago

Comment on attachment 8570242 [details] [diff] [review]
rename-mm-disconnect

I converted everything to message-manager-close. I think these clients really just want to know when a <browser> is removed from the DOM, so the close message will be fine for that.

Attachment #8570242 - Flags: review?(bugs)

Joel Maher ( :jmaher ) (UTC -8)

Comment 26

•

9 years ago

Comment on attachment 8570243 [details] [diff] [review]
testing-fix

Review of attachment 8570243 [details] [diff] [review]:
-----------------------------------------------------------------

I really don't feel comfortable reviewing this one.  The code here I don't understand.  Is there another reviewer for this, or should I spend some more time trying to understand this in detail?

Attachment #8570243 - Flags: review?(jmaher)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 27

•

9 years ago

Comment on attachment 8570243 [details] [diff] [review]
testing-fix

OK, I'll give this one to Olli too.

Attachment #8570243 - Flags: review?(bugs)

Tim Taubert [:ttaubert] (inactive)

Comment 28

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #26)
> I really don't feel comfortable reviewing this one.  The code here I don't
> understand.  Is there another reviewer for this, or should I spend some more
> time trying to understand this in detail?

Just my two cents from working on the leak detection code (without being able to r+ anything in testing/): This looks good to me and should indeed do what it's supposed to.

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

9 years ago

Attachment #8570243 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 29

•

9 years ago

Comment on attachment 8570242 [details] [diff] [review]
rename-mm-disconnect

I filed Bug 1142489

Attachment #8570242 - Flags: review?(bugs) → review+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 30

•

9 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/7d9a91ee3d48
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/f63a2cf3fa11

Ryan VanderMeulen [:RyanVM]

Comment 32

•

9 years ago

Backed out for mochitest-bc failures.
https://hg.mozilla.org/integration/mozilla-inbound/rev/76aab45b3300

https://treeherder.mozilla.org/logviewer.html#?job_id=7593014&repo=mozilla-inbound

Tim Taubert [:ttaubert] (inactive)

Comment 33

•

9 years ago

Attached patch WIP (to revive crashed tabs) (obsolete) (deleted) — Details — Splinter Review

Had a suspicion what's causing the test failure. We can't use the child process message manager to revive crashed tabs any longer, fixing that to use the frame message manager would have been a follow-up anyway.

The problem I'm hitting now are two more test failures:

TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_393716.js | also duplicated text data - Got , expected Another value: 1426274018440
TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_scrollPositions.js | force-reload resets scroll positions - Got {"scroll":"105,398"}, expected null

For at least the second test it looks like some messages don't arrive for whatever reason. As soon as I set addMessageListener()'s third parameter back to false the test works again.

Tim Taubert [:ttaubert] (inactive)

Comment 35

•

9 years ago

Comment on attachment 8577404 [details] [diff] [review]
WIP (to revive crashed tabs)

Eh, this should probably be a follow-up.

Attachment #8577404 - Attachment is obsolete: true

Tim Taubert [:ttaubert] (inactive)

Comment 36

•

9 years ago

Tried to find out what was actually causing the problem with browser_crashedTabs.js. The browser isn't marked as "revived" by sessionstore because somehow the permanentKey seems wrong. If I use the <browser> element itself and put that in the WeakSet the test succeeds.

Tim Taubert [:ttaubert] (inactive)

Comment 37

•

9 years ago

aMessage.objects.browser.permanentKey is undefined now when we receive the "RemoveTabRevived" message.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 38

•

9 years ago

I poked at this more and I want to write down what I found while it's still fresh in my head. This stuff is really complex!

First, I think comment 16 is all wrong. We run DelayedDisconnect before nsFrameLoader::Finalize, which is the order they're enqueued in. I think the reason we still run pagehide on the <browser> element is the SetChromeEventHandler call in DelayedDisconnect. The next time an event runs, it causes nsGlobalWindow::UpdateParentTarget to run. If that function can't find a TabChild to use, it default to mChromeEventHandler, which should still be the <browser> element:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#3008

Now here's why the test failure happens:
Without my patch, the XBL binding for the <browser> element is detached after the "unload" event for frame script runs. With my patch, XBL is detached before, so the "unload" event handler can't use XBL.

Longer explanation (mostly for me):
The code to detach XBL is enqueued as a script runner here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/Element.cpp#1695
That happens after nsDocShell::Destroy.

Without my patch, nsDocShell::Destroy calls Disconnect on the nsInProcessTabChildGlobal, which enqueues DelayedDisconnect. So DelayedDisconnect will run before XBL is detached. And DelayedDisconnect runs the "unload" event.

With my patch, nsDocShell::Destroy calls FinalizeFrameLoader and passes it a runnable. That runnable is also supposed to run before XBL is detached. However, when we run it, we get here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#7357
That branch is taken because mUpdateNestLevel is 1. So we don't finalize the frameloader just yet. Instead, we finish running script runners, which includes detaching XBL. After we detach XBL (and run any other script runners), we get here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#4963
That code decrements mUpdateNestLevel and then finalizes the frame loader, which triggers the "unload" event.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 39

•

9 years ago

Attached patch patch (deleted) — Details — Splinter Review

This patch fixes the problem by using an async message for the revived message. It causes us to receive the revived message after the <browser> has been added back to the document and permanentKey is available again. I'm guessing we were using a sync message to ensure that no data gets thrown away between when the message is sent and when it's received. But the only data comes from messages, which are processed in order, so I think an async message should be fine.

Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=de8fc269d907

Attachment #8579146 - Flags: review?(ttaubert)

Tim Taubert [:ttaubert] (inactive)

Comment 40

•

9 years ago

Comment on attachment 8579146 [details] [diff] [review]
patch

Review of attachment 8579146 [details] [diff] [review]:
-----------------------------------------------------------------

Seems fine assuming it passes tests :)

Attachment #8579146 - Flags: review?(ttaubert) → review+

Tim Taubert [:ttaubert] (inactive)

Comment 41

•

9 years ago

(In reply to Tim Taubert [:ttaubert] from comment #40)
> Seems fine assuming it passes tests :)

It does! Missed the try run.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 42

•

9 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/3bcebfd13538
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/288efd5e2ca0
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/9af9111e9c27

Keywords: leave-open

Phil Ringnalda (:philor)

Comment 43

•

9 years ago

Try push missed doing ASan, though. Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/8491e8e66dfd

About a 20% intermittent ASan e10s bc1 "browser_crashedTabs.js | This test exceeded the timeout threshold", and also a 100% in both orange and green runs (apparently at least in e10s browser-chrome on ASan, we actually don't care in the least about crashes, bc2 has a permanent crash but all the runs are green) six instances of "SUMMARY: AddressSanitizer: SEGV /builds/slave/m-in-l64-asan-0000000000000000/build/src/js/src/ctypes/CTypes.cpp:2150 js::ctypes::ConvertToJS(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, void*, bool, bool, JS::MutableHandle<JS::Value>)". While the harness seems not to care about that, I find it rather disturbing, so I backed out all three instead of just backing out the manifest change that started the test running on Linux.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 44

•

9 years ago

The crashes are intentional crashes, so those are okay.

The timeouts don't look too bad either. Given that ASAN probably has to spend a lot of time generating those stack traces, it makes sense that the test will take a while. I'll just increase the timeout.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 45

•

9 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/ce1ac10eae4a
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/ca6d6b665bc1
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/8cadd6b561b3

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 46

•

9 years ago

Actually, now I'm wondering why this test generates ASAN warnings for its intentional crashes while others don't. Is there something we need to do in the test to signal that crashes are expected?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 47

•

9 years ago

https://dxr.mozilla.org/mozilla-central/search?q=_expectingProcessCrash&case=true :)

Flags: needinfo?(jmaher)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 48

•

9 years ago

I think that only applies to mochitests-plain.

I looked around a bit more and it seems that ASAN builds are compiled without the crash reporter. So I added skip-if = !crashreporter to the test annotation to avoid running it on ASAN. That's what we do for browser_thumbnails_bg_crash_during_capture.js, which is a similar test.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 49

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/ce1ac10eae4a
https://hg.mozilla.org/mozilla-central/rev/ca6d6b665bc1
https://hg.mozilla.org/mozilla-central/rev/8cadd6b561b3
https://hg.mozilla.org/mozilla-central/rev/ce3fea0cb25b

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox39: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla39

Tim Taubert [:ttaubert] (inactive)

Updated

•

9 years ago

Depends on: 1145942

Mike de Boer [:mikedeboer]

Updated

•

8 years ago

Depends on: 1250871

Mats Palmgren (inactive)

Updated

•

7 years ago

Depends on: 1370015

Mats Palmgren (inactive)

Updated

•

7 years ago

No longer depends on: 1370015

You need to log in before you can comment on or make changes to this bug.