Leak from MessagePort::UpdateMustKeepAlive()
Categories
(Core :: DOM: Core & HTML, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox66 | --- | wontfix |
firefox67 | --- | fix-optional |
firefox68 | --- | affected |
People
(Reporter: tsmith, Unassigned)
References
(Blocks 3 open bugs)
Details
(Keywords: memory-leak, testcase, Whiteboard: [MemShrink:P2])
Attachments
(6 files)
Reporter | ||
Comment 1•6 years ago
|
||
Reporter | ||
Comment 2•6 years ago
|
||
Updated•6 years ago
|
Comment 3•6 years ago
|
||
Reporter | ||
Comment 4•6 years ago
|
||
Reporter | ||
Updated•6 years ago
|
Comment 5•6 years ago
|
||
Hi Andrew, could you please take a look at the full output on comment 4 and see if it still looks like the JS engine leaking? Thanks!
Comment 6•6 years ago
|
||
Oops, I actually looked over the full output, but then I forgot to comment here. Anyways, it just looks like a regular DOM leak, so I think DOM is the right place for this bug.
The test case involves MessageChannel, and I think I recall some possible issues with that causing leaks, so maybe that's at fault.
Updated•6 years ago
|
Comment 7•6 years ago
|
||
MessageChannel/MessagePort do not use JS slots, and I don't see these 2 classes in the leak-Sanitizier output.
Andrew, I think your first idea of JS leaking slots seems reasonable.
Comment 8•6 years ago
|
||
There's some DOM stuff in the output. I'll try running the test case in a debug build. The BloatLog is easier to interpret for big leaks like this.
Comment 9•6 years ago
|
||
I looked at the BloatView. There is a document and window leaking, and 2 MessagePorts, but no MessageChannel. I'll see what the CC logs say.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 10•6 years ago
|
||
To investigate this leak, I first looked at the cycle collector log. The DOM window is being held alive by a MessagePort.
0x7ffb2187a400 [MessagePort file:///home/amccreight/tests/1517667.html]
Root 0x7ffb2187a400 is a ref counted object with 1 unknown edge(s).
known edges:
0x7ffb2187a640 [MessagePort file:///home/amccreight/tests/1517667.html] --[mUnshippedEntangledPort]--> 0x7ffb2187a400
0x7ffb216d1400 [JS Object (MessagePort)] --[UnwrapDOMObject(obj)]--> 0x7ffb2187a400
There is a known reference from the reflector, another from the other MessagePort via the mUnshippedEntangledPort field, and one unknown reference.
Then I used DMD heap scan mode to find all references to one of the leaking MessagePorts in memory. There are only the two strong references known to the CC.
I then looked at the refcount log for this MessagePort, in an unoptimized build to get better stacks. I have attached an annotated version of this log. I also removed some of the stack frames that didn't seem useful.
Anyways, it looks like this message port is leaking because MessagePort::UpdateMustKeepAlive() calls AddRef() on the object, but Release() is never called. I looked at the refcount log for the other MessagePort, and it is basically the same thing, except that the other port has no reflector.
Comment 11•6 years ago
|
||
I think I've gotten as far as I can without knowing anything about MessagePort. Could you take a look again, Baku? Thanks.
Comment 12•6 years ago
|
||
I've debugged this issue a bit. What happens is this:
When the form is submitted, we free the inner objects:
#0 0x00007fa565b62248 in nsGlobalWindowInner::FreeInnerObjects(bool) (this=0x55e7d8e33c00, aForDocumentOpen=false) at /home/baku/Sources/m/barfoo/src/dom/base/nsGlobalWindowInner.cpp:1091
...
#30 0x00007fa56b3f3f14 in nsDocShell::OnLinkClickSync(nsIContent*, nsIURI*, nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, nsIInputStream*, nsIInputStream*, bool, nsIDocShell**, nsIRequest**, bool, nsIPrincipal*) (this=0x55e7d94c6660, aContent=0x55e7d8d9d080, aURI=0x55e7d942fb70, aTargetSpec=..., aFileName=..., aPostDataStream=0x0, aHeadersDataStream=0x0, aNoOpenerImplied=false, aDocShell=0x7ffe912777c0, aRequest=0x55e7d8d9d1c0, aIsUserTriggered=false, aTriggeringPrincipal=0x0) at /home/baku/Sources/m/barfoo/src/docshell/base/nsDocShell.cpp:12654
...
#32 0x00007fa5678ba183 in mozilla::dom::HTMLFormElement::SubmitSubmission(mozilla::dom::HTMLFormSubmission*) (this=0x55e7d8d9d080, aFormSubmission=0x55e7d9433490)
at /home/baku/Sources/m/barfoo/src/dom/html/HTMLFormElement.cpp:710
But then we create the MessagePort, which creates a loop with the entangled port:
#0 0x00007fa5687253e8 in mozilla::dom::MessagePort::UnshippedEntangle(mozilla::dom::MessagePort*) (this=0x55e7d95a71d0, aEntangledPort=0x55e7d8bfa7f0) at /home/baku/Sources/m/barfoo/src/dom/messagechannel/MessagePort.cpp:233
#1 0x00007fa568725213 in mozilla::dom::MessageChannel::Constructor(nsIGlobalObject*, mozilla::ErrorResult&) (aGlobal=0x55e7d8e33cf8, aRv=...) at /home/baku/Sources/m/barfoo/src/dom/messagechannel/MessageChannel.cpp:76
Note that CheckInnerWindowCorrectness() returns NS_OK because the MessagePort's window (globalObject) is the current inner window.
The problem is that, after this point, FreeInnerObjects() is not called again and the window is leaked.
I suspect that this problem can be reproduced with many other DOM APIs. In particular, anything that is released in FreeInnerObjects:
Smaug, should FreeInnerObjects() be called (also) at the end of the current JS execution instead?
Comment 13•6 years ago
|
||
Why should FreeInnerObjects be called the end of current JS execution?
Creating object should be work, I think, since throwing might cause unexpected issues. But worth to test what other browsers do. But perhaps the created object should be somehow dummy if the owner global is being destroyed.
Comment 14•6 years ago
|
||
Why should FreeInnerObjects be called the end of current JS execution?
Right now, FreeInnerObjects() is called when JS executes form.submit(), but more things happen after that. New objects are created, localStorage and indexedDB can be used for instance. What's the point of this cleanup: https://searchfox.org/mozilla-central/rev/4587d146681b16ff9878a6fdcba53b04f76abe1d/dom/base/nsGlobalWindowInner.cpp#1246-1282 ... if all of that can be re-initialize immediately after?
Creating object should be work, I think, since throwing might cause unexpected issues. But worth to test what other browsers do. But perhaps the created object should be somehow dummy if the owner global is being destroyed.
I don't think we should move this extra complexity in any possible object.
Btw, this problem is extremely similar to the worker shutdown. Maybe we can try to find a way to unify the 2 paths (long term project).
Comment 15•6 years ago
|
||
(In reply to Andrea Marchesini [:baku] from comment #14)
Why should FreeInnerObjects be called the end of current JS execution?
Right now, FreeInnerObjects() is called when JS executes form.submit(), but
more things happen after that. New objects are created, localStorage and
indexedDB can be used for instance. What's the point of this cleanup:
https://searchfox.org/mozilla-central/rev/
4587d146681b16ff9878a6fdcba53b04f76abe1d/dom/base/nsGlobalWindowInner.
cpp#1246-1282 ... if all of that can be re-initialize immediately after?
But at which point would you call FreeInnerObjects. In principal cycle collector should be able to unlink everything, not some explicit "FreeFooBar" call
Comment 16•6 years ago
|
||
But at which point would you call FreeInnerObjects. In principal cycle collector should be able to unlink everything, not some explicit "FreeFooBar" call
This is right. But FreeInnerObjects() is what calls DisconnectEventTargetObjects():
https://searchfox.org/mozilla-central/rev/b10ae6b7a50d176a813900cbe9dc18c85acd604b/dom/base/nsGlobalWindowInner.cpp#1216
And in DisconnectEventTargetObjects() we execute DisconnectFromOwner():
https://searchfox.org/mozilla-central/rev/b10ae6b7a50d176a813900cbe9dc18c85acd604b/dom/base/nsIGlobalObject.cpp#158-160
Many objects relay on ::DisconnectFromOwner() to break circular references.
It seems to me that actually the issue is in how we execute the form.submit(). This should be async:
https://html.spec.whatwg.org/#submit-mutate-action
"[...] Plan to navigate to parsed action."
https://html.spec.whatwg.org/#plan-to-navigate
"Each form element has a planned navigation, which is either null or a task; when the form is first created, its planned navigation must be set to null. In the behaviors described below, when the user agent is required to plan to navigate to a particular resource destination, it must run the following steps:
[...]
If the form has a non-null planned navigation, remove it from its task queue.
Let the form's planned navigation be a new task that consists of running the following steps:
Let the form's planned navigation be null.
Navigate target browsing context to destination. If replace is true, then target browsing context must be navigated with replacement enabled.
[...]
Queue the task that is the form's new planned navigation."
Comment 17•6 years ago
|
||
This is what I wrote so far. It makes the form submission async, as the spec says. But it introduces an extra about:blank load event and that breaks a few tests. Maybe somebody has time to continue this patch.
Comment 18•6 years ago
|
||
I might be able to take it after bug 1518442.
Assignee | ||
Updated•6 years ago
|
Comment 20•5 years ago
|
||
Andrea and I looked at this a bit today. What's happening is that the form submit is trying to find a window target named "x", there isn't one, so we go to open a new window/tab. That calls into the window provider machinery, which spins the event loop before returning a window. That event loop spin ends up processing the timeout and doing window.close(), which tears down the window; then we unwind and create the MessageChannel.
So a simpler testcase might be:
<iframe></iframe>
<script>
var win = frames[0];
win.frameElement.remove();
var chan = new win.MessageChannel();
</script>
or something along those lines.
I filed bug 1552505 on the event loop spinning thing.
Comment 21•5 years ago
|
||
This has been fixed by bug 1552751.
Description
•