Closed
Bug 1238707
Opened 9 years ago
Closed 9 years ago
[e10s] Frequent shutdown leak default process: 2482603 bytes leaked (AnimationTimeline, AsyncLatencyLogger, AsyncStatement, AtomImpl, BackstagePass, ...) on Win7 and OSX
Categories
(Core :: DOM: Service Workers, defect)
Core
DOM: Service Workers
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: jgriffin, Assigned: bkelly)
References
(Blocks 2 open bugs)
Details
(Keywords: memory-leak, Whiteboard: [e10s-orangeblockers] btpp-followup-2016-03-04)
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
smaug
:
review+
ritu
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
RyanVM
:
review+
|
Details | Diff | Splinter Review |
On Windows 7 debug, mochitest-plain chunk 4 in e10s mode frequently suffers from a shutdown leak, default process: 2482603 bytes leaked (AnimationTimeline, AsyncLatencyLogger, AsyncStatement, AtomImpl, BackstagePass, ...)
example log: https://treeherder.mozilla.org/logviewer.html#?job_id=15286412&repo=try
This is similar to but distinct from bug 1186675.
Updated•9 years ago
|
Flags: needinfo?(mrbkap)
I think I get this message when this happens:
WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME. FIX THIS!
Comment 2•9 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #1)
> I think I get this message when this happens:
> WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything
> alive inside it, that is) AT JS_ShutDown TIME. FIX THIS!
That's just a symptom of a huge leak like this. If you leak tons of pages, then you can end up not shutting down the JS engine correctly, which then prints out this message.
Comment 3•9 years ago
|
||
I think RyanVM already filed something for this, and bkelly has been needinfo'd on that.
Flags: needinfo?(mrbkap)
Updated•9 years ago
|
Summary: [e10s] Frequent shutdown leak default process: 2482603 bytes leaked (AnimationTimeline, AsyncLatencyLogger, AsyncStatement, AtomImpl, BackstagePass, ...) → [e10s] Frequent shutdown leak default process: 2482603 bytes leaked (AnimationTimeline, AsyncLatencyLogger, AsyncStatement, AtomImpl, BackstagePass, ...) on Win7 and OSX
Updated•9 years ago
|
Updated•9 years ago
|
Whiteboard: btpp-followup-2016-03-04
Assignee | ||
Comment 6•9 years ago
|
||
Digging out from PTO and my windows machine is in a box at the moment. I hope to get that setup over the weekend so I can look at these leak bugs next week.
Assignee | ||
Comment 7•9 years ago
|
||
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Updated•9 years ago
|
Comment 8•9 years ago
|
||
FWIW, there's a try run from kats from last week that shows this issue:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9cd7ed658735&selectedJob=17714419
Comment 9•9 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #7)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=8add5c2a0841
FYI, you left a global Windows debug e10s skip-if in the top of the manifest.
Comment 10•9 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #8)
> FWIW, there's a try run from kats from last week that shows this issue:
You also might like the Ash branch, where I've got e10s tests running across all desktop platforms now. Plan is to keep it in sync with m-c, but haven't gotten that part automated yet.
https://treeherder.mozilla.org/#/jobs?repo=ash
Assignee | ||
Comment 11•9 years ago
|
||
There are known issues with http cache shutdown at the moment. I'm inclined to wait for bug 1251130 to be fixed before spending time here. I don't see any ServiceWorker objects in the leaked list, but I do see http cache objects.
Depends on: 1251130
Assignee | ||
Comment 12•9 years ago
|
||
I guess the dates don't make sense there.
No longer depends on: 1251130
Comment 13•9 years ago
|
||
I'll look at this locally and do a basic CC log analysis.
Updated•9 years ago
|
tracking-e10s:
--- → +
Comment 14•9 years ago
|
||
Windows debug serviceworker tests were disabled across the board due to this bug. Interestingly, they never got disabled on OSX and appear to be running green on Ash. Might be worth a fresh Try push to see where we stand with them re-enabled on Windows too.
Whiteboard: btpp-followup-2016-03-04 → [e10s-orangeblockers] btpp-followup-2016-03-04
Comment 15•9 years ago
|
||
Yeah, I ran the test suite a few times locally on OSX and I wasn't able to reproduce the leak. I could look at CC logs from a Windows run if somebody generates them.
Comment 16•9 years ago
|
||
This will prevent us from running debug sw test suites on a number of platforms. Ben, can we get this assigned to someone and looked at?
Flags: needinfo?(bkelly)
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #16)
> This will prevent us from running debug sw test suites on a number of
> platforms. Ben, can we get this assigned to someone and looked at?
This is a major pain to work on when the whole platform is disabled in automation. It would be helpful if we could enable the platform with the tests disabled. We can then investigate the tests without fighting automation as well.
Flags: needinfo?(bkelly)
Reporter | ||
Comment 18•9 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #17)
> (In reply to Jim Mathies [:jimm] from comment #16)
> > This will prevent us from running debug sw test suites on a number of
> > platforms. Ben, can we get this assigned to someone and looked at?
>
> This is a major pain to work on when the whole platform is disabled in
> automation. It would be helpful if we could enable the platform with the
> tests disabled. We can then investigate the tests without fighting
> automation as well.
These suites are running and enabled on ash; feel free to work on debugging there.
Comment 19•9 years ago
|
||
Here's a Try push log with CC shutdown logging enabled. Sorry, Treeherder lost track of the jobs, but the log is still there.
http://archive.mozilla.org/pub/firefox/try-builds/ryanvm@gmail.com-be73f8e02191cbc098deb8fadcc2e617fb3d7bd3/try-win32-debug/try_win7-all-debug_test-mochitest-4-bm110-tests1-windows-build454.txt.gz
I believe that the cc-edges.2868 logs are the ones you want.
Flags: needinfo?(continuation)
Comment 20•9 years ago
|
||
These logs are weird. There are 7 CC logs. The first 5 don't have anything in the CC graph.
Anyways, all the stuff in those last two CC logs that I see that is being held alive from a single nsXPCWrappedJS (nsIBrowserDOMWindow), which has 1 missing reference. As far as I can see, there are only two places that hold nsIBrowserDOMWindow alive from C++. One is nsGlobalWindow, which is properly traced so that can't be it. The other is TabParent.
I also see some code related to nsIBrowserDOMWindow in OpenWindowRunnable::OpenWindow(), which is a service worker class, so maybe that is related? The test case is leaking for nsRunnable, so perhaps those are OpenWindowRunnable.
Flags: needinfo?(continuation)
Comment 21•9 years ago
|
||
Hmm, didn't we have some case where js is keeping nsITabParent alive too long. That would lead to
keeping mBrowserDOMWindow alive too long.
Looks like destroying TabParent is too messy atm.
The sanest please to clear the member variable is probably in
TabParent::Destroy() *before* if (mIsDestroyed) { check, so that we clear it even if the child process
has crashed and mIsDestroyed already set.
Assignee | ||
Comment 22•9 years ago
|
||
This implements Olli's suggestion from comment 21. Ryan, can you see if this helps in automation?
Attachment #8735503 -
Flags: feedback?(ryanvm)
Assignee | ||
Comment 23•9 years ago
|
||
Here's a try to see if it breaks any other existing tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=47c531053883
Assignee | ||
Comment 24•9 years ago
|
||
I think try is showing some issues with this patch.
Assignee | ||
Comment 25•9 years ago
|
||
It seems with this patch we are consistently hitting this on linux debug in M-e10s(2):
12:42:20 INFO - [Parent 4238] WARNING: '!browserDOMWin', file /builds/slave/try-l64-d-00000000000000000000/build/src/dom/ipc/ContentParent.cpp, line 5424
12:42:20 INFO - [Child 4296] ###!!! ASSERTION: attempted to open a new window with no WindowCreator: 'mWindowCreator', file /builds/slave/try-l64-d-00000000000000000000/build/src/embedding/components/windowwatcher/nsWindowWatcher.cpp, line 792
12:43:06 INFO - #01: nsWindowWatcher::OpenWindow2 [embedding/components/windowwatcher/nsWindowWatcher.cpp:453]
12:43:06 INFO - #02: nsGlobalWindow::OpenInternal [dom/base/nsGlobalWindow.cpp:11560]
12:43:06 INFO - #03: nsGlobalWindow::OpenNoNavigate [dom/base/nsGlobalWindow.cpp:7847]
12:43:06 INFO - #04: nsDocShell::InternalLoad [xpcom/string/nsTSubstring.h:95]
12:43:06 INFO - #05: nsDocShell::OnLinkClickSync [xpcom/string/nsTSubstring.h:95]
12:43:06 INFO - #06: OnLinkClickEvent::Run [docshell/base/nsDocShell.cpp:13496]
12:43:06 INFO - #07: nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:994]
12:43:06 INFO - #08: NS_ProcessNextEvent [xpcom/glue/nsThreadUtils.cpp:297]
12:43:06 INFO - #09: mozilla::ipc::MessagePump::Run [ipc/glue/MessagePump.cpp:98]
12:43:06 INFO - #10: MessageLoop::RunInternal [ipc/chromium/src/base/message_loop.cc:235]
12:43:06 INFO - #11: MessageLoop::Run [ipc/chromium/src/base/message_loop.cc:520]
12:43:06 INFO - #12: nsBaseAppShell::Run [widget/nsBaseAppShell.cpp:158]
12:43:06 INFO - #13: XRE_RunAppShell [toolkit/xre/nsEmbedFunctions.cpp:801]
12:43:06 INFO - #14: mozilla::ipc::MessagePumpForChildProcess::Run [ipc/glue/MessagePump.cpp:263]
12:43:06 INFO - #15: MessageLoop::RunInternal [ipc/chromium/src/base/message_loop.cc:235]
12:43:06 INFO - #16: MessageLoop::Run [ipc/chromium/src/base/message_loop.cc:520]
12:43:06 INFO - #17: XRE_InitChildProcess [toolkit/xre/nsEmbedFunctions.cpp:641]
12:43:06 INFO - #18: content_process_main [ipc/contentproc/plugin-container.cpp:240]
12:43:06 INFO - #19: libc.so.6 + 0x2176d
12:43:06 INFO - #20: _start
Comment 26•9 years ago
|
||
Comment on attachment 8735503 [details] [diff] [review]
Release the window immediately in TabParent::Destroy() to avoid leaks. r=smaug
I have good news and bad news. The good news is that it indeed appears to fix the leak-the-world situation in the serviceworkers tests!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ae444245de83&group_state=expanded
The bad news is that it appears there's another cache-related leak lurking behind those:
https://treeherder.mozilla.org/logviewer.html#?job_id=18936889&repo=try
I ran another push with leak stack walking enabled, and the leaked CacheEntry has the following stack:
19:00:51 INFO - Serial Numbers of Leaked Objects:
19:00:51 INFO - 123 @1D2FB9D0 (1 references)
19:00:51 INFO - allocation stack:
19:01:06 INFO - #00: mozilla::net::CacheEntry::CacheEntry(nsACString_internal const &,nsIURI *,nsACString_internal const &,bool,bool,bool) [netwerk/cache2/CacheEntry.cpp:225]
19:01:06 INFO - #01: mozilla::net::CacheStorageService::AddStorageEntry(nsACString_internal const &,nsIURI *,nsACString_internal const &,bool,bool,bool,bool,mozilla::net::CacheEntryHandle * *) [netwerk/cache2/CacheStorageService.cpp:1493]
19:01:06 INFO - #02: mozilla::net::CacheStorageService::AddStorageEntry(mozilla::net::CacheStorage const *,nsIURI *,nsACString_internal const &,bool,mozilla::net::CacheEntryHandle * *) [netwerk/cache2/CacheStorageService.cpp:1411]
19:01:06 INFO - #03: mozilla::net::CacheStorage::OpenTruncate(nsIURI *,nsACString_internal const &,nsICacheEntry * *) [netwerk/cache2/CacheStorage.cpp:131]
19:01:06 INFO - #04: mozilla::net::nsHttpChannel::OpenCacheEntry(bool) [netwerk/protocol/http/nsHttpChannel.cpp:3110]
19:01:06 INFO - #05: mozilla::net::nsHttpChannel::Connect() [netwerk/protocol/http/nsHttpChannel.cpp:380]
19:01:06 INFO - #06: mozilla::net::nsHttpChannel::ContinueBeginConnectWithResult() [netwerk/protocol/http/nsHttpChannel.cpp:5560]
19:01:06 INFO - #07: mozilla::net::nsHttpChannel::BeginConnect() [netwerk/protocol/http/nsHttpChannel.cpp:5463]
19:01:06 INFO - #08: mozilla::net::nsHttpChannel::OnProxyAvailable(nsICancelable *,nsIChannel *,nsIProxyInfo *,nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:5630]
19:01:06 INFO - #09: nsAsyncResolveRequest::DoCallback() [netwerk/base/nsProtocolProxyService.cpp:253]
19:01:06 INFO - #10: nsAsyncResolveRequest::OnQueryComplete(nsresult,nsCString const &,nsCString const &) [netwerk/base/nsProtocolProxyService.cpp:214]
19:01:06 INFO - #11: ExecuteCallback::Run() [netwerk/base/nsPACMan.cpp:87]
19:01:06 INFO - #12: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:994]
19:01:06 INFO - #13: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/glue/nsThreadUtils.cpp:297]
19:01:06 INFO - #14: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:98]
19:01:06 INFO - #15: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:236]
19:01:06 INFO - #16: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:230]
19:01:06 INFO - #17: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:204]
19:01:06 INFO - #18: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:158]
19:01:06 INFO - #19: nsAppShell::Run() [widget/windows/nsAppShell.cpp:264]
19:01:06 INFO - #20: nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:282]
19:01:06 INFO - #21: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:4342]
19:01:06 INFO - #22: XREMain::XRE_main(int,char * * const,nsXREAppData const *) [toolkit/xre/nsAppRunner.cpp:4439]
19:01:06 INFO - #23: XRE_main [toolkit/xre/nsAppRunner.cpp:4545]
19:01:06 INFO - #24: do_main [browser/app/nsBrowserApp.cpp:220]
19:01:06 INFO - #25: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:360]
19:01:06 INFO - #26: wmain [toolkit/xre/nsWindowsWMain.cpp:138]
19:01:06 INFO - #27: __scrt_common_main_seh [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:264]
19:01:06 INFO - #28: kernel32.dll + 0x53c45
19:01:06 INFO - #29: ntdll.dll + 0x637f5
19:01:06 INFO - #30: ntdll.dll + 0x637c8
19:00:51 INFO - Leaked URLs:
19:00:51 INFO - http://mochi.test:8888/tests/dom/workers/test/serviceworkers/fetch/synthesized-redirect-twice-real-file.txt
19:00:51 INFO - http://mochi.test:8888/tests/dom/workers/test/serviceworkers/fetch/index.html
19:00:51 INFO - http://mochi.test:8888/tests/dom/workers/test/serviceworkers/fetch/fetch_worker_script.js
19:00:51 INFO - http://mochi.test:8888/tests/dom/workers/test/serviceworkers/fetch/fetch_worker_script.js
19:00:51 INFO - http://mochi.test:8888/tests/dom/workers/test/serviceworkers/synthesized-redirect-real-file.txt
19:00:51 INFO - http://mochi.test:8888/tests/dom/workers/test/serviceworkers/fetch/synthesized-redirect-twice-real-file.txt
19:00:51 INFO - http://mochi.test:8888/tests/dom/workers/test/serviceworkers/fetch/fetch_worker_script.js
19:00:51 INFO - http://mochi.test:8888/tests/dom/workers/test/serviceworkers/fetch/synthesized-redirect-twice-real-file.txt
Attachment #8735503 -
Flags: feedback?(ryanvm) → feedback+
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(bkelly)
Assignee | ||
Comment 27•9 years ago
|
||
Let's see if this fixes the issue from the previous try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0340b4eca8d
I'm also looking at the revealed leak as well. Its more interesting in that the leaked URLs are mainly from a single test case. That test case performs a number of redirects which bounces back and forth between parent and child. It would not surprise me if we had some corner case in there that did not close a cache entry properly.
Flags: needinfo?(bkelly)
Assignee | ||
Comment 28•9 years ago
|
||
Olli, this implements your suggestion from comment 21.
It also adjusts the error value returned if code tries to open a window after we clear the reference. We need to use NS_ERROR_ABORT in order to signal a hard failure back to the nsWindowWatcher. It checks for this value here:
https://dxr.mozilla.org/mozilla-central/source/embedding/components/windowwatcher/nsWindowWatcher.cpp#738
Without adjusting the error value it would continue on its way and later hit the mWindowCreator assertion at:
https://dxr.mozilla.org/mozilla-central/source/embedding/components/windowwatcher/nsWindowWatcher.cpp#792
Attachment #8735503 -
Attachment is obsolete: true
Attachment #8738143 -
Flags: review?(bugs)
Comment 29•9 years ago
|
||
It would be nice to backport your patch to at least Aurora, in case it is responsible for some of the mysterious high memory usage we're seeing with e10s enabled on the beta experiments. (We're pretty late in beta, so it probably isn't worth the risk to get it there.)
Assignee | ||
Comment 30•9 years ago
|
||
[Tracking Requested - why for this release]:
See comment 29.
tracking-firefox47:
--- → ?
Updated•9 years ago
|
Attachment #8738143 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 31•9 years ago
|
||
Attachment #8738219 -
Flags: review?(ryanvm)
Comment 32•9 years ago
|
||
Comment on attachment 8738219 [details] [diff] [review]
Remove blanket disable of serviceworker mochitests on windows debug e10s. r=ryanvm
Review of attachment 8738219 [details] [diff] [review]:
-----------------------------------------------------------------
This looks good. We can handle the individual subtests in the follow-up bug you're filing for the Necko leak.
Updated•9 years ago
|
Attachment #8738219 -
Flags: review?(ryanvm) → review+
Assignee | ||
Comment 33•9 years ago
|
||
The additional leak from comment 26 will be handled in bug 1262224.
Comment 34•9 years ago
|
||
Comment 35•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/281636a40854
https://hg.mozilla.org/mozilla-central/rev/5ea9c875d871
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Assignee | ||
Comment 36•9 years ago
|
||
Comment on attachment 8738143 [details] [diff] [review]
Release the window immediately in TabParent::Destroy() to avoid leaks. r=smaug
Approval Request Comment
[Feature/regressing bug #]: e10s
[User impact if declined]: Memory leaks in automation due to windows being leaked by tabs that are not shut down cleanly. Comment 29 suggests it might be useful to have this fix since the beta e10s experiments showed increased OOMs.
[Describe test coverage new/current, TreeHerder]: Existing mochitests and leak checks.
[Risks and why]: Minimal.
[String/UUID change made/needed]: None.
Attachment #8738143 -
Flags: approval-mozilla-aurora?
Updated•9 years ago
|
Comment on attachment 8738143 [details] [diff] [review]
Release the window immediately in TabParent::Destroy() to avoid leaks. r=smaug
Mem leak fix that might help reduce e10s OOMs, Aurora47+
Attachment #8738143 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 38•9 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•