Closed
Bug 1254029
Opened 9 years ago
Closed 9 years ago
Running e10s mochitest locally takes an extensive time to start
Categories
(Testing :: Mochitest, defect)
Tracking
(e10s+, firefox48 fixed)
RESOLVED
FIXED
mozilla48
People
(Reporter: xidorn, Assigned: sotaro)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 2 obsolete files)
(deleted),
text/x-log
|
Details | |
(deleted),
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
When I run e10s mochitest on my Windows, it takes a very long time (~1min) before the test starts. In most of the time, the browser window freezes.
Non-e10s mochitest is not affected, e10s reftest is fine as well. Not sure what's wrong there.
Reporter | ||
Comment 1•9 years ago
|
||
This is the log before the test actually started (the last line is the start of the test).
The browser window froze after line
> 1457335296294 Marionette TRACE conn1 <- [1,12,null,{"value":null}]
After that, the log was still going but very slow.
Reporter | ||
Comment 2•9 years ago
|
||
This makes debugging issues in e10s tests really annoying... Hopefully we can have this fixed soon.
tracking-e10s:
--- → ?
Updated•9 years ago
|
Blocks: e10s-tests
Reporter | ||
Comment 3•9 years ago
|
||
It seems if I remove both sanity-test.* and extensions.* from prefs.js, this would happen.
Comment 4•9 years ago
|
||
Is this on debug or opt builds? I have seen a freeze similar to your description on my debug builds, though I just blamed it on "debug being debug", though that may have been naive. I have not seen any similar slow-down in my opt builds.
Flags: needinfo?(quanxunzhen)
Reporter | ||
Comment 5•9 years ago
|
||
I have no idea. My local build is debug build (--enable-debug with --enable-optimize). But it seems I can reproduce this issue on neither opt build nor debug build from the server :/
I think a freeze up to ~1min is far too long even for debug build.
Flags: needinfo?(quanxunzhen)
Reporter | ||
Updated•9 years ago
|
Summary: Running e10s mochitest takes an extensive time to start → Running e10s mochitest locally takes an extensive time to start
Reporter | ||
Comment 6•9 years ago
|
||
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #3)
> It seems if I remove both sanity-test.* and extensions.* from prefs.js, this
> would happen.
I meant, if remove those prefs from the test profile, and execute |mach run|, the freeze also happens.
Reporter | ||
Comment 7•9 years ago
|
||
It seems this gets stuck at this loop: https://dxr.mozilla.org/mozilla-central/rev/55d557f4d73ee58664bdf2fa85aaab555224722e/ipc/glue/GeckoChildProcessHost.cpp#439-441
And when the loop finishes, it reports several pipe error:
> [Parent 764] WARNING: pipe error: 109: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 333
> [Parent 764] WARNING: pipe error: 232: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 446
> [Parent 764] WARNING: pipe error: 232: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 446
> [Parent 764] WARNING: pipe error: 109: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 333
> [Parent 764] WARNING: pipe error: 109: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 333
Reporter | ||
Comment 8•9 years ago
|
||
Actually it gets stuck only at the lock.Wait() rather than the whole loop. And it seems to me this happens only if there is another run of LaunchAdnWaitForProcessHandle before.
Reporter | ||
Comment 9•9 years ago
|
||
It seems that is waiting for some notification from the IOThread, and the IOThread is blocked on ProcessWatcher::EnsureProcessTerminated for ~GeckoChildProcessHost().
Reporter | ||
Comment 10•9 years ago
|
||
ProcessWatcher::EnsureProcessTerminated waits for the child process to terminate, and the child process is blocked on MediaShutdown on this loop: https://dxr.mozilla.org/mozilla-central/rev/55d557f4d73ee58664bdf2fa85aaab555224722e/dom/media/MediaShutdownManager.cpp#142-144
Reporter | ||
Comment 11•9 years ago
|
||
And the MediaShutdown seems to be blocked on this stack:
> xul.dll!mozilla::layers::AsyncTransactionWaiter::WaitComplete() Line 29 C++
> xul.dll!mozilla::layers::ImageBridgeChild::FlushAllImages(mozilla::layers::ImageClient * aClient, mozilla::layers::ImageContainer * aContainer) Line 675 C++
> xul.dll!mozilla::layers::ImageContainer::ClearAllImages() Line 283 C++
> xul.dll!mozilla::VideoFrameContainer::ClearCurrentFrame() Line 96 C++
> xul.dll!mozilla::MediaFormatReader::ReleaseMediaResources() Line 1626 C++
> xul.dll!mozilla::MediaDecoderReader::Shutdown() Line 371 C++
> xul.dll!mozilla::MediaFormatReader::Shutdown() Line 142 C++
> xul.dll!mozilla::detail::MethodCallInvokeHelper<RefPtr<mozilla::MozPromise<bool,bool,0> >,mozilla::MediaDecoderReader>(RefPtr<mozilla::MozPromise<bool,bool,0> >(mozilla::MediaDecoderReader::*)() aMethod, mozilla::MediaDecoderReader * aThisVal, mozilla::Tuple<> & aArgs, mozilla::IndexSequence<> __formal) Line 917 C++
AsyncTransactionWaiter waits for a complete 50s. According to the stack, it seems the waiter is waiting for FlushAllImagesSync to finish. However, if I understand correctly, that function is never executed, and thus the waiter waits it for the whole timeout and finally abandons.
Reporter | ||
Comment 12•9 years ago
|
||
Anthony, could you have a look at this issue?
This happens reliably for me on my Windows 10 machine with debug build and e10s. The simplest steps to reproduce are:
1. remove $OBJDIR\tmp\scratch_user (the profile directory)
2. mach run
Then a small window appears and disappears, and then the main window freezes there for ~1min.
It seems to me it only happens if the sanity test window appears before the main browser window. According to the information I got above, I believe it is because the chrome process waits for the content process of sanity test to finish. And that content process is blocked on waiting for media shutdown.
It seems to me builds from servers do not run sanity test, and thus this issue is not triggered. I guess I can disable the sanity test to workaround this.
Flags: needinfo?(ajones)
Disabling the sanity test during testing seems reasonable to me.
Flags: needinfo?(ajones)
Reporter | ||
Comment 14•9 years ago
|
||
But it is still an issue that the window freeze ~1min every time when I run a debug build with fresh profile.
Why does the MediaShutdown take such a long time (and AsyncTransactionWaiter actually can wait forever if there is no timeout)? Is that expected behavior?
Flags: needinfo?(ajones)
Comment 15•9 years ago
|
||
MediaShutdown waits until all media related objects have shut down, then gfx is shutdown. If not for the wait, and as all media objects are typically asynchronous and running on secondary threads you would find gfx being shutdown and media still attempting to use it causing nasty crashes.
Why gfx is hanging waiting on something apparently non useful is something gfx related, and I'd say what needs fixing.
Flags: needinfo?(nical.bugzilla)
Comment 16•9 years ago
|
||
AsyncTransactionWaiter lets us wait for something specific to happen on the compositor side like the creation and delivery to the content side of Fence objects. It looks like the compositor should have sent a message specific to the content side but didn't, and it indeed very much needs fixing.
Passing the needinfo to Sotaro who knows best the ins and outs of AsyncTransactionWaiter (And is in the process of refactoring it pretty heavily).
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(ajones)
Assignee | ||
Comment 17•9 years ago
|
||
ImageBridgeChild::FlushAllImages() waits flush completion synchronously. It is originally implemented for b2g. It could be remove except b2g gonk.
Assignee: nobody → sotaro.ikeda.g
Assignee | ||
Comment 18•9 years ago
|
||
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Updated•9 years ago
|
Attachment #8739821 -
Flags: review?(nical.bugzilla)
Comment 19•9 years ago
|
||
Comment on attachment 8739821 [details] [diff] [review]
patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk
Review of attachment 8739821 [details] [diff] [review]:
-----------------------------------------------------------------
We indeed don't seem to need this on non-gralloc platforms AFAIK, but it'd be very useful to know why we are missing that notification from the compositor to the content process.
Attachment #8739821 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 20•9 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #19)
> Comment on attachment 8739821 [details] [diff] [review]
> We indeed don't seem to need this on non-gralloc platforms AFAIK, but it'd
> be very useful to know why we are missing that notification from the
> compositor to the content process.
I tried with my laptop but I did not see the above. Nor "Waiting async transaction complete" log. I suspect that debug build with E10s is just very heavy weight an slow. Anyway, the patch remove one bottleneck.
Assignee | ||
Comment 21•9 years ago
|
||
Assignee | ||
Comment 22•9 years ago
|
||
attachment 8739821 [details] [diff] [review] removes all wait. But ImageBridgeChild::FlushAllImages() needs to wait FlushAllImagesSync() call to ensure ImageContainer alive.
Assignee | ||
Comment 23•9 years ago
|
||
Attachment #8739821 -
Attachment is obsolete: true
Assignee | ||
Comment 24•9 years ago
|
||
Hmm, attachment 8740320 [details] [diff] [review] does not work well when e10s is disabled.
Assignee | ||
Comment 25•9 years ago
|
||
Addressed the problem.
Attachment #8740320 -
Attachment is obsolete: true
Assignee | ||
Comment 26•9 years ago
|
||
Comment on attachment 8740356 [details] [diff] [review]
patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk
:nical, can you review the patch again? It was changed since the review.
Attachment #8740356 -
Flags: review?(nical.bugzilla)
Updated•9 years ago
|
Attachment #8740356 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 27•9 years ago
|
||
Reporter | ||
Comment 28•9 years ago
|
||
I can confirm the patch does fix this issue. Thanks!
Assignee | ||
Comment 29•9 years ago
|
||
Thanks for the confirmation.
Comment 30•9 years ago
|
||
Comment 31•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in
before you can comment on or make changes to this bug.
Description
•