Closed Bug 1254029 Opened 9 years ago Closed 9 years ago

Running e10s mochitest locally takes an extensive time to start

Tracking

(e10s+, firefox48 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla48

Tracking Flags:

Tracking

Status

e10s

---

firefox48

---

fixed

People

(Reporter: xidorn, Assigned: sotaro)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

log 9 years ago Xidorn Quan [:xidorn] UTC+11 (deleted), text/x-log		Details
patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk 9 years ago Sotaro Ikeda [:sotaro] (deleted), patch	nical : review+	Details \| Diff \| Splinter Review
patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk 9 years ago Sotaro Ikeda [:sotaro] (deleted), patch		Details \| Diff \| Splinter Review
patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk 9 years ago Sotaro Ikeda [:sotaro] (deleted), patch	nical : review+	Details \| Diff \| Splinter Review

Xidorn Quan [:xidorn] UTC+11

Reporter

Description

•

9 years ago

When I run e10s mochitest on my Windows, it takes a very long time (~1min) before the test starts. In most of the time, the browser window freezes. Non-e10s mochitest is not affected, e10s reftest is fine as well. Not sure what's wrong there.

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 1

•

9 years ago

Attached file log (deleted) — Details

This is the log before the test actually started (the last line is the start of the test). The browser window froze after line > 1457335296294 Marionette TRACE conn1 <- [1,12,null,{"value":null}] After that, the log was still going but very slow.

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 2

•

9 years ago

This makes debugging issues in e10s tests really annoying... Hopefully we can have this fixed soon.

tracking-e10s: --- → ?

Tracy Walker [:tracy]

Updated

•

9 years ago

Blocks: e10s-tests

tracking-e10s: ? → +

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 3

•

9 years ago

It seems if I remove both sanity-test.* and extensions.* from prefs.js, this would happen.

Jared Wein [:jaws] (please needinfo? me)

Comment 4

•

9 years ago

Is this on debug or opt builds? I have seen a freeze similar to your description on my debug builds, though I just blamed it on "debug being debug", though that may have been naive. I have not seen any similar slow-down in my opt builds.

Flags: needinfo?(quanxunzhen)

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 5

•

9 years ago

I have no idea. My local build is debug build (--enable-debug with --enable-optimize). But it seems I can reproduce this issue on neither opt build nor debug build from the server :/ I think a freeze up to ~1min is far too long even for debug build.

Flags: needinfo?(quanxunzhen)

Xidorn Quan [:xidorn] UTC+11

Reporter

Updated

•

9 years ago

Summary: Running e10s mochitest takes an extensive time to start → Running e10s mochitest locally takes an extensive time to start

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 6

•

9 years ago

(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #3) > It seems if I remove both sanity-test.* and extensions.* from prefs.js, this > would happen. I meant, if remove those prefs from the test profile, and execute |mach run|, the freeze also happens.

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 7

•

9 years ago

It seems this gets stuck at this loop: https://dxr.mozilla.org/mozilla-central/rev/55d557f4d73ee58664bdf2fa85aaab555224722e/ipc/glue/GeckoChildProcessHost.cpp#439-441 And when the loop finishes, it reports several pipe error: > [Parent 764] WARNING: pipe error: 109: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 333 > [Parent 764] WARNING: pipe error: 232: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 446 > [Parent 764] WARNING: pipe error: 232: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 446 > [Parent 764] WARNING: pipe error: 109: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 333 > [Parent 764] WARNING: pipe error: 109: file central/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 333

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 8

•

9 years ago

Actually it gets stuck only at the lock.Wait() rather than the whole loop. And it seems to me this happens only if there is another run of LaunchAdnWaitForProcessHandle before.

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 9

•

9 years ago

It seems that is waiting for some notification from the IOThread, and the IOThread is blocked on ProcessWatcher::EnsureProcessTerminated for ~GeckoChildProcessHost().

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 10

•

9 years ago

ProcessWatcher::EnsureProcessTerminated waits for the child process to terminate, and the child process is blocked on MediaShutdown on this loop: https://dxr.mozilla.org/mozilla-central/rev/55d557f4d73ee58664bdf2fa85aaab555224722e/dom/media/MediaShutdownManager.cpp#142-144

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 11

•

9 years ago

And the MediaShutdown seems to be blocked on this stack: > xul.dll!mozilla::layers::AsyncTransactionWaiter::WaitComplete() Line 29 C++ > xul.dll!mozilla::layers::ImageBridgeChild::FlushAllImages(mozilla::layers::ImageClient * aClient, mozilla::layers::ImageContainer * aContainer) Line 675 C++ > xul.dll!mozilla::layers::ImageContainer::ClearAllImages() Line 283 C++ > xul.dll!mozilla::VideoFrameContainer::ClearCurrentFrame() Line 96 C++ > xul.dll!mozilla::MediaFormatReader::ReleaseMediaResources() Line 1626 C++ > xul.dll!mozilla::MediaDecoderReader::Shutdown() Line 371 C++ > xul.dll!mozilla::MediaFormatReader::Shutdown() Line 142 C++ > xul.dll!mozilla::detail::MethodCallInvokeHelper<RefPtr<mozilla::MozPromise<bool,bool,0> >,mozilla::MediaDecoderReader>(RefPtr<mozilla::MozPromise<bool,bool,0> >(mozilla::MediaDecoderReader::*)() aMethod, mozilla::MediaDecoderReader * aThisVal, mozilla::Tuple<> & aArgs, mozilla::IndexSequence<> __formal) Line 917 C++ AsyncTransactionWaiter waits for a complete 50s. According to the stack, it seems the waiter is waiting for FlushAllImagesSync to finish. However, if I understand correctly, that function is never executed, and thus the waiter waits it for the whole timeout and finally abandons.

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 12

•

9 years ago

Anthony, could you have a look at this issue? This happens reliably for me on my Windows 10 machine with debug build and e10s. The simplest steps to reproduce are: 1. remove $OBJDIR\tmp\scratch_user (the profile directory) 2. mach run Then a small window appears and disappears, and then the main window freezes there for ~1min. It seems to me it only happens if the sanity test window appears before the main browser window. According to the information I got above, I believe it is because the chrome process waits for the content process of sanity test to finish. And that content process is blocked on waiting for media shutdown. It seems to me builds from servers do not run sanity test, and thus this issue is not triggered. I guess I can disable the sanity test to workaround this.

Flags: needinfo?(ajones)

Xidorn Quan [:xidorn] UTC+11

Reporter

Updated

•

9 years ago

Blocks: 1243083

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Comment 13

•

9 years ago

Disabling the sanity test during testing seems reasonable to me.

Flags: needinfo?(ajones)

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 14

•

9 years ago

But it is still an issue that the window freeze ~1min every time when I run a debug build with fresh profile. Why does the MediaShutdown take such a long time (and AsyncTransactionWaiter actually can wait forever if there is no timeout)? Is that expected behavior?

Flags: needinfo?(ajones)

Jean-Yves Avenard [:jya]

Comment 15

•

9 years ago

MediaShutdown waits until all media related objects have shut down, then gfx is shutdown. If not for the wait, and as all media objects are typically asynchronous and running on secondary threads you would find gfx being shutdown and media still attempting to use it causing nasty crashes. Why gfx is hanging waiting on something apparently non useful is something gfx related, and I'd say what needs fixing.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Comment 16

•

9 years ago

AsyncTransactionWaiter lets us wait for something specific to happen on the compositor side like the creation and delivery to the content side of Fence objects. It looks like the compositor should have sent a message specific to the content side but didn't, and it indeed very much needs fixing. Passing the needinfo to Sotaro who knows best the ins and outs of AsyncTransactionWaiter (And is in the process of refactoring it pretty heavily).

Flags: needinfo?(sotaro.ikeda.g)

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(ajones)

Sotaro Ikeda [:sotaro]

Assignee

Comment 17

•

9 years ago

ImageBridgeChild::FlushAllImages() waits flush completion synchronously. It is originally implemented for b2g. It could be remove except b2g gonk.

Assignee: nobody → sotaro.ikeda.g

Sotaro Ikeda [:sotaro]

Assignee

Comment 18

•

9 years ago

Attached patch patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk (obsolete) (deleted) — Details — Splinter Review

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Assignee

Updated

•

9 years ago

Attachment #8739821 - Flags: review?(nical.bugzilla)

Nicolas Silva [:nical]

Comment 19

•

9 years ago

Comment on attachment 8739821 [details] [diff] [review] patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk Review of attachment 8739821 [details] [diff] [review]: ----------------------------------------------------------------- We indeed don't seem to need this on non-gralloc platforms AFAIK, but it'd be very useful to know why we are missing that notification from the compositor to the content process.

Attachment #8739821 - Flags: review?(nical.bugzilla) → review+

Sotaro Ikeda [:sotaro]

Assignee

Updated

•

9 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1244883

Sotaro Ikeda [:sotaro]

Assignee

Comment 20

•

9 years ago

(In reply to Nicolas Silva [:nical] from comment #19) > Comment on attachment 8739821 [details] [diff] [review] > We indeed don't seem to need this on non-gralloc platforms AFAIK, but it'd > be very useful to know why we are missing that notification from the > compositor to the content process. I tried with my laptop but I did not see the above. Nor "Waiting async transaction complete" log. I suspect that debug build with E10s is just very heavy weight an slow. Anyway, the patch remove one bottleneck.

Sotaro Ikeda [:sotaro]

Assignee

Comment 21

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=00a655e3cc78

Sotaro Ikeda [:sotaro]

Assignee

Comment 22

•

9 years ago

attachment 8739821 [details] [diff] [review] removes all wait. But ImageBridgeChild::FlushAllImages() needs to wait FlushAllImagesSync() call to ensure ImageContainer alive.

Sotaro Ikeda [:sotaro]

Assignee

Comment 23

•

9 years ago

Attached patch patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk (obsolete) (deleted) — Details — Splinter Review

Attachment #8739821 - Attachment is obsolete: true

Sotaro Ikeda [:sotaro]

Assignee

Comment 24

•

9 years ago

Hmm, attachment 8740320 [details] [diff] [review] does not work well when e10s is disabled.

Sotaro Ikeda [:sotaro]

Assignee

Comment 25

•

9 years ago

Attached patch patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk (deleted) — Details — Splinter Review

Addressed the problem.

Attachment #8740320 - Attachment is obsolete: true

Sotaro Ikeda [:sotaro]

Assignee

Comment 26

•

9 years ago

Comment on attachment 8740356 [details] [diff] [review] patch - Do not wait in ImageBridgeChild::FlushAllImages() except gonk :nical, can you review the patch again? It was changed since the review.

Attachment #8740356 - Flags: review?(nical.bugzilla)

Nicolas Silva [:nical]

Updated

•

9 years ago

Attachment #8740356 - Flags: review?(nical.bugzilla) → review+

Sotaro Ikeda [:sotaro]

Assignee

Comment 27

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a6bc4a136239

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 28

•

9 years ago

I can confirm the patch does fix this issue. Thanks!

Sotaro Ikeda [:sotaro]

Assignee

Comment 29

•

9 years ago

Thanks for the confirmation.

Pulsebot

Comment 30

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/0dcdf655c75e

Carsten Book [:Tomcat]

Comment 31

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0dcdf655c75e

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox48: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla48

You need to log in before you can comment on or make changes to this bug.