Closed Bug 1335906 Opened 8 years ago Closed 7 years ago

Investigate hang when running jetpack mochitests

Categories

(Core :: Graphics: WebRender, defect, P3)

Other Branch
x86_64
Linux
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: kats, Unassigned)

References

Details

(Whiteboard: [gfx-noted])

When I try to enable the jetpack mochitests on QR builds in automation they seem to often hang. In at least one instance the last two lines before the hang were: [task 2017-01-30T15:22:08.776040Z] 15:22:08 INFO - TEST-INFO | executing 'jetpack-package/addon-sdk/source/test/test-tabs.js.testTabContentTypeAndReload' [task 2017-01-30T15:22:11.228494Z] 15:22:11 INFO - OpenGL version new 3.3 (Core Profile) Mesa 11.2.0 Annoyingly the logs seem to disappear after the failure. I was able to reproduce a similar-looking hang locally by running the following on my Linux64 QR (opt) build: xr ./mach mochitest --disable-e10s -f jpp --keep-open=false addon-sdk/source/test/test-tabs.js (where "xr" is just a wrapper that uses xvfb-run [1]). The hang occurs after this output: TEST-START | jetpack-package/addon-sdk/source/test/test-tabs.js.test tabs ready and close after window.open OpenGL version new 3.3 (Core Profile) Mesa 11.2.0 OpenGL version new 3.3 (Core Profile) Mesa 11.2.0 I tried attaching gdb at this point but the threads all seem to be just waiting for events. Of note is that there were 6 RenderBackend threads, each with 5 GlyphCache threads, which adds up to a lot of threads. gw filed [2] for consolidating the threadpools across RenderBackend threads so that we don't spawn so many threads. However that may not be related to the actual hang in this case. It still needs more investigation. [1] https://github.com/staktrace/moz-scripts/blob/master/xr [2] https://github.com/servo/webrender/issues/819
Just putting a braindump here before I forget: I spent yesterday trying to chase down this problem. The test seems to hang at [1] waiting for the 'ready' event. That 'ready' event in theory is triggered by the DOMContentLoaded event dispatched by gecko. I added some logging to that event and ran it both with and without webrender. In both cases, the sequence of DOMContentLoaded events was identical. So I then tried to find the code where the DOMContentLoaded gets mapped to the 'ready' event, but was unsuccessful in that quest. I asked on #jetpack as well and while I got some help, I wasn't able to nail down exactly where this was happening. It's complicated by the fact that this is JS code which I can't easily debug and while printf-debugging seems to work I'm not 100% sure all the "dump" output I added is showing up. [1] http://searchfox.org/mozilla-central/rev/b1aadb3572eaf7d2c70e19a2ba5413809d9ac698/addon-sdk/source/test/tabs/test-firefox-tabs.js#1233
I spent some more time yesterday tracking this down. The most interesting finding was that I can reproduce the hang even without WR enabled, if I add some dumps/busyloops into the content script at [1]. I suspect there is an inherent race condition in this test, and it gets hit because opening windows with WR is slower than with gecko. I'm trying to dig into what the race condition is exactly. [1] http://searchfox.org/mozilla-central/rev/b1aadb3572eaf7d2c70e19a2ba5413809d9ac698/addon-sdk/source/test/tabs/test-firefox-tabs.js#1228
Narrowed it down further to [1]. In the failure case the "domWindow.gBrowserInit.delayedStartupFinished" clause is false so the event is dropped, stalling the test. [1] http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/addon-sdk/source/lib/sdk/tabs/tab-firefox.js#314
So I talked this over with :mconley, :zombie, and :Mossop. It seems like the use of "delayedStartupFinished" in the addon-sdk is not quite correct, but works well enough for gecko at the moment. It doesn't work for QR, and so it would be good to fix it eventually. There were suggestions to taking the logic from BrowserTestUtils functions ([1] or [2]) and using those in the addon-sdk instead to get the desired behaviour. However doing this is non-trivial partly because the code is not well maintained and there aren't many people who know what's going on there. :billm and :mconley suggested that we deprioritize getting the jetpack mochitests running for now, because of this, which sounds fine to me. I'll leave this bug open and go work on getting other test suites running on QR builds. Maybe in the future QR builds will be faster at starting up new windows and so this issue won't affect us any more. *fingers crossed* [1] http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/testing/mochitest/BrowserTestUtils/BrowserTestUtils.jsm#346 [2] http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/testing/mochitest/BrowserTestUtils/BrowserTestUtils.jsm#185
Don't care about jetpack.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.