Closed Bug 797263 Opened 12 years ago Closed 12 years ago

Intermittent browser_586068-reload.js | Test timed out

Categories

(Firefox :: Session Restore, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 21
Tracking Status
b2g18 --- disabled
b2g18-v1.0.1 --- disabled

People

(Reporter: philor, Assigned: ttaubert)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=15762886&tree=Mozilla-Inbound Rev3 WINNT 6.1 mozilla-inbound debug test mochitest-other on 2012-10-02 19:30:54 PDT for push acf91f25f228 slave: talos-r3-w7-005 TEST-PASS | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_586068-reload.js | load 6 - correct tab was reloaded WARNING: NS_ENSURE_TRUE(mMutable) failed: file e:/builds/moz2_slave/m-in-w32-dbg/build/netwerk/base/src/nsSimpleURI.cpp, line 273 WARNING: NS_ENSURE_TRUE(mMutable) failed: file e:/builds/moz2_slave/m-in-w32-dbg/build/netwerk/base/src/nsSimpleURI.cpp, line 273 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 WARNING: No outer window available!: file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/base/nsGlobalWindow.cpp, line 9197 ++DOMWINDOW == 50 (0E56D518) [serial = 4247] [outer = 1EFB6D50] WARNING: NS_ENSURE_TRUE(mMutable) failed: file e:/builds/moz2_slave/m-in-w32-dbg/build/netwerk/base/src/nsSimpleURI.cpp, line 273 TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_586068-reload.js | Test timed out (screenshot of a window claiming to be about:blank (and "Connecting...") but showing the index of the root of the mochitest server) INFO TEST-END | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_586068-reload.js | finished in 31408ms
https://tbpl.mozilla.org/php/getParsedLog.php?id=15737409&tree=Fx-Team Rev3 WINNT 6.1 fx-team debug test mochitest-other on 2012-10-02 02:28:07 PDT for push 85dd8e346102 slave: talos-r3-w7-099 TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_586068-reload.js | Test timed out
Now one of our top (non-linux weirdness) oranges. Test was happily running for a week (after landing in bug 789102), before the first occurrence here: https://tbpl.mozilla.org/?rev=85dd8e346102&tree=Fx-Team No idea what regressed it...
(In reply to Ed Morley [:edmorley UTC+1] from comment #110) > Now one of our top (non-linux weirdness) oranges. > > Test was happily running for a week (after landing in bug 789102), before > the first occurrence here: > https://tbpl.mozilla.org/?rev=85dd8e346102&tree=Fx-Team > > No idea what regressed it... Believe the cause to have been bug 731974; we'd actually been permaorange with this on WinXP debug for 3 days, but due to coalescing it hadn't been overly obvious :-( Anyway, bug 731974 is now backed out, so hoping this will go away on all platforms.
Has not gone away... I'll take a look at the test.
Going to assert that bug 731974 was not the cause, since there are plenty of examples even after the backout. I'm going to reland that (with the crash fix). I did some digging into this, however. The "testCascade" part is what's failing. Two randomly-chosen failures and a success look like: fail-1: ... TEST-PASS | browser_586068-reload.js | load 4 - correct tab was reloaded TEST-PASS | browser_586068-reload.js | load 5 - correct tab was reloaded TEST-UNEXPECTED-FAIL | browser_586068-reload.js | Test timed out fail-2: ... TEST-PASS | browser_586068-reload.js | load 5 - correct tab was reloaded TEST-PASS | browser_586068-reload.js | load 6 - correct tab was reloaded TEST-UNEXPECTED-FAIL | browser_586068-reload.js | Test timed out success: ... TEST-PASS | browser_586068-reload.js | load 3 - correct tab was reloaded TEST-PASS | browser_586068-reload.js | load 4 - correct tab was reloaded TEST-PASS | browser_586068-reload.js | load 5 - correct tab was reloaded TEST-PASS | browser_586068-reload.js | load 6 - correct tab was reloaded TEST-INFO | browser_586068-reload.js | running test_finish INFO TEST-END | browser_586068-reload.js | finished in 30620ms Note that in fail-1, "load 6" never happened. In fail-2, "load 6" did, but we never executed the test_finish callback! In between those two is a waitForBrowserState() of the state at the start of the test. In the attached screenshots after the two failing states, in fail-1, there are 6 tabs visible, but tab 6 is still reloading (throbber is throbbing). In fail-2, there's only one tab visible, the URL is about:blank, and the throbber is throbbing. I don't know what the saved state looks like, but in "load 6", the reload is of example.com/#6. I know we redirect example.com to our own local http server, but could there be load issues there that are causing the load to never finish, either on the 6th tab, or on resetting things back to the state before the test started? This test is heavily triggered off those state changes, so if the loads never finish, that would cause the test to time out.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #213) > Going to assert that bug 731974 was not the cause, since there are plenty of > examples even after the backout. I'm going to reland that (with the crash > fix). Please do not reland yet. Whilst this still occurs with that bug backed out, the frequency is *much* less: http://brasstacks.mozilla.com/orangefactor/?display=Bug&tree=trunk&startday=2012-10-04&endday=2012-10-11&bugid=797263 (or just look at the bug comment frequency)
(And thank you for looking at this :-))
Erf. Too late, relanded already :/ I guess let's see what happens? I'll keep looking...
Perhaps if we do a bunch of retriggers on m-oth (particularly winxp debug) we can tell for sure :-)
Yeah, I'll watch it throughout the night, probably kicking off some retriggers if needed. Having the frequency go up might actually be a good thing, if it increases the chances of someone reproducing it locally. I really wish we could have replay debugging for this stuff on the tinderboxes :(
It actually fails more than you see here - while you were out of the tree, we weren't having the https://tbpl.mozilla.org/php/getParsedLog.php?id=16036887&tree=Mozilla-Inbound pattern of failing several hundred more tests after the timeout in this test, but when we do have it, tbplbot fails at commenting in the bug.
Regressed again, will back out bug 731974 once inbound's .hg store is fixed.
I don't really know what to do about this test -- it's clearly unstable (with or without my patch) and a way that's hard to debug. Based on the screenshots, my theory is that it's network related.
Whiteboard: [orange]
This has suddenly been cropping up with alarming frequency on b2g18 since the 14th.
I'll have a look into that test next week. Feel free to disable it on b2g18 since it's not relevant for B2G anyway.
Attached patch start making the test suite async (obsolete) (deleted) — — Splinter Review
I couldn't get the exact 'intermittent orange signature' on my machine but I realized that some (most?) of the sessionstore test don't run in single mode anymore because reading the session store file now seems a lot more async and we should wait for SR to be initialized before running the test suite. I added a TestRunner that would allow us to migrate all existing tests to use the task approach in combination with yield, etc. instead of all this callback mess. That should pave the way for async tests as we rewrite SR to be more async. Note: the |function test () { ...| definition should be moved to head.js when all tests are migrated but for now there's now other way to keep the old tests with the new TestRunner ones.
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Attachment #704632 - Flags: review?(dteller)
David, if you still want to give feedback on the general approach for the test suite, please do. The patch unfortunately doesn't fix the specific problem here: https://tbpl.mozilla.org/?tree=Try&rev=2757d095ffd7
Fun, looks like the (mostly Windows) debug machines are just too slow to run the test in the default 30s time frame. Requesting a longer timeout should fix this. Pushed to try.
Yep, that's it. > browser/components/sessionstore/test/browser_586068-reload.js | finished in 37157ms requestLongerTimeout() to the rescue!
Attachment #704632 - Attachment is obsolete: true
Attachment #704632 - Flags: review?(dteller)
Attachment #704818 - Flags: review?(dteller)
Comment on attachment 704818 [details] [diff] [review] request a longer timeout and start making the test suite async Review of attachment 704818 [details] [diff] [review]: ----------------------------------------------------------------- At a quick glance, the patch looks good. I have not had time to review this properly yet. I will try and do so tomorrow.
Comment on attachment 704818 [details] [diff] [review] request a longer timeout and start making the test suite async Review of attachment 704818 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/components/sessionstore/src/SessionStore.jsm @@ +110,5 @@ > } > > this.SessionStore = { > + get promiseInitialized() { > + return SessionStoreInternal.promiseInitialized; I'd prefer if we returned |SessionStoreInternal.promiseInitialized.promise|. ::: browser/components/sessionstore/test/browser_586068-reload.js @@ +10,3 @@ > > +function runTests() { > + requestLongerTimeout(2); Nit: Could you document why a longer timeout is needed? ::: browser/components/sessionstore/test/head.js @@ +292,5 @@ > + > +/** > + * The test runner that controls the execution flow of our tests. > + */ > +let TestRunner = { Nit: Could you add a field |_iter: null|, just for the sake of documentation? @@ +293,5 @@ > +/** > + * The test runner that controls the execution flow of our tests. > + */ > +let TestRunner = { > + backupState: {}, Not sure what that backup state is for. Could you document this?
Attachment #704818 - Flags: review?(dteller) → review+
Made all requested changes. Thanks for the review! https://hg.mozilla.org/integration/fx-team/rev/3d578ca8e4b3
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 21
Since this fix was more than test-only, I've disabled this test on Windows builds on the b2g18 branches due high failure rate. If you feel that this test is critical enough on those branches to need re-enabling, feel free to request branch approval for the fix here. https://hg.mozilla.org/releases/mozilla-b2g18/rev/7d5debc1c9c4 https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/79289b90cf7f
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: