Closed
Bug 505718
Opened 15 years ago
Closed 15 years ago
mochitest-plain timeout (hanging?) during shutdown on Windows
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.9.3a1
Tracking | Status | |
---|---|---|
status1.9.2 | --- | beta1-fixed |
status1.9.1 | --- | unaffected |
People
(Reporter: dbaron, Assigned: bent.mozilla)
References
Details
(Keywords: intermittent-failure)
Attachments
(4 files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
mrbkap
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
bent.mozilla
:
review+
bent.mozilla
:
superreview+
ted
:
approval1.9.2+
|
Details | Diff | Splinter Review |
In the last few mochitest-plain runs, there's been a form of random orange that I haven't seen recently. During the shutdown of the run, after printing:
92847 INFO Passed: 86162
92848 INFO Failed: 1
92849 INFO Todo: 1272
92850 INFO SimpleTest FINISHED
instead of recording the shutdown properly, we get:
command timed out: 1200 seconds without output
program finished with exit code 1
This makes the run orange.
This occurred on:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1248256012.1248265493.6079.gz
WINNT 5.2 mozilla-central unit test on 2009/07/22 02:46:52
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1248259996.1248266626.19205.gz
WINNT 5.2 mozilla-central unit test on 2009/07/22 03:53:16
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1248263042.1248266049.12600.gz
WINNT 5.2 mozilla-central test mochitests on 2009/07/22 04:44:02
(The timeout is 300 seconds on the "unit test" boxes on Firefox and 1200 seconds on the "mochitests" boxes on Firefox-Unittest.)
Reporter | ||
Comment 1•15 years ago
|
||
(It's possible this is related to bug 497053, since all three occurred in builds that also had that, although there have been many many more builds with that in the past 12 hours without this.)
Reporter | ||
Updated•15 years ago
|
Whiteboard: [orange]
Reporter | ||
Comment 2•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1248262168.1248269456.18193.gz
WINNT 5.2 mozilla-central unit test on 2009/07/22 04:29:28
(also showed bug 497053, but the last cycle that should)
Reporter | ||
Comment 3•15 years ago
|
||
Also happened in:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1248269038.1248271997.14905.gz
WINNT 5.2 mozilla-central test mochitests on 2009/07/22 06:23:58
which is post-bug 497053 (so did not show it), but did show bug 505217.
Reporter | ||
Comment 4•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1248336876.1248347373.12002.gz
WINNT 5.2 mozilla-central unit test on 2009/07/23 01:14:36
... happened here in a log with no other test failures.
Reporter | ||
Comment 5•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1248353844.1248361667.16751.gz
WINNT 5.2 mozilla-central unit test on 2009/07/23 05:57:24
Reporter | ||
Comment 6•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1248363340.1248366155.4215.gz
WINNT 5.2 mozilla-central test mochitests on 2009/07/23 08:35:40
Reporter | ||
Comment 7•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1248359994.1248368762.1702.gz
WINNT 5.2 mozilla-central unit test on 2009/07/23 07:39:54
Comment 8•15 years ago
|
||
WINNT 5.2 mozilla-central unit test on 2009/07/23
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1248385103.1248391746.5955.gz&fulltext=1
Comment 9•15 years ago
|
||
Looking at the buildbot logs we have going back to July 18, the problem is
* is win32 only
* has occurred 21 times in mozilla-central unittest builds
* first example is Mon Jul 20 22:01:00 2009 PDT (rev 682906e2a48b)
* happens in roughly one third of the builds
* has occurred 20 times in mozilla-central packaged mochitest runs
* first example is Tue Jul 21 04:07:41 2009 PDT (rev e50cbaef1d8f)
* about 1/3 of builds again
* occurred once on tracemonkey, at Thu Jul 23 14:24:16 2009 (rev 9cf9a10f7e49, looks like a merge from mozilla-central), only 5 builds since then so the stats aren't super strong
Which points to the first few checkins at the top of
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d6895cb5ac3b&tochange=682906e2a48b
Comment 10•15 years ago
|
||
Fwiw, the symptom reminds me of bug 471085;
but I don't know if there is a "video" regression here.
Comment 12•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1248687427.1248690417.10991.gz
WINNT 5.2 mozilla-central test mochitests on 2009/07/27 02:37:07
Comment 13•15 years ago
|
||
This has happened a bunch this morning already, in addition to the ones that Serge reported.
WINNT 5.2 mozilla-central test mochitests on 2009/07/27 07:42:56
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1248705776.1248708876.31326.gz
WINNT 5.2 mozilla-central unit test on 2009/07/27 08:00:58
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1248706858.1248715866.13640.gz
WINNT 5.2 mozilla-central test mochitests on 2009/07/27 09:52:18
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1248713538.1248716755.23655.gz
Comment 14•15 years ago
|
||
WINNT 5.2 mozilla-central test mochitests on 2009/07/28 12:26:20
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1248809180.1248812108.11796.gz
Comment 15•15 years ago
|
||
Do any of the changes in comment #9 seem likely to cause problems ? Would it be helpful to repeat the analysis there ?
Comment 16•15 years ago
|
||
I _may_ have caught this with windbg. Firefox was chugging away at 100% CPU, and after attaching and .dump /mf there is this
http://people.mozilla.org/~nthomas/misc/mochitest-hang.dmp.bz2
Reporter | ||
Comment 17•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1249128753.1249133431.30496.gz
WINNT 5.2 mozilla-central unit test on 2009/08/01 05:12:33
Reporter | ||
Comment 18•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1249144446.1249147004.28560.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/01 09:34:06
Summary: mochitest-plain timing out (hanging?) during shutdown on Windows → mochitest-plain timeout (hanging?) during shutdown on Windows
Comment 21•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1249483284.1249489687.30260.gz
WINNT 5.2 mozilla-central unit test on 2009/08/05 07:41:24
Reporter | ||
Comment 22•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1249516254.1249522258.3598.gz
WINNT 5.2 mozilla-central unit test on 2009/08/05 16:50:54
(FWIW, I looked at the file in comment 16 but I have no idea what kind of file it is or what to do with it; maybe somebody with more Windows knowledge does.)
Comment 23•15 years ago
|
||
It's a minidump, you should be able to load it in WinDBG or the Visual C++ debugger, load the symbols from the symbol server, and get a stack trace.
Comment 24•15 years ago
|
||
Oh, crap. I realized this is from a unittest build, which means the symbols aren't on the symbol server. :-/
Nick: how did you notice this? Just dumb luck? If you can grab this again it would probably be better if you could just follow the steps here:
https://developer.mozilla.org/en/How_to_get_a_stacktrace_with_WinDbg
Comment 25•15 years ago
|
||
Indeed, that was the doc what I was trying to use, with the symbol path updated for the location of the unpacked crashreporter-symbols.zip. WinDbg was failing to resolve the symbols regardless, falling back to export symbols. It's not hard to reproduce this so if someone would to hold my hand on IRC we can try to get some info.
Comment 26•15 years ago
|
||
Oh, one difference from that doc was attaching to the process while it was churning away post-mochitest, rather than launching it from the debugger.
Comment 27•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1249529101.1249531761.8411.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/05 20:25:01
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1249545017.1249547678.16295.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/06 00:50:17
Comment 28•15 years ago
|
||
I'd be happy to hand-hold, but our timezone difference is generally fail. Regardless, if you have questions, you know where to find me.
Reporter | ||
Comment 29•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1249643014.1249650889.22696.gz
WINNT 5.2 mozilla-central unit test on 2009/08/07 04:03:34
...which means that http://hg.mozilla.org/mozilla-central/rev/bae405b94b96 didn't help.
Comment 30•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1249751531.1249760853.13878.gz
WINNT 5.2 mozilla-central unit test on 2009/08/08 10:12:11
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1249769074.1249775146.5829.gz
WINNT 5.2 mozilla-central unit test on 2009/08/08 15:04:34
Comment 31•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1249918855.1249921962.2540.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/10 08:40:55
Comment 32•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1249924446.1249927188.28212.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/10 10:14:06
Comment 33•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1249949013.1249951952.9560.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/10 17:03:33
Reporter | ||
Comment 34•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1249945547.1249953894.30668.gz
WINNT 5.2 mozilla-central unit test on 2009/08/10 16:05:47
Comment 35•15 years ago
|
||
Courtesy of WinDbg and timeless's help.
<timeless_mbp> your problem is deadlock
<timeless_mbp> thread 11 is more interesting
<timeless_mbp> afaict thread 1 is trying to have threads like 11 die
<timeless_mbp> but it's unclear as to whether thread 11 was sent or missed the message
So it's DOM workers
Comment 37•15 years ago
|
||
Suggestions for reproducing the problem locally:
* from http://ftp.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-unittest/ d/l the build, tests and crash symbols
* unpack them so that you have a firefox/, mochitest/ and symbols/
* expand the symbols files, eg find symbols/ -name '*.pd_' | perl -nle '$a=$_;$a=~s/_$/b/; print $a;system("expand $_ $a")'
* build slaves have the following defined in the environment (although I doubt all are relevant)
export MOZ_AIRBAG=1
export MOZ_CRASHREPORTER_NO_REPORT=1
export MOZ_NO_REMOTE=1
export MOZ_NO_RESET_PATH=1
export NO_EM_RESTART=1
export NO_FAIL_ON_TEST_ERRORS=1
export XPCOM_DEBUG_BREAK=warn
* build slaves run mochitest like this:
python mochitest/runtests.py --appname=firefox/firefox.exe --utility-path=bin --extra-profile-file=bin/plugins --certificate-path=certs --autorun --close-when-done --console-level=INFO --symbols-path=symbols --leak-threshold=484
* in windbg, attach to firefox, set your sympath, symfix it, reload (a la https://developer.mozilla.org/en/How_to_get_a_stacktrace_with_WinDbg)
Reporter | ||
Updated•15 years ago
|
Component: General → DOM: Mozilla Extensions
QA Contact: general → general
Assignee | ||
Comment 38•15 years ago
|
||
I'll certainly look at this tomorrow, but two things I want to point out/ask:
1. Originally the report was that firefox was churning up to 100% cpu. Is that what was happening here before you attached to the process? If so that means that something else must be sending lots of windows messages... I would suspect java somehow but see #2...
2. There are lots of threads that have warnings saying their stacks may be incorrect (all the JVM ones, for instance). Also, thread 11 doesn't start with |kernel32!BaseThreadStart| like all the others. What do we make of that? The first few frames are obviously wrong, |PL_HashTableRawLookup| doesn't ever call |nsDOMWorkerMessageHandler::DispatchEvent|. Could that be a weird result of JIT being enabled?
Comment 39•15 years ago
|
||
(In reply to comment #38)
> 1. Originally the report was that firefox was churning up to 100% cpu. Is that
> what was happening here before you attached to the process?
That's right, using all the time the system would give it. Note that Java was updated to Java SE 6 Update 14 a month before this issue appeared (bug 495533).
Comment 40•15 years ago
|
||
Comment 41•15 years ago
|
||
Comment 42•15 years ago
|
||
Comment 43•15 years ago
|
||
Comment 44•15 years ago
|
||
Comment 45•15 years ago
|
||
Comment 46•15 years ago
|
||
Assignee | ||
Comment 47•15 years ago
|
||
Ok, I think I know what is wrong.
Assignee: nobody → bent.mozilla
Status: NEW → ASSIGNED
Assignee | ||
Comment 48•15 years ago
|
||
I <3 threads.
Attachment #394114 -
Flags: superreview?(jst)
Attachment #394114 -
Flags: review?(jst)
Comment 49•15 years ago
|
||
Comment on attachment 394114 [details] [diff] [review]
Fix
This looks good to me. There's a subtle hint that bent gave me over IRC: the worker can only become canceled if the pool's monitor is held, which eliminates the race.
Attachment #394114 -
Flags: review?(jst) → review+
Comment 50•15 years ago
|
||
Comment on attachment 394114 [details] [diff] [review]
Fix
sr=jst. Please add the subtle hint you gave Blake in a comment in this code when landing...
Attachment #394114 -
Flags: superreview?(jst) → superreview+
Assignee | ||
Comment 51•15 years ago
|
||
Pushed changeset 340fe75c03dd to mozilla-central.
Crap, I pushed without seeing your request for a comment. I'll add one.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 52•15 years ago
|
||
Pushed the comment to mozilla-central, changeset e53964cf3d20.
Updated•15 years ago
|
Target Milestone: --- → mozilla1.9.2b1
Reporter | ||
Comment 53•15 years ago
|
||
I hate to say it, but it looks like this hasn't fixed the problems on tinderbox; it's happened 4 times since the fix landed:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250166961.1250170279.31431.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/13 05:36:01
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250151248.1250159957.14134.gz
WINNT 5.2 mozilla-central unit test on 2009/08/13 01:14:08
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250150308.1250160956.24829.gz
WINNT 5.2 mozilla-central unit test on 2009/08/13 00:58:28
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250154536.1250157623.21324.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/13 02:08:56
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 54•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1250171957.1250181773.12666.gz
WINNT 5.2 mozilla-1.9.2 unit test on 2009/08/13 06:59:17
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1250178579.1250188593.26996.gz
WINNT 5.2 mozilla-1.9.2 unit test on 2009/08/13 08:49:39
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250172703.1250182275.18351.gz
WINNT 5.2 mozilla-central unit test on 2009/08/13 07:11:43
Reporter | ||
Comment 55•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250176437.1250179666.19299.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/13 08:13:57
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250182257.1250185320.21382.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/13 09:50:57
Comment 56•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250197052.1250206221.29593.gz
WINNT 5.2 mozilla-central unit test on 2009/08/13 13:57:32
Comment 57•15 years ago
|
||
This is from a build with rev 93a7af291db1, it took about 4 attempts to get it to hang on shutdown.
Reporter | ||
Comment 58•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250210922.1250214059.20275.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/13 17:48:42
Reporter | ||
Comment 59•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250203551.1250206219.29584.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/13 15:45:51
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250214571.1250217372.27737.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/13 18:49:31
Comment 60•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250259243.1250269115.3661.gz
WINNT 5.2 mozilla-central unit test on 2009/08/14 07:14:03
Comment 61•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250271600.1250279506.26053.gz
WINNT 5.2 mozilla-central unit test on 2009/08/14 10:40:00
Reporter | ||
Comment 62•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250273088.1250283600.6762.gz
WINNT 5.2 mozilla-central unit test on 2009/08/14 11:04:48
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250279366.1250282607.27944.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/14 12:49:26
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250274727.1250283146.1277.gz
WINNT 5.2 mozilla-central unit test on 2009/08/14 11:32:07
Comment 63•15 years ago
|
||
Ben asked for a ".dump /ma" from the build slave, which is here
http://people.mozilla.org/~nthomas/bug505718/hang1/
with symbols etc. The stack in firefox-debug_16FC_2009-08-16_17-03-19-643.log looks different to attachment 394430 [details], so let me know if you want more dumps to explore the variations.
Comment 64•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250501950.1250511186.15188.gz
WINNT 5.2 mozilla-central unit test on 2009/08/17 02:39:10
Comment 65•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250551085.1250554119.22605.gz
WINNT 5.2 mozilla-central test mochitests on 2009/08/17 16:18:05
Assignee | ||
Comment 66•15 years ago
|
||
I pushed changeset bb1b6c42d78a, fingers crossed!
Comment 67•15 years ago
|
||
(In reply to comment #66)
> I pushed changeset bb1b6c42d78a, fingers crossed!
Fired off 8 extra packaged-mochitest runs (M on tbpl) on this revision to try to get good stats. Results in 30 minutes or so.
Comment 68•15 years ago
|
||
How does 9 green runs grab you ?
Assignee | ||
Comment 69•15 years ago
|
||
Thanks Nick! Couldn't have figured that out without your help.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
Target Milestone: mozilla1.9.2b1 → mozilla1.9.3a1
Assignee | ||
Comment 70•15 years ago
|
||
This had r+sr=jst over irc.
Attachment #395073 -
Flags: superreview+
Attachment #395073 -
Flags: review+
Assignee | ||
Updated•15 years ago
|
Attachment #394114 -
Flags: approval1.9.2?
Assignee | ||
Updated•15 years ago
|
Attachment #395073 -
Flags: approval1.9.2?
Comment 72•15 years ago
|
||
Who can approve these two patches for 1.9.2 ?
Updated•15 years ago
|
Flags: blocking1.9.2?
Comment 73•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1252860938.1252863977.20393.gz
WINNT 5.2 mozilla-1.9.2 test mochitests on 2009/09/13 09:55:38
Comment 74•15 years ago
|
||
Comment 75•15 years ago
|
||
sorry, second line in my previous comment was supposed to be:
WINNT 5.2 mozilla-1.9.2 unit test on 2009/09/17 10:42:14
Comment 76•15 years ago
|
||
(In reply to comment #72)
> Who can approve these two patches for 1.9.2 ?
Maybe jst could?
I think component owners have approval-granting power. (At least, I remember someone telling roc in a recent platform meeting that roc could grant a1.9.2 for layout bugs.)
Updated•15 years ago
|
Attachment #394114 -
Flags: approval1.9.2? → approval1.9.2+
Comment 77•15 years ago
|
||
Comment on attachment 394114 [details] [diff] [review]
Fix
a=me to stop the orange
Updated•15 years ago
|
Attachment #395073 -
Flags: approval1.9.2? → approval1.9.2+
Assignee | ||
Updated•15 years ago
|
Attachment #394114 -
Flags: approval1.9.2+
Assignee | ||
Comment 78•15 years ago
|
||
Comment on attachment 394114 [details] [diff] [review]
Fix
This landed before 1.9.2 branching. Oops.
Assignee | ||
Updated•15 years ago
|
Assignee | ||
Comment 79•15 years ago
|
||
Comment on attachment 395073 [details] [diff] [review]
Followup patch
Pushed changeset 10a7ac8d7038 to mozilla-1.9.2.
Assignee | ||
Comment 80•15 years ago
|
||
Should be all done now.
Comment 81•15 years ago
|
||
See also Bug 523319, which appears to be a new occurrence of this type of bug.
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange]
Updated•12 years ago
|
Component: DOM: Mozilla Extensions → DOM
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•