Closed Bug 524014 Opened 15 years ago Closed 15 years ago

Mac OS X debug mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures (on xserves)

Categories

(Core :: General, defect)

x86
macOS
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: dholbert, Assigned: cjones)

References

()

Details

(Keywords: intermittent-failure)

At least three times today, the Mac "mochitests-4/5" tinderbox has timed out on 4 consecutive mochitests and given up.

For some reason, it reports 22 failures, even though there were just 4 timeouts.

The timing out tests aren't always the same, but they're always consecutive.  Maybe this is due to a sudden CPU spike on the machine?

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256235034.1256238242.10437.gz
OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 11:10:34
{
703 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug66619.html | Test timed out.
706 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_scrolling.html | Test timed out.
709 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/forms/test/test_bug287446.html | Test timed out.
712 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/forms/test/test_bug345267.html | Test timed out.
713 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
714 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 149 remaining tests.
}

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256256634.1256258227.4637.gz
OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 17:10:34
{
83 ERROR TEST-UNEXPECTED-FAIL | /tests/js/src/xpconnect/tests/mochitest/test_bug390488.html | Test timed out.
86 ERROR TEST-UNEXPECTED-FAIL | /tests/js/src/xpconnect/tests/mochitest/test_bug393269.html | Test timed out.
89 ERROR TEST-UNEXPECTED-FAIL | /tests/js/src/xpconnect/tests/mochitest/test_bug396851.html | Test timed out.
92 ERROR TEST-UNEXPECTED-FAIL | /tests/js/src/xpconnect/tests/mochitest/test_bug428021.html | Test timed out.
93 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
94 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 253 remaining tests.
}

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256253893.1256255425.5821.gz
OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 16:24:53
{
22 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/universalchardet/tests/test_bug421271.html | Test timed out.
25 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/universalchardet/tests/test_bug426271-euc-jp.html | Test timed out.
28 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/universalchardet/tests/test_bug426271-utf-8.html | Test timed out.
31 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/universalchardet/tests/test_bug431054-japanese.html | Test timed out.
32 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
33 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 268 remaining tests.
}
OS: Linux → Mac OS X
Whiteboard: [orange]
Blocks: 523385
No longer blocks: 438871
Summary: Mac mochitests-4/5 box times out on 4 consecutive mochitests and gives up, reporting 22 failures → Mac OS X mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256269556.1256270974.14455.gz
OS X 10.5.2 mozilla-central test debug mochitests-3/5 on 2009/10/22 20:45:56  
20 ERROR TEST-UNEXPECTED-FAIL | Test timed out. | undefined
23 ERROR TEST-UNEXPECTED-FAIL | Test timed out. | undefined
26 ERROR TEST-UNEXPECTED-FAIL | Test timed out. | undefined
29 ERROR TEST-UNEXPECTED-FAIL | Test timed out. | undefined
30 ERROR TEST-UNEXPECTED-FAIL | 4 test timeouts, giving up. | undefined
31 ERROR TEST-UNEXPECTED-FAIL | Skipping 1033 remaining tests. | undefined


http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256266016.1256267607.10523.gz
OS X 10.5.2 mozilla-central test debug mochitests-3/5 on 2009/10/22 19:46:56  
183 ERROR TEST-UNEXPECTED-FAIL | Test timed out. | undefined
186 ERROR TEST-UNEXPECTED-FAIL | Test timed out. | undefined
189 ERROR TEST-UNEXPECTED-FAIL | Test timed out. | undefined
192 ERROR TEST-UNEXPECTED-FAIL | Test timed out. | undefined
193 ERROR TEST-UNEXPECTED-FAIL | 4 test timeouts, giving up. | undefined
194 ERROR TEST-UNEXPECTED-FAIL | Skipping 1014 remaining tests. | undefined


http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256266016.1256268343.18636.gz
OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 19:46:56  
1171 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug508115.xul | Test timed out.
1174 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug514732.html | Test timed out.
1177 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_characters.html | Test timed out.
1180 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Test timed out.
1181 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
1182 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 107 remaining tests.


http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256262050.1256264664.10810.gz
OS X 10.5.2 mozilla-central test debug mochitests-1/5 on 2009/10/22 18:40:50  
1894 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_CrossSiteXHR.html | Test timed out.
1903 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_CrossSiteXHR_cache.html | Test timed out.
1912 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_NodeIterator_basics_filters.xhtml | Test timed out.
1921 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_NodeIterator_mutations_1.xhtml | Test timed out.
1922 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
1923 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 427 remaining tests.



http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256262392.1256265007.14399.gz
OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 18:46:32  
18 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/universalchardet/tests/test_bug421271.html | Test timed out.
21 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/universalchardet/tests/test_bug426271-euc-jp.html | Test timed out.
24 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/universalchardet/tests/test_bug426271-utf-8.html | Test timed out.
27 ERROR TEST-UNEXPECTED-FAIL | /tests/extensions/universalchardet/tests/test_bug431054-japanese.html | Test timed out.
28 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
29 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 268 remaining tests.
buildbot.slave.commands.TimeoutError: command timed out: 1200 seconds without output, killing pid 527


http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256275212.1256278589.31206.gz
OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 22:20:12  
1921 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_acid3_test46.html | Test timed out.
1924 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_at_rule_parse_serialize.html | Test timed out.
1927 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_bug160403.html | Test timed out.
1930 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_bug221428.html | Test timed out.
1931 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
1932 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 92 remaining tests.
buildbot.slave.commands.TimeoutError: command timed out: 1200 seconds without output, killing pid 640






I'm going to hide the mac debug mochitest-plain boxes until this is fixed.
We just hit this on an opt Linux mochitest-plain tinderbox... :-/

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256340460.1256341889.11536.gz
Linux mozilla-central test opt mochitests-5/5 on 2009/10/23 16:27:40
{
565 INFO Running /tests/security/ssl/mixedcontent/test_bug329869.html...
NEXT ERROR 566 ERROR TEST-UNEXPECTED-FAIL | /tests/security/ssl/mixedcontent/test_bug329869.html | Test timed out.
568 INFO Running /tests/security/ssl/mixedcontent/test_bug383369.html...
569 INFO Error: Unable to restore focus, expect failures and timeouts.
NEXT ERROR 570 ERROR TEST-UNEXPECTED-FAIL | /tests/security/ssl/mixedcontent/test_bug383369.html | Test timed out.
572 INFO Running /tests/security/ssl/mixedcontent/test_bug455367.html...
573 INFO Error: Unable to restore focus, expect failures and timeouts.
574 ERROR TEST-UNEXPECTED-FAIL | /tests/security/ssl/mixedcontent/test_bug455367.html | Test timed out.
576 INFO Running /tests/security/ssl/mixedcontent/test_bug472986.html...
577 INFO Error: Unable to restore focus, expect failures and timeouts.
578 ERROR TEST-UNEXPECTED-FAIL | /tests/security/ssl/mixedcontent/test_bug472986.html | Test timed out.
579 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
580 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 141 remaining tests.
}
OS: Mac OS X → All
Hardware: x86 → All
Summary: Mac OS X mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures → mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures
FWIW, the Linux failure (from comment 6) only reported 6 failures on the tinderbox page, though -- not 22 like on the mac boxes.  This one says "428/6/69".
Summary: mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures → mochitest-plain frequently times out on 4 consecutive mochitests and gives up (reporting 22 failures on mac, 6 on linux)
Assignee: nobody → ted.mielczarek
I think that Linux thing should probably be a separate bug.
Ok - I've filed bug 525012 on the linux issue.
OS: All → Mac OS X
Hardware: All → x86
Summary: mochitest-plain frequently times out on 4 consecutive mochitests and gives up (reporting 22 failures on mac, 6 on linux) → Mac OS X mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures
Summary: Mac OS X mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures → Mac OS X debug mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures
catlee was looking into something else, and noticed that our unittests often ran faster on the mac minis than on the xserves. The root cause was that the tests would hang and timeout on the xserves. I wonder if that's the same thing we're seeing here? If so, then it could be a bad threading issue, since the xserves have more physical cpus than the minis (I believe they have 4 cores vs. the minis' two cores.)
Every single one of the listed failures on this bug is on an xserve:
(In reply to comment #0)
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256235034.1256238242.10437.gz
> OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 11:10:34
'echo Building on: bm-xserve18'

> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256256634.1256258227.4637.gz
> OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 17:10:34
echo Building on: bm-xserve11

> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256253893.1256255425.5821.gz
> OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 16:24:53
echo Building on: bm-xserve16

(In reply to comment #2)
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256269556.1256270974.14455.gz
> OS X 10.5.2 mozilla-central test debug mochitests-3/5 on 2009/10/22 20:45:56  
echo Building on: bm-xserve11

> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256266016.1256267607.10523.gz
> OS X 10.5.2 mozilla-central test debug mochitests-3/5 on 2009/10/22 19:46:56  
echo Building on: bm-xserve07

> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256266016.1256268343.18636.gz
> OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 19:46:56  
echo Building on: bm-xserve16

> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256262050.1256264664.10810.gz
> OS X 10.5.2 mozilla-central test debug mochitests-1/5 on 2009/10/22 18:40:50  
echo Building on: bm-xserve07

> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256262392.1256265007.14399.gz
> OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 18:46:32  
echo Building on: bm-xserve19

> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256275212.1256278589.31206.gz
> OS X 10.5.2 mozilla-central test debug mochitests-4/5 on 2009/10/22 22:20:12  
echo Building on: bm-xserve16
Summary: Mac OS X debug mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures → Mac OS X debug mochitest-plain frequently times out on 4 consecutive mochitests and gives up, reporting 22 failures (on xserves)
(In reply to comment #7)
> catlee was looking into something else, and noticed that our unittests often
> ran faster on the mac minis than on the xserves. The root cause was that the
> tests would hang and timeout on the xserves. I wonder if that's the same thing
> we're seeing here? If so, then it could be a bad threading issue, since the
> xserves have more physical cpus than the minis (I believe they have 4 cores vs.
> the minis' two cores.)


If this is an xserve-vs-mini problem, is this blocking switching from unittest builds over to unittest on debug builds?

 
We're having to run unittests both ways until this is fixed, and the extra load is hurting wait times.
Ted's comment#8 is great data. An intermittent xserve-only test failure. Lovely. 

Next time this happens, please let us know on irc. We'll pull the machine from production, hopefully before it starts another job, and change keys to give access for debugging.

Raising priority on this, as it is not just specific to one machine, and is also blocking us reducing load on pool-o-slaves.
Severity: normal → major
Do we have a spare xserve or mac pro we could run some tests on? Considering how often this happens it seems like we might be able to get it in a debugger if we ran mochitest in a loop.
(In reply to comment #14)
> Do we have a spare xserve or mac pro we could run some tests on? Considering
> how often this happens it seems like we might be able to get it in a debugger
> if we ran mochitest in a loop.

We could steal one from the production pool for a few hours if this would be helpful.
Just to sanity check, looked at the logs from comment 10, 11, 12:
echo Building on: bm-xserve12
echo Building on: bm-xserve11
echo Building on: bm-xserve11
echo Building on: bm-xserve18
echo Building on: bm-xserve16
echo Building on: bm-xserve17

I'm going to get someone to pull an xserve from the slave pool this week, and try running Mochitest in a loop and see if I can catch this in a debugger.
Ted, did you get a chance to look into this?
Not yet, sorry.
This was fixed by
http://hg.mozilla.org/mozilla-central/rev/7c5e126880cf

Yay!
Assignee: ted.mielczarek → jones.chris.g
Status: NEW → RESOLVED
Closed: 15 years ago
Depends on: 529134
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a1
(My guess is that the test server was crashing due to bug 529134.)
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.