Closed Bug 686143 Opened 13 years ago Closed 11 years ago

Intermittent Android jsreftests "command timed out: 2400 seconds without output"

Categories

(Testing :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox16 wontfix, firefox17 wontfix, firefox18 wontfix, firefox19 wontfix, firefox-esr17 wontfix)

RESOLVED WORKSFORME
Tracking Status
firefox16 --- wontfix
firefox17 --- wontfix
firefox18 --- wontfix
firefox19 --- wontfix
firefox-esr17 --- wontfix

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [android_tier_1])

I haven't looked at every one of them, and I probably won't, but the four I did look at were all in the enormous hunk of ecma/Date/15.9.5.9.js where it does getUTCMonth() over and over and over and over and over. https://tbpl.mozilla.org/php/getParsedLog.php?id=6365327&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=6365430&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=6363168&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=6359329&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=6367476&full=1 in the midst of passing ecma/GlobalObject/15.1.2.4.js escape(String.fromCharCode(1020)), so perhaps we're just on to a whole new set of things to need to disable.
https://tbpl.mozilla.org/php/getParsedLog.php?id=6378254&full=1 in ecma/Expressions/11.10-2.js and if anyone actually wanted to see them, there's probably a bunch of these in bug 686084 since the actual failure is miles of foopy GC away from the parsed failure, so I often forget that they are this rather than that.
https://tbpl.mozilla.org/php/getParsedLog.php?id=6369315&full=1 - jsreftest-2 in js1_8_1/jit/math-jit-tests.js
Summary: Intermittent Android jsreftest-1 "command timed out: 2400 seconds without output" probably mostly in ecma/Date/15.9.5.9.js → Intermittent Android jsreftests "command timed out: 2400 seconds without output"
https://tbpl.mozilla.org/php/getParsedLog.php?id=6381914&full=1 - ecma/Date/15.9.5.9.js | (new Date(1117584000001)).getUTCMonth() item 382 I say we just skip 15.9.5.9.js - it's total crap. It thinks it's being all cunning by testing UTC_FEB_29_2000, but the way the test works is to take a date, add the number of milliseconds in January to it, test that, 1 ms before, 1 ms after, then add the number of milliseconds in February to it, .... So that round of the test makes sure we know what month it was on the 31st of March 2000, which would have been the 30th if we didn't know 2000 was a leap year, then on the 29th of April, ....
I agree that dropping 15.9.5.9.js is a good idea.
https://tbpl.mozilla.org/php/getParsedLog.php?id=6399139&tree=Mozilla-Inbound&full=1#error0 - js1_5/Regress/regress-4047... (whatever comes after js1_5/Regress/regress-398609.js)
Assignee: general → jmaher
we continue to see this in different spots rather than ecma/Date/15.9.5.9.js. I am considering making this 3 chunks instead of 2. I also need to do some memory profiling.
https://tbpl.mozilla.org/php/getParsedLog.php?id=6621122&tree=Firefox&full=1#error0 - ecma_3/RegExp/regress-223273.js (or maybe whatever test is next after that, not sure if it was over already)
hmm, I might have reproduced this. I was running tests on my a tegra in the staging environment and now I cannot connect to it. What I see is that I had adb via tcp connected doing a logcat as well as running tests via SUT. This is my 5th or 6th run and the first failure I have seen. I cannot ping the device, telnet to sut, or connect via adb. Looking at some other logs, it appears that the device really does go offline when we run into this error. Now to figure out why the network drops.
oh, el tegra really goes offline, I reproduced this at home and my usb adb cable provides no help :( Looking on a screen, I see fennec displayed, but no mouse input is available.
https://tbpl.mozilla.org/php/getParsedLog.php?id=6626838&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/15.9.5.9.js Were you OOM, after pointlessly and ridiculously counting each millisecond back to 1/1/0000 over and over?
https://tbpl.mozilla.org/php/getParsedLog.php?id=6627468&tree=Firefox&full=1#error0 - ecma/Date/15.9.5.13-1.js (wonder whether it counts milliseconds back to year zero repeatedly, too?)
ok, I repeated this twice more: 1) not sure what happened, but it looks like my dhcp server refreshed the ip addresses, although my tegra got an ip only a few hours earlier 2) my tegra did a soft reboot.
actually for the peak memory consumption it isn't normally during the 15.9.5.9.js test it is the tests immediately following it. So maybe it is a side effect of our memory usage during these silly date tests which happen to be seen in the following tests.
Yeah, because of the way the logs just get cut off in the middle, and the way (according to my vague understanding, anyway) the log we see is just what it was at the last time the log was successfully polled, there's no guarantee that what I think was running when it died was actually what was running. Could well be that there's something in the next test that I hate even more, and I just haven't seen it yet :)
https://tbpl.mozilla.org/php/getParsedLog.php?id=6635587&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/dst-offset-caching-8-of-8.js (oddly, looking exactly like the log from comment 67, and not entirely like all the others).
https://tbpl.mozilla.org/php/getParsedLog.php?id=6641548&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/dst-offset-caching-8-of-8.js Be sort of interesting to know what timing out during ecma/Date/dst-offset-caching-8-of-8.js puts in the log, that results in those but no others having several "failed to validate file when downloading /mnt/sdcard/tests/reftest/reftest.log!" in them. Well, okay, maybe it wouldn't be interesting, but it might.
Because I just can't help wondering what will happen when I put a fork in the light socket, pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/bbd483aa8883 skipping ecma/Date/15.9.5.9.js. Best hypothesis is "frequency remains exactly the same, location shifts downward to either 15.9.5.js or one of the dst-offset ones."
Unfortunately, I don't actually know what the frequency was, so while https://tbpl.mozilla.org/php/getParsedLog.php?id=6643134&tree=Mozilla-Inbound&full=1#error0 - ecma/Expressions/11.10-2.js https://tbpl.mozilla.org/php/getParsedLog.php?id=6643560&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/dst-offset-caching-8-of-8.js https://tbpl.mozilla.org/php/getParsedLog.php?id=6643791&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/dst-offset-caching-8-of-8.js https://tbpl.mozilla.org/php/getParsedLog.php?id=6643934&tree=Mozilla-Inbound&full=1#error0 - ecma_5/Types/8.12.5-01.js and three of bug 681855 and one of bug 691117 and 13 green seems like "omg a million times better!" it might not really be more than half a million times better.
It's kind of creepy that WOO keeps graphs of when I'm on vacation. 2011-09-27 through 2011-10-02 is the only range through which you have actual valid data, and because I don't generally look at jobs starred by other people, I don't know how much through that range has residiual "whatever, retriggered" starring. I'm afraid it's equally possible that that graph shows "the frequency has increased" and that it shows "the percentage of the constant frequency which is correctly starred has increased."
https://tbpl.mozilla.org/php/getParsedLog.php?id=6653682&tree=Mozilla-Inbound&full=1#error0 - js1_6/Array/regress-304828.js For that matter, the frequency is strongly affected by my perception of the frequency, since if I'm sure that retriggering will just net me another one of these, and the one I'm starring was a non-JS non-Android push I'm less likely to retrigger after failure.
Ordinarily, I'd just call https://tbpl.mozilla.org/php/getParsedLog.php?id=6658922&tree=Mozilla-Inbound the oh so useful bug 689856, but right in the suspicious part of ecma/Date/dst-offset-caching-8-of-8.js?
Did you miss me? It's been a long time! https://tbpl.mozilla.org/php/getParsedLog.php?id=6694839&tree=Mozilla-Inbound&full=1 - ecma/Date/dst-offset-caching-8-of-8.js
https://tbpl.mozilla.org/php/getParsedLog.php?id=6900468&tree=Mozilla-Inbound (yes, it is sort of a problem that I don't actually know how I would tell the difference between someone breaking jsreftest and just the normal noises in here)
Depends on: 690311
Depends on: 697470
https://tbpl.mozilla.org/php/getParsedLog.php?id=7906703&tree=Firefox REFTEST FINISHED: Slowest test took 44430ms (http://10.250.48.217:30145/jsreftest/tests/jsreftest.html?test=e4x/Regress/regress-308111.js) REFTEST INFO | Result summary: REFTEST INFO | Successful: 44629 (44629 pass, 0 load only) REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception) REFTEST INFO | Known problems: 1131 (56 known fail, 0 known asserts, 1027 random, 48 skipped, 0 slow) REFTEST INFO | Total canvas count = 0 REFTEST TEST-START | Shutdown INFO | automation.py | Application ran for: 0:13:13.701436 INFO | automation.py | Reading PID log: /tmp/tmpSm9NBnpidlog getting files in '/mnt/sdcard/tests/reftest/profile/minidumps/' WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected! REFTEST INFO | runreftest.py | Running tests: end. command timed out: 2400 seconds without output, killing pid 35761
These all seem to be not specific to an individual test but due to memory pressure/gc and since the reftest framework loads the tests sequentially without restarting the browser. Can you limit the amount of memory used by the test process on android to prevent low memory from causing the device to fall over? If the test run aborts due to exceeding the memory limit, at least you have that datum. njn, is there anything you can suggest to help improve or diagnose this situation?
Depends on: 725500
Assignee: jmaher → nobody
Component: JavaScript Engine → General
Product: Core → Testing
QA Contact: general → general
No longer blocks: 750959
jmaher: Your no reboot fix for jstestbrowser fixed this I think.
https://tbpl.mozilla.org/php/getParsedLog.php?id=12425487&tree=Fx-Team (It probably is quite a bit better, but mostly we just all gave up and stopped pasting links for Android failures.)
Whiteboard: [orange][android_tier_1] → [android_tier_1]
Resolving WFM keyword:intermittent-failure bugs last modified >3 months ago, whose whiteboard contains none of: {random,disabled,marked,fuzzy,todo,fails,failing,annotated,time-bomb,leave open} There will inevitably be some false positives; for that (and the bugspam) I apologise. Filter on orangewfm.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.