Intermittent asan TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 370 seconds with no output
Categories
(Core :: Sanitizers, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox78 | --- | disabled |
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure, test-disabled)
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
Details |
Filed by: hskupin [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=297446937&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/IplYzIsaQbyN0RK9SxutFA/runs/0/artifacts/public/logs/live_backing.log
This bug covers the ASAN specific subset for bug 1414495, which is an early hang of Firefox during startup.
Comment 1•5 years ago
|
||
From the above log:
[task 2020-04-14T01:10:57.721Z] 01:10:57 INFO - Application command: Z:\task_1586825398\build\application\firefox\firefox.exe -marionette --wait-for-browser -foreground -profile c:\users\task_1586825398\appdata\local\temp\tmpigabhm.mozrunner
[task 2020-04-14T01:10:57.726Z] 01:10:57 INFO - runtests.py | Application pid: 5452
[task 2020-04-14T01:10:57.726Z] 01:10:57 INFO - TEST-INFO | started process GECKO(5452)
[task 2020-04-14T01:13:58.043Z] 01:13:58 INFO - runtests.py | Waiting for browser...
[task 2020-04-14T01:17:07.743Z] 01:17:07 INFO - Buffered messages finished
[task 2020-04-14T01:17:07.743Z] 01:17:07 ERROR - TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 370 seconds with no output
Usually it would look like:
[task 2020-04-14T01:10:35.490Z] 01:10:35 INFO - Application command: Z:\task_1586825398\build\application\firefox\firefox.exe -marionette --wait-for-browser -foreground -profile c:\users\task_1586825398\appdata\local\temp\tmpv2ndif.mozrunner
[task 2020-04-14T01:10:35.492Z] 01:10:35 INFO - runtests.py | Application pid: 6272
[task 2020-04-14T01:10:35.492Z] 01:10:35 INFO - TEST-INFO | started process GECKO(6272)
[task 2020-04-14T01:10:37.699Z] 01:10:37 INFO - GECKO(6272) | 1586826637693 Marionette TRACE Marionette enabled
[task 2020-04-14T01:10:37.942Z] 01:10:37 INFO - GECKO(6272) | 1586826637941 Marionette TRACE Received observer notification toplevel-window-ready
Marionette enabled
gets logged for the command-line-startup
notification, or when the marionette.enabled
preference or environment variable is set. As you can see above in this case it's for the command line argument, so it means this notification never gets sent.
Comment 2•5 years ago
|
||
The first failure of this type is actually on March 16th:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=293425763&repo=autoland&lineNumber=1783
And the number of failures increased on March 19th.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 7•5 years ago
|
||
This seems to be a perma fail on the TV jobs for Windows 10 x64 asan on this try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=787cafdb4679d41cda6f8ac1cef2af6125949b2b
This might help to debug the intermittent failures where the last message in the log is "Waiting until startup recorder finished recording startup scripts..."
Comment 8•5 years ago
|
||
Florian, for reference it would be good to know which changes you actually pushed to get the test jobs into that state, maybe also based on which revision on central.
Comment 9•5 years ago
|
||
Depends on D75104
Updated•5 years ago
|
Comment 10•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #8)
Florian, for reference it would be good to know which changes you actually pushed to get the test jobs into that state, maybe also based on which revision on central.
I don't think any of what I pushed is responsible for making this fail more often. It's just flaky enough that TV's chaos mode makes it almost perma-fail. I actually had one green run yesterday, after several retriggers. For the runs that weren't green, sometimes we hang after the "Waiting until startup recorder finished recording startup scripts..." message and I get a profile uploaded, sometimes we fail with no output at all (similar to your first quote in comment 1). I also did a try push with the mainthreadio feature disabled, and the TV job still failed.
I suspect the underlying bug is that startup profiling doesn't work well on Windows asan, and we either hang during early startup (when enabling the profiler?) or never resolve the promise when trying to capture the profile. I don't see fixing this being a top priority soon, so I attached a patch to skip this entire test manifest for Windows asan.
I haven't looked at the logs of all the intermittent failures reported in the last few weeks to get a sense of how much of them were when running this folder of tests, but I suspect a lot were.
Comment 11•5 years ago
|
||
Comment 12•5 years ago
|
||
Backed out 4 changesets (bug 1580390, bug 1637603, bug 1629824) for Browser-chrome failures in performance/browser_startup_images.js. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302310802&repo=autoland&lineNumber=16761
Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&revision=7a0696cf87a3256f13dbeddb620ed783f4dbeb90
Backout:
https://hg.mozilla.org/integration/autoland/rev/8714cc14400550c3ef4f69b8b6ef8d844e830a93
Comment hidden (Intermittent Failures Robot) |
Comment 14•5 years ago
|
||
Updated•5 years ago
|
Comment 15•5 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 17•4 years ago
|
||
This is not fixed but the test was disabled. As such reopening this bug.
Comment 18•4 years ago
|
||
Note that I can also see the hang on the linux1804 machines in CI. It's perma failing there for the wdspec tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=504cf49688f54117ee871e0575768ba70a0107ab
Christian, have you seen such a bad interaction of asan builds with the gecko profiler yet? Any idea what could be done to further investigate that?
Comment 20•4 years ago
|
||
ASan is slightly slower and consumes more memory than the regular builds. This can cause different timings (and make certain hangs more likely) as well as hit resource limits and timeouts that are hit less frequently in regular builds (or only barely reached there).
In any case, it is highly unlikely that this is an ASan specific problem. If you think there is a hang, then it probably makes sense to investigate it further as there is a chance that the hang can also occur in regular builds. Otherwise, I'd stick with disabling the test as a fix.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 26•4 years ago
|
||
Florian, is the test which was causing the problem still disabled? If yes, do we have other tests that can cause the same problem?
Comment 27•4 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #26)
Florian, is the test which was causing the problem still disabled? If yes, do we have other tests that can cause the same problem?
The tests we disabled were the browser chrome tests that used startup profiling. When looking at the failures from comment 25, none of them is for a browser-chrome test job.
A few of the asan failure logs I opened seem to have an actionable error message:
==1241==ERROR: AddressSanitizer: heap-use-after-free on address 0x604000093732 at pc 0x7fe8eaa4728b bp 0x7ffc3b8dbcf0 sp 0x7ffc3b8dbce8
READ of size 1 at 0x604000093732 thread T0
#0 0x7fe8eaa4728a in SplitSingleCharHelper<unsigned char> /builds/worker/checkouts/gecko/js/src/builtin/String.cpp:3508:31
#1 0x7fe8eaa4728a in SplitSingleCharHelper /builds/worker/checkouts/gecko/js/src/builtin/String.cpp:3543:12
#2 0x7fe8eaa4728a in js::StringSplitString(JSContext*, JS::Handle<js::ObjectGroup*>, JS::Handle<JSString*>, JS::Handle<JSString*>, unsigned int) /builds/worker/checkouts/gecko/js/src/builtin/String.cpp:3573:12
#3 0x7fe8eacc2cd5 in js::intrinsic_StringSplitString(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/vm/SelfHosting.cpp:1608:20
#4 0x1f77c0dbaf7f (<unknown module>)
0x604000093732 is located 18 bytes inside of 31-byte region [0x604000093720,0x60400009373f)
freed by thread T7 (JS Helper) here:
#0 0x558792deb0ad in free /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:123:3
#1 0x7fe8eb356584 in js_free /builds/worker/workspace/obj-build/dist/include/js/Utility.h:432:3
#2 0x7fe8eb356584 in freeUntracked /builds/worker/checkouts/gecko/js/src/gc/FreeOp.h:75:33
#3 0x7fe8eb356584 in js::gc::GCRuntime::freeFromBackgroundThread(js::AutoLockHelperThreadState&) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:3492:12
#4 0x7fe8eb3560f9 in js::gc::BackgroundFreeTask::run() /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:3468:7
#5 0x7fe8eb384286 in js::GCParallelTask::runTask() /builds/worker/checkouts/gecko/js/src/gc/GCParallelTask.cpp:146:3
#6 0x7fe8eb384065 in js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) /builds/worker/checkouts/gecko/js/src/gc/GCParallelTask.cpp:131:5
Jon, is this a GC bug?
Comment 28•4 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #27)
This may be related to nursery string deduplication which landed recently. Steve, can you take a look?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 92•2 years ago
|
||
Lots of these recent failures are wrongly classified and should be bug 1784591. Maybe a fix there could also help here.
Comment hidden (Intermittent Failures Robot) |
Comment 94•2 years ago
|
||
(In reply to Intermittent Failures Robot from comment #94)
15 failures in 2916 pushes (0.005 failures/push) were associated with this bug in the last 7 days.
- windows10-64-2004-asan-qr: 2
- opt: 2
Note that only 2 of these failures are actually ASAN build related. All others have been wrongly classified.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 98•2 years ago
|
||
Tests for ASAN builds on Linux seem to also have issues with Firefox not opening an initial browser window within a 6 minutes timeout:
[task 2022-09-26T15:04:13.780Z] 15:04:13 INFO - GECKO(17874) | 1664204653779 Marionette DEBUG Waiting for initial application window
[task 2022-09-26T15:04:17.633Z] 15:04:17 INFO - GECKO(17874) | console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at /tmp/tmpzrxqzmzx.mozrunner/search.json.mozlz4", (void 0)))
[task 2022-09-26T15:10:13.705Z] 15:10:13 INFO - runtests.py | Waiting for browser...
[task 2022-09-26T15:10:27.636Z] 15:10:27 INFO - Buffered messages finished
[task 2022-09-26T15:10:27.636Z] 15:10:27 ERROR - TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 370 seconds with no output
Florian, should we extend your already landed patch and also disable Linux or finally get someone to investigate?
Comment 99•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #99)
Florian, should we extend your already landed patch and also disable Linux
The patch I landed was specifically related to the startup tests that rely on startup profiling, and were near perma fail on Windows asan. I see no indication that the recent failures involved startup profiling.
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Description
•