Intermittent TEST-UNEXPECTED-TIMEOUT | dom/tests/mochitest/fetch/test_fetch_basic_http_sw_reroute.html | application timed out after 370 seconds with no output
Categories
(Core :: DOM: Service Workers, defect, P5)
Tracking
()
People
(Reporter: intermittent-bug-filer, Assigned: edenchuang)
References
Details
(Keywords: intermittent-failure)
Attachments
(3 files)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 13•7 years ago
|
||
Comment 14•6 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 17•6 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment 19•6 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 23•6 years ago
|
||
Comment 24•6 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 29•6 years ago
|
||
Recent occurrences for bug 1492340, bug 1531158 and this one occurred on the 27th of February.
Could the recent failures be from bug 1525959? James, can you please take a look?
Bug 1492340 https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-02-23&endday=2019-03-02&tree=trunk&bug=1492340
Bug 1531158 https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-02-23&endday=2019-03-02&tree=trunk&bug=1531158
Bug 1427396 https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-02-23&endday=2019-03-02&tree=trunk&bug=1427396
Comment hidden (Intermittent Failures Robot) |
(In reply to Andreea Pavel [:apavel] from comment #29)
Recent occurrences for bug 1492340, bug 1531158 and this one occurred on the 27th of February.
Could the recent failures be from bug 1525959? James, can you please take a look?
We did start running this test in that bug, but it's not the root cause of the problem. Andrew, maybe some SW folks could try to make this more reliable? Otherwise I guess we can skip it, but I'd rather not.
Comment 32•6 years ago
|
||
Sure. I think Eden may have a bit of time to look.
I briefly glanced and am confused about where the relevant stack trace is. At https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=231507730&repo=autoland&lineNumber=3748 I see lots of frames in AOSP code. Is there a better place to be looking?
Comment 33•6 years ago
|
||
Oh, and I'll ask the silly question: can one reproduce this locally? :)
You can probably ignore the stack. The stack is just whatever we were running when the harness kills the process due to the timeout. You can probably repro locally with mach mochitest --app org.mozilla.geckoview.test <path to test>
.
Assignee | ||
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment 36•6 years ago
|
||
In the last 7 days there have been 34 occurrences on Android 7 and Linux 64, build type opt.
Comment hidden (Intermittent Failures Robot) |
Comment 38•6 years ago
|
||
Over the last 7 days there are 44 crashes on this bug. These happen on linux32, linux64-pgo-qr.
Here is the latest failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=235663378&repo=autoland&lineNumber=9420
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 45•6 years ago
|
||
Sorry for so late response. This bug can not be reproduced in my local. :(
All timeout bugs under dom/tests/mochitest/fetch/ are the same reason, I set them as "duplicate" of this bug.
After analyzing the log from the try runs, the root cause of timeout is failing to load the script of <script> element in dom/tests/mochitest/fetch/fetch_test_framework.js::windowTest. Since loading failed, no SimpleTest.finish() is called, then the test goes into timeout status.
Here is the warning message
Web Content: [JavaScript Warning: "Loading failed for the <script> with source “http://mochi.test:8888/tests/dom/tests/mochitest/fetch/test_formdataparsing.jsâ€." {file: "http://mochi.test:8888/tests/dom/tests/mochitest/fetch/reroute.html?test_formdataparsing" line: 1}]
Compare to the test of non-sw_reroute version, the special characters, “ and â€, are prepended and appended on loading URI, it probably is the reason why loading failed.
I am figuring out the actual reason why script loading fail.
Comment hidden (Intermittent Failures Robot) |
Comment 47•6 years ago
|
||
(In reply to Eden Chuang[:edenchuang] from comment #45)
Web Content: [JavaScript Warning: "Loading failed for the <script> with source “http://mochi.test:8888/tests/dom/tests/mochitest/fetch/test_formdataparsing.jsâ€." {file: "http://mochi.test:8888/tests/dom/tests/mochitest/fetch/reroute.html?test_formdataparsing" line: 1}]
That reminds me very much of https://bugzilla.mozilla.org/show_bug.cgi?id=1534732#c7 ... although that is apparently not related to particular tests, while this is particular to the sw_reroute tests.
Compare to the test of non-sw_reroute version, the special characters, “ and â€, are prepended and appended on loading URI, it probably is the reason why loading failed.
I think those special characters are "normal" for android, but I'm not 100% sure.
Comment 48•6 years ago
|
||
Let's skip all these sw_reroute tests temporarily, while we continue investigating,
so that fewer failures are reported and fewer alerts/needinfos are generated!
Comment hidden (Intermittent Failures Robot) |
Comment 50•6 years ago
|
||
Updated•6 years ago
|
Comment 51•6 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 53•6 years ago
|
||
There are 150 total failures in the last 7 days, however, since yesterday, failure rate has significantly decreased.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=237054254&repo=autoland&lineNumber=9583
[task 2019-03-30T00:46:40.875Z] 00:46:40 INFO - TEST-START | dom/tests/mochitest/fetch/test_fetch_basic_http_sw_reroute.html
[task 2019-03-30T00:52:50.884Z] 00:52:50 INFO - Buffered messages finished
[task 2019-03-30T00:52:50.885Z] 00:52:50 ERROR - TEST-UNEXPECTED-TIMEOUT | dom/tests/mochitest/fetch/test_fetch_basic_http_sw_reroute.html | application timed out after 370 seconds with no output
[task 2019-03-30T00:52:50.887Z] 00:52:50 ERROR - Force-terminating active process(es).
[task 2019-03-30T00:52:50.888Z] 00:52:50 INFO - Determining child pids from psutil...
[task 2019-03-30T00:52:50.894Z] 00:52:50 INFO - [5902, 5925, 5974, 5998]
[task 2019-03-30T00:52:50.894Z] 00:52:50 INFO - ==> process 5837 launched child process 5862
[task 2019-03-30T00:52:50.895Z] 00:52:50 INFO - ==> process 5837 launched child process 5902
[task 2019-03-30T00:52:50.897Z] 00:52:50 INFO - ==> process 5837 launched child process 5925
[task 2019-03-30T00:52:50.898Z] 00:52:50 INFO - ==> process 5837 launched child process 5974
[task 2019-03-30T00:52:50.899Z] 00:52:50 INFO - ==> process 5837 launched child process 5998
[task 2019-03-30T00:52:50.900Z] 00:52:50 INFO - Found child pids: set([5862, 5974, 5925, 5902, 5998])
[task 2019-03-30T00:52:50.901Z] 00:52:50 INFO - Failed to get child procs
[task 2019-03-30T00:52:50.901Z] 00:52:50 INFO - Killing process: 5862
[task 2019-03-30T00:52:50.902Z] 00:52:50 INFO - TEST-INFO | started process screentopng
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 56•6 years ago
|
||
After more analyzing on the log, I found the HttpChannelChild::Cancel(NS_BINDING_ABORTED) is called unexpected, although the data from the parent process are the same with the normal case. However, the Cancel makes the script loading fail and times out the test.
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 61•6 years ago
|
||
ServiceWorker Termination cancels the HttpChannel and causes the script loading failed.
Comparing to the situation in other platforms, the ServiceWorker termination is expected since no jobs for ServiceWorker. However, the difference is the termination job will be executed after HttpChannel::OnStopRequest, therefore, there is no loading fail in other platforms.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 66•6 years ago
|
||
Over the last 7 days there are 38 failures present on this bug. These happen on linux32-shippable, linux64, linux64-qr, linux64-shippable, linux64-shippable-qr, windows7-32-shippable
Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=240860942&repo=autoland&lineNumber=9481
[task 2019-04-17T06:42:28.311Z] 06:42:28 INFO - SimpleTest START
[task 2019-04-17T06:42:28.319Z] 06:42:28 INFO - TEST-START | dom/tests/mochitest/fetch/test_fetch_basic.html
[task 2019-04-17T06:42:30.124Z] 06:42:30 INFO - GECKO(5875) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
[task 2019-04-17T06:42:30.126Z] 06:42:30 INFO - GECKO(5875) | MEMORY STAT | vsize 2497MB | residentFast 117MB | heapAllocated 15MB
[task 2019-04-17T06:42:30.147Z] 06:42:30 INFO - TEST-OK | dom/tests/mochitest/fetch/test_fetch_basic.html | took 1828ms
[task 2019-04-17T06:42:30.196Z] 06:42:30 INFO - TEST-START | dom/tests/mochitest/fetch/test_fetch_basic_http.html
[task 2019-04-17T06:42:31.109Z] 06:42:31 INFO - GECKO(5875) | MEMORY STAT | vsize 2500MB | residentFast 119MB | heapAllocated 15MB
[task 2019-04-17T06:42:31.126Z] 06:42:31 INFO - TEST-OK | dom/tests/mochitest/fetch/test_fetch_basic_http.html | took 929ms
[task 2019-04-17T06:42:31.166Z] 06:42:31 INFO - TEST-START | dom/tests/mochitest/fetch/test_fetch_basic_http_sw_empty_reroute.html
[task 2019-04-17T06:42:32.158Z] 06:42:32 INFO - GECKO(5875) | MEMORY STAT | vsize 2501MB | residentFast 120MB | heapAllocated 14MB
[task 2019-04-17T06:42:32.178Z] 06:42:32 INFO - TEST-OK | dom/tests/mochitest/fetch/test_fetch_basic_http_sw_empty_reroute.html | took 1009ms
[task 2019-04-17T06:42:32.199Z] 06:42:32 INFO - TEST-START | dom/tests/mochitest/fetch/test_fetch_basic_http_sw_reroute.html
[task 2019-04-17T06:48:42.216Z] 06:48:42 INFO - Buffered messages finished
[task 2019-04-17T06:48:42.217Z] 06:48:42 ERROR - TEST-UNEXPECTED-TIMEOUT | dom/tests/mochitest/fetch/test_fetch_basic_http_sw_reroute.html | application timed out after 370 seconds with no output
[task 2019-04-17T06:48:42.217Z] 06:48:42 ERROR - Force-terminating active process(es).
[task 2019-04-17T06:48:42.218Z] 06:48:42 INFO - Determining child pids from psutil...
[task 2019-04-17T06:48:42.222Z] 06:48:42 INFO - [6040, 5939, 5962, 5997]
[task 2019-04-17T06:48:42.224Z] 06:48:42 INFO - ==> process 5875 launched child process 5900
[task 2019-04-17T06:48:42.227Z] 06:48:42 INFO - ==> process 5875 launched child process 5939
[task 2019-04-17T06:48:42.229Z] 06:48:42 INFO - ==> process 5875 launched child process 5962
[task 2019-04-17T06:48:42.235Z] 06:48:42 INFO - ==> process 5875 launched child process 5997
[task 2019-04-17T06:48:42.236Z] 06:48:42 INFO - ==> process 5875 launched child process 6040
[task 2019-04-17T06:48:42.237Z] 06:48:42 INFO - Found child pids: set([6040, 5962, 5939, 5900, 5997])
[task 2019-04-17T06:48:42.238Z] 06:48:42 INFO - Failed to get child procs
[task 2019-04-17T06:48:42.240Z] 06:48:42 INFO - Killing process: 6040
[task 2019-04-17T06:48:42.241Z] 06:48:42 INFO - TEST-INFO | started process screentopng
[task 2019-04-17T06:48:42.532Z] 06:48:42 INFO - TEST-INFO | screentopng: exit 0
[task 2019-04-17T06:48:42.532Z] 06:48:42 INFO - Killing process: 5962
[task 2019-04-17T06:48:42.535Z] 06:48:42 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-04-17T06:48:42.537Z] 06:48:42 INFO - Killing process: 5939
[task 2019-04-17T06:48:42.539Z] 06:48:42 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-04-17T06:48:42.541Z] 06:48:42 INFO - Killing process: 5900
[task 2019-04-17T06:48:42.543Z] 06:48:42 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-04-17T06:48:42.545Z] 06:48:42 INFO - Can't trigger Breakpad, process no longer exists
[task 2019-04-17T06:48:42.547Z] 06:48:42 INFO - Killing process: 5997
[task 2019-04-17T06:48:42.548Z] 06:48:42 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-04-17T06:48:42.550Z] 06:48:42 INFO - Killing process: 5875
[task 2019-04-17T06:48:42.552Z] 06:48:42 INFO - Not taking screenshot here: see the one that was previously logged
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 71•6 years ago
|
||
This bug has failed 60 times in the last 7 days. Occurs on linux64-shippable-qr, linux64-qr on opt build type.
Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=242800465&repo=mozilla-inbound&lineNumber=9388
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 76•6 years ago
|
||
Note that all of the recent Android and some of the recent Linux failures are for a different test:
dom/tests/mochitest/fetch/test_formdataparsing_sw_reroute.html
Comment 77•6 years ago
|
||
I see half and half for:
dom/tests/mochitest/fetch/test_fetch_basic_http_sw_reroute.html
and:
dom/tests/mochitest/fetch/test_formdataparsing_sw_reroute.html
most of these are on linux && !debug and split between 64/32 and regular/qr.
I would honestly disable both for linux&&!debug so we could see what remains- 2+ weeks of no mention of a pending fix it is safe to disable.
Comment hidden (Intermittent Failures Robot) |
Comment 79•6 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment 81•6 years ago
|
||
Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/418adda2c4ce
disable test_fetch_basic_http_sw_reroute.html and test_formdataparsing_sw_reroute.html on linux and not debug r=jmaher
Comment 82•6 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 90•5 years ago
|
||
The ServiceWorker is terminated by idleWorkerTimer timeout, and in slow platforms/machines, it would cancel the uncompleted fetching and make unexpected behavior during running tests.
The SW idle termination implementation has no defects, so the fix would just double the dom.serviceWorkers.idle_timeout to let fetchings have more chance to finish before idleWorkerTimer timeout.
Assignee | ||
Comment 91•5 years ago
|
||
The try result with the patch
Assignee | ||
Updated•5 years ago
|
Comment 92•5 years ago
|
||
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/725ea89ab2d2
Twice the dom.serviceWorkers.idle_timeout value for tests under dom/tests/mochitest/fetch/ to reduce the impact SW termination by timeout. r=perry
Comment 93•5 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•5 years ago
|
Comment 104•4 years ago
|
||
The latest failures are all on ESR (since 7 months). We should observe this after ESR 78 is out.
Comment hidden (Intermittent Failures Robot) |
Comment 106•4 years ago
|
||
Hi Geoff, it seems that the tests are enabled again with Eden's patch and are only failing (rarely) on ESR 68. Can we remove the [stockwell disabled] whiteboard then?
Comment 107•4 years ago
|
||
Yes, that's right - done!
Comment hidden (Intermittent Failures Robot) |
Comment 109•4 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Description
•