Closed Bug 1392946 Opened 7 years ago Closed 4 years ago

Perma [tier2] dom/media/test/test_webvtt_positionalign.html | application timed out after 330/370 seconds with no output

Categories

(Firefox for Android Graveyard :: Audio/Video, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Assigned: nataliaCs)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed])

Attachments

(1 file)

Flags: needinfo?(alwu)
Since its failure rate is super low, I think it's not worth to spend time investigating it. I'll prefer to wait for a long while to see its trend (still happen or disappear).
Flags: needinfo?(alwu)
Summary: Intermittent dom/media/test/test_webvtt_positionalign.html | application timed out after 330 seconds with no output → Perma [tier2] dom/media/test/test_webvtt_positionalign.html | application timed out after 330 seconds with no output
Summary: Perma [tier2] dom/media/test/test_webvtt_positionalign.html | application timed out after 330 seconds with no output → Perma [tier2] dom/media/test/test_webvtt_positionalign.html | application timed out after 330/370 seconds with no output
(In reply to Narcis Beleuzu [:NarcisB] from comment #11)
> Starting with
> https://treeherder.mozilla.org/#/
> jobs?repo=autoland&selectedJob=218864088&searchStr=android%2C8.
> 0%2Cpixel2%2Cdebug%2Cmochitests%2Ctest-android-hw-p2-8-0-arm7-api-16%2Fdebug-
> mochitest-media-
> 1%2Cm%28mda1%29&revision=792bc67640ca3156a647a00f2bdfc4fb87b53cf3 , this
> fail turned into a perma tier2 failure.

That's bug 1514075, an android host-utils update. Bug 1516441 started at the same time and has some commentary.
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/61eca43af2df
Skip test_webvtt_positionalign.html on android-hw; r=me,a=test-only
Keywords: leave-open
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d39f40e695de
backout 61eca43af2dffc3ba19cf343210d851c638694d5 since it was not effective
Let's have a closer look at a recent failure:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=219406473&repo=mozilla-inbound&lineNumber=4095

It looks like the reported test completed, all tests in the suite completed, and then there was a hang:

15:47:06     INFO -  1071 INFO TEST-START | dom/media/test/test_webvtt_positionalign.html
15:47:06     INFO -  1072 INFO TEST-OK | dom/media/test/test_webvtt_positionalign.html | took 452ms
15:47:06     INFO -  1073 INFO TEST-START | Shutdown
15:47:06     INFO -  1074 INFO Passed:  6506
15:47:06     INFO -  1075 INFO Failed:  0
15:47:06     INFO -  1076 INFO Todo:    21
15:47:06     INFO -  1077 INFO Mode:    non-e10s
15:47:06     INFO -  1078 INFO Slowest: 70168ms - /tests/dom/media/test/test_fragment_play.html
15:47:06     INFO -  1079 INFO SimpleTest FINISHED
15:59:39     INFO -  wait for org.mozilla.fennec_aurora complete; top activity=org.mozilla.fennec_aurora
15:59:40     INFO -  org.mozilla.fennec_aurora unexpectedly found running. Killing...
15:59:40     INFO -  TEST-INFO | started process screencap
15:59:41     INFO -  TEST-INFO | screencap: exit 0
15:59:52  WARNING -  TEST-UNEXPECTED-FAIL | dom/media/test/test_webvtt_positionalign.html | application timed out after 370 seconds with no output

So the tests completed, but fennec never closed.
But wait! That's the *second* time those results have been reported

15:46:57     INFO -  558 INFO TEST-START | dom/media/test/test_webvtt_positionalign.html
15:46:57     INFO -  559 INFO TEST-OK | dom/media/test/test_webvtt_positionalign.html | took 452ms
15:46:57     INFO -  560 INFO TEST-START | Shutdown
15:46:57     INFO -  561 INFO Passed:  6506
15:46:57     INFO -  562 INFO Failed:  0
15:46:57     INFO -  563 INFO Todo:    21
15:46:57     INFO -  564 INFO Mode:    non-e10s
15:46:57     INFO -  565 INFO Slowest: 70168ms - /tests/dom/media/test/test_fragment_play.html
15:46:57     INFO -  566 INFO SimpleTest FINISHED
15:46:59     INFO -  wait for org.mozilla.fennec_aurora complete; top activity=com.bitbar.testdroid.monitor
15:46:59     INFO -  remoteautomation.py | Application ran for: 0:07:16.141269

...
15:47:06     INFO -  1071 INFO TEST-START | dom/media/test/test_webvtt_positionalign.html
15:47:06     INFO -  1072 INFO TEST-OK | dom/media/test/test_webvtt_positionalign.html | took 452ms
15:47:06     INFO -  1073 INFO TEST-START | Shutdown
15:47:06     INFO -  1074 INFO Passed:  6506
15:47:06     INFO -  1075 INFO Failed:  0
15:47:06     INFO -  1076 INFO Todo:    21
15:47:06     INFO -  1077 INFO Mode:    non-e10s
15:47:06     INFO -  1078 INFO Slowest: 70168ms - /tests/dom/media/test/test_fragment_play.html

Note that the entire "second run" is reported in just a few seconds. I think the harness is just dumping the same remote test log a second time.

Why is the harness trying to run a second time? Some entries in dom/media/test/mochitest.ini have "scheme=https" entries, so the harness loops over scheme=(http,https):

https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/testing/mochitest/runtests.py#2748

On the https run, the fennec url is incorrect (note the duplication at the end):

15:47:05     INFO -  adb launch_application: am start -W -n org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp -a android.intent.action.VIEW --es env9 MOZ_UPLOAD_DIR=/sdcard/tests/mozlog --es env8 R_LOG_DESTINATION=stderr --es args "-no-remote -profile /sdcard/tests/profile//" --es env3 DISABLE_UNSAFE_CPOW_WARNINGS=1 --es env2 R_LOG_VERBOSE=1 --es env1 XPCOM_DEBUG_BREAK=stack --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_LOG_FILE=/sdcard/tests/mozlog/moz.log --es env6 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env5 MOZ_IN_AUTOMATION=1 --es env4 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env12 MOZ_HIDE_RESULTS_TABLE=1 --es env11 R_LOG_LEVEL=6 --es env10 MOZ_CRASHREPORTER_NO_REPORT=1 -d "https://example.com:443/tests?autorun=1&closeWhenDone=1&logFile=%2Fsdcard%2Ftests%2Flogs%2Fmochitest.log&fileLevel=INFO&consoleLevel=INFO&hideResultsTable=1&manifestFile=tests.json&dumpOutputDirectory=%2Fsdcard%2Ftests&autorun=1&closeWhenDone=1&logFile=%2Fsdcard%2Ftests%2Flogs%2Fmochitest.log&fileLevel=INFO&consoleLevel=INFO&hideResultsTable=1&manifestFile=tests.json&dumpOutputDirectory=%2Fsdcard%2Ftests"

I don't know if this is the cause of the timeouts, but it is suspicious.
This craziness is evident in test logs before the host-utils update:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=218756945&repo=mozilla-central&lineNumber=3555

11:18:20     INFO -  558 INFO TEST-START | dom/media/test/test_webvtt_positionalign.html
11:18:20     INFO -  559 INFO TEST-OK | dom/media/test/test_webvtt_positionalign.html | took 448ms
11:18:20     INFO -  560 INFO TEST-START | Shutdown
11:18:20     INFO -  561 INFO Passed:  6515
11:18:20     INFO -  562 INFO Failed:  0
11:18:20     INFO -  563 INFO Todo:    21
11:18:20     INFO -  564 INFO Mode:    non-e10s
11:18:20     INFO -  565 INFO Slowest: 69803ms - /tests/dom/media/test/test_fragment_play.html
11:18:20     INFO -  566 INFO SimpleTest FINISHED
11:18:22     INFO -  wait for org.mozilla.fennec_aurora complete; top activity=com.bitbar.testdroid.monitor
11:18:22     INFO -  remoteautomation.py | Application ran for: 0:07:15.998966
...
11:18:30     INFO -  1071 INFO TEST-START | dom/media/test/test_webvtt_positionalign.html
11:18:30     INFO -  1072 INFO TEST-OK | dom/media/test/test_webvtt_positionalign.html | took 448ms
11:18:30     INFO -  1073 INFO TEST-START | Shutdown
11:18:30     INFO -  1074 INFO Passed:  6515
11:18:30     INFO -  1075 INFO Failed:  0
11:18:30     INFO -  1076 INFO Todo:    21
11:18:30     INFO -  1077 INFO Mode:    non-e10s
11:18:30     INFO -  1078 INFO Slowest: 69803ms - /tests/dom/media/test/test_fragment_play.html
11:18:30     INFO -  1079 INFO SimpleTest FINISHED

but there is no timeout waiting for fennec:

11:18:36     INFO -  wait for org.mozilla.fennec_aurora complete; top activity=com.bitbar.testdroid.monitor
11:18:36     INFO -  remoteautomation.py | Application ran for: 0:00:10.341591

Maybe just a timing difference? Maybe fennec quickly exited with the older host-utils, for some reason?

At any rate, the way forward is likely to fix the multi-scheme handling.
Depends on: 1517169
Depends on: 1517170
Assignee: nobody → ncsoregi
Attachment #9034555 - Flags: review?(jmaher)
Flags: needinfo?(egao)
Comment on attachment 9034555 [details] [diff] [review]
Disabled dom/media/test/test_webvtt_positionalign.html on Android 8 for frequent failures.

Review of attachment 9034555 [details] [diff] [review]:
-----------------------------------------------------------------

There have been no failures since Jan 2: The perma-fail was fixed by the dependent bugs.

We should just wait a bit longer to see if the intermittent failure returns; if not, we can close this bug.
Attachment #9034555 - Flags: review?(jmaher)
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell fixed]
We have completed our launch of our new Firefox on Android. The development of the new versions use GitHub for issue tracking. If the bug report still reproduces in a current version of [Firefox on Android nightly](https://play.google.com/store/apps/details?id=org.mozilla.fenix) an issue can be reported at the [Fenix GitHub project](https://github.com/mozilla-mobile/fenix/). If you want to discuss your report please use [Mozilla's chat](https://wiki.mozilla.org/Matrix#Connect_to_Matrix) server https://chat.mozilla.org and join the [#fenix](https://chat.mozilla.org/#/room/#fenix:mozilla.org) channel.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: