Closed Bug 1063290 Opened 10 years ago Closed 6 years ago

Intermittent test_getUserMedia_constraints_mobile.html,test_getUserMedia_gumWithinGum.html | Test timed out.

Categories

(Core :: WebRTC, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox36 --- wontfix
firefox37 --- disabled
firefox38 --- disabled
firefox-esr31 --- unaffected
b2g-v2.2 --- disabled
b2g-master --- disabled

People

(Reporter: KWierso, Assigned: jib)

Details

(Keywords: intermittent-failure)

Attachments

(2 files, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=47424870&tree=Mozilla-Inbound
b2g_emulator_vm mozilla-inbound opt test mochitest-7 on 2014-09-04 12:34:26 PDT for push 65dead1492ba

slave: tst-linux64-spot-119



12:42:25     INFO -  TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams.
12:42:26     INFO -  46 INFO TEST-OK | /tests/dom/media/tests/mochitest/test_getUserMedia_constraints_mobile.html | took 2886ms
12:42:28     INFO -  47 INFO TEST-START | /tests/dom/media/tests/mochitest/test_getUserMedia_exceptions.html
12:42:29     INFO -  TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams.
12:42:30     INFO -  48 INFO TEST-OK | /tests/dom/media/tests/mochitest/test_getUserMedia_exceptions.html | took 2813ms
12:42:31     INFO -  49 INFO TEST-START | /tests/dom/media/tests/mochitest/test_getUserMedia_gumWithinGum.html
12:42:33     INFO -  TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams.
12:42:36     INFO -  ###################################### forms.js loaded
12:42:36     INFO -  ############################### browserElementPanning.js loaded
12:42:36     INFO -  ######################## BrowserElementChildPreload.js loaded
12:47:48     INFO -  dumping last 1 message(s)
12:47:48     INFO -  if you need more context, please use SimpleTest.requestCompleteLog() in your test
12:47:48     INFO -  50 INFO Call getUserMedia for {"video":true,"fake":true}
12:47:48     INFO -  51 INFO TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_getUserMedia_gumWithinGum.html | Test timed out. - expected PASS
12:47:49     INFO -  52 INFO TEST-OK | /tests/dom/media/tests/mochitest/test_getUserMedia_gumWithinGum.html | took 317837ms
12:47:50     INFO -  53 INFO TEST-START | /tests/dom/media/tests/mochitest/test_getUserMedia_playAudioTwice.html
12:47:51     INFO -  TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams.
12:47:54     INFO -  54 INFO TEST-OK | /tests/dom/media/tests/mochitest/test_getUserMedia_playAudioTwice.html | took 3431ms
12:47:54     INFO -  55 INFO TEST-START | /tests/dom/media/tests/mochitest/test_getUserMedia_playVideoAudioTwice.html
12:47:55     INFO -  TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams.
12:47:58     INFO -  56 INFO TEST-OK | /tests/dom/media/tests/mochitest/test_getUserMedia_playVideoAudioTwice.html | took 3503ms
12:47:59     INFO -  57 INFO TEST-START | /tests/dom/media/tests/mochitest/test_getUserMedia_playVideoTwice.html
12:48:01     INFO -  TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams.
12:48:03     INFO -  58 INFO TEST-OK | /tests/dom/media/tests/mochitest/test_getUserMedia_playVideoTwice.html | took 4596ms
12:48:04     INFO -  59 INFO TEST-START | /tests/dom/media/tests/mochitest/test_getUserMedia_stopAudioStream.html
12:48:06     INFO -  TEST DEVICES: No test devices found (in media.{audio,video}_loopback_dev, using fake streams.
12:48:08     INFO -  60 INFO TEST-OK | /tests/dom/media/tests/mochitest/test_getUserMedia_stopAudioStream.html | took 3803ms
12:48:09     INFO -  61 INFO TEST-START | /tests/dom/media/tests/mochitest/test_getUserMedia_stopAudioStreamWithFollowupAudio.html
Martin, the recent spike appears to correlate with your recent changes :(
Flags: needinfo?(martin.thomson)
OS: Gonk (Firefox OS) → All
Summary: Intermittent test_getUserMedia_gumWithinGum.html | Test timed out. - expected PASS → Intermittent test_getUserMedia_constraints_mobile.html,test_getUserMedia_gumWithinGum.html | Test timed out.
Well, hooray for the b2g emulator.  In this case, I'd rather we look at disabling these tests on bog-slow platforms until the underlying issues with gUM are resolved.  I don't know about the specific configuration here, though :jib might.

I just r+'d bug 997365, which removes the first offending test file.  That might help.  Though it looks like there is a persistent timeout in getUserMedia, which is likely systemic.
Flags: needinfo?(martin.thomson) → needinfo?(jib)
(In reply to Martin Thomson [:mt] from comment #34)
> Well, hooray for the b2g emulator.  In this case, I'd rather we look at
> disabling these tests on bog-slow platforms until the underlying issues with
> gUM are resolved.  I don't know about the specific configuration here,
> though :jib might.

On a good run [1] test_getUserMedia_constraints_mobile.html & co take < 12 seconds each. On a bad run (pick any) they takes > 5 minutes each. They're also back-to-back, which makes me agree about systemic, maybe some underlying gonk hang-up?

> I just r+'d bug 997365, which removes the first offending test file.  That might help.

Not likely, as it merely removes a variation of the test. The test is still there (minus the _mobile postfix).

[1] http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-central-emulator-debug/1422639160/mozilla-central_ubuntu64_vm-b2g-emulator-debug_test-mochitest-debug-12-bm116-tests1-linux64-build18.txt.gz
Flags: needinfo?(jib)
Attached file MozReview Request: bz://1063290/mt (obsolete) (deleted) —
/r/3201 - Bug 1063290 - Disabling extended gUM tests on gonk for intermittent failures, r=jib

Pull down this commit:

hg pull review -r 8b009449201f692d9578175aaeda67bc4bbeb983
Attachment #8557315 - Flags: review?(jib)
Comment on attachment 8557315 [details]
MozReview Request: bz://1063290/mt

https://reviewboard.mozilla.org/r/3199/#review2619

Is there an action plan for turning these on, or do you plan to leave this bug open?

::: dom/media/tests/mochitest/mochitest.ini
(Diff revision 1)
> -[test_getUserMedia_constraints_mobile.html]
> -skip-if = toolkit != 'gonk' && toolkit != 'android' # Bug 907352, backwards-compatible behavior on mobile only

This assumes my patch lands shortly after, or test stops running on android. It'd probably be cleaner to give it the same treatment here as the others.

::: dom/media/tests/mochitest/mochitest.ini
(Diff revision 1)
>  [test_getUserMedia_stopAudioStream.html]
> +skip-if = toolkit == 'gonk' # Bug 1063290, intermittent timeout
>  [test_getUserMedia_stopAudioStreamWithFollowupAudio.html]
> +skip-if = toolkit == 'gonk' # Bug 1063290, intermittent timeout

The stopAudio ones have not stalled. Should we leave them?
Attachment #8557315 - Flags: review?(jib) → review+
(In reply to Jan-Ivar Bruaroey [:jib] from comment #41)
> Is there an action plan for turning these on, or do you plan to leave this
> bug open?

I have no plan currently, but I've marked this leave-open.

> ::: dom/media/tests/mochitest/mochitest.ini
> (Diff revision 1)
> > -[test_getUserMedia_constraints_mobile.html]
> > -skip-if = toolkit != 'gonk' && toolkit != 'android' # Bug 907352, backwards-compatible behavior on mobile only
> 
> This assumes my patch lands shortly after, or test stops running on android.
> It'd probably be cleaner to give it the same treatment here as the others.

Ack, will do.

> ::: dom/media/tests/mochitest/mochitest.ini
> (Diff revision 1)
> >  [test_getUserMedia_stopAudioStream.html]
> > +skip-if = toolkit == 'gonk' # Bug 1063290, intermittent timeout
> >  [test_getUserMedia_stopAudioStreamWithFollowupAudio.html]
> > +skip-if = toolkit == 'gonk' # Bug 1063290, intermittent timeout
> 
> The stopAudio ones have not stalled. Should we leave them?

The tests abort after several failures.  I don't want to disable some and find that the remaining ones are busted too.

> 19:37:05 INFO - 1639 INFO TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up. - expected PASS 
> 19:37:05 INFO - 1640 INFO TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 174 remaining tests. - expected PASS 

No sense in tempting fate.  Let's stop screwing other people over with these failures and try to get to root causes without it being a screaming rush.

There's nothing particularly remarkable about any of these tests, so it's hard to identify one as triggering a storm of problems.  Maybe it's just that the constraints tests are leaving too many un-garbage-collected locks on devices or something and subsequent tests just can't get a device.

What is interesting is that we have one audio and one video test remaining on gonk.  Those haven't failed (yet, or much).
https://hg.mozilla.org/mozilla-central/rev/157dbfa50820
Assignee: nobody → martin.thomson
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Well, except you didn't actually mark it leave-open.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Keywords: leave-open
Any news here? This is currently #8 overall on OrangeFactor.
Flags: needinfo?(martin.thomson)
Hi Ryan, I'm going to find a new owner for this.  Martin's busy, and this part of the code is not his expertise.  

FYI: When I reviewed our (WebRTC's) oranges on Monday, this wasn't that bad, but it's gotten horrible since then.  In fact, back in December, I almost closed this out because we had gone 3+ month without a single hit.  I'll note that jesup landed the upgrade to webrtc 40 about a week ago, and the Android gUM code has been very unhappy since.  gcp has been helping to make it happy again.  So this could be fallout from that upgrade.  In any case, I'll find the right owner/solution tomorrow.  Feel free to ping me directly in the meantime.
Flags: needinfo?(martin.thomson)
I've asked jib to add additional logging so we can get more info about what's happening here.  Once we have more info, we'll assess whether we need to put someone on this full-time immediately or if we can temporarily disable the affected tests until folks free up in a few weeks.
Assignee: martin.thomson → jib
Attachment #8557315 - Flags: review+ → review?(jib)
Comment on attachment 8557315 [details]
MozReview Request: bz://1063290/mt

/r/3201 - Bug 1063290 - Disabling most gUM tests on android too, r=jib

Pull down this commit:

hg pull review -r dd4490e4c0dde053cbb27135ee198b2fe2cfa6f0
:jib, can you have a look at this new patch please?

Since we are in awkward time zones, would you mind hitting checkin-needed if this is OK to ship?
Flags: needinfo?(jib)
Attachment #8557315 - Flags: review?(jib) → review+
Flags: needinfo?(jib)
Keywords: checkin-needed
Thanks.  That was quicker than I expected.
NSPR logs on android are tricky, apparently robotium doesn't like commas? :-P

> ERROR buildRobotiumConfig: browserEnv - Found a ',' in our value, unable to process value. 
> key=NSPR_LOG_MODULES,value=GetUserMedia:5,MediaManager:5,MediaStreamGraph:5,TrackUnionStream:5,signaling:5

Trying more pushes with individual flags and with the usual suspects of escape characters...
Sorry for the late update. I've made little headway on getting NSPR logs due to Bug 977589. Without commas, I can only log one module at a time, and I can't add timestamps, so correlating logs with the hang is difficult (I know of no way to merge NSPR with regular logs on try).

This "MediaManager:5" good/bad log-diff from [1] suggests it hangs around here:

>   OnNavigation for 613
> - Audio device 0 deallocated
>   Listener removed by DOM Destroy(), mFinished = 0
>   OnNavigation for 614
>   OnNavigation for 614
> - Selected video device
> - Selected audio device
> - started all sources
> - Returning success for getUserMedia()
> - Selected video device
> - started all sources
> - Returning success for getUserMedia()
>   OnNavigation for 615
> - Listener removed on purpose, mFinished = 0
> - Listener removed on purpose, mFinished = 0
>   Sent recording-window-ended for window 615 (outer 9)

But I think this tells us very little. I'll run it again with :9 and other modules.

I tried "all:5" but this produces 700MB+ logs which seem to cause other things to fail on a good log [2], so I haven't pursued that further for fear of wreaking havoc on the builders [2].

Separately, I tried to roll back the two main suspects, but wasn't able to get them working cleanly on try (with the size of the patches and related patches, it is entirely possible I missed something). Included only for the curious [3][4].

Gcp (or others), any tips on what I can do next?

[1] MediaManager:5 https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb1cc1a5b65b
[2] all:5 https://treeherder.mozilla.org/#/jobs?repo=try&revision=28784d7dc1c4
[3] Bug 1119593 https://treeherder.mozilla.org/#/jobs?repo=try&revision=310d2ed43cb4
[4] Bug 1109248 https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fb0ecc5ea6f
Flags: needinfo?(gpascutto)
I've worked around Bug 977589 so lets see if NSPR logs turns up anything:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=15b47f3d3c92
Flags: needinfo?(gpascutto)
Got one.
Rank: 28
Priority: -- → P2
backlog: --- → webRTC+
Attachment #8557315 - Attachment is obsolete: true
Attachment #8618299 - Flags: review+
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
The leave-open keyword is there and there is no activity for 6 months.
:jib, maybe it's time to close this bug?
Flags: needinfo?(jib)
Thanks, yes. Tests appear enabled and working.
Status: REOPENED → RESOLVED
Closed: 10 years ago6 years ago
Flags: needinfo?(jib)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: