814718 - Intermittent failure in basic gUM tests | canplaythrough event never fired

Reporter

Description

•

12 years ago

Followup on try run from https://bugzilla.mozilla.org/show_bug.cgi?id=781534#c50 - apparently the canplaythrough event never manages to fire in the gum test on OS X 10.7 and 10.8 opt builds. Actual try run results here - https://tbpl.mozilla.org/?tree=Try&rev=f7e037dc5360. Apparently it's only happening with basic video test.

Jason Smith [:jsmith]

Reporter

Updated

•

12 years ago

Blocks: 781534

Jason Smith [:jsmith]

Reporter

Comment 1

•

12 years ago

Blocker cause we can't pref on gum automation without it, which is a blocker as well.

Whiteboard: [getUserMedia] [blocking-gum+]

Jason Smith [:jsmith]

Reporter

Updated

•

12 years ago

Whiteboard: [getUserMedia] [blocking-gum+] → [getUserMedia] [blocking-gum+] [automation-blocked]

Jason Smith [:jsmith]

Reporter

Updated

•

12 years ago

Blocks: 814807

Jason Smith [:jsmith]

Reporter

Updated

•

12 years ago

No longer blocks: 781534

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 2

•

12 years ago

Maire, can we please get this higher prioritized given that it blocks our automation efforts?

Paul Adenot (:padenot)

Comment 3

•

12 years ago

Taking that per discussion during the 2012/12/04 WebRTC meeting. If this is indeed MacOS specific, I'll receive a Macbook in a couple days. I won't be able to investigate until then.

Assignee: nobody → paul

Status: NEW → ASSIGNED

Paul Adenot (:padenot)

Comment 4

•

12 years ago

A bit more on this. The "canplaythrough" event fires when I run only the test test_getUserMedia_basicVideo.html. However, when I run the whole suite, I can reproduce. Now that I can repro, I'm going to dig a little further.

Paul Adenot (:padenot)

Comment 5

•

12 years ago

Another update: the root problem is at [1]. We don't go in this if(), and we should: if I log in this function, when we switch from blocked to not blocked, we don't notify the listener, that does not notify the element, and so on. I'm going to try to fix the problem, if this explanation makes sense. [1]: http://mxr.mozilla.org/mozilla-central/source/content/media/MediaStreamGraph.cpp#813

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 6

•

12 years ago

yep

Jason Smith [:jsmith]

Reporter

Comment 7

•

12 years ago

Paul says he can't reproduce on today's build. he just kicked off a try build with the basic video test prefed on to see what happens: https://tbpl.mozilla.org/?tree=Try&rev=4a43d50ab3cc

Paul Adenot (:padenot)

Comment 8

•

12 years ago

The try push shows that the problem still occurs, but I still can't reproduce locally. I'll try harder.

Jason Smith [:jsmith]

Reporter

Comment 9

•

12 years ago

So bug 802538's try run reveals this is happening generally with any of the mochitests. And it's not just OS X opt builds either.

Keywords: intermittent-failure

Summary: canplaythrough event on a media element with gum media stream with video fails to fire on OS X 10.7/10.8 opt builds → canplaythrough event on a media element with gum media stream with video or audio fails to fire intermittently

Paul Adenot (:padenot)

Comment 10

•

12 years ago

Here is the current state of understanding of this issue: What should happen (i.e. how "canplaythrough" is fired): - Whenever the blocking state of a stream changes (from Blocked to not blocked or the inverse), in the |MediaStreamGraphImpl::UpdateCurrentTime|, the graph should call |nsHTMLMediaElement::StreamListener::NotifyBlockingChanged|. This dispatches an event on the main thread (after all the streams have been updated), that marks the streams as unblocked in the listener, and call |UpdateReadyStateForData| on the element, with |NEXT_FRAME_AVAILABLE| as parameter we it is not blocked anymore, that end up calling |nsHTMLMediaElement::ChangeReadyState| withe |HAVE_ENOUGH_DATA| as parameter, that fires "canplaythrough". What actually happens (i.e. why it is not fired): - On try, and very rarely on my local machine, and only when running the full suite (not the individual test), very frequently on MacOS opt, sometimes on MacOS debug, the MediaStreamGraph experiences a global underrun (the code to detect that is at the beginning of the |MediaStreamGraphImpl::UpdateCurrentTime| method). I failed to notice this at the beginning, because LOG macros don't output on try when doing an opt build. This moves the |mStateComputedTime| forward to the |nextCurrentTime|, and set all the streams as blocking from |mStateComputedTime|. The stream becomes blocked, and everything is dispatched properly to the element. Next time the MediaStreamGraph thread runs, the stream is logically not blocked anymore, but should be reported as blocked at |prevCurrentTime|, and we _should_ dispatch an event to the main thread saying that we are not blocked anymore, since we have advanced the |mCurrentTime| time to compensate the underrun that made us set the stream as blocked. The problem that seem to happen is that |prevCurrentTime| is wrong for some reason OR there is a bug in TimeVarying::GetAt, and this line reports that the stream was not blocked. > // Save current blocked status > bool wasBlocked = stream->mBlocked.GetAt(prevCurrentTime); Because we only notify the element via the listener if there is a blocking state change from |prevCurrentTime| to |nextCurrentTime| (in the while loop) and because the streams are marked as not blocked during this period, the |wasBlocked != blocked| condition is never true, and we don't notify the element. When outputing the values, |prevCurrentTime| in the function call where we fail to notify the element is the |nextCurrentTime| of the previous call, so this seems alright to me. Then it is normal that the stream is not blocked at that time, because we specifically updated the graph time to bring it to a point where it is not blocked to compensate for the underrun, so I believe the underrun compensation code is wrong: it should set the stream as blocked until the next time we run the graph, so we report the the stream was blocked at |prevCurrentTime| in the next run. Hopefully this makes sense. I have a 3g connection this week, so I'll be able to check my email, please tell me if you need clarifications or something.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 11

•

12 years ago

I suspect the problem happens like this: 1) The global underrun happens and we set mCurrentTime and mStateComputedTime to the same value. The stream is marked blocking from prevCurrentTime onward and we notify the element. 2) After that, we recompute blocking status for the stream and decide it should not be blocked from mStateComputedTime onward. So we set that in its mBlocking. Note that mBlocking is now false at mCurrentTime! 3) In the next iteration of UpdateCurrentTime, we proceed normally. wasBlocked is set to false since at prevCurrentTime mBlocking is false. Basically the problem is that we have forgotten our old blocking status; it's been overwritten when mStateComputedTime == mCurrentTime. We should store it separately on each stream instead of utilizing mBlocked.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 12

•

12 years ago

Attached patch fix (hopefully) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=05b9fe9c8865

Assignee: paul → roc

Attachment #696822 - Flags: review?(rjesup)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 13

•

12 years ago

Thanks Paul, I wouldn't have figured that out without your data.

fix (hopefully) 12 years ago Robert O'Callahan (:roc) (email my personal email if necessary) (deleted), patch	jesup : review+ whimboo : checkin+	Details \| Diff \| Splinter Review
Remove Incorrect Gum Tests 12 years ago Jason Smith [:jsmith] (deleted), patch	roc : review+ whimboo : checkin+	Details \| Diff \| Splinter Review