Closed Bug 996465 Opened 10 years ago Closed 10 years ago

Intermittent test_fragment_play.html | big.wav#t=5 fragment test: seeked currentTime is 5.675918 != 5

Categories

(Core :: Audio/Video, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox31 --- disabled
firefox32 --- disabled
firefox33 --- fixed
firefox-esr24 --- unaffected
b2g-v2.0 --- disabled
b2g-v2.1 --- fixed

People

(Reporter: cbook, Assigned: jwwang)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 6 obsolete files)

b2g_emulator_vm mozilla-inbound opt test mochitest-3 on 2014-04-14 22:10:29 PDT for push 4743c6a40048

slave: tst-linux64-spot-310

https://tbpl.mozilla.org/php/getParsedLog.php?id=37815622&tree=Mozilla-Inbound



1278 INFO TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_fragment_play.html | big.wav#t=5 fragment test: seeked currentTime is 5.675918 != 5
04-15 05:33:50.568   720   720 I GeckoDump: 1278 INFO TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_fragment_play.html | big.wav#t=5 fragment test: seeked currentTime is 5.675918 != 5
http://hg.mozilla.org/mozilla-central/file/b85b57f05fda/content/html/content/src/HTMLMediaElement.cpp#l3008
"seeked" event is dispatched asynchronously. Therefore playback position could change again before the event is fired on the main thread.

http://hg.mozilla.org/mozilla-central/file/b85b57f05fda/content/media/MediaDecoderStateMachine.cpp#l2037
But, it looks like MediaDecoderStateMachine is trying to notify "seek stopped" in a synchronous fashion.

http://dev.w3.org/html5/spec-preview/media-elements.html#seeking
However, the spec says in step 14 queuing a task to fire the "seeked" event.

http://hg.mozilla.org/mozilla-central/file/b85b57f05fda/content/media/test/fragment_play.js#l22
So should we relax the test condition a bit?
Flags: needinfo?(cpearce)
In MediaDecoder::SeekingStopped(), it calls PlaybackPositionChanged() to update the currentTime which is the official playback position.

http://hg.mozilla.org/mozilla-central/file/b85b57f05fda/content/media/MediaDecoderStateMachine.cpp#l2026
It is suspicious that MediaDecoderStateMachine would update the playback position to something other than to seek target position before dispatching "seek stopped" event to MediaDecoder which will change the official position read by MediaDecoder.
(In reply to JW Wang [:jwwang] from comment #19)
> http://hg.mozilla.org/mozilla-central/file/b85b57f05fda/content/html/content/
> src/HTMLMediaElement.cpp#l3008
> "seeked" event is dispatched asynchronously. Therefore playback position
> could change again before the event is fired on the main thread.

I think that's correct.


> http://hg.mozilla.org/mozilla-central/file/b85b57f05fda/content/media/
> MediaDecoderStateMachine.cpp#l2037
> But, it looks like MediaDecoderStateMachine is trying to notify "seek
> stopped" in a synchronous fashion.

I can't remember why the SeekingStarted/SeekingStopped calls are dispatched synchronously.

I traced them all the way back to them initially landing in 2010 in nsOggPlayStateMachine.cpp, but there's no rationale there:
http://hg.mozilla.org/mozilla-central/rev/1d00691be5f2

I suspect it's got something to do with wanting the readyState to be up to date while we're seeking.

We could try making those event dispatches async and checking what breaks.



> http://dev.w3.org/html5/spec-preview/media-elements.html#seeking
> However, the spec says in step 14 queuing a task to fire the "seeked" event.
> 
> http://hg.mozilla.org/mozilla-central/file/b85b57f05fda/content/media/test/
> fragment_play.js#l22
> So should we relax the test condition a bit?

What if we called v.play() inside onSeeked() instead of inside onLoadedMetadata()?


(In reply to JW Wang [:jwwang] from comment #20)
> In MediaDecoder::SeekingStopped(), it calls PlaybackPositionChanged() to
> update the currentTime which is the official playback position.
> 
> http://hg.mozilla.org/mozilla-central/file/b85b57f05fda/content/media/
> MediaDecoderStateMachine.cpp#l2026
> It is suspicious that MediaDecoderStateMachine would update the playback
> position to something other than to seek target position before dispatching
> "seek stopped" event to MediaDecoder which will change the official position
> read by MediaDecoder.

If we're doing a fastSeek(), we may not actually end up at the seek target, we'll end up at the keyframe before the target. In the WhatWG version of the spec, which is what we've traditionally followed for media, we're supposed to update the current playback position to be the seek target before seeking, but in the fastSeek case we don't actually know where that is until after we seek. Now that we're integrating our own demuxer for MP4, that could change, but it requires work...

For a time in the non fastSeek case we wouldn't end up exactly at the seek target either, we'd be at a packet boundary, but now our seeking we should accurate down to the granularity of a single audio sample.
Flags: needinfo?(cpearce)
(In reply to Chris Pearce (:cpearce) from comment #23)
> What if we called v.play() inside onSeeked() instead of inside
> onLoadedMetadata()?

With comment 20 presented, the official playback position could still be changed even when we are paused while seeking. I will look into the code to see if it needs to be changed.

Although not specially stated in the spec, I think v.currentTime should equal the seeking target in the "seeked" callback which is expected by the developers. Since v.currentTime is the official playback position which is not necessarily the same as the current playback position, that should allow enough room for us to achieve "v.currentTime == seeking target" when fastSeek is in action.
Summary: Intermittent /tests/content/media/test/test_fragment_play.html | big.wav#t=5 fragment test: seeked currentTime is 5.675918 != 5 → Intermittent test_fragment_play.html | big.wav#t=5 fragment test: seeked currentTime is 5.675918 != 5
JW, do you have any idea why this might have spiked so heavily in the last week and a half or so?
Flags: needinfo?(jwwang)
2 possibilities I can think of:
1. changes in MediaDecoderStateMachine.cpp
2. changes in the machine which run B2G emulator tests on Try

As we can see most failures are on B2G, I guess the changes change the timing of CPU scheduling and make it more likely to happen on B2G.

The root cause is we have to release the lock when we notify the main thread that seek is complete [1]. It is possible for state machine thread to kick in to change the playback position when we are out of the scope of the lock. Therefore, there is a contention between 'seeked' fired on the main thread and changes in official playback position triggered by the state machine thread.

We need a way to synchronize the execution of 3 threads (main, decoding and state machine threads) under some situations. Using a lock won't do the work for we have occasions where we have to do jobs outside the lock. And when we go out of the lock, there is no more guarantee of the execution orders.

I don't have a solution for the synchronization yet which is more profound than I could imagine. But I will try to cook a patch which can somewhat fix the problem we are facing at hand.

[1] http://hg.mozilla.org/mozilla-central/file/606848e8adfc/content/media/MediaDecoderStateMachine.cpp#l2182
Flags: needinfo?(jwwang)
(In reply to JW Wang [:jwwang] from comment #205)
> I don't have a solution for the synchronization yet which is more profound
> than I could imagine. But I will try to cook a patch which can somewhat fix
> the problem we are facing at hand.

Yes, this is a problem we've encountered before. Typically we would just re-check the state machine's state after re-acquiring the monitor, to ensure we're still doing the right thing.

But yes, ideally we'd use a different pattern throughout the code, where issues like this didn't show up.
Disabled test_fragment_play.html on B2G & Android for intermittent failures (bug 996465 & bug 924246):
https://hg.mozilla.org/integration/mozilla-inbound/rev/8486d79fd1a8
Keywords: leave-open
Whiteboard: [test disabled on B2G & Android]
Flags: needinfo?(jwwang)
Still cooking and testing the fix...
Flags: needinfo?(jwwang)
Assignee: nobody → jwwang
Attached patch 996465_fix_seeked_currentTime_part1-v1.patch (obsolete) (deleted) — Splinter Review
Extract code about state machine scheduling into a class so that it is easier for further refactoring.
Attachment #8451463 - Flags: feedback?(jolin)
Attachment #8451463 - Flags: feedback?(ayang)
Attached patch 996465_fix_seeked_currentTime_part2-v1.patch (obsolete) (deleted) — Splinter Review
Add ability to delay running state machine cycles for synchronization between decoding/main and state machine threads. This is used to delay state machine cycles until seek operation is completed for we expect currentTime equals the seek target in 'seeked' event.

The previous attempt was to dispatch a runnable to the state machine thread to wait until seek operation is completed. By doing so, we can ensure no change in playback position before seeking completed. However, I soon came to realize that all state machines share the same state machine thread. Such a sync object will stutter the playback of other state machines. In other words, [1] could stutter playback of other media elements if main thread is busy.

I will file another bug to refactor the state machine code to remove such less-than-ideal code.

[1] http://hg.mozilla.org/mozilla-central/file/1dc6b294800d/content/media/MediaDecoderStateMachine.cpp#l2437
Attachment #8451467 - Flags: feedback?(rlin)
Attachment #8451467 - Flags: feedback?(bechen)
Comment on attachment 8451467 [details] [diff] [review]
996465_fix_seeked_currentTime_part2-v1.patch

Review of attachment 8451467 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. :)

::: content/media/MediaDecoderStateMachine.cpp
@@ +357,3 @@
>      return NS_ERROR_FAILURE;
> +  case SCHEDULER_STATE_FROZEN:
> +    mState = SCHEDULER_STATE_FROZEN_WITH_PENDING_TASK;

fallthrough and return?
Attachment #8451467 - Flags: feedback?(rlin) → feedback+
Comment on attachment 8451463 [details] [diff] [review]
996465_fix_seeked_currentTime_part1-v1.patch

Review of attachment 8451463 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaDecoderStateMachine.cpp
@@ +245,5 @@
> +  TimeStamp mTimeout;
> +  // The id of timer tasks, timer callback will only run if id matches.
> +  int mTimerId;
> +  // No more state machine cycles in shutdown state.
> +  State mState;

Can you rename it to XXXState and mXXXState since there is a |mState | already in MediaDecoderStateMachine.

@@ +3163,5 @@
>  {
> +  return mScheduler->GetStateMachineThread();
> +}
> +
> +bool MediaDecoderStateMachine::IsStateMachineScheduled() const {

nit. parentheses
Comment on attachment 8451467 [details] [diff] [review]
996465_fix_seeked_currentTime_part2-v1.patch

Review of attachment 8451467 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaDecoderStateMachine.cpp
@@ +226,5 @@
> +    mState = !IsScheduled() ? SCHEDULER_STATE_FROZEN :
> +                              SCHEDULER_STATE_FROZEN_WITH_PENDING_TASK;
> +    // Nullify pending timer task if any.
> +    ++mTimerId;
> +    mTimeout = TimeStamp();

What's the different if call ResetTimer() here?

@@ +228,5 @@
> +    // Nullify pending timer task if any.
> +    ++mTimerId;
> +    mTimeout = TimeStamp();
> +  }
> +

nit, extra new line.
Attachment #8451467 - Flags: feedback?(bechen) → feedback+
Attachment #8451463 - Flags: feedback?(jolin) → feedback+
Comment on attachment 8451467 [details] [diff] [review]
996465_fix_seeked_currentTime_part2-v1.patch

Review of attachment 8451467 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaDecoderStateMachine.cpp
@@ +226,5 @@
> +    mState = !IsScheduled() ? SCHEDULER_STATE_FROZEN :
> +                              SCHEDULER_STATE_FROZEN_WITH_PENDING_TASK;
> +    // Nullify pending timer task if any.
> +    ++mTimerId;
> +    mTimeout = TimeStamp();

ResetTimer() will call nsITimer::Cancel() which is not thread-safe and should only be called in the target thread. However, FreezeScheduling() can be called in any thread provided the lock is hold.

@@ +357,3 @@
>      return NS_ERROR_FAILURE;
> +  case SCHEDULER_STATE_FROZEN:
> +    mState = SCHEDULER_STATE_FROZEN_WITH_PENDING_TASK;

yes.
Comment on attachment 8451463 [details] [diff] [review]
996465_fix_seeked_currentTime_part1-v1.patch

Review of attachment 8451463 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaDecoderStateMachine.cpp
@@ +205,5 @@
> +    mMonitor.AssertCurrentThreadIn();
> +    return !mTimeout.IsNull();
> +  }
> +
> +  void ReleaseTimer() {

TODO: remove this function from the public interface. See the rationale below.

@@ +245,5 @@
> +  TimeStamp mTimeout;
> +  // The id of timer tasks, timer callback will only run if id matches.
> +  int mTimerId;
> +  // No more state machine cycles in shutdown state.
> +  State mState;

Sure, readability outweighs typing effort.

@@ +2492,5 @@
>        // state machine.
>        GetStateMachineThread()->Dispatch(
>          new nsDispatchDisposeEvent(mDecoder, this), NS_DISPATCH_NORMAL);
>  
> +      mScheduler->ReleaseTimer();

Since nsITimer is not thread-safe and StateMachineScheduler will be deleted in the main thread, we would like to release the timer here for it is not needed anymore in the target thread to avoid race. However, this somewhat breaks the encapsulation of StateMachineScheduler by leaking the implementation details to its user. StateMachineScheduler should be smart enough to know when to release |mTimer| without worrying the user of this class.
Attachment #8451463 - Flags: review?(cpearce)
Attachment #8451467 - Flags: review?(cpearce)
Comment on attachment 8451463 [details] [diff] [review]
996465_fix_seeked_currentTime_part1-v1.patch

Review of attachment 8451463 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but can you please move this new class definition out into its own cpp/h file pair? We should reduce the lines of code in MediaDecoderStateMachine.cpp where possible.

::: content/media/MediaDecoderStateMachine.cpp
@@ +1952,5 @@
>      nsresult rv = mAudioSink->Init();
>      if (NS_FAILED(rv)) {
>        DECODER_LOG(PR_LOG_WARNING, "Changed state to SHUTDOWN because audio sink initialization failed");
>        mState = DECODER_STATE_SHUTDOWN;
> +      mScheduler->Shutdown();

I think we should rename Shutdown() to ScheduleAndShutdown(), as it's not obvious that the state machine will run again otherwise.
Attachment #8451463 - Flags: review?(cpearce) → review-
Comment on attachment 8451467 [details] [diff] [review]
996465_fix_seeked_currentTime_part2-v1.patch

Review of attachment 8451467 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. r=cpearce, after rebase on previous patch after its comments are addressed.
Attachment #8451467 - Flags: review?(cpearce) → review+
Attached patch 996465_fix_seeked_currentTime_part1-v2.patch (obsolete) (deleted) — Splinter Review
Per comment 272, extract code into a new file.
Attachment #8451463 - Attachment is obsolete: true
Attachment #8451463 - Flags: feedback?(ayang)
Attachment #8453583 - Flags: review?(cpearce)
Attached patch 996465_fix_seeked_currentTime_part2-v2.patch (obsolete) (deleted) — Splinter Review
Per comment 273, rebase patch part2. r=cpearce
Attachment #8451467 - Attachment is obsolete: true
Attachment #8453584 - Flags: review+
try on b2g: https://tbpl.mozilla.org/?tree=Try&rev=4a04fb7a4b4d
no failure on test_fragment_play.html in 99 runs.

try on android: https://tbpl.mozilla.org/?tree=Try&rev=cfb4b5594a50
most green.

try on others: https://tbpl.mozilla.org/?tree=Try&rev=692d265cfb97
most green.
Comment on attachment 8453583 [details] [diff] [review]
996465_fix_seeked_currentTime_part1-v2.patch

Review of attachment 8453583 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Thanks for pulling the code out into a separate smaller file. Small classes are easier to understand, and the StateMachine is not easy to understand; it needs to be smaller.

r=cpearce with the style nits fixed.

Thanks!

::: content/media/MediaDecoderStateMachineScheduler.cpp
@@ +10,5 @@
> +#include "nsITimer.h"
> +#include "nsComponentManagerUtils.h"
> +
> +// Number of microseconds per millisecond.
> +static const int64_t USECS_PER_MS = 1000;

This is already declared in VideoUtils.h.

@@ +41,5 @@
> +
> +NS_IMPL_ISUPPORTS(TimerEvent, nsITimerCallback, nsIRunnable);
> +} // anonymous namespace
> +
> +static already_AddRefed<nsIEventTarget> CreateStateMachineThread() {

The mozilla coding style guidelines say this should be styled as:

static already_AddRefed<nsIEventTarget>
CreateStateMachineThread()
{
  //...
}

All the functions in this file need to be restyled this way.

Consistent style makes it easier for people unfamiliar with the code to read it.

::: content/media/MediaDecoderStateMachineScheduler.h
@@ +46,5 @@
> +
> +private:
> +  void ResetTimer();
> +
> +private:

You don't need this second "private:"
Attachment #8453583 - Flags: review?(cpearce) → review+
Attached patch 996465_fix_seeked_currentTime_part1-v3.patch (obsolete) (deleted) — Splinter Review
Fix nits.
Attachment #8453583 - Attachment is obsolete: true
Attachment #8453669 - Flags: review+
Attached patch 996465_fix_seeked_currentTime_part2-v3.patch (obsolete) (deleted) — Splinter Review
rebased on part1-v3.patch
Attachment #8453584 - Attachment is obsolete: true
Attachment #8453670 - Flags: review+
Almost forgot this patch to enable test_fragment_play.html again.
Try in comment 277 run with test_fragment_play.html enabled. Worry not.
Attachment #8454202 - Flags: review?(cpearce)
Attachment #8454202 - Flags: review?(cpearce) → review+
Keywords: checkin-needed
Hardware: x86 → ARM
Ah, this is a problem caused by unified build...
Ok, I will inline UsecsToDuration for it is only used once in MediaDecoderStateMachineScheduler.cpp.
Fix build error: inline UsecsToDuration() which conflicts the one in MediaDecoderStateMachine.cpp.
Attachment #8453669 - Attachment is obsolete: true
Attachment #8454362 - Flags: review+
rebase on part1-v4.
Attachment #8453670 - Attachment is obsolete: true
Attachment #8454364 - Flags: review+
try builds: https://tbpl.mozilla.org/?tree=Try&rev=c95d801258cb
Sorry for the trouble. Please try to check in the patches again. Thanks.
Keywords: checkin-needed
No. I would expect if this fails again in the future, it would fail for another reason. I would prefer to stop this bug from growing comments...
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: