1188313 - Fast seek (inaccurate seeks) will almost certainly leads to broken A/V Sync

Assignee

Description

•

9 years ago

Was reviewing the seek code as part of bug 1159343. And I took notice of the PrevSyncPoint seek type. So if we seek to any time, the audio seek will likely be very accurate (as all frames are keyframes), but the video could be several seconds behind. It doesn't appear that there is anything in place to dismiss all future video frames until we've caught up with audio. Or maybe I'm reading it wrong.

Jean-Yves Avenard [:jya]

Assignee

Updated

•

9 years ago

Flags: needinfo?(cpearce)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 1

•

9 years ago

With PrevSyncPoint seeking, the MP4Reader seeks the video stream first, finds the keyframe time, and then seeks the audio to that time. MediaFormatReader should be doing the same thing.

Flags: needinfo?(cpearce)

Jean-Yves Avenard [:jya]

Assignee

Comment 2

•

9 years ago

Attached patch P1. Attempt to seek audio near video. (deleted) — Details — Splinter Review

Do not only rely on the MediaDecoderState machine to keep A/V sync after a seek as should we seeked in fast mode ; it wouldn't

Attachment #8645701 - Flags: review?(cpearce)

JW Wang [:jwwang] [:jw_wang]

Comment 3

•

9 years ago

I don't understand why we can't count on MDSM for A/V sync. Even if the timestamps do not align after seeking, MDSM should still be able to handle that. It is like audio/video tracks have different start times and MDSM would send silent frames for padding.

Jean-Yves Avenard [:jya]

Assignee

Comment 4

•

9 years ago

neither do I. But unless we did so, we had broken A/V sync (bug 1105066) I have managed to reproduce it with some youtube videos recently. Now there probably will be a better fix in MDSM, but in the mean time.. this works well and simply restore a past behaviour available in MP4Reader and MediaSourceReader.

Jean-Yves Avenard [:jya]

Assignee

Comment 5

•

9 years ago

JW, comment https://bugzilla.mozilla.org/show_bug.cgi?id=1105066#c5 is the interesting part for you.

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 6

•

9 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #0) > Was reviewing the seek code as part of bug 1159343. > > And I took notice of the PrevSyncPoint seek type. > > So if we seek to any time, the audio seek will likely be very accurate (as > all frames are keyframes), but the video could be several seconds behind. > > It doesn't appear that there is anything in place to dismiss all future > video frames until we've caught up with audio. > > Or maybe I'm reading it wrong. You are reading it wrong. With PrevSyncPoint seek type, we seek both audio and video streams to the *video* keyframe preceding the seek target. So that means we must seek the video stream to the video keyframe preceding the seek target, and then seek the audio stream to that video keyframe's timestamp. From your comment above you've implied that we should be seeking both streams as close as possible to the audio keyframe preceding the seek target, but that's not the case. The goal of PrevSyncPoint is to seek such that you can playback immediately at the *video* keyframe preceding the seek target as fast as possible. The goal with fastSeek is to eliminate the delay in video decoding when we decode from the previous keyframe up to the seek target in order to fulfil an accurate seek request; this could be very slow on mobile. If you integrated with the demuxer closely (as we probably can now do with the new MediaFormatReader), you can probably calculate in advance of the video seek completing where the target video keyframe is, and so you could seek the audio stream in parallel.

Jean-Yves Avenard [:jya]

Assignee

Comment 7

•

9 years ago

(In reply to Chris Pearce (:cpearce) from comment #6) > With PrevSyncPoint seek type, we seek both audio and video streams to the > *video* keyframe preceding the seek target. So that means we must seek the > video stream to the video keyframe preceding the seek target, and then seek > the audio stream to that video keyframe's timestamp. > > From your comment above you've implied that we should be seeking both > streams as close as possible to the audio keyframe preceding the seek > target, but that's not the case. The goal of PrevSyncPoint is to seek such > that you can playback immediately at the *video* keyframe preceding the seek > target as fast as possible. sure, the problem here is that no such information is available to the MediaDecoderReader. There's nothing that tells it what type of seeking it should be doing. We've always assumed that the MDSM will then do the right thing. > > The goal with fastSeek is to eliminate the delay in video decoding when we > decode from the previous keyframe up to the seek target in order to fulfil > an accurate seek request; this could be very slow on mobile. > > If you integrated with the demuxer closely (as we probably can now do with > the new MediaFormatReader), you can probably calculate in advance of the > video seek completing where the target video keyframe is, and so you could > seek the audio stream in parallel. This is exactly what the patch I've submitted for your review is doing. There is one particular corner case I had to implement however is that you can't always seek the audio exactly to the video keyframe ; this issue was noted in bug 1173792. When you start youtube with a parameter so it seeks exactly at some value (in the bug 61s) ; it loads video with [58-70] and audio with [60-70] The keyframe in the video buffer is at 58s ; it's not possible to seek audio to 58s as there's no such data in the source buffers, and it will never be loaded. So what the MediaFormatReader does instead is attempt to seek to where the video seeked to ; and if that failed due to missing data ; then it will seek instead to the closest buffered point. If the closest buffered point is further than the original seek, then it will enter in WAITING_FOR_DATA mode and will not resolve the seek. Now, we could handle the case where we couldn't exactly seek the audio and video to the same point, like the example in bug 1173792. However, this would required the MediaFormatReader to start decoding frames and drop them before returning to the MDSM. This is a very easy thing to do, but if the aim of the fast seek is to seek as quickly as possible, there's no guarantee the gap between audio and video is going to be small. For all we know, the key frames could be 128 frames appart, requiring several seconds to decode + drop. So as such, I think that not handling this case is better from a performance point of view.

Chris Pearce [:cpearce (Not reading bugmail)]

Updated

•

9 years ago

Attachment #8645701 - Flags: review?(cpearce) → review+

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 8

•

9 years ago

(In reply to Chris Pearce (:cpearce) from comment #6) > You are reading it wrong. My understanding of how seeking worked predated the great MSDM refactoring, where every seek was accurate by default, whereas now every seek is fast by default. My understanding has been updated now.

JW Wang [:jwwang] [:jw_wang]

Comment 9

•

9 years ago

AudioSink will pad silence to fill the gap between the start time and the first audio sample when |start time < sample.mTime|. However, I can't find the code to deal with the case where |sample.mTime < start time|. AudioSink should discard samples predating the start time. Checking...

Jean-Yves Avenard [:jya]

Assignee

Updated

•

9 years ago

Blocks: 1193124

Pulsebot

Comment 10

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/4e6233840d00

Ryan VanderMeulen [:RyanVM]

Comment 11

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/4e6233840d00

Assignee: nobody → jyavenard

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox43: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla43

Jean-Yves Avenard [:jya]

Assignee

Updated

•

9 years ago

Blocks: 1197083

Ralph Giles (:rillian)

Comment 12

•

9 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/174a4342d074

status-firefox42: affected → fixed

Ralph Giles (:rillian)

Comment 13

•

9 years ago

Backed out for a youtube playback regression. See Bug 1199573. https://hg.mozilla.org/releases/mozilla-aurora/rev/5bb661db5c6c

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Ralph Giles (:rillian)

Updated

•

9 years ago

Status: REOPENED → RESOLVED

Closed: 9 years ago → 9 years ago

status-firefox42: fixed → affected

Resolution: --- → FIXED

Ralph Giles (:rillian)

Comment 14

•

9 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/f843378cfeb0

status-firefox42: affected → fixed