Closed Bug 818822 Opened 12 years ago Closed 11 years ago

Need to resample inputs to MediaStreamGraph

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla31

People

(Reporter: jesup, Assigned: padenot)

References

Details

(Whiteboard: [s=fx32][p=13, 1.5:p1, ft:webrtc])

Attachments

(4 files, 2 obsolete files)

Resample all inputs of the MediaStreamGraph to the ideal audio rate. r= 11 years ago Paul Adenot (:padenot) (deleted), patch	roc : review+	Details \| Diff \| Splinter Review
Update AudioConduit so it can work at 44.1kHz. r= 11 years ago Paul Adenot (:padenot) (deleted), patch	jesup : review+	Details \| Diff \| Splinter Review
Resample all inputs of the MediaStreamGraph to the ideal audio rate. r= 11 years ago Paul Adenot (:padenot) (deleted), patch		Details \| Diff \| Splinter Review
Update AudioConduit so it can work at 44.1kHz. r= 11 years ago Paul Adenot (:padenot) (deleted), patch		Details \| Diff \| Splinter Review
Reduce fake audio/video rates on b2g debug only to avoid overloading mochitest emulator VMs 11 years ago Randell Jesup [:jesup] (needinfo me) (deleted), patch	padenot : review+	Details \| Diff \| Splinter Review
Disable problematic test for B2G 11 years ago Randell Jesup [:jesup] (needinfo me) (deleted), patch	drno : review+	Details \| Diff \| Splinter Review

Randell Jesup [:jesup] (needinfo me)

Reporter

Description

•

12 years ago

We need to resample some/most inputs to the MediaStreamGraph in order to avoid drift issues with audio being sampled on clocks other than the CPU clock (or sampled on an entirely different machine). There have been considerable discussion (see below and bug 691234 for art of it), and roc, mreavy, derf and I talked extensively about it at IETF and other points. roc: you may want to summarize your thoughts/plan here. My understanding was that we would try clocking off the playback driver, and resample inputs. Handling mic inputs (which may have a different clock than the audio out!) is tricky, especially with echo cancellation, which really needs to see the actual samples that went out to the hardware. I *believe* the right way to handle that if the mic and speakers don't share a clock is to resample the "far-end" sample (going to the speakers) to match the input clock rate, since that's effectively what the differing hardware sample rates are doing to you and them anyways. From bug 691234 comment 41: Randell Jesup [:jesup] 2012-06-21 11:03:38 PDT Right - jitter buffers move a lot more than any reasonable hardware clock, so they need to do more than resample. The Time Scale Modification (TSM) stuff in NetEQ is a common and quite effective trick. (For anyone who didn't know.) I certainly would be tempted to use the hardware capture clock for incoming mic data, and the playout clock (not guaranteed to the be same as the capture clock!) for data from the PeerConnection, and if the two are connected directly (audio.src = getUserMedia_stream) then force the resample; but I realize that's not how MediaStreams work. An example for a videophone (DSP details omitted). Buffer nodes rebuffer into 10ms frames. Recording taps, VAD, control, etc all are omitted. A lot of this is inside the webrtc code, but it helps to understand the complexity of stream management. The wonder of Emacs artist-mode! ------------------------------------------------------------------------- RTP | | +-------------+ | Send audio thread | Jitter Buff | RTP reception thread | | Insert | | RTP +------+------+ | | | | +-----------------+ -------------+------------------------------------+ | Speech Encode | +------+------+ Receive audio | | | | Jitter Buff | thread | +--------+--------+ | Dequeue | | +--------+--------+ +------+------+ | | Mixer +- Other +------+------+ | +--------+--------+ sources | Decode | | +-------+--------+ | | | | Buffer | +------+------+ | +-------+--------+ +------+------+ | | |Time Scale | | +------+----------+ |Modification | | | Conference | +------+------+ | /--+Bridge (optional)| +------+------+ +-------------+ /---+- +-------+---------+ | Buffer +-------+ audio +--- | | | | | conf. bridge| | +------+-------+ +------+------+ +-------------+ | |AGC and output| +------+------+ | | level adjust | | Mixer +-- other sources | |& DTMF insert | | | | +------+-------+ +------+------+ | | +------+------+ | +------+-------+ |AGC, DC Filt | +---------------+| | Main | | & volume +------------+ fsb queue ++-----+ AEC | +------+------+ | || +-------+------+ +------+------+ +---------------+| +------+------+ |Playout audio| | |Capture audio| | driver | | | driver | +------+------+ | +------+------+ --------------+------------------------------------+-------------+--------- +------+------+ Linux Kernel | | Buffers | +------+-------+ +-------------+ | Buffers | | | +--------------+ +-------------+ | | | | +--------------+ +----------+--+ | | | +-------+------+ +--+-----------------------------------------------+-------+ | | | Hardware Audio Codec | | Note: Use HW Biquad filters to flatten response if avail | +--+-----------------------------------------+-------------+ | | ..|. | .. .. --+-- .. .. --+-- .. .. ( ) .......... ----- This let the output (mic -> RTP) thread run with microphone timestamps, and the input thread (after the TSM and associated buffer node) run with audio playout timestamps. RTP input to TSM runs with RTP source timestamps; the TSM takes the place of a resampler. The only complexity is at the cross input/output thread links, which would need resampling if the mic and playout clocks aren't synced (mine were), and mixing of other sources (ditto). --- It's probably helpful to label your clock sources; merging two streams with the same clock source doesn't need a resample; merging them with different sources does. The complication here is that MediaStreams run on internal (CPU/NTP) clock time; the only sources that do so are probably pre-recorded sources. So, for all sources that aren't internal (sourced at a video or audio decoder element), the inputs will need associated resamplers or timestamp transcoders. For audio, the sources will need to monitor input samples versus nominal frequency (at CPU/NTP clock rate), perhaps with a PLL, and use that to drive the settings for the resampler. For video, I would simply use the PLL to convert input timestamps to internal CPU timestamps. This might need some tweakery, as many RTP sinks assume video sources are at consistent cadences and timestamp deltas in order to provide smooth playback, generally some integer frame rate, or some integer number of frames each on a 30 (or 25!) fps boundary (i.e. timestamp deltas of 3000 at 90KHz timestamp rate.) Note that many audio sources may require a resampler anyways to feed into the core, so this would just affect settings for it. The PLL aspect of it does mean it may take a short time to converge after starting. We probably need this to be a core part of MediaStream data inputs, with an option to disable it if we know the source is synced to the CPU. Sorry for the long message...

Timothy B. Terriberry (:derf)

Comment 1

•

12 years ago

(In reply to Randell Jesup [:jesup] from comment #0) > I *believe* the right way to handle that if the mic and speakers don't share > a clock is to resample the "far-end" sample (going to the speakers) to match > the input clock rate, since that's effectively what the differing hardware > sample rates are doing to you and them anyways. I think we should do things almost exactly the other way around (and I think I've said this before): the audio output should be the master clock, and we should resample at the microphones to compensate for drift. Part of the reason for this is that while we will basically always have a single audio output device, we won't always have exactly one input device. In fact we will often have zero. In a system with multiple microphones, none of which are in use, which clock do you sync to? In any case, from a practical point of view, the Speex resampler was designed to handle exactly this case (small, continuously changing sample rate differences), and is already in the tree, so we should probably just use that. After bug 818327 it should even work.

Jean-Marc Valin

Comment 2

•

12 years ago

I tend to agree with Tim that syncing with the playback is better than syncing with capture, for exactly the same reasons. I don't think it makes any difference to the AEC whether it's the capture or the playback that's resampled. The other possibility is to go with the "global clock" (RTC/NTP-based I assume) that's used for the RTP timestamps. The advantage of this is that you never have to deal with any clock drift and/or RTP timestamp adjustments. I think the RTC is also much more accurate than what a soudcard gives you. Of course, the downside is that it means twice the resampling.

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 3

•

12 years ago

My apologies - it was late when I entered that bug. My discussions with roc involved using the output clock as the master, not the input clock - I got it backwards when writing it down. And as mentioned, that would mean resampling the input, not the output. I think locking to the playout clock also may help us control playout latency. Note that the output frequency may jump if the output destination changes.

Timothy B. Terriberry (:derf)

•

11 years ago

Blocks: 958090

Karl Tomlinson (:karlt)

Comment 6

•

11 years ago

When there are two different output cards connected to the MSG, I wonder whether there is anything ensuring that they are in sync. If nothing guarantees this, then something needs to be done on every output except one, to avoid drift.

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 7

•

11 years ago

Correct: nothing guarantees two outputs are synced. In theory the "right" way to do it is to have the graph self-divide (graph nodes are driven by the output), and when there's a cross-connect, force a resample. Most graphs will end in one output, and not cross-connect. In some rare instances a source might get played on both outputs, in which case at the join point (the source) you'd need to resample. It may be simpler to choose one output as "main", and resample anything sent to the other one to that (if both are in use). It gets trickier when there's a switch from two to one outputs active, or one to another, but probably not too problematic modulo that you might need to leave a small buffer in place after updating the resample rate which you wouldn't need if it had been originally run without a resampler. (ie. if A is direct (no resample), B is added (and resampled to match A), then A stops, B will still need a buffer even if the resampler is turned off (though maybe you can drift it slowly until the buffer underflows and you can remove the buffer...)

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

11 years ago

Assignee: roc → paul

Blocks: 970426, 694814, 929128

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 9

•

11 years ago

If we're going to provide a single mixed stream from the MediaStreamGraph back to the AEC (and to support driving the MSG from the outputs), we need to resample to 48000Hz at all inputs that aren't 48000Hz, even if they're don't need it for clock domain reasons (like gUM audio does). gUM (or anything else with a different clockbase) will need to be resampled/buffered.

Paul Adenot (:padenot)

Assignee

Comment 10

•

11 years ago

Attached patch Resample all inputs of the MediaStreamGraph to the ideal audio rate. r= (obsolete) (deleted) — Details — Splinter Review

Comment 11

•

11 years ago

Attached patch Update AudioConduit so it can work at 44.1kHz. r= (obsolete) (deleted) — Details — Splinter Review

This is necessary for the rest to work.

Attachment #8397196 - Flags: review?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 12

•

•

11 years ago

Blocks: MediaSegmentBase

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 16

•

11 years ago

Backed out for perma-orange on b2g emulator M10: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=cb894b5d342f http://hg.mozilla.org/integration/mozilla-inbound/rev/c0eaa844f2f0

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 17

•

11 years ago

Also https://tbpl.mozilla.org/php/getParsedLog.php?id=37161532&tree=Mozilla-Inbound may be new 13:16:47 INFO - [2358] ###!!! ASSERTION: Buffer underran: 'bufferEnd >= mCurrentTime', file /builds/slave/m-in-lx-d-00000000000000000000/build/content/media/MediaStreamGraph.cpp, line 473

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 18

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/965c62289427 https://hg.mozilla.org/mozilla-central/rev/b0ba65997aba

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 19

•

•

11 years ago

Attachment #8402226 - Flags: review?(paul) → review+

Karl Tomlinson (:karlt)

Comment 30

•

11 years ago

Comment on attachment 8402226 [details] [diff] [review] Reduce fake audio/video rates on b2g debug only to avoid overloading mochitest emulator VMs >+#if defined(MOZ_WIDGET_GONK) && defined(DEBUG) >+// B2G emulator debug is very, very slow and has problems dealing with realtime audio inputs >+ mTimer->InitWithCallback(this, (1000 / mOpts.mFPS)/10, nsITimer::TYPE_REPEATING_SLACK); >+#else > mTimer->InitWithCallback(this, 1000 / mOpts.mFPS, nsITimer::TYPE_REPEATING_SLACK); >+#endif I would have naively guessed that reducing the interval to 1/10th would mean timer is called 10x as often, *increasing* the video rate.

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 31

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/974c4db3003e with nit fix for karlt - thanks. The fact that the timer is SLACK makes this far less critical. https://hg.mozilla.org/integration/mozilla-inbound/rev/5349ecd9c313 https://hg.mozilla.org/integration/mozilla-inbound/rev/33072f5b4c66

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

11 years ago

No longer depends on: 992436

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 32

•

11 years ago

backed out: https://hg.mozilla.org/integration/mozilla-inbound/rev/423df46d8d57 https://hg.mozilla.org/integration/mozilla-inbound/rev/670cb6d1750a https://hg.mozilla.org/integration/mozilla-inbound/rev/0e094298eaef

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 33

•

11 years ago

Attached patch Disable problematic test for B2G (deleted) — Details — Splinter Review

This test a) has issues with missing preload/loops which makes it flaky, b) even with that fixed, on b2g emulator opt (both local and on tbpl) randomly fails with timeouts with this patch. Likely this is a timing issue simply provoked by the changes in this bug. See bug 994351

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

11 years ago

Attachment #8404286 - Flags: review?(drno)

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 34

•

11 years ago

Comment on attachment 8404286 [details] [diff] [review] Disable problematic test for B2G We'll add the annotation to bug 994351 as well

Nils Ohlmeier [:drno]

Comment 35

•

11 years ago

Comment on attachment 8404286 [details] [diff] [review] Disable problematic test for B2G Review of attachment 8404286 [details] [diff] [review]: ----------------------------------------------------------------- One nit. Otherwise looks good to me. ::: content/media/test/crashtests/crashtests.list @@ +65,5 @@ > load 952756.html > load 986901.html > load buffer-source-ended-1.html > load offline-buffer-source-ended-1.html > +skip-if(B2G) HTTP load media-element-source-seek-1.html Can we a comment referring to bug 994351 please? Otherwise looks good to me.

Attachment #8404286 - Flags: review?(drno) → review+

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 36

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5c57a363fe25 https://hg.mozilla.org/integration/mozilla-inbound/rev/c5b652aacb37 https://hg.mozilla.org/integration/mozilla-inbound/rev/d8cc47051879

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 37

•

11 years ago

Forgot to re-push the "slow down inputs for B2G emulator debug" patch this time: https://hg.mozilla.org/integration/mozilla-inbound/rev/7ebc60820032

Ryan VanderMeulen [:RyanVM]

Comment 38

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/5c57a363fe25 https://hg.mozilla.org/mozilla-central/rev/c5b652aacb37 https://hg.mozilla.org/mozilla-central/rev/d8cc47051879 https://hg.mozilla.org/mozilla-central/rev/7ebc60820032

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Karl Tomlinson (:karlt)

Updated

•

11 years ago

No longer blocks: MediaSegmentBase

Depends on: MediaSegmentBase

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

11 years ago

Depends on: 996853

Karl Tomlinson (:karlt)

Updated

•

11 years ago

Depends on: 999267

:shell escalante

Updated

•

11 years ago

Whiteboard: [s=fx32]

Maire Reavy [:mreavy]

Comment 39

•

11 years ago