Closed
Bug 818822
Opened 12 years ago
Closed 11 years ago
Need to resample inputs to MediaStreamGraph
Categories
(Core :: WebRTC: Audio/Video, defect)
Tracking
()
RESOLVED
FIXED
mozilla31
People
(Reporter: jesup, Assigned: padenot)
References
Details
(Whiteboard: [s=fx32][p=13, 1.5:p1, ft:webrtc])
Attachments
(4 files, 2 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
padenot
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
drno
:
review+
|
Details | Diff | Splinter Review |
We need to resample some/most inputs to the MediaStreamGraph in order to avoid drift issues with audio being sampled on clocks other than the CPU clock (or sampled on an entirely different machine).
There have been considerable discussion (see below and bug 691234 for art of it), and roc, mreavy, derf and I talked extensively about it at IETF and other points.
roc: you may want to summarize your thoughts/plan here. My understanding was that we would try clocking off the playback driver, and resample inputs.
Handling mic inputs (which may have a different clock than the audio out!) is tricky, especially with echo cancellation, which really needs to see the actual samples that went out to the hardware.
I *believe* the right way to handle that if the mic and speakers don't share a clock is to resample the "far-end" sample (going to the speakers) to match the input clock rate, since that's effectively what the differing hardware sample rates are doing to you and them anyways.
From bug 691234 comment 41:
Randell Jesup [:jesup] 2012-06-21 11:03:38 PDT
Right - jitter buffers move a lot more than any reasonable hardware clock, so they need to do more than resample. The Time Scale Modification (TSM) stuff in NetEQ is a common and quite effective trick. (For anyone who didn't know.)
I certainly would be tempted to use the hardware capture clock for incoming mic data, and the playout clock (not guaranteed to the be same as the capture clock!) for data from the PeerConnection, and if the two are connected directly (audio.src = getUserMedia_stream) then force the resample; but I realize that's not how MediaStreams work.
An example for a videophone (DSP details omitted). Buffer nodes rebuffer into 10ms frames.
Recording taps, VAD, control, etc all are omitted. A lot of this is inside the webrtc code, but it helps to understand the complexity of stream management.
The wonder of Emacs artist-mode!
-------------------------------------------------------------------------
RTP
| |
+-------------+ | Send audio thread
| Jitter Buff | RTP reception thread |
| Insert | | RTP
+------+------+ | |
| | +-----------------+
-------------+------------------------------------+ | Speech Encode |
+------+------+ Receive audio | | |
| Jitter Buff | thread | +--------+--------+
| Dequeue | | +--------+--------+
+------+------+ | | Mixer +- Other
+------+------+ | +--------+--------+ sources
| Decode | | +-------+--------+
| | | | Buffer |
+------+------+ | +-------+--------+
+------+------+ | |
|Time Scale | | +------+----------+
|Modification | | | Conference |
+------+------+ | /--+Bridge (optional)|
+------+------+ +-------------+ /---+- +-------+---------+
| Buffer +-------+ audio +--- | |
| | | conf. bridge| | +------+-------+
+------+------+ +-------------+ | |AGC and output|
+------+------+ | | level adjust |
| Mixer +-- other sources | |& DTMF insert |
| | | +------+-------+
+------+------+ | |
+------+------+ | +------+-------+
|AGC, DC Filt | +---------------+| | Main |
| & volume +------------+ fsb queue ++-----+ AEC |
+------+------+ | || +-------+------+
+------+------+ +---------------+| +------+------+
|Playout audio| | |Capture audio|
| driver | | | driver |
+------+------+ | +------+------+
--------------+------------------------------------+-------------+---------
+------+------+ Linux Kernel |
| Buffers | +------+-------+
+-------------+ | Buffers |
| | +--------------+
+-------------+ | |
| | +--------------+
+----------+--+ | |
| +-------+------+
+--+-----------------------------------------------+-------+
| |
| Hardware Audio Codec |
| Note: Use HW Biquad filters to flatten response if avail |
+--+-----------------------------------------+-------------+
| |
..|. |
.. .. --+--
.. .. --+--
.. .. ( )
.......... -----
This let the output (mic -> RTP) thread run with microphone timestamps, and the input thread (after the TSM and associated buffer node) run with audio playout timestamps. RTP input to TSM runs with RTP source timestamps; the TSM takes the place of a resampler. The only complexity is at the cross input/output thread links, which would need resampling if the mic and playout clocks aren't synced (mine were), and mixing of other sources (ditto).
---
It's probably helpful to label your clock sources; merging two streams with the same clock source doesn't need a resample; merging them with different sources does. The complication here is that MediaStreams run on internal (CPU/NTP) clock time; the only sources that do so are probably pre-recorded sources.
So, for all sources that aren't internal (sourced at a video or audio decoder element), the inputs will need associated resamplers or timestamp transcoders. For audio, the sources will need to monitor input samples versus nominal frequency (at CPU/NTP clock rate), perhaps with a PLL, and use that to drive the settings for the resampler. For video, I would simply use the PLL to convert input timestamps to internal CPU timestamps. This might need some tweakery, as many RTP sinks assume video sources are at consistent cadences and timestamp deltas in order to provide smooth playback, generally some integer frame rate, or some integer number of frames each on a 30 (or 25!) fps boundary (i.e. timestamp deltas of 3000 at 90KHz timestamp rate.)
Note that many audio sources may require a resampler anyways to feed into the core, so this would just affect settings for it. The PLL aspect of it does mean it may take a short time to converge after starting.
We probably need this to be a core part of MediaStream data inputs, with an option to disable it if we know the source is synced to the CPU.
Sorry for the long message...
Comment 1•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #0)
> I *believe* the right way to handle that if the mic and speakers don't share
> a clock is to resample the "far-end" sample (going to the speakers) to match
> the input clock rate, since that's effectively what the differing hardware
> sample rates are doing to you and them anyways.
I think we should do things almost exactly the other way around (and I think I've said this before): the audio output should be the master clock, and we should resample at the microphones to compensate for drift. Part of the reason for this is that while we will basically always have a single audio output device, we won't always have exactly one input device. In fact we will often have zero. In a system with multiple microphones, none of which are in use, which clock do you sync to?
In any case, from a practical point of view, the Speex resampler was designed to handle exactly this case (small, continuously changing sample rate differences), and is already in the tree, so we should probably just use that. After bug 818327 it should even work.
Comment 2•12 years ago
|
||
I tend to agree with Tim that syncing with the playback is better than syncing with capture, for exactly the same reasons. I don't think it makes any difference to the AEC whether it's the capture or the playback that's resampled. The other possibility is to go with the "global clock" (RTC/NTP-based I assume) that's used for the RTP timestamps. The advantage of this is that you never have to deal with any clock drift and/or RTP timestamp adjustments. I think the RTC is also much more accurate than what a soudcard gives you. Of course, the downside is that it means twice the resampling.
Reporter | ||
Comment 3•12 years ago
|
||
My apologies - it was late when I entered that bug. My discussions with roc involved using the output clock as the master, not the input clock - I got it backwards when writing it down. And as mentioned, that would mean resampling the input, not the output.
I think locking to the playout clock also may help us control playout latency. Note that the output frequency may jump if the output destination changes.
Comment 4•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #3)
> latency. Note that the output frequency may jump if the output destination
> changes.
This is something Jean-Marc pointed out to me, as well, but I think it's something we'd want to know about (and since this can happen somewhere beyond our control, may be our only way of finding out).
Updated•12 years ago
|
Summary: Need to resample inputs to MediaStramGraph → Need to resample inputs to MediaStreamGraph
Comment 6•11 years ago
|
||
When there are two different output cards connected to the MSG, I wonder whether there is anything ensuring that they are in sync. If nothing guarantees this, then something needs to be done on every output except one, to avoid drift.
Reporter | ||
Comment 7•11 years ago
|
||
Correct: nothing guarantees two outputs are synced. In theory the "right" way to do it is to have the graph self-divide (graph nodes are driven by the output), and when there's a cross-connect, force a resample. Most graphs will end in one output, and not cross-connect. In some rare instances a source might get played on both outputs, in which case at the join point (the source) you'd need to resample.
It may be simpler to choose one output as "main", and resample anything sent to the other one to that (if both are in use). It gets trickier when there's a switch from two to one outputs active, or one to another, but probably not too problematic modulo that you might need to leave a small buffer in place after updating the resample rate which you wouldn't need if it had been originally run without a resampler. (ie. if A is direct (no resample), B is added (and resampled to match A), then A stops, B will still need a buffer even if the resampler is turned off (though maybe you can drift it slowly until the buffer underflows and you can remove the buffer...)
Reporter | ||
Updated•11 years ago
|
Reporter | ||
Comment 9•11 years ago
|
||
If we're going to provide a single mixed stream from the MediaStreamGraph back to the AEC (and to support driving the MSG from the outputs), we need to resample to 48000Hz at all inputs that aren't 48000Hz, even if they're don't need it for clock domain reasons (like gUM audio does).
gUM (or anything else with a different clockbase) will need to be resampled/buffered.
Assignee | ||
Comment 10•11 years ago
|
||
This resamples all SourceMediaStream to have a rate of IdealAudioRate(), so that
all the audio that flows through the graph has the same sample rate, so that we
can easily mix the tracks.
The resampling happens in `SourceMediaStream::AppendToTrack`.
Attachment #8389635 -
Flags: review?(roc)
Attachment #8389635 -
Flags: review?(roc) → review+
Assignee | ||
Comment 11•11 years ago
|
||
This is necessary for the rest to work.
Attachment #8397196 -
Flags: review?(rjesup)
Reporter | ||
Comment 12•11 years ago
|
||
Comment on attachment 8397196 [details] [diff] [review]
Update AudioConduit so it can work at 44.1kHz. r=
Review of attachment 8397196 [details] [diff] [review]:
-----------------------------------------------------------------
r+ but this change is also included in my patchset
Attachment #8397196 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 13•11 years ago
|
||
Rebased and ready to land.
Assignee | ||
Updated•11 years ago
|
Attachment #8389635 -
Attachment is obsolete: true
Assignee | ||
Comment 14•11 years ago
|
||
Rebased and ready to land.
Assignee | ||
Updated•11 years ago
|
Attachment #8397196 -
Attachment is obsolete: true
Reporter | ||
Comment 15•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/965c62289427
https://hg.mozilla.org/integration/mozilla-inbound/rev/b0ba65997aba
Target Milestone: --- → mozilla31
Reporter | ||
Updated•11 years ago
|
Blocks: MediaSegmentBase
Reporter | ||
Comment 16•11 years ago
|
||
Backed out for perma-orange on b2g emulator M10:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=cb894b5d342f
http://hg.mozilla.org/integration/mozilla-inbound/rev/c0eaa844f2f0
Reporter | ||
Comment 17•11 years ago
|
||
Also https://tbpl.mozilla.org/php/getParsedLog.php?id=37161532&tree=Mozilla-Inbound may be new
13:16:47 INFO - [2358] ###!!! ASSERTION: Buffer underran: 'bufferEnd >= mCurrentTime', file /builds/slave/m-in-lx-d-00000000000000000000/build/content/media/MediaStreamGraph.cpp, line 473
https://hg.mozilla.org/mozilla-central/rev/965c62289427
https://hg.mozilla.org/mozilla-central/rev/b0ba65997aba
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 19•11 years ago
|
||
Backed out of central:
https://hg.mozilla.org/mozilla-central/rev/ac6cbaa47f34
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 20•11 years ago
|
||
Got some logs via testing/marionette/client/marionette/emulator.py changes:
env = {'MOZ_CRASHREPORTER': '1',
'MOZ_CRASHREPORTER_NO_REPORT': '1',
- 'MOZ_CRASHREPORTER_SHUTDOWN': '1'}
+ 'MOZ_CRASHREPORTER_SHUTDOWN': '1',
+ 'NSPR_LOG_MODULES': 'mediastreamgraph:5,signaling:5,getusermedia:5'}
https://pastebin.mozilla.org/4755653
13:24:39 INFO - 138 INFO TEST-PASS | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | iceConnectionState should not be undefined
13:24:39 INFO - 139 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | PeerConnectionWrapper (pcRemote): oniceconnectionstatechange fired, new state is: closed
13:24:39 INFO - 140 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | PeerConnectionWrapper (pcRemote): Closed connection.
13:24:39 INFO - 141 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | Test finished
13:24:39 INFO - 142 INFO TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | Test timed out.
Probably some type of shutdown/cleanup of the MSG or stream...
Reporter | ||
Comment 21•11 years ago
|
||
See bug 991866 for code needed to run full M10 mochitest-remote locally (and no, it didn't fail for me)
Reporter | ||
Comment 22•11 years ago
|
||
Reporter | ||
Comment 23•11 years ago
|
||
Hmmm, it didnt' seem to have any extra logging in the above. :-(
Reporter | ||
Comment 24•11 years ago
|
||
https://pastebin.mozilla.org/4756101 may also be relevant. The test destroyed the peerconnection at timestamp 489.xx - which may be near the mochitest timeout time
Comment 25•11 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #23)
> Hmmm, it didnt' seem to have any extra logging in the above. :-(
I wonder whether this might be to blame:
http://hg.mozilla.org/mozilla-central/annotate/6c924a018540/testing/mozbase/mozrunner/mozrunner/remote.py#l122
Reporter | ||
Comment 26•11 years ago
|
||
Reporter | ||
Updated•11 years ago
|
Attachment #8402226 -
Flags: review?(paul)
Reporter | ||
Comment 27•11 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=303b107d286f
The problem is the debug emulator running out of virtual cpu power to keep up with realtime inputs.
The patch reduces fake video and audio input rates to reduce the CPU load from the media streams.
Reporter | ||
Comment 28•11 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=ccf4b3d625d8
M10 is 11 for 11 green :-)
Once this is r+'d we can reland
Reporter | ||
Comment 29•11 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=bd2edc046e1f
(without the wakelock stuff) also green
Assignee | ||
Updated•11 years ago
|
Attachment #8402226 -
Flags: review?(paul) → review+
Comment 30•11 years ago
|
||
Comment on attachment 8402226 [details] [diff] [review]
Reduce fake audio/video rates on b2g debug only to avoid overloading mochitest emulator VMs
>+#if defined(MOZ_WIDGET_GONK) && defined(DEBUG)
>+// B2G emulator debug is very, very slow and has problems dealing with realtime audio inputs
>+ mTimer->InitWithCallback(this, (1000 / mOpts.mFPS)/10, nsITimer::TYPE_REPEATING_SLACK);
>+#else
> mTimer->InitWithCallback(this, 1000 / mOpts.mFPS, nsITimer::TYPE_REPEATING_SLACK);
>+#endif
I would have naively guessed that reducing the interval to 1/10th would mean timer is called 10x as often, *increasing* the video rate.
Reporter | ||
Comment 31•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/974c4db3003e
with nit fix for karlt - thanks. The fact that the timer is SLACK makes this far less critical.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5349ecd9c313
https://hg.mozilla.org/integration/mozilla-inbound/rev/33072f5b4c66
Reporter | ||
Comment 32•11 years ago
|
||
Reporter | ||
Comment 33•11 years ago
|
||
This test a) has issues with missing preload/loops which makes it flaky, b) even with that fixed, on b2g emulator opt (both local and on tbpl) randomly fails with timeouts with this patch. Likely this is a timing issue simply provoked by the changes in this bug.
See bug 994351
Reporter | ||
Updated•11 years ago
|
Attachment #8404286 -
Flags: review?(drno)
Reporter | ||
Comment 34•11 years ago
|
||
Comment on attachment 8404286 [details] [diff] [review]
Disable problematic test for B2G
We'll add the annotation to bug 994351 as well
Comment 35•11 years ago
|
||
Comment on attachment 8404286 [details] [diff] [review]
Disable problematic test for B2G
Review of attachment 8404286 [details] [diff] [review]:
-----------------------------------------------------------------
One nit.
Otherwise looks good to me.
::: content/media/test/crashtests/crashtests.list
@@ +65,5 @@
> load 952756.html
> load 986901.html
> load buffer-source-ended-1.html
> load offline-buffer-source-ended-1.html
> +skip-if(B2G) HTTP load media-element-source-seek-1.html
Can we a comment referring to bug 994351 please?
Otherwise looks good to me.
Attachment #8404286 -
Flags: review?(drno) → review+
Reporter | ||
Comment 36•11 years ago
|
||
Reporter | ||
Comment 37•11 years ago
|
||
Forgot to re-push the "slow down inputs for B2G emulator debug" patch this time:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ebc60820032
Comment 38•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/5c57a363fe25
https://hg.mozilla.org/mozilla-central/rev/c5b652aacb37
https://hg.mozilla.org/mozilla-central/rev/d8cc47051879
https://hg.mozilla.org/mozilla-central/rev/7ebc60820032
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
No longer blocks: MediaSegmentBase
Depends on: MediaSegmentBase
Updated•11 years ago
|
Whiteboard: [s=fx32]
Comment 39•11 years ago
|
||
Updating this to WebRTC since that's the work driving this.
Component: Video/Audio → WebRTC: Audio/Video
Whiteboard: [s=fx32] → [s=fx32][p=2, 1.5:p1, ft:webrtc]
Updated•11 years ago
|
Whiteboard: [s=fx32][p=2, 1.5:p1, ft:webrtc] → [s=fx32][p=13, 1.5:p1, ft:webrtc]
You need to log in
before you can comment on or make changes to this bug.
Description
•