Closed
Bug 1172997
Opened 9 years ago
Closed 8 years ago
Optimize ObtainInputBlock when many suspended nodes are connected to inputs
Categories
(Core :: Web Audio, defect, P1)
Core
Web Audio
Tracking
()
RESOLVED
FIXED
mozilla53
Tracking | Status | |
---|---|---|
firefox53 | --- | fixed |
People
(Reporter: padenot, Assigned: dminor)
References
(Blocks 2 open bugs)
Details
Attachments
(2 files, 2 obsolete files)
We spend way to much time preparing the input for nodes: ObtainInputBlock and AccumulateInputChunk. Maybe it's just the re-scaling (in which case we can just land the SSE optimizations in 877662 and 877662), but I think we do wasteful allocation and de-allocation.
Reporter | ||
Updated•9 years ago
|
Blocks: webaudioperf
Updated•9 years ago
|
Rank: 10
Priority: -- → P1
Updated•9 years ago
|
Blocks: webaudioperf_parity
Updated•9 years ago
|
No longer blocks: webaudioperf
Comment 1•9 years ago
|
||
Hi Karl, Can I ask you to take this bug? This is needed for us to achieve parity with our competitors, which we've promised ourselves we'd achieve by end of Q3 (roughly 2 months time). If you take this bug, will you be able to start work on it soon? Thanks.
Flags: needinfo?(karlt)
Comment 2•9 years ago
|
||
Paul, can you link to the testcase where you observed this problem, please?
I'd like to see whether bug 1189562 is involved, because these are the routines that I would expect to be busy due to that bug.
Flags: needinfo?(padenot)
Comment 3•9 years ago
|
||
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #1)
> Hi Karl, Can I ask you to take this bug? This is needed for us to achieve
> parity with our competitors, which we've promised ourselves we'd achieve by
> end of Q3 (roughly 2 months time). If you take this bug, will you be able
> to start work on it soon? Thanks.
If the CPU usage is caused by bug 1189562, then I may be kind of already working on this.
If not, then my current priorities are bug 1189562 and bug 962719, which is causing the glitches from missed scheduling in bug 956574 et al. in a way that is much worse than our competitors. I may be close to a solution for bug 962719 but I still have plenty to do for bug 1189562.
I suspect those are the more important bugs to address first, but I can reconsider based on the nature of the testcase and profiles.
Flags: needinfo?(karlt)
Reporter | ||
Comment 4•9 years ago
|
||
I think you're right. I'll try to dig out some profiles, though.
Flags: needinfo?(padenot)
Updated•9 years ago
|
Assignee: nobody → karlt
Comment 6•9 years ago
|
||
The cpu sampling in https://bugzilla.mozilla.org/show_bug.cgi?id=1217625#c0
indicates that ObtainInputBlock() is now the routine most affected by large
numbers of Web Audio nodes waiting for garbage collection.
This is because the suspended nodes waiting to be collected are typically
still connected to downstream active nodes. When processing the downstream
nodes, looping over input nodes includes suspended nodes.
I think the best way to address this is to split up
ProcessedMediaStream:mInputs so that pointers to MediaInputPorts from
suspended streams are stored separately. ObtainInputBlock() then need only
loop over active input nodes.
I'm not seeing AccumulateInputChunk() in profiles, but I suspect that would be covered by bug 877662.
I think the repeated memory allocations have been addressed in bug 1196109 and bug 1197028.
Blocks: 1189562
Summary: Optimize ObtainInputBlock and callees → Optimize ObtainInputBlock when many suspended nodes are connected to inputs
Assignee | ||
Comment 7•8 years ago
|
||
Just to make sure I understand, what we would want here would be something like:
- MediaStream::Suspend notifies each of the MediaInputPort in its mConsumers that it has been suspended. - Each MediaInputPort would then notify it's ProcessedMediaStream that it should move the input to a mSuspendedInputs array
- mSuspendedInputs can be ignored by ObtainInputBlock
- Do the reverse if MediaStream::Resume() is called
Thanks!
Flags: needinfo?(karlt)
Comment 8•8 years ago
|
||
Yes, that's correct, thanks.
I haven't looked how many other uses of ProcessedMediaStream:mInputs exist, but perhaps some of them can also ignore mSuspendedInputs.
Flags: needinfo?(karlt)
Assignee | ||
Updated•8 years ago
|
Assignee: karlt → dminor
Status: NEW → ASSIGNED
Comment hidden (mozreview-request) |
Assignee | ||
Comment 10•8 years ago
|
||
Results of running https://github.com/padenot/webaudio-benchmark before and after the patch. It appears to make substantial improvements on some of the benchmarks.
Updated•8 years ago
|
Blocks: webaudioperf
Comment 11•8 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #10)
> Results of running https://github.com/padenot/webaudio-benchmark before and
> after the patch. It appears to make substantial improvements on some of the
> benchmarks.
I don't know where improvements on "Simple gain" and "Simple mixing"
benchmarks would come from because there are no suspended audio node streams
in those benchmarks. A change in behavior may be a hint of a problem.
With this change, we're looking for an improvement in the long-time performance of
http://webaudiodemos.appspot.com/MIDIDrums/index.html for example.
Comment 12•8 years ago
|
||
mozreview-review |
Comment on attachment 8786865 [details]
Bug 1172997 - Optimize ObtainInputBlock when many suspended nodes are connected to inputs;
https://reviewboard.mozilla.org/r/75744/#review87958
> Optimize ObtainInputBlock when many suspended nodes are connected to inputs
Please describe how this patch does that in the first line of the commit message.
Something like "track suspended MediaStream inputs separately so that they can be skipped in input processing".
I think the specifics of ObtainInputBlock() are incidental, but that can be mentioned in a secondary comment if you like.
::: dom/media/MediaStreamGraph.h:991
(Diff revision 1)
> // Control API
> /**
> + * Notify the port that the MediaStream has been suspended.
> + */
> + void Suspended();
> +
> + /**
> + * Notify the port that the MediaStream has been resumed.
Please specify *source* MediaStream here.
The "Control API" is defined elsewhere in this header file and these methods are not part of that, and so please declare these with the graph thread methods.
::: dom/media/MediaStreamGraph.h:1235
(Diff revision 1)
> }
>
> protected:
> // This state is all accessed only on the media graph thread.
>
> // The list of all inputs that are currently enabled or waiting to be enabled.
Please add something like "and are not currently suspended" to this comment.
I don't know what is meant by "enabled" or "waiting" in the existing comment here. Given these are graph thread arrays, "waiting" is not waiting for the connection to be added to the graph. "enabled" is usually a track property rather than a stream property. The comment was added with the array in https://hg.mozilla.org/mozilla-central/rev/74e761adfc42f308406a51ce82c5fef4ccd2d046 but I don't see a clear reason for these words there. Perhaps it is related to blocking.
It's probably fine to remove "currently enabled or waiting to be enabled", if you like, because I don't think that is helpful.
::: dom/media/MediaStreamGraph.cpp:2171
(Diff revision 1)
> + for (uint32_t i = 0; i < mConsumers.Length(); ++i) {
> + mConsumers[i]->Suspended();
> + }
mConsumers is modified on the graph thread elsewhere, and so it is not safe to iterate through on the main thread here in MediaStream::Suspend().
This is not the right place for this anyway as inputs are not resumed until the suspend count reaches zero, and IncrementSuspendCount() is called on the graph thread from inactive AudioNodeStreams, bypassing Suspend().
::: dom/media/MediaStreamGraph.cpp:3855
(Diff revision 1)
> mInputs.AppendElement(aPort);
> GraphImpl()->SetStreamOrderDirty();
> }
>
> void
> +ProcessedMediaStream::InputSuspended(MediaInputPort* aPort)
Often enough people have called methods on the wrong threads, and so I think it is worth having AssertOnGraphThreadOrNotRunning() here and in InputResumed().
Attachment #8786865 -
Flags: review?(karlt) → review-
Assignee | ||
Comment 13•8 years ago
|
||
After making the suggested revisions I spent some time evaluating the effect of this patch on http://webaudiodemos.appspot.com/MIDIDrums/index.html by using `perf top` to watch the msg thread, hopefully in a way comparable to what was done in Bug 1217625.
Surprisingly, I saw no performance benefits with this patch applied. Either way, ObtainInputBlock hovered around 23% usage after being allowed to run for ten minutes or so prior to garbage collection running. I added some logging statements and verified that the suspended inputs were being properly segregated and not being iterated on in ObtainInputBlock.
I then tried just skipping over suspended inputs in ObtainInputBlock:
MediaStream* s = mInputs[i]->GetSource();
+ if (s->IsSuspended()) {
+ continue;
+ }
+
With this patch applied, the cpu utilization tops out around 16% prior to garbage collection running.
In all three cases, the second and third highest hitters are:
mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams
mozilla::MediaStream::GraphTimeToStreamTime
They run around 6 or 7 percent utilization each in the unmodified case and maybe 8 or 9 in the patched case. I'm not sure why it would be different, but I thought it was worth mentioning.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8786865 -
Attachment is obsolete: true
Comment 15•8 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #13)
> After making the suggested revisions I spent some time evaluating the effect
> of this patch on http://webaudiodemos.appspot.com/MIDIDrums/index.html by
> using `perf top` to watch the msg thread, hopefully in a way comparable to
> what was done in Bug 1217625.
>
> Surprisingly, I saw no performance benefits with this patch applied. Either
> way, ObtainInputBlock hovered around 23% usage after being allowed to run
> for ten minutes or so prior to garbage collection running. I added some
> logging statements and verified that the suspended inputs were being
> properly segregated and not being iterated on in ObtainInputBlock.
Can you post the patch with the suggested revisions please?
> I then tried just skipping over suspended inputs in ObtainInputBlock:
> MediaStream* s = mInputs[i]->GetSource();
> + if (s->IsSuspended()) {
> + continue;
> + }
> +
>
> With this patch applied, the cpu utilization tops out around 16% prior to
> garbage collection running.
It's hard to imagine how calling IsSuspended() on the suspended inputs is
faster than not looking at them at all.
When the results don't match what is expected from our understanding, then it
is worth finding out what is wrong in our understanding or in the results.
> In all three cases, the second and third highest hitters are:
> mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams
> mozilla::MediaStream::GraphTimeToStreamTime
>
> They run around 6 or 7 percent utilization each in the unmodified case and
> maybe 8 or 9 in the patched case. I'm not sure why it would be different,
> but I thought it was worth mentioning.
Useful to keep in mind what else is consuming CPU, thanks.
If the CPU usage in one function reduces, then the relative usage increases in
other functions.
Flags: needinfo?(dminor)
Assignee | ||
Comment 16•8 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #15)
> (In reply to Dan Minor [:dminor] from comment #13)
> > With this patch applied, the cpu utilization tops out around 16% prior to
> > garbage collection running.
>
> It's hard to imagine how calling IsSuspended() on the suspended inputs is
> faster than not looking at them at all.
>
My hand-wavy hypothesis is that by constantly adding and removing inputs from the arrays we're invalidating cache and so the memory reads are taking longer in this ObtainInputBlock, but it's definitely possible I made a mistake somewhere along the way. I'll dig up the patches and attach them later today.
Comment 17•8 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #16)
> > It's hard to imagine how calling IsSuspended() on the suspended inputs is
> > faster than not looking at them at all.
> >
> My hand-wavy hypothesis is that by constantly adding and removing inputs
> from the arrays we're invalidating cache and so the memory reads are taking
> longer in this ObtainInputBlock,
I have seen cache influences on MSG performance due to needing to iterate over many stream, and it is plausible that manipulating the input arrays causes problems, but I would have guessed that writing to the array would make it available in the cache. Perhaps not though, if the cache is write-around somehow.
The array manipulation can also be optimized by moving only the last element, as order is not important. I would have guessed that would be necessary only if the array manipulation methods show up in the profile. It would however touch less memory.
Assignee | ||
Comment 18•8 years ago
|
||
Well, I made a mistake somewhere... despite having a bunch of inputs in mSuspendedInputs, there are still suspended inputs in mInputs. I'll dig in further.
Flags: needinfo?(dminor)
Assignee | ||
Comment 19•8 years ago
|
||
I failed to handle the case where a stream was already suspended when it was added as an input. As it turns out, that happens a lot in the drum demo. Once fixed, ObtainInputBlock drops down to ~0.25% cpu utilization.
Comment 20•8 years ago
|
||
mozreview-review |
Comment on attachment 8808330 [details]
Bug 1172997 - Optimize ObtainInputBlock by skipping suspended input nodes;
https://reviewboard.mozilla.org/r/91156/#review94852
Attachment #8808330 -
Flags: review?(karlt)
Comment 21•8 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #19)
> a stream was already suspended when it was
> added as an input. As it turns out, that happens a lot in the drum demo.
> Once fixed, ObtainInputBlock drops down to ~0.25% cpu utilization.
Nice find. I guess that somewhat explains why there are so many nodes, if the demo is creating nodes that do nothing.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8808330 -
Attachment is obsolete: true
Comment 23•8 years ago
|
||
mozreview-review |
Comment on attachment 8813194 [details]
Bug 1172997 - Track suspended MediaStream inputs separately to optimize input processing;
https://reviewboard.mozilla.org/r/94704/#review95134
This is all good except that I'm concerned about TrackUnionStream's use of
mInputs.
If you can be sure that TrackUnionStream has no suspended inputs, or at least
no suspended inputs when mInputs is used, then please assert that where it
uses mInputs.
I suspect that it may be possible for TrackUnionStream to have suspended
inputs, but I don't know about mAutoFinish. If there may be suspended inputs
with mAutoFinish, then TrackUnionStream needs to consider these for
|allFinished|.
I doubt a stream can finish while suspended, but it may finish before it is
suspended.
I don't know why |allFinished| is false when there are no inputs.
Perhaps that is detecting a situation where it hasn't started.
The assertion in TrackUnionStream::AddTrack() should consider mSuspendedInputs
also, I assume.
::: dom/media/MediaStreamGraph.cpp:3998
(Diff revision 1)
> +ProcessedMediaStream::InputSuspended(MediaInputPort* aPort)
> +{
> + GraphImpl()->AssertOnGraphThreadOrNotRunning();
> + mInputs.RemoveElement(aPort);
> + mSuspendedInputs.AppendElement(aPort);
> + GraphImpl()->SetStreamOrderDirty();
I had to think a bit about whether SetStreamOrderDirty() was necessary on
suspend and whether UpdateStreamOrder() need consider only active inputs in
finding cycles.
The behavior of an active DelayNode depends on whether it is in a cycle, and so cycle detection is required by the WebAudio spec if an active DelayNode may be involved. Any
active AudioNode in a cycle would usually cause downstream nodes, and so the
whole cycle of nodes, to become active. i.e. cycles are either entirely
active, or entirely inactive. This would mean that it is fine to consider
only active nodes when finding cycles. An exception is two distinct
AudioContexts connected via a DOM MediaStream, where one AudioContext is
explicitly suspended. In that situation the active nodes of the running
AudioContext don't make the nodes in the suspended context active. However, I
don't think we need to consider this a cycle when a link in the cycle would be
a suspended AudioContext. Having the definition of a cycle depend on whether
the AudioContext is suspended or not means that it is necessary to
SetStreamOrderDirty() when a suspended input may have been suspended by its AudioContext.
So I think what you have here is good.
Attachment #8813194 -
Flags: review?(karlt)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 25•8 years ago
|
||
Unless the performance implications are unacceptable, I think the safest thing is to roll up all of the inputs in TrackUnionStream so that the behaviour is unchanged.
Try run with the updates: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4642003acae40fb3424d3099d436c9b8b277c720
Comment 26•8 years ago
|
||
mozreview-review |
Comment on attachment 8813194 [details]
Bug 1172997 - Track suspended MediaStream inputs separately to optimize input processing;
https://reviewboard.mozilla.org/r/94704/#review96966
(In reply to Dan Minor [:dminor] from comment #25)
> Unless the performance implications are unacceptable, I think the safest
> thing is to roll up all of the inputs in TrackUnionStream so that the
> behaviour is unchanged.
Yes, I agree.
Attachment #8813194 -
Flags: review?(karlt) → review+
Comment 27•8 years ago
|
||
Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e341fa032c29
Track suspended MediaStream inputs separately to optimize input processing; r=karlt
Comment 28•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox53:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in
before you can comment on or make changes to this bug.
Description
•