Closed Bug 1172997 Opened 9 years ago Closed 8 years ago

Optimize ObtainInputBlock when many suspended nodes are connected to inputs

Categories

(Core :: Web Audio, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: padenot, Assigned: dminor)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files, 2 obsolete files)

We spend way to much time preparing the input for nodes: ObtainInputBlock and AccumulateInputChunk. Maybe it's just the re-scaling (in which case we can just land the SSE optimizations in 877662 and 877662), but I think we do wasteful allocation and de-allocation.
Blocks: webaudioperf
Rank: 10
Priority: -- → P1
No longer blocks: webaudioperf
Hi Karl, Can I ask you to take this bug? This is needed for us to achieve parity with our competitors, which we've promised ourselves we'd achieve by end of Q3 (roughly 2 months time). If you take this bug, will you be able to start work on it soon? Thanks.
Flags: needinfo?(karlt)
Paul, can you link to the testcase where you observed this problem, please? I'd like to see whether bug 1189562 is involved, because these are the routines that I would expect to be busy due to that bug.
Flags: needinfo?(padenot)
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #1) > Hi Karl, Can I ask you to take this bug? This is needed for us to achieve > parity with our competitors, which we've promised ourselves we'd achieve by > end of Q3 (roughly 2 months time). If you take this bug, will you be able > to start work on it soon? Thanks. If the CPU usage is caused by bug 1189562, then I may be kind of already working on this. If not, then my current priorities are bug 1189562 and bug 962719, which is causing the glitches from missed scheduling in bug 956574 et al. in a way that is much worse than our competitors. I may be close to a solution for bug 962719 but I still have plenty to do for bug 1189562. I suspect those are the more important bugs to address first, but I can reconsider based on the nature of the testcase and profiles.
Flags: needinfo?(karlt)
I think you're right. I'll try to dig out some profiles, though.
Flags: needinfo?(padenot)
Depends on: 1196109
Bug 1196109 is likely a good part of possible gains here.
Depends on: 877662
Depends on: 1197028
Assignee: nobody → karlt
Depends on: 1205558
No longer depends on: 1205558
Depends on: 1210280
The cpu sampling in https://bugzilla.mozilla.org/show_bug.cgi?id=1217625#c0 indicates that ObtainInputBlock() is now the routine most affected by large numbers of Web Audio nodes waiting for garbage collection. This is because the suspended nodes waiting to be collected are typically still connected to downstream active nodes. When processing the downstream nodes, looping over input nodes includes suspended nodes. I think the best way to address this is to split up ProcessedMediaStream:mInputs so that pointers to MediaInputPorts from suspended streams are stored separately. ObtainInputBlock() then need only loop over active input nodes. I'm not seeing AccumulateInputChunk() in profiles, but I suspect that would be covered by bug 877662. I think the repeated memory allocations have been addressed in bug 1196109 and bug 1197028.
Blocks: 1189562
Summary: Optimize ObtainInputBlock and callees → Optimize ObtainInputBlock when many suspended nodes are connected to inputs
Just to make sure I understand, what we would want here would be something like: - MediaStream::Suspend notifies each of the MediaInputPort in its mConsumers that it has been suspended. - Each MediaInputPort would then notify it's ProcessedMediaStream that it should move the input to a mSuspendedInputs array - mSuspendedInputs can be ignored by ObtainInputBlock - Do the reverse if MediaStream::Resume() is called Thanks!
Flags: needinfo?(karlt)
Yes, that's correct, thanks. I haven't looked how many other uses of ProcessedMediaStream:mInputs exist, but perhaps some of them can also ignore mSuspendedInputs.
Flags: needinfo?(karlt)
Assignee: karlt → dminor
Status: NEW → ASSIGNED
Attached file bug-1172997-benchmarks.csv (deleted) —
Results of running https://github.com/padenot/webaudio-benchmark before and after the patch. It appears to make substantial improvements on some of the benchmarks.
(In reply to Dan Minor [:dminor] from comment #10) > Results of running https://github.com/padenot/webaudio-benchmark before and > after the patch. It appears to make substantial improvements on some of the > benchmarks. I don't know where improvements on "Simple gain" and "Simple mixing" benchmarks would come from because there are no suspended audio node streams in those benchmarks. A change in behavior may be a hint of a problem. With this change, we're looking for an improvement in the long-time performance of http://webaudiodemos.appspot.com/MIDIDrums/index.html for example.
Comment on attachment 8786865 [details] Bug 1172997 - Optimize ObtainInputBlock when many suspended nodes are connected to inputs; https://reviewboard.mozilla.org/r/75744/#review87958 > Optimize ObtainInputBlock when many suspended nodes are connected to inputs Please describe how this patch does that in the first line of the commit message. Something like "track suspended MediaStream inputs separately so that they can be skipped in input processing". I think the specifics of ObtainInputBlock() are incidental, but that can be mentioned in a secondary comment if you like. ::: dom/media/MediaStreamGraph.h:991 (Diff revision 1) > // Control API > /** > + * Notify the port that the MediaStream has been suspended. > + */ > + void Suspended(); > + > + /** > + * Notify the port that the MediaStream has been resumed. Please specify *source* MediaStream here. The "Control API" is defined elsewhere in this header file and these methods are not part of that, and so please declare these with the graph thread methods. ::: dom/media/MediaStreamGraph.h:1235 (Diff revision 1) > } > > protected: > // This state is all accessed only on the media graph thread. > > // The list of all inputs that are currently enabled or waiting to be enabled. Please add something like "and are not currently suspended" to this comment. I don't know what is meant by "enabled" or "waiting" in the existing comment here. Given these are graph thread arrays, "waiting" is not waiting for the connection to be added to the graph. "enabled" is usually a track property rather than a stream property. The comment was added with the array in https://hg.mozilla.org/mozilla-central/rev/74e761adfc42f308406a51ce82c5fef4ccd2d046 but I don't see a clear reason for these words there. Perhaps it is related to blocking. It's probably fine to remove "currently enabled or waiting to be enabled", if you like, because I don't think that is helpful. ::: dom/media/MediaStreamGraph.cpp:2171 (Diff revision 1) > + for (uint32_t i = 0; i < mConsumers.Length(); ++i) { > + mConsumers[i]->Suspended(); > + } mConsumers is modified on the graph thread elsewhere, and so it is not safe to iterate through on the main thread here in MediaStream::Suspend(). This is not the right place for this anyway as inputs are not resumed until the suspend count reaches zero, and IncrementSuspendCount() is called on the graph thread from inactive AudioNodeStreams, bypassing Suspend(). ::: dom/media/MediaStreamGraph.cpp:3855 (Diff revision 1) > mInputs.AppendElement(aPort); > GraphImpl()->SetStreamOrderDirty(); > } > > void > +ProcessedMediaStream::InputSuspended(MediaInputPort* aPort) Often enough people have called methods on the wrong threads, and so I think it is worth having AssertOnGraphThreadOrNotRunning() here and in InputResumed().
Attachment #8786865 - Flags: review?(karlt) → review-
After making the suggested revisions I spent some time evaluating the effect of this patch on http://webaudiodemos.appspot.com/MIDIDrums/index.html by using `perf top` to watch the msg thread, hopefully in a way comparable to what was done in Bug 1217625. Surprisingly, I saw no performance benefits with this patch applied. Either way, ObtainInputBlock hovered around 23% usage after being allowed to run for ten minutes or so prior to garbage collection running. I added some logging statements and verified that the suspended inputs were being properly segregated and not being iterated on in ObtainInputBlock. I then tried just skipping over suspended inputs in ObtainInputBlock: MediaStream* s = mInputs[i]->GetSource(); + if (s->IsSuspended()) { + continue; + } + With this patch applied, the cpu utilization tops out around 16% prior to garbage collection running. In all three cases, the second and third highest hitters are: mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams mozilla::MediaStream::GraphTimeToStreamTime They run around 6 or 7 percent utilization each in the unmodified case and maybe 8 or 9 in the patched case. I'm not sure why it would be different, but I thought it was worth mentioning.
Attachment #8786865 - Attachment is obsolete: true
(In reply to Dan Minor [:dminor] from comment #13) > After making the suggested revisions I spent some time evaluating the effect > of this patch on http://webaudiodemos.appspot.com/MIDIDrums/index.html by > using `perf top` to watch the msg thread, hopefully in a way comparable to > what was done in Bug 1217625. > > Surprisingly, I saw no performance benefits with this patch applied. Either > way, ObtainInputBlock hovered around 23% usage after being allowed to run > for ten minutes or so prior to garbage collection running. I added some > logging statements and verified that the suspended inputs were being > properly segregated and not being iterated on in ObtainInputBlock. Can you post the patch with the suggested revisions please? > I then tried just skipping over suspended inputs in ObtainInputBlock: > MediaStream* s = mInputs[i]->GetSource(); > + if (s->IsSuspended()) { > + continue; > + } > + > > With this patch applied, the cpu utilization tops out around 16% prior to > garbage collection running. It's hard to imagine how calling IsSuspended() on the suspended inputs is faster than not looking at them at all. When the results don't match what is expected from our understanding, then it is worth finding out what is wrong in our understanding or in the results. > In all three cases, the second and third highest hitters are: > mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams > mozilla::MediaStream::GraphTimeToStreamTime > > They run around 6 or 7 percent utilization each in the unmodified case and > maybe 8 or 9 in the patched case. I'm not sure why it would be different, > but I thought it was worth mentioning. Useful to keep in mind what else is consuming CPU, thanks. If the CPU usage in one function reduces, then the relative usage increases in other functions.
Flags: needinfo?(dminor)
(In reply to Karl Tomlinson (:karlt) from comment #15) > (In reply to Dan Minor [:dminor] from comment #13) > > With this patch applied, the cpu utilization tops out around 16% prior to > > garbage collection running. > > It's hard to imagine how calling IsSuspended() on the suspended inputs is > faster than not looking at them at all. > My hand-wavy hypothesis is that by constantly adding and removing inputs from the arrays we're invalidating cache and so the memory reads are taking longer in this ObtainInputBlock, but it's definitely possible I made a mistake somewhere along the way. I'll dig up the patches and attach them later today.
(In reply to Dan Minor [:dminor] from comment #16) > > It's hard to imagine how calling IsSuspended() on the suspended inputs is > > faster than not looking at them at all. > > > My hand-wavy hypothesis is that by constantly adding and removing inputs > from the arrays we're invalidating cache and so the memory reads are taking > longer in this ObtainInputBlock, I have seen cache influences on MSG performance due to needing to iterate over many stream, and it is plausible that manipulating the input arrays causes problems, but I would have guessed that writing to the array would make it available in the cache. Perhaps not though, if the cache is write-around somehow. The array manipulation can also be optimized by moving only the last element, as order is not important. I would have guessed that would be necessary only if the array manipulation methods show up in the profile. It would however touch less memory.
Well, I made a mistake somewhere... despite having a bunch of inputs in mSuspendedInputs, there are still suspended inputs in mInputs. I'll dig in further.
Flags: needinfo?(dminor)
I failed to handle the case where a stream was already suspended when it was added as an input. As it turns out, that happens a lot in the drum demo. Once fixed, ObtainInputBlock drops down to ~0.25% cpu utilization.
Comment on attachment 8808330 [details] Bug 1172997 - Optimize ObtainInputBlock by skipping suspended input nodes; https://reviewboard.mozilla.org/r/91156/#review94852
Attachment #8808330 - Flags: review?(karlt)
(In reply to Dan Minor [:dminor] from comment #19) > a stream was already suspended when it was > added as an input. As it turns out, that happens a lot in the drum demo. > Once fixed, ObtainInputBlock drops down to ~0.25% cpu utilization. Nice find. I guess that somewhat explains why there are so many nodes, if the demo is creating nodes that do nothing.
Attachment #8808330 - Attachment is obsolete: true
Comment on attachment 8813194 [details] Bug 1172997 - Track suspended MediaStream inputs separately to optimize input processing; https://reviewboard.mozilla.org/r/94704/#review95134 This is all good except that I'm concerned about TrackUnionStream's use of mInputs. If you can be sure that TrackUnionStream has no suspended inputs, or at least no suspended inputs when mInputs is used, then please assert that where it uses mInputs. I suspect that it may be possible for TrackUnionStream to have suspended inputs, but I don't know about mAutoFinish. If there may be suspended inputs with mAutoFinish, then TrackUnionStream needs to consider these for |allFinished|. I doubt a stream can finish while suspended, but it may finish before it is suspended. I don't know why |allFinished| is false when there are no inputs. Perhaps that is detecting a situation where it hasn't started. The assertion in TrackUnionStream::AddTrack() should consider mSuspendedInputs also, I assume. ::: dom/media/MediaStreamGraph.cpp:3998 (Diff revision 1) > +ProcessedMediaStream::InputSuspended(MediaInputPort* aPort) > +{ > + GraphImpl()->AssertOnGraphThreadOrNotRunning(); > + mInputs.RemoveElement(aPort); > + mSuspendedInputs.AppendElement(aPort); > + GraphImpl()->SetStreamOrderDirty(); I had to think a bit about whether SetStreamOrderDirty() was necessary on suspend and whether UpdateStreamOrder() need consider only active inputs in finding cycles. The behavior of an active DelayNode depends on whether it is in a cycle, and so cycle detection is required by the WebAudio spec if an active DelayNode may be involved. Any active AudioNode in a cycle would usually cause downstream nodes, and so the whole cycle of nodes, to become active. i.e. cycles are either entirely active, or entirely inactive. This would mean that it is fine to consider only active nodes when finding cycles. An exception is two distinct AudioContexts connected via a DOM MediaStream, where one AudioContext is explicitly suspended. In that situation the active nodes of the running AudioContext don't make the nodes in the suspended context active. However, I don't think we need to consider this a cycle when a link in the cycle would be a suspended AudioContext. Having the definition of a cycle depend on whether the AudioContext is suspended or not means that it is necessary to SetStreamOrderDirty() when a suspended input may have been suspended by its AudioContext. So I think what you have here is good.
Attachment #8813194 - Flags: review?(karlt)
Unless the performance implications are unacceptable, I think the safest thing is to roll up all of the inputs in TrackUnionStream so that the behaviour is unchanged. Try run with the updates: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4642003acae40fb3424d3099d436c9b8b277c720
Comment on attachment 8813194 [details] Bug 1172997 - Track suspended MediaStream inputs separately to optimize input processing; https://reviewboard.mozilla.org/r/94704/#review96966 (In reply to Dan Minor [:dminor] from comment #25) > Unless the performance implications are unacceptable, I think the safest > thing is to roll up all of the inputs in TrackUnionStream so that the > behaviour is unchanged. Yes, I agree.
Attachment #8813194 - Flags: review?(karlt) → review+
Pushed by dminor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e341fa032c29 Track suspended MediaStream inputs separately to optimize input processing; r=karlt
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: