Closed
Bug 1172979
Opened 9 years ago
Closed 7 years ago
Optimize allocation and de-allocation patterns of StreamBuffers
Categories
(Core :: Audio/Video: MediaStreamGraph, defect, P2)
Core
Audio/Video: MediaStreamGraph
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: padenot, Assigned: karlt)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(5 files)
ForgetUpTo is always high in the profiles, because it does a lot of de-allocation: it prunes the StreamBuffers storage.
Attached is a picture of a profile, dominated by free(). Maybe we could devise some sort of ring buffer mechanism for stream to reduce allocations ? We should not keep a lot of audio in the graph, these days, anyways, maximum 40ms per stream or something in line with the latency of the audio output (or a fixed amount if we're running an OfflineAudioContext, off course).
This picture is a profile of "Simple gain test without resampling", with the rendering length bumped to 1000 seconds so I have some time to profile.
Reporter | ||
Updated•9 years ago
|
Blocks: webaudioperf
Updated•9 years ago
|
Assignee: nobody → amarchesini
Updated•9 years ago
|
Rank: 10
Priority: -- → P1
Updated•9 years ago
|
Blocks: webaudioperf_parity
Updated•9 years ago
|
No longer blocks: webaudioperf
Assignee | ||
Comment 1•9 years ago
|
||
IIUC most AudioNodes shouldn't use StreamBuffers, but AudioDestinationNode does.
Assignee | ||
Comment 2•9 years ago
|
||
One way to get improvement here might be to keep the AudioChunks in the AudioSegment during ForgetUpTo() but null them in a way that doesn't release the memory for the channel pointer arrays. Subsequently appended AudioChunks will likely have the same channel count and so can reuse the channel pointer arrays.
Comment 3•9 years ago
|
||
Baku -- I know you're busy for the next 2 weeks. If you can get back to this bug then (around the end of Sept), that'd be great. I'd really like to land this in the Fx44 cycle. If landing this in Fx44 seems unlikely, then I'll look to find a new owner. Thanks!
Rank: 10 → 15
Updated•8 years ago
|
Blocks: webaudioperf
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Rank: 15 → 25
Priority: P1 → P2
Resolution: --- → INCOMPLETE
Assignee | ||
Comment 4•7 years ago
|
||
I'm guessing the free() in attachment 8617397 [details] is mostly destruction of the AudioChunk::nsTArray<const void*>, which has been inlined.
Now that we have the static analysis for bug 1159433, the compiler tells us what adjustments are necessary to use AutoTArray here.
Assignee: amarchesini → karlt
Status: RESOLVED → REOPENED
Component: Web Audio → Audio/Video: MediaStreamGraph
Resolution: INCOMPLETE → ---
Assignee | ||
Comment 5•7 years ago
|
||
Since changes for bug 1197028, most Web Audio nodes don't create and destroy AudioChunks frequently, but this does still happen when sending output to external streams.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 9•7 years ago
|
||
The win from fewer allocations doesn't come for free.
CPU usage seen with attachment 8897140 [details] decreased by only just over 1%.
Comparing perf top output for the graph thread without AutoTArray,
8.06% libxul.so [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendFrom
6.26% libxul.so [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSliceInternal
6.02% libxul.so [.] mozilla::TrackUnionStream::ProcessInput
5.82% libpthread-2.23.so [.] __pthread_mutex_unlock_usercnt
5.02% libxul.so [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadataForInterval<mozilla::Au
4.96% libpthread-2.23.so [.] pthread_mutex_lock
3.42% libxul.so [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::RemoveEle
3.35% libxul.so [.] mozilla::TrackUnionStream::CopyTrackData
3.28% libxul.so [.] mozilla::AudioBlock::AllocateChannels
3.08% firefox [.] je_free
3.04% [kernel] [k] clear_page_c_e
2.49% libxul.so [.] mozilla::AudioSegment::AppendAndConsumeChunk
2.46% [kernel] [k] page_fault
2.40% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Ens
1.94% libxul.so [.] mozilla::AudioNodeStream::ProcessInput
1.90% firefox [.] je_malloc
1.55% libxul.so [.] mozilla::AudioNodeStream::ObtainInputBlock
1.44% libxul.so [.] mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams
1.29% libxul.so [.] mozilla::StreamTracks::FindTrack
1.13% libxul.so [.] mozilla::AudioBlockCopyChannelWithScale
1.12% libxul.so [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadata
1.10% [kernel] [k] __rmqueue
1.08% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shi
1.03% libc-2.23.so [.] __memset_avx2
0.98% libxul.so [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::AppendEle
0.91% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shr
and with AutoTAray,
7.97% libxul.so [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::App(endSliceInternal)
7.77% libxul.so [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::App(endFrom)
5.57% libxul.so [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadataForInterval<mozilla::Au
4.67% libxul.so [.] mozilla::TrackUnionStream::ProcessInput
4.25% libxul.so [.] mozilla::AudioSegment::AppendAndConsumeChunk
4.18% libpthread-2.23.so [.] __pthread_mutex_unlock_usercnt
3.77% libxul.so [.] mozilla::AudioBlock::AllocateChannels
3.46% libxul.so [.] mozilla::TrackUnionStream::CopyTrackData
3.33% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Ens
3.16% [kernel] [k] clear_page_c_e
2.46% [kernel] [k] page_fault
2.28% libpthread-2.23.so [.] pthread_mutex_lock
2.22% libxul.so [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::RemoveEle
1.86% firefox [.] je_free
1.77% libxul.so [.] mozilla::AudioNodeStream::ProcessInput
1.66% libxul.so [.] mozilla::AudioNodeStream::ObtainInputBlock
1.55% libxul.so [.] nsTArray_CopyWithConstructors<mozilla::AudioChunk>::MoveNonOverlappingRegi
1.35% libxul.so [.] mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams
1.28% libxul.so [.] mozilla::AudioBlockCopyChannelWithScale
1.20% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shi
1.11% libxul.so [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::AppendEle
1.09% libxul.so [.] mozilla::StreamTracks::FindTrack
1.08% [kernel] [k] __rmqueue
1.03% firefox [.] je_malloc
0.96% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Swa
0.95% libc-2.23.so [.] __memset_avx2
time spent in malloc/free and mutex lock/unlock has definitely improved, as
has time in TrackUnionStream::ProcessInput().
However, time spent in mozilla::MediaSegmentBase<mozilla::AudioSegment,
mozilla::AudioChunk>::AppendSliceInternal() and
AudioSegment::AppendAndConsumeChunk() has increased.
This is in a fresh browser instance without anything else running.
I'm expecting the benefits of avoiding the less predictable behavior of
malloc/free and locking may be more significant when there are more
allocations involved.
Assignee | ||
Comment 10•7 years ago
|
||
Reporter | ||
Comment 11•7 years ago
|
||
mozreview-review |
Comment on attachment 8897161 [details]
bug 1172979 correct mBufferIsDownstreamRef documentation
https://reviewboard.mozilla.org/r/168432/#review174442
Attachment #8897161 -
Flags: review?(padenot) → review+
Reporter | ||
Comment 12•7 years ago
|
||
mozreview-review |
Comment on attachment 8897162 [details]
bug 1172979 permit retrieving const channel data from const AudioChunk
https://reviewboard.mozilla.org/r/168434/#review174444
Attachment #8897162 -
Flags: review?(padenot) → review+
Reporter | ||
Comment 13•7 years ago
|
||
mozreview-review |
Comment on attachment 8897163 [details]
bug 1172979 use AutoTArray for AudioChunk::mChannelData to reduce allocations
https://reviewboard.mozilla.org/r/168436/#review174450
Attachment #8897163 -
Flags: review?(padenot) → review+
Reporter | ||
Comment 14•7 years ago
|
||
Regardless of the CPU usage win (which is always welcome), reducing locking is really important here, thanks a lot.
Comment 15•7 years ago
|
||
Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/73ed1b91c4fe
correct mBufferIsDownstreamRef documentation r=padenot
https://hg.mozilla.org/integration/autoland/rev/367f82686e4d
permit retrieving const channel data from const AudioChunk r=padenot
https://hg.mozilla.org/integration/autoland/rev/6a8b37d47733
use AutoTArray for AudioChunk::mChannelData to reduce allocations r=padenot
Comment 16•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/73ed1b91c4fe
https://hg.mozilla.org/mozilla-central/rev/367f82686e4d
https://hg.mozilla.org/mozilla-central/rev/6a8b37d47733
Status: REOPENED → RESOLVED
Closed: 8 years ago → 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in
before you can comment on or make changes to this bug.
Description
•