Closed
Bug 877662
Opened 11 years ago
Closed 9 years ago
Optimize the AudioNodeEngine.cpp routines
Categories
(Core :: Web Audio, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla48
Tracking | Status | |
---|---|---|
firefox48 | --- | fixed |
People
(Reporter: ehsan.akhgari, Assigned: dminor)
References
(Blocks 1 open bug)
Details
(Keywords: perf, Whiteboard: [blocking-webaudio-])
Attachments
(9 files, 2 obsolete files)
(deleted),
text/x-csrc
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
patch
|
ehsan.akhgari
:
review+
roc
:
review-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
ehsan.akhgari
:
review+
|
Details | Diff | Splinter Review |
(deleted),
text/x-review-board-request
|
padenot
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
padenot
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
padenot
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
padenot
:
review+
|
Details |
MozReview Request: Bug 877662 - Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=ehsan
(deleted),
text/x-review-board-request
|
padenot
:
review+
|
Details |
We need SSE2 versions of those routines. Also it would be nice to ensure that AllocateAudioBlock always allocates 16-byte aligned buffers where we have SSE2 available.
Comment 1•11 years ago
|
||
Here is a reimplementation in sse2 of all the functions we have in AudioNodeEngine.cpp (but not WriteZeroesToAudioBlock, which is essentially a memset), in the form of a independent microbenchmark with various flavor of C++, hand-unrolled C++, SSE2 intrinsics and hand unrolled SSE2 intrinsics, with some basic tests of the output.
Comment 2•11 years ago
|
||
And some encouraging results.
Reporter | ||
Comment 3•11 years ago
|
||
Nice! BTW, I forgot to ask, do you also have NEON experience? We should definitely optimize these routines for mobile as well...
Comment 4•11 years ago
|
||
No, but this is something I've been considering for some time. Now is the time look into it, I think.
Reporter | ||
Comment 5•11 years ago
|
||
Mass moving Web Audio bugs to the Web Audio component. Filter on duckityduck.
Component: Video/Audio → Web Audio
Comment 6•11 years ago
|
||
Attachment #758671 -
Flags: review?(ehsan)
Comment 7•11 years ago
|
||
derf, if you have a minute, could you have a look at the functions in
AudioNodeEngineSSE.cpp?
Attachment #758675 -
Flags: review?(tterribe)
Attachment #758675 -
Flags: review?(ehsan)
Comment 8•11 years ago
|
||
I'm also planning on adding some form of compiled test for those functions.
Reporter | ||
Comment 9•11 years ago
|
||
Comment on attachment 758671 [details] [diff] [review]
Align AudioBlock on 16 bytes when SSE2 is available. r=
Review of attachment 758671 [details] [diff] [review]:
-----------------------------------------------------------------
::: content/media/AudioNodeEngine.cpp
@@ +14,2 @@
> nsRefPtr<SharedBuffer> buffer =
> SharedBuffer::Create(WEBAUDIO_BLOCK_SIZE*aChannelCount*sizeof(float));
Please MOZ_ASSERT the alignment here.
::: content/media/SharedBuffer.h
@@ +28,5 @@
> * Typically you would allocate one of these, fill it in, and then treat it as
> * immutable while it's shared.
> * This only guarantees 4-byte alignment of the data. For alignment we
> * simply assume that the refcount is at least 4-byte aligned and its size
> * is divisible by 4.
Please update this comment.
Attachment #758671 -
Flags: review?(ehsan) → review+
Reporter | ||
Comment 10•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review]
Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=
Review of attachment 758675 [details] [diff] [review]:
-----------------------------------------------------------------
Ted, can you please do the honors for the build system stuff here?
Attachment #758675 -
Flags: review?(ted)
Reporter | ||
Comment 11•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review]
Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=
Review of attachment 758675 [details] [diff] [review]:
-----------------------------------------------------------------
::: content/media/AudioNodeEngineSSE2.cpp
@@ +38,5 @@
> +
> + vscaled0 = _mm_mul_ps(vin0, vgain);
> + vscaled1 = _mm_mul_ps(vin1, vgain);
> + vscaled2 = _mm_mul_ps(vin2, vgain);
> + vscaled3 = _mm_mul_ps(vin3, vgain);
It might make sense to have a separate path for the aScale==1.0 case, but it's fine if you want to do that in a follow-up.
::: content/media/AudioNodeStream.cpp
@@ +331,5 @@
> AllocateAudioBlock(outputChannelCount, &aTmpChunk);
> float silenceChannel[WEBAUDIO_BLOCK_SIZE] = {0.f};
> // The static storage here should be 1KB, so it's fine
> + nsAutoTArray<float, GUESS_AUDIO_CHANNELS*WEBAUDIO_BLOCK_SIZE + 15> downmixBuffer;
> + float* alignedDownmixBuffer = ALIGN16(downmixBuffer.Elements());
Please declare this variable below, as this value here will be invalid if we end up doing a dynamic allocation. Also, we don't need this value here anyway.
@@ +358,2 @@
> for (uint32_t j = 0; j < outputChannelCount; ++j) {
> + outputChannels[j] = &alignedDownmixBuffer[j * WEBAUDIO_BLOCK_SIZE];
So here we're assuming that WEBAUDIO_BLOCK_SIZE is a multiple of 16, which is fine, but please static assert that here to be explicit about it.
Attachment #758675 -
Flags: review?(ehsan) → review+
Comment 12•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review]
Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=
Review of attachment 758675 [details] [diff] [review]:
-----------------------------------------------------------------
r=me with nits fixed.
::: content/media/AudioNodeEngineSSE2.cpp
@@ +8,5 @@
> +
> +#ifdef DEBUG
> + #define ASSERT_ALIGNED(ptr) \
> + MOZ_ASSERT((((uintptr_t)ptr + 15) & ~0x0F) == (uintptr_t)ptr, \
> + #ptr " has to be aligned on 16 bytes.");
"16-byte aligned"
@@ +54,5 @@
> + _mm_store_ps(&aOutput[i], vout0);
> + _mm_store_ps(&aOutput[i + 4], vout1);
> + _mm_store_ps(&aOutput[i + 8], vout2);
> + _mm_store_ps(&aOutput[i + 12], vout3);
> + }
Did you look at the compiler's output on x86-32? I worry that it will spill lots with this unrolled 4 times (since it can't prove the loads don't alias with the stores, it needs at least 9 registers, plus one because register allocation is NP-hard).
@@ +183,5 @@
> + _mm_store_ps(in, vout0);
> + _mm_store_ps(in+4, vout1);
> + _mm_store_ps(in+8, vout2);
> + _mm_store_ps(in+12, vout3);
> + in += 16;
It's generally better for the compiler if you do array indexing (like your other functions) than if you do pointer arithmetic.
::: content/media/AudioNodeStream.cpp
@@ +10,5 @@
> #include "ThreeDPoint.h"
>
> using namespace mozilla::dom;
>
> +#define ALIGN16(ptr) (float*)(((uintptr_t)ptr + 15) & ~0x0F);
Parentheses around ptr, as well as the whole macro body, and remove the trailing ;
@@ +330,5 @@
>
> AllocateAudioBlock(outputChannelCount, &aTmpChunk);
> float silenceChannel[WEBAUDIO_BLOCK_SIZE] = {0.f};
> // The static storage here should be 1KB, so it's fine
> + nsAutoTArray<float, GUESS_AUDIO_CHANNELS*WEBAUDIO_BLOCK_SIZE + 15> downmixBuffer;
You're allocating 15 extra floats when I think you want 15 extra bytes.
@@ +352,5 @@
> } else if (channels.Length() > outputChannelCount) {
> if (mChannelInterpretation == ChannelInterpretation::Speakers) {
> nsAutoTArray<float*,GUESS_AUDIO_CHANNELS> outputChannels;
> outputChannels.SetLength(outputChannelCount);
> + downmixBuffer.SetLength(outputChannelCount * WEBAUDIO_BLOCK_SIZE + 15);
See above.
::: content/media/Makefile.in
@@ +26,5 @@
> CFLAGS += $(GSTREAMER_CFLAGS)
> CXXFLAGS += $(GSTREAMER_CFLAGS)
> +
> +ifneq (,$(INTEL_ARCHITECTURE))
> +# gcc requires -msse2 on AudioNodeEngine.cpp since it uses SSE intrinsics.
The comment says -msse2 but the flag you're actually adding is -msse. Since you're using XMM registers, you want the former.
Also, shouldn't this be on AudioNodeEngineSSE2.$(OBJ_SUFFIX) instead?
@@ +28,5 @@
> +
> +ifneq (,$(INTEL_ARCHITECTURE))
> +# gcc requires -msse2 on AudioNodeEngine.cpp since it uses SSE intrinsics.
> +ifdef GNU_CC
> +AudioNodeEngine.$(OBJ_SUFFIX): CXXFLAGS+=-msse
The comment says -msse2 but the flag you're actually adding is -msse. Since you're using XMM registers, you want the former.
@@ +32,5 @@
> +AudioNodeEngine.$(OBJ_SUFFIX): CXXFLAGS+=-msse
> +endif
> +
> +ifdef SOLARIS_SUNPRO_CXX
> +AudioNodeEngine.$(OBJ_SUFFIX): CXXFLAGS+=-xarch=sse -xO4
Again, sse vs. sse2.
::: content/media/moz.build
@@ +43,5 @@
> MODULE = 'content'
>
> +# Are we targeting x86 or x64? If so, build SSE2 files.
> +if CONFIG['INTEL_ARCHITECTURE']:
> + DEFINES += ['-DUSE_SSE']
Unless you're planning to use gfx2d's Factory::HasSSE2() for some bizarre reason, I think it makes more sense to use xpcom's SSE.h and MOZILLA_MAY_SUPPORT_SSE2/MOZILLA_PRESUME_SSE2 (which you don't have to define yourself here).
Comment 13•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review]
Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=
Forgot to set the flag, too.
Attachment #758675 -
Flags: review?(tterribe) → review+
Comment 14•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review]
Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=
Review of attachment 758675 [details] [diff] [review]:
-----------------------------------------------------------------
Makefile/moz.build bits look fine modulo derf's concerns.
Attachment #758675 -
Flags: review?(ted) → review+
Comment 15•11 years ago
|
||
The previous patch did not include the SSE2 / scalar dispatch code, for some
reason. Here it is, in AudioNodeEngine.cpp. I basically followed the
recommendations at the top of SSE.h.
The unity gain fastpath is handled outside of SSE2/scalar specific code.
Attachment #759249 -
Flags: review?(ehsan)
Updated•11 years ago
|
Attachment #758675 -
Attachment is obsolete: true
Comment 16•11 years ago
|
||
(In reply to Timothy B. Terriberry (:derf) from comment #12)
> @@ +54,5 @@
> > + _mm_store_ps(&aOutput[i], vout0);
> > + _mm_store_ps(&aOutput[i + 4], vout1);
> > + _mm_store_ps(&aOutput[i + 8], vout2);
> > + _mm_store_ps(&aOutput[i + 12], vout3);
> > + }
>
> Did you look at the compiler's output on x86-32? I worry that it will spill
> lots with this unrolled 4 times (since it can't prove the loads don't alias
> with the stores, it needs at least 9 registers, plus one because register
> allocation is NP-hard).
http://www.pastebin.mozilla.org/2492550, so it looks like it's OK.
Reporter | ||
Updated•11 years ago
|
Attachment #759249 -
Flags: review?(ehsan) → review+
Comment 17•11 years ago
|
||
This makes sure that we get aligned pointers even when we could borrow data from
the AudioBufferSourceNode buffer.
Attachment #760660 -
Flags: review?(ehsan)
Updated•11 years ago
|
Attachment #759249 -
Attachment is obsolete: true
Reporter | ||
Comment 18•11 years ago
|
||
Comment on attachment 760660 [details] [diff] [review]
Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=
Review of attachment 760660 [details] [diff] [review]:
-----------------------------------------------------------------
Nice catch!
Note that I added a bunch of functions to AudioNodeEngine.cpp in bug 815643, sorry if it bitrots this patch. I filed bug 881587 to optimize those.
Attachment #760660 -
Flags: review?(ehsan) → review+
Comment on attachment 758671 [details] [diff] [review]
Align AudioBlock on 16 bytes when SSE2 is available. r=
Review of attachment 758671 [details] [diff] [review]:
-----------------------------------------------------------------
::: content/media/SharedBuffer.h
@@ +35,5 @@
> public:
> + void* Data() {
> + if (mozilla::supports_sse()) {
> + // return a pointer aligned on 16 bytes.
> + void * storage_start = (uint8_t*)(this + 1);
I think this should be unconditional. It's simpler that way, and there's pretty much no benefit to being less-aligned on non-SSE platforms.
Attachment #758671 -
Flags: review-
Blocks: 882171
Reporter | ||
Comment 20•11 years ago
|
||
Paul, can you please address comment 19 and land these patches?
Flags: needinfo?(paul)
Comment 21•11 years ago
|
||
Some code has landed that allocate new buffers (like in the convolver node), so I need to make all that aligned, and make sure it works. But yes, I'll get to it next week.
Flags: needinfo?(paul)
Reporter | ||
Updated•11 years ago
|
Whiteboard: [blocking-webaudio-]
Updated•11 years ago
|
Whiteboard: [blocking-webaudio-] → [blocking-webaudio-][webaudio-perf]
Updated•11 years ago
|
Keywords: perf
Whiteboard: [blocking-webaudio-][webaudio-perf] → [blocking-webaudio-]
Comment 22•11 years ago
|
||
Making a P1 since early results showed it would help with perf. If it's not going to have a big enough impact, we can drop it to P2.
Priority: -- → P1
Comment 23•11 years ago
|
||
Karl -- Paul is clearing his plate to focus on bug 848954. Can you pick this up? Also, see Comment 22. If you think this should be lowered to a P2, I'm fine with that. Thanks.
Assignee: paul → karlt
Comment 25•11 years ago
|
||
If we want to refocus on this (and its sister bug), we really should spend some time profiling. Last time I checked, it was not clear all the processing functions were the bottlenecks, and the code has moved a lot since then.
Comment 26•11 years ago
|
||
Yes, the only method I've seen over 1% of a busy profile was BufferComplexMultiply()
Blocks: 881587
Updated•9 years ago
|
Blocks: webaudioperf
Updated•9 years ago
|
Priority: P2 → P1
Updated•9 years ago
|
Rank: 15
Updated•9 years ago
|
Blocks: webaudioperf_parity
Updated•9 years ago
|
No longer blocks: webaudioperf
Comment 27•9 years ago
|
||
I talked with Karl today, and he and I agree that it'd be more efficient to find another person to work on this one. Same with Bug 881587.
Assignee: karlt → nobody
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → dminor
Assignee | ||
Comment 28•9 years ago
|
||
To be able to use SSE2 routines, we need to audio buffers to be allocated
on 16 byte boundaries.
Review commit: https://reviewboard.mozilla.org/r/44347/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/44347/
Attachment #8738465 -
Flags: review?(padenot)
Attachment #8738466 -
Flags: review?(padenot)
Attachment #8738467 -
Flags: review?(padenot)
Assignee | ||
Comment 29•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/44349/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/44349/
Assignee | ||
Comment 30•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/44351/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/44351/
Assignee | ||
Comment 31•9 years ago
|
||
Recent try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=685abc5801f3edfe63b9b0e85ffa203e1fab62d5&selectedJob=19059201
Status: NEW → ASSIGNED
Comment 32•9 years ago
|
||
Comment on attachment 8738467 [details]
MozReview Request: Bug 877662 - Use SSE2 versions of AudioNodeEngine functions r?padenot
https://reviewboard.mozilla.org/r/44351/#review41969
::: dom/media/webaudio/blink/ReverbAccumulationBuffer.cpp:100
(Diff revision 1)
> MOZ_ASSERT(isSafe);
> if (!isSafe)
> return 0;
>
> - AudioBufferAddWithScale(source, 1.0f, destination + writeIndex, numberOfFrames1);
> +#ifdef USE_SSE2
> + // It is unlikely either the source is aligned or the number of values
This is not ideal, we should either:
- fall back to scalar when we have a number of items that is not a multiple of 16
- fall back to scalar if the input aligned, or process the first few elements in scalar and switch to vector and back for the last few items.
Can you file a followup for this?
Attachment #8738467 -
Flags: review?(padenot) → review+
Comment 33•9 years ago
|
||
Comment on attachment 8738465 [details]
MozReview Request: Bug 877662 - Align audio buffer allocations to 16 byte boundaries r?padenot
https://reviewboard.mozilla.org/r/44347/#review41965
::: dom/media/webaudio/AudioBlock.cpp
(Diff revision 1)
> * buffer can reuse and modify its contents next iteration if other references
> * are all downstream temporary references held by AudioBlock.
> - *
> - * This only guarantees 4-byte alignment of the data. For alignment we simply
> - * assume that the memory from malloc is at least 4-byte aligned and that
> - * AudioBlockBuffer's size is divisible by 4.
Can you leave a note about alignment here ?
::: dom/media/webaudio/AudioNodeExternalInputStream.cpp:91
(Diff revision 1)
> {
> NS_ASSERTION(aSegment->GetDuration() == WEBAUDIO_BLOCK_SIZE, "Bad segment duration");
>
> +// To be safe, let's copy data if we're using SSE2 to ensure everything is
> +// aligned properly. Simply checking for channel alignment does not seem
> +// to be sufficient here.
Do we know why ?
::: dom/media/webaudio/BiquadFilterNode.cpp:141
(Diff revision 1)
> const AudioBlock& aInput,
> AudioBlock* aOutput,
> bool* aFinished) override
> {
> - float inputBuffer[WEBAUDIO_BLOCK_SIZE];
> + float inputBuffer[WEBAUDIO_BLOCK_SIZE + 4];
> + float* alignedInputBuffer = (float* )(((uintptr_t)inputBuffer + 15) & ~0x0F);
Maybe assert the alignment here ? Or maybe I'm just paranoid, do we have the guarantee that the stack is aligned in all the platforms we support ?
Attachment #8738465 -
Flags: review?(padenot) → review+
Comment 34•9 years ago
|
||
Comment on attachment 8738466 [details]
MozReview Request: Bug 877662 - Update SSE2 versions of AudioNodeEngine functions r?padenot
https://reviewboard.mozilla.org/r/44349/#review41967
::: dom/media/webaudio/AudioNodeEngineSSE2.cpp:31
(Diff revision 1)
> +void
> +AudioBufferAddWithScale_SSE(const float* aInput,
> + float aScale,
> + float* aOutput,
> + uint32_t aSize)
> +{
How are all those different from my initial patch ? It's a bit hard to review, can you make this sit on top of the old patches?
Attachment #8738466 -
Flags: review?(padenot)
Assignee | ||
Comment 35•9 years ago
|
||
https://reviewboard.mozilla.org/r/44347/#review41965
> Do we know why ?
I think I was accidentally only checking the alignment of the first channel. Works fine if I iterate over each channel.
Assignee | ||
Comment 36•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/46183/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/46183/
Attachment #8738466 -
Attachment description: MozReview Request: Bug 877662 - Add SSE2 versions of AudioNodeEngine functions r?padenot → MozReview Request: Bug 877662 - Update SSE2 versions of AudioNodeEngine functions r?padenot
Attachment #8741091 -
Flags: review?(padenot)
Attachment #8738466 -
Flags: review?(padenot)
Assignee | ||
Comment 37•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/46185/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/46185/
Assignee | ||
Comment 38•9 years ago
|
||
Comment on attachment 8738465 [details]
MozReview Request: Bug 877662 - Align audio buffer allocations to 16 byte boundaries r?padenot
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44347/diff/1-2/
Assignee | ||
Comment 39•9 years ago
|
||
Comment on attachment 8738466 [details]
MozReview Request: Bug 877662 - Update SSE2 versions of AudioNodeEngine functions r?padenot
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44349/diff/1-2/
Assignee | ||
Comment 40•9 years ago
|
||
Comment on attachment 8738467 [details]
MozReview Request: Bug 877662 - Use SSE2 versions of AudioNodeEngine functions r?padenot
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44351/diff/1-2/
Updated•9 years ago
|
Attachment #8741091 -
Flags: review?(padenot) → review+
Comment 41•9 years ago
|
||
Comment on attachment 8741091 [details]
MozReview Request: Bug 877662 - Add AlignmentUtils.h r?padenot
https://reviewboard.mozilla.org/r/46183/#review42969
Comment 42•9 years ago
|
||
Comment on attachment 8741092 [details]
MozReview Request: Bug 877662 - Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=ehsan
https://reviewboard.mozilla.org/r/46185/#review42971
I've written this, but I remember it being reviewed by someone, so I'm setting it as r+.
Attachment #8741092 -
Flags: review+
Comment 43•9 years ago
|
||
Comment on attachment 8738466 [details]
MozReview Request: Bug 877662 - Update SSE2 versions of AudioNodeEngine functions r?padenot
https://reviewboard.mozilla.org/r/44349/#review42973
Attachment #8738466 -
Flags: review?(padenot) → review+
Assignee | ||
Comment 44•9 years ago
|
||
https://reviewboard.mozilla.org/r/46185/#review42971
ehsan did the review of your original patch.
Comment 45•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/826d16396107
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ae01cbc5549
https://hg.mozilla.org/integration/mozilla-inbound/rev/6e96c35c78bd
https://hg.mozilla.org/integration/mozilla-inbound/rev/5f1d898a440e
https://hg.mozilla.org/integration/mozilla-inbound/rev/58f6d3815cac
Comment 46•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/826d16396107
https://hg.mozilla.org/mozilla-central/rev/5ae01cbc5549
https://hg.mozilla.org/mozilla-central/rev/6e96c35c78bd
https://hg.mozilla.org/mozilla-central/rev/5f1d898a440e
https://hg.mozilla.org/mozilla-central/rev/58f6d3815cac
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in
before you can comment on or make changes to this bug.
Description
•