Closed
Bug 988950
Opened 11 years ago
Closed 11 years ago
Improve performance of generational barriers
Categories
(Core :: JavaScript: GC, enhancement)
Core
JavaScript: GC
Tracking
()
RESOLVED
FIXED
mozilla31
People
(Reporter: jonco, Assigned: jonco)
References
(Blocks 1 open bug)
Details
Attachments
(9 files, 3 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mjrosenb
:
review+
|
Details | Diff | Splinter Review |
The generational post-barriers need to work out whether a pointer points from the tenured heap into the nursery. Currently this is done by range testing on both pointers.
Now that we have a ChunkTrailer that is common between heap and nursery chunks we can put a flag there to indicate whether a chunk is part of the nursery. This would make the check much simpler.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → jcoppeard
Assignee | ||
Comment 1•11 years ago
|
||
Here's a first cut of this.
I benchmarked on x64 only which showed a promising 2.8% improvement on Richards over 20 runs.
Now looking into testing on ARM.
Comment 2•11 years ago
|
||
Comment on attachment 8400049 [details] [diff] [review]
bug988950-barriers
Review of attachment 8400049 [details] [diff] [review]:
-----------------------------------------------------------------
Looks great! Pre-emptive r=me. Please flag Jan for a second review of the jit bits.
::: js/public/HeapAPI.h
@@ +25,5 @@
> CurrentThreadCanAccessZone(JS::Zone *zone);
>
> namespace gc {
>
> +struct Cell;
\o/
We'll want to use this to kill void* use in the browser, but that's a different patch.
@@ +174,5 @@
> #endif
> }
>
> +MOZ_ALWAYS_INLINE bool
> +IsInsideNursery(const JS::shadow::Runtime *runtime, const js::gc::Cell *cell)
Please remove the runtime parameter of this instance to make it harder to accidentally get the wrong variant: calling this on a malloc pointer would be disastrous and the browser is uncomfortably eager to cast gc things to void*.
::: js/src/jit/CodeGenerator.cpp
@@ +1791,5 @@
> + * To get the address of the location field in the chunk trailer, we OR the
> + * object address with the chunk mask to give us the address of the last
> + * byte in the chunk, and then subtract off an offset.
> + */
> + const int addressOffset = 1 + gc::ChunkLocationOffset - gc::ChunkSize;
Nice!
@@ +1805,5 @@
> Register objreg = ToRegister(lir->object());
> + masm.movePtr(objreg, temp);
> + masm.orPtr(Imm32(gc::ChunkMask), temp);
> + masm.branch32(Assembler::NotEqual, Address(temp, addressOffset),
> + Imm32(gc::ChunkLocationTenuredHeap), ool->rejoin());
It would be interesting to see if we useRegisterAtStart and clobber instead of taking a temp would be faster and where. Probably not worth fussing over until we break down and make this all platform specific.
Attachment #8400049 -
Flags: review+
Assignee | ||
Comment 4•11 years ago
|
||
Here's another approach based on the patch in bug 922354 (I made some changes to get this to compile for ARM).
Assignee | ||
Comment 5•11 years ago
|
||
I did some performance testing of this, and it seems that both patches improve performance of Richards by about 5% on X64. However the first patch is slightly slower on ARM whereas the second is slightly faster. Overall score improvement seems to be masked by the usual variations in splay result.
I think on balance we should probably go with the second patch as it's simpler too.
It should be possible to improve this too by not making it use a temporary register where possible.
Assignee | ||
Updated•11 years ago
|
Attachment #8400049 -
Attachment is obsolete: true
Assignee | ||
Comment 6•11 years ago
|
||
Comment on attachment 8401823 [details] [diff] [review]
bug988950-approach2
Requesting review for this as it is, we can take the improvement now and refine further at our leisure.
Attachment #8401823 -
Flags: review?(terrence)
Assignee | ||
Comment 7•11 years ago
|
||
Comment on attachment 8401823 [details] [diff] [review]
bug988950-approach2
Requesting review for JIT changes.
Attachment #8401823 -
Flags: review?(jdemooij)
Comment 8•11 years ago
|
||
Comment on attachment 8401823 [details] [diff] [review]
bug988950-approach2
Review of attachment 8401823 [details] [diff] [review]:
-----------------------------------------------------------------
r=me
Attachment #8401823 -
Flags: review?(terrence) → review+
Comment 9•11 years ago
|
||
Comment on attachment 8401823 [details] [diff] [review]
bug988950-approach2
Review of attachment 8401823 [details] [diff] [review]:
-----------------------------------------------------------------
Nice.
::: js/src/jit/CodeGenerator.cpp
@@ +1793,4 @@
> Register objreg = ToRegister(lir->object());
> + masm.movePtr(ImmWord(-ptrdiff_t(nursery.start())), temp);
> + masm.addPtr(objreg, temp);
> + masm.branchPtr(Assembler::Below, temp, ImmWord(nursery.heapEnd() - nursery.start()),
nursery.heapEnd() - nursery.start() should always fit in 32 bits right? In that case you can use Imm32 instead of ImmWord (branchPtr instead of branch32 will ensure we still compare the full 64-bit value).
This avoids an extra mov instruction on x64, because cmpq can't have a 64-bit immediate. In some cases we optimize this in the macro-assembler, but not this one, looks like.
Same for the 4 cases below.
Attachment #8401823 -
Flags: review?(jdemooij) → review+
Comment 10•11 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #9)
> Same for the 4 cases below.
3 cases..
Comment 11•11 years ago
|
||
Just realized you may have to add cmpPtr(Register, Imm32) to the macro assemblers to get this to compile. It should be very similar to cmpPtr(Operand, Imm32) (and on 32-bit platforms exactly the same as cmp32).
Assignee | ||
Comment 12•11 years ago
|
||
Thanks for the comments! I'm asking for review again to check I got the macro assembler changes right, and as I also fixed a problem in the second half CodeGenerator::visitPostWriteBarrierV() when valuereg ended up being the same register as the temp.
Attachment #8401823 -
Attachment is obsolete: true
Attachment #8405436 -
Flags: review?(jdemooij)
Comment 13•11 years ago
|
||
Comment on attachment 8405436 [details] [diff] [review]
bug988950-approach2 v2
Review of attachment 8405436 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/CodeGenerator.cpp
@@ +1874,5 @@
> + * This is a little complicated because sometimes the value object address
> + * ends up in a temp register, and sometimes it's available in a register
> + * already. We can trash the temp, but not the original register. And if
> + * it's in the temp, we mustn't trash this before we use its value!
> + */
Nit: //-style-comments in jit/*
@@ +1878,5 @@
> + */
> + Register temp = ToRegister(lir->temp());
> + Register valuereg = masm.extractObject(value, temp);
> + if (temp != valuereg)
> + masm.movePtr(valuereg, temp);
You can use unboxObject so that you don't need valuereg and the if:
Register temp = ToRegister(lir->temp());
masm.unboxObject(value, temp);
@@ +1880,5 @@
> + Register valuereg = masm.extractObject(value, temp);
> + if (temp != valuereg)
> + masm.movePtr(valuereg, temp);
> + masm.rshiftPtr(Imm32(Nursery::ChunkShift), temp);
> + masm.subPtr(Imm32(nursery.start() >> Nursery::ChunkShift), temp);
Ah, that's clever! Would be good to add a short comment explaining why Imm32 is fine on x64 (nursery.start() always fits in 47 bits for Value format) and the following assert:
MOZ_ASSERT((nursery.start() >> Nursery::ChunkShift) < INT32_MAX);
::: js/src/jit/arm/MacroAssembler-arm.h
@@ +989,5 @@
> movePtr(imm, ScratchRegister);
> branchPtr(cond, lhs, ScratchRegister, label);
> }
> + void branchPtr(Condition cond, Register lhs, Imm32 imm, Label *label) {
> + branchPtr(cond, lhs, imm, label);
branch32
Attachment #8405436 -
Flags: review?(jdemooij) → review+
Comment 14•11 years ago
|
||
Thinking about this more, I don't think you need the separate rshiftPtr + subPtr methods if you do:
masm.addPtr(ImmWord(-ptrdiff_t(nursery.start())), temp);
On x64 we have a dedicated scratch register, and addPtr will use it internally if nursery.start() doesn't fit in 32-bits.
Assignee | ||
Comment 15•11 years ago
|
||
Whiteboard: [leave open]
Comment 16•11 years ago
|
||
Assignee | ||
Comment 17•11 years ago
|
||
To guide further optimisation, I gathered some stats for postbarrier instructions for the v8 benchmarks. o is the object whose property is being written and v is the value being written to it:
instrs vObj oCnst oNrs vNurs taken
Richards: 1623943 48.8 0.0 100.0 100.0 0.0
DeltaBlue: 1048647 98.4 0.0 99.2 75.8 0.5
Crypto: 20346 100.0 0.0 99.9 100.0 0.1
RayTrace: 1585805 100.0 0.0 100.0 73.1 0.0
EarleyBoyer: 14206107 47.0 8.5 91.5 90.5 8.5
RegExp: 0
Splay: 2035582 87.4 0.0 35.1 34.5 8.2
NavierStokes: 0
Total: 20520430 57.9 5.9 87.6 83.6 6.7
instrs: postbarrier instructions executed
vObj: LPostWriteBarrierO instruction rather than LPostWriteBarrierV
oCnst: o is known to be a constant
oNurs: o was in the nursery
vNurs: v was in the nursery
taken: the postbarrier was called
The last five columns are percentages.
Comment 18•11 years ago
|
||
Great work! That is beautify data. It may be time for us to start thinking about what sort of static or dynamic analyses could let us elide barriers in the common cases. Maybe we could look at the top couple of instances in E-B to see what the JS looks like?
Assignee | ||
Comment 19•11 years ago
|
||
Here's the code we currently generate for these barriers on different architectures.
Assignee | ||
Comment 20•11 years ago
|
||
In the value postbarrier, we currently check the whether the value is an object before we check whether the object whose property is being set is tenured.
For the v8 benchmark, the first check results in us skipping the barrier 43% of the time, whereas the second check lets us skip the barrier 87% of the time.
This suggests we should do these checks the other way around, to allow us to skip the barrier check earlier more frequently.
Attachment #8410374 -
Flags: review?(terrence)
Assignee | ||
Comment 21•11 years ago
|
||
Factor out checks nursery checks into two new methods in the macro assembler.
Attachment #8410376 -
Flags: review?(jdemooij)
Assignee | ||
Comment 22•11 years ago
|
||
Move these nursery checks to architecture specific classes to allow optimization per-architecture (no code changes).
Attachment #8410378 -
Flags: review?(jdemooij)
Assignee | ||
Comment 23•11 years ago
|
||
Use the scratch register rather than requiring a temp on x64 and ARM.
Attachment #8410380 -
Flags: review?(jdemooij)
Assignee | ||
Comment 24•11 years ago
|
||
Improve x86 value test by combining tag and payload comparisons.
Attachment #8410386 -
Flags: review?(jdemooij)
Comment 25•11 years ago
|
||
Comment on attachment 8410376 [details] [diff] [review]
1-add-nursery-check-instruction
Review of attachment 8410376 [details] [diff] [review]:
-----------------------------------------------------------------
Nice.
::: js/src/jit/IonMacroAssembler.cpp
@@ +1790,5 @@
> +#ifdef JSGC_GENERATIONAL
> +
> +void
> +MacroAssembler::branchPtrInNurseryRange(Register ptr, Register temp, Label *label)
> +{
You can also call this method in ICStubCompiler::emitPostWriteBarrierSlot in BaselineIC.cpp. The value check there branches if *not* in the nursery; that one doesn't need to be changed in this patch.
(Also, I just noticed that the barriers in BaselineCompiler.cpp check that the object is tenured and one of them checks that the value is an object, but we don't ensure it's a nursery object, maybe worth fixing..)
Attachment #8410376 -
Flags: review?(jdemooij) → review+
Comment 26•11 years ago
|
||
Comment on attachment 8410374 [details] [diff] [review]
0-check-value-type-after-object
Review of attachment 8410374 [details] [diff] [review]:
-----------------------------------------------------------------
A zero line patch that provides a provable performance boost? Brilliant! r=me
Attachment #8410374 -
Flags: review?(terrence) → review+
Comment 27•11 years ago
|
||
Comment on attachment 8410378 [details] [diff] [review]
2-split-macro-assembler-methods
Review of attachment 8410378 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +4342,5 @@
> + JS_ASSERT(ptr != temp);
> + JS_ASSERT(temp != InvalidReg);
> +
> + const Nursery &nursery = GetIonContext()->runtime->gcNursery();
> + movePtr(ImmWord(-ptrdiff_t(nursery.start())), temp);
Pre-existing, but will this do the right thing if nursery.start() has the upper bit set?
Attachment #8410378 -
Flags: review?(jdemooij) → review+
Comment 28•11 years ago
|
||
Comment on attachment 8410380 [details] [diff] [review]
3-use-scratch-reg-where-possible
Review of attachment 8410380 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +433,5 @@
> void
> MacroAssemblerX86::branchPtrInNurseryRange(Register ptr, Register temp, Label *label)
> {
> JS_ASSERT(ptr != temp);
> + JS_ASSERT(temp != InvalidReg); // A temp register is requied for x86.
Nit: required
Attachment #8410380 -
Flags: review?(jdemooij) → review+
Comment 29•11 years ago
|
||
Comment on attachment 8410380 [details] [diff] [review]
3-use-scratch-reg-where-possible
Review of attachment 8410380 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +449,5 @@
>
> branchTestObject(Assembler::NotEqual, value, &done);
>
> Register obj = extractObject(value, temp);
> + branchPtrInNurseryRange(obj, temp, label);
Sorry I didn't notice this until I got to the last patch, but instead of extractObject you can pass value.payloadReg() to branchPtrInNurseryRange. Same for ARM/MIPS.
Comment 30•11 years ago
|
||
Comment on attachment 8410386 [details] [diff] [review]
5-improve-x64
Review of attachment 8410386 [details] [diff] [review]:
-----------------------------------------------------------------
This is cool!
::: js/src/jit/x64/Lowering-x64.h
@@ +35,5 @@
>
> LDefinition tempToUnbox();
>
> bool needTempForObjectInNurseryRange() { return false; }
> + bool needTempForValueIsNurseryObject() { return false; }
With this change, these two return the same value on x86/x64/ARM. Maybe we can have a single needTempForPostBarrier instead?
Attachment #8410386 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 31•11 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #25)
Thanks for all the reviews!
I filed bug 1000100 for the baseline improvements.
>> + movePtr(ImmWord(-ptrdiff_t(nursery.start())), temp);
> Pre-existing, but will this do the right thing if nursery.start() has the upper bit set?
On ARM the result of -ptrdiff_t(nursery.start()) ends up for example as 0x4a500000, so I think this works.
Other comments addressed.
Assignee | ||
Comment 32•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/116fabfb50b5
https://hg.mozilla.org/integration/mozilla-inbound/rev/b72e6cc23574
https://hg.mozilla.org/integration/mozilla-inbound/rev/13d60930dab2
https://hg.mozilla.org/integration/mozilla-inbound/rev/174895f05c9f
https://hg.mozilla.org/integration/mozilla-inbound/rev/691d410b7f59
Comment 33•11 years ago
|
||
Comment 34•11 years ago
|
||
This added two build warnings, as Nursery is now unused in BaselineCompiler.cpp:
In file included from /home/ben/code/moz/builds/d64/js/src/Unified_cpp_js_src2.cpp:67:0:
/home/ben/code/moz/inbound/js/src/jit/BaselineCompiler.cpp: In member function ‘bool
js::jit::BaselineCompiler::emit_JSOP_SETALIASEDVAR()’:
/home/ben/code/moz/inbound/js/src/jit/BaselineCompiler.cpp:2105:14: warning: unused variable ‘nursery’
[-Wunused-variable]
Nursery &nursery = cx->runtime()->gcNursery;
^
In file included from /home/ben/code/moz/builds/d64/js/src/Unified_cpp_js_src2.cpp:67:0:
/home/ben/code/moz/inbound/js/src/jit/BaselineCompiler.cpp: In member function ‘bool
js::jit::BaselineCompiler::emitFormalArgAccess(uint32_t, bool)’:
/home/ben/code/moz/inbound/js/src/jit/BaselineCompiler.cpp:2425:18: warning: unused variable ‘nursery’
[-Wunused-variable]
Nursery &nursery = cx->runtime()->gcNursery;
^
Attachment #8411612 -
Flags: review?(jdemooij)
Updated•11 years ago
|
Assignee: jcoppeard → benj
Status: NEW → ASSIGNED
Updated•11 years ago
|
Assignee: benj → jcoppeard
Comment 35•11 years ago
|
||
oops, sorry didn't mean to assign the bug to myself, bzexport did that in my behalf :/
Updated•11 years ago
|
Attachment #8411612 -
Flags: review?(jdemooij) → review+
Comment 36•11 years ago
|
||
Comment 37•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #8405436 -
Attachment is obsolete: true
Assignee | ||
Comment 38•11 years ago
|
||
This is only a small win, but by doing the comparison in terms of chunks rather than raw addresses, we can load the nursery start in one MOV rather than two, saving in total two instructions for each barrier.
Attachment #8412535 -
Flags: review?(mrosenberg)
Comment 39•11 years ago
|
||
Comment on attachment 8412535 [details] [diff] [review]
4-improve-arm
Review of attachment 8412535 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +4361,5 @@
> +
> + ma_mov(Imm32(startChunk), ScratchRegister);
> + as_rsb(ScratchRegister, ScratchRegister, lsr(ptr, Nursery::ChunkShift));
> + branch32(cond == Assembler::Equal ? Assembler::Below : Assembler::AboveOrEqual,
> + ScratchRegister, Imm32(Nursery::NumNurseryChunks), label);
There should be an assert that Nursery::NumNurseryChunks < 256, because things will go very wrong (ScratchRegister will get re-used to load the constant)
Alternately, you can use SecondScratchRegister, and not worry about it.
Attachment #8412535 -
Flags: review?(mrosenberg) → review+
Assignee | ||
Comment 40•11 years ago
|
||
Comment 41•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Updated•11 years ago
|
Target Milestone: --- → mozilla31
You need to log in
before you can comment on or make changes to this bug.
Description
•