Closed
Bug 647624
Opened 14 years ago
Closed 13 years ago
Remove property access inc/dec ops
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bhackett1024, Assigned: dvander)
References
(Blocks 1 open bug)
Details
(Whiteboard: [not a good introductory bug])
Attachments
(4 files, 2 obsolete files)
(deleted),
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
The pre- and post- inc/dec ops are very complex opcodes to implement in the interpreter and JITs. Handling overflow in these is very tricky for type inference and has caused a few bugs. JM implements these by decomposing them into their basic operations (i.e. propinc => getprop, convert to number, add 1, setprop), which is pretty unsatisfying both because of the ugliness and because of a vague desire going forward to make one stub call per opcode. TM implements INCELEM using imacros, which in a microbenchmark are more than eight times slower than if the op is expanded. TM implements the other inc/dec ops directly; times are shown below. INCPROP is slower than the expanded version, other incops are 20% to 100% faster.
INCELEM
x[i] = x[i] + 1 205
x[i]++ 1700
++x[i] 1700
INCPROP
x.p = x.p + 1 440
x.p++ 503
++x.p 503
INCGNAME
x = x + 1 330
x++ 217
++x 217
INCNAME
x = x + 1 619
x++ 500
++x 500
INCLOCAL/INCARG
a = a + 1 760
a++ 340
++a 340
INCLOCAL and INCARG probably shouldn't be removed (at least for now): they are significantly faster in the tracer, and they simplify upcoming analysis for bounds check hoisting (determining linear relationships holding in a loop). INCELEM/INCPROP could be removed immediately and improve perf (to my knowledge, these are not used at all on the three main benchmarks we track). Provided that the perf issues with INCGNAME and INCNAME could be sorted, those would also be nice to remove; the main issue is in combining the multiple property accesses with an arithmetic operation in the same bytecode.
Can the tracer eliminate redundant shape guards when there are no intervening calls that can change object shapes? If so, it doesn't seem that 'x = x + 1' should be any slower than 'x++' for a GNAME or NAME.
Comment 1•14 years ago
|
||
Some redundant guard elimination occurs in the tracer, but it all depends on what occurs in between.
Seeing the LIR for all of these alternative versions would be very instructive!
Assignee | ||
Updated•14 years ago
|
Whiteboard: [good first bug]
Assignee | ||
Comment 2•14 years ago
|
||
To avoid too much mucking with the decompiler and emitter, the goal of this patch is to perform the decomposition in the emitter and transform the existing ops into nops. There's a PCDELTA srcnote so the decompiler skips the decomposition. The method JIT, tracer, and interpreter implementations are removed.
The method JIT needed some hackery to implement JSOP_PICK.
Assignee: general → dvander
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•14 years ago
|
||
Assignee | ||
Comment 4•14 years ago
|
||
These cases were slower in the tracer than their decomposed counterparts. I'll leave the NAME/GNAME ops for a follow-up bug since those are currently faster in the tracer.
11 files changed, 265 insertions(+), 477 deletions(-)
No perf change on ss/v8. Some microbenchmarks that use the old/new ops in a loop:
ELEMINC benchmark:
pre -m: 71ms
pre -j: 59ms
post -m: 55ms
post -j: 59ms
PROPINC benchmark:
pre -m: 48ms
pre -j: 26ms
post -m: 56ms
post -j: 35ms
Attachment #524466 -
Attachment is obsolete: true
Attachment #524489 -
Attachment is obsolete: true
Attachment #524494 -
Flags: review?(dmandelin)
Assignee | ||
Comment 5•14 years ago
|
||
The POSTINC benchmark is:
for (var i = 0; i < 10000000; i++) {
a.x++;
}
So it got a fraction slower per iteration, but I think this is well worth it for the simplicity it will bring to TI and ionmonkey.
Reporter | ||
Comment 6•14 years ago
|
||
Oops, I suspect in comment 0 I was adding undefined properties of the object/array for INCELEM and INCPROP.
Reporter | ||
Comment 7•14 years ago
|
||
Also, for GNAME here is the tracer-generated code for infinite loops incrementing values:
x++
1: x/i $pc 0x69ffbc: mov 0x10(%ecx),%esi
1: x/i $pc 0x69ffbf: mov 0x827440,%ebx
1: x/i $pc 0x69ffc5: test %ebx,%ebx
1: x/i $pc 0x69ffc7: jne 0x6affa8
1: x/i $pc 0x69ffcd: mov 0x638(%esi),%ebx
1: x/i $pc 0x69ffd3: add $0x1,%ebx
1: x/i $pc 0x69ffd6: jo 0x6affbd
1: x/i $pc 0x69ffdc: mov %ebx,0x638(%esi)
1: x/i $pc 0x69ffe2: jmp 0x69ffbc
x = x + 1
1: x/i $pc 0x69ffa9: mov 0xc(%ecx),%edi
1: x/i $pc 0x69ffac: mov 0x10(%ecx),%esi
1: x/i $pc 0x69ffaf: mov 0x827440,%ebx
1: x/i $pc 0x69ffb5: test %ebx,%ebx
1: x/i $pc 0x69ffb7: jne 0x6affa8
1: x/i $pc 0x69ffbd: movl $0x1402028,(%edi)
1: x/i $pc 0x69ffc3: mov 0x638(%esi),%ebx
1: x/i $pc 0x69ffc9: mov %ebx,0x8(%edi)
1: x/i $pc 0x69ffcc: movl $0x1,0x10(%edi)
1: x/i $pc 0x69ffd3: add $0x1,%ebx
1: x/i $pc 0x69ffd6: jo 0x6affbd
1: x/i $pc 0x69ffdc: mov %ebx,0x638(%esi)
1: x/i $pc 0x69ffe2: jmp 0x69ffa9
It looks like the shape guard on the setprop *is* eliminated, but there are three extra stores issued before the overflow jump (syncing the stack?) and an extra load at the head of the loop (loading stack pointer?).
Comment 8•14 years ago
|
||
Are those something we think we'll be able to eliminate later? Making x++ pretty fast is actually important in some "benchmarks"...
Assignee | ||
Comment 9•14 years ago
|
||
I'm not removing (x++) yet where x is an arg/local. Eliminating the stack stores is not easy in the tracer. JM should be able to match the tracer, eventually, and then we can remove these ops entirely.
Comment 10•14 years ago
|
||
Comment on attachment 524494 [details] [diff] [review]
remove prop/elem inc ops
Patch looks good.
My only concern is about whether we won't like the effect on interpreter performance. Of course, in Firefox we run in the jits. David Anderson also points out that these ops are rare. So it seems OK. If we end up needing to get that interp perf back, it seems we can just put the interp cases back in, and let them skip the following ops--and then we still get to keep the jit simplifications.
Attachment #524494 -
Flags: review?(dmandelin) → review+
Updated•13 years ago
|
Reporter | ||
Comment 11•13 years ago
|
||
(In reply to comment #10)
> Comment on attachment 524494 [details] [diff] [review] [review]
> remove prop/elem inc ops
>
> Patch looks good.
>
> My only concern is about whether we won't like the effect on interpreter
> performance. Of course, in Firefox we run in the jits. David Anderson also
> points out that these ops are rare. So it seems OK. If we end up needing to
> get that interp perf back, it seems we can just put the interp cases back
> in, and let them skip the following ops--and then we still get to keep the
> jit simplifications.
Would be good to get this in pretty soon. It is also I think worth revisiting and extending the idea in comment 10 to remove all property access ops (including NAME/GNAME/etc.) from JM only: keep the ops and their semantics, but follow them up with their broken down version. The interpreter and tracer can run the fat ops, and JM and analysis/inference can run the broken down ops. If we need to rejoin in the middle of an INCPROP, we can point the regs at the appropriate place to resume and the interpreter will know what to do.
Assignee | ||
Comment 12•13 years ago
|
||
Yeah, this patch absolutely has to get in for IonMonkey to handle these opcodes. What do NAME/GNAME break into?
Reporter | ||
Comment 13•13 years ago
|
||
Mind if I take this patch/bug and rework per comment 11? I've been wanting to not think about these opcodes for soooooo long. NAME and GNAME are the same as PROP, except there is a BINDNAME or BINDGNAME up front. The consistent stack layouts were necessary to simplify (or, at least, make not-impossibly-complex) the handling of these ops in the interpoline.
Assignee | ||
Comment 14•13 years ago
|
||
(In reply to comment #13)
> Mind if I take this patch/bug and rework per comment 11? I've been wanting
> to not think about these opcodes for soooooo long. NAME and GNAME are the
> same as PROP, except there is a BINDNAME or BINDGNAME up front. The
> consistent stack layouts were necessary to simplify (or, at least, make
> not-impossibly-complex) the handling of these ops in the interpoline.
Please do take this patch :D but I think the NAME/GNAME thing should be a separate bug. This one is basically done, and it seems like NAME and GNAME might not necessarily be decomposed with the same trick, they're a lot simpler.
Comment 15•13 years ago
|
||
These ops date from '97 or so when I was benchmarking jsinterp.c code (profiled on my SGI Indy) against Perl 4!
/be
Assignee | ||
Comment 16•13 years ago
|
||
FWIW, v8, Nitro seem to have them too, so they must be convenient. Crankshaft just aborts if it sees an Inc/Dec node with a property access, so I'm guessing they ran into the same bailout problems that decomposition or imacros solve.
Comment 17•13 years ago
|
||
Comment on attachment 524494 [details] [diff] [review]
remove prop/elem inc ops
Why the JOF_TMPSLOT2 addition for JSOP_SWAP and JSOP_PICK?
/be
Reporter | ||
Comment 18•13 years ago
|
||
(In reply to comment #14)
> it seems like NAME and GNAME
> might not necessarily be decomposed with the same trick, they're a lot
> simpler.
Not sure about this, NAME and GNAME are both property incops (one on a scope chain object, the other on a global object) so with the extra BIND they should be more complicated, rather than less.
Assignee | ||
Comment 19•13 years ago
|
||
(In reply to comment #18)
Sorry, I misunderstood - totally, INCNAME/GNAME should go, the same tricks in this patch will work.
(In reply to comment #17)
> Why the JOF_TMPSLOT2 addition for JSOP_SWAP and JSOP_PICK?
JM implements these by simulating a bunch of stack movement, so it needs a little more temporary space than the interpreter.
Reporter | ||
Comment 20•13 years ago
|
||
With all the analysis changes pulled in on the JM merge this needs a new review. That being the case, I made some other changes:
- INCNAME, INCGNAME and friends also have decomposed versions. This allows removing all the code related to property incops in inference, JM, stub calls and interpoline.
- Going with the comment 10 strategy where the interpreter and tracer are basically unchanged (the interpreter runs the decomposed version if inference is on to accumulate types on the property accesses).
- Decomposed opcode lengths are generated into a table, rather than added as source notes (should be faster).
Double checked that tracer performance was not affected on any of the opcodes. JM+TI is faster than TM or d8 on all the opcodes except INCNAME, where TM is fastest (JM+TI should get faster here with the reentrant closures patch).
Attachment #544648 -
Flags: review?(dmandelin)
Updated•13 years ago
|
Attachment #544648 -
Flags: review?(dmandelin) → review?(dvander)
Reporter | ||
Comment 21•13 years ago
|
||
Reporter | ||
Comment 22•13 years ago
|
||
Some followup fixes to cover for inadequate testing before pushing:
- The method of setting the decomposed length of opcodes doesn't work with XDR, and is also totally broken in the presence of INDEXBASE opcodes. This table is removed, in favor of adding an extra byte at the end of DECOMPOSE ops storing the decomposed length.
- Some fixes so imacros work.
A test added for an obscure language feature (how many times the index is converted to an id in incelem ops) also breaks on trunk -j so is disabled until that is fixed (see bug 672076).
Attachment #546350 -
Flags: review?(dvander)
Reporter | ||
Updated•13 years ago
|
Reporter | ||
Comment 23•13 years ago
|
||
Patches from bugs 673710, 672153, and 673731 fixing regressions from this patch. Two of these are for special cases in the emitter which previously had no test coverage.
Attachment #548477 -
Flags: review?(dvander)
Reporter | ||
Updated•13 years ago
|
Whiteboard: [good first bug] → [not a good first bug]
Updated•13 years ago
|
Whiteboard: [not a good first bug] → [not a good introductory bug]
Assignee | ||
Comment 24•13 years ago
|
||
Comment on attachment 544648 [details] [diff] [review]
updated
Review of attachment 544648 [details] [diff] [review]:
-----------------------------------------------------------------
Nice, very glad to see even more code go.
::: js/src/jsopcode.h
@@ +526,5 @@
> +static inline uintN
> +GetDecomposeLength(JSOp op)
> +{
> + /* Table of lengths, updated on demand. See SetDecomposeLength. */
> + extern uint8 js_decomposeLengthTable[];
Nit: js_DecomposeLengthTable?
Attachment #544648 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 25•13 years ago
|
||
Comment on attachment 546350 [details] [diff] [review]
followup
Review of attachment 546350 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jsopcode.h
@@ +528,5 @@
> {
> + /*
> + * The last byte of a DECOMPOSE op stores the decomposed length. This can
> + * vary across different instances of an opcode due to INDEXBASE ops.
> + */
Could this bit of info be copied into either jsopcode.tbl or to the definition of JOF_DECOMPOSE?
Attachment #546350 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 26•13 years ago
|
||
Comment on attachment 548477 [details] [diff] [review]
followup 2
Review of attachment 548477 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #548477 -
Flags: review?(dvander) → review+
Updated•13 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•