Closed
Bug 910782
Opened 11 years ago
Closed 11 years ago
indirect-goto-based interpreter
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: sunfish, Assigned: sunfish)
References
Details
(Whiteboard: [qa-])
Attachments
(19 files, 5 obsolete files)
For a tangentially-related experiment, I (re-)implemented an indirect-goto-based interpreter loop. I'm aware of some reasons why this may not be desirable upstream, so I don't mind if this isn't accepted, but I'm sending it upstream in case someone finds it interesting. It speeds up the interpreter by about 10-15%, but this is typically only noticeable when the JITs are disabled.
Along the way, I also made several cleanup patches, which I split out in case they are desirable even if the main indirect-goto patch is not.
Some differences between this new interp-indirect-goto.patch and the code removed in bug 799777 include:
- It gets rid of the global (to the interpreter loop) len and op variables, and significantly reduces the scope of the switchOp variable.
- The table contains offsets instead of absolute addresses, which is more PIC/PIE friendly (though it is slightly slower).
- There's only one table; the interrupt mechanism is simpler.
- The indirect-goto code shares more with the switch code; there's only one #ifdef.
Assignee | ||
Comment 1•11 years ago
|
||
Assignee: general → sunfish
Assignee | ||
Comment 2•11 years ago
|
||
Assignee | ||
Comment 3•11 years ago
|
||
Assignee | ||
Comment 4•11 years ago
|
||
Assignee | ||
Comment 5•11 years ago
|
||
Assignee | ||
Comment 6•11 years ago
|
||
Assignee | ||
Comment 7•11 years ago
|
||
Assignee | ||
Comment 8•11 years ago
|
||
Assignee | ||
Comment 9•11 years ago
|
||
Assignee | ||
Comment 10•11 years ago
|
||
Assignee | ||
Comment 11•11 years ago
|
||
Assignee | ||
Comment 12•11 years ago
|
||
This patch is the one the actually implements the indirect goto feature.
Assignee | ||
Comment 13•11 years ago
|
||
Comment on attachment 797337 [details] [diff] [review]
interp-cleanup-comments.patch
Looks like the patch in bug 877599 accidentally re-introduced an old comment. This patch removes it again, plus one other unneeded comment.
Attachment #797337 -
Flags: review?(jorendorff)
Assignee | ||
Comment 14•11 years ago
|
||
Comment on attachment 797338 [details] [diff] [review]
interp-cleanup-end-case.patch
Some fairly straight-forward cleanups.
Attachment #797338 -
Flags: review?(terrence)
Assignee | ||
Updated•11 years ago
|
Whiteboard: [leave open]
Comment 15•11 years ago
|
||
Comment on attachment 797338 [details] [diff] [review]
interp-cleanup-end-case.patch
Review of attachment 797338 [details] [diff] [review]:
-----------------------------------------------------------------
Nice! \o/
r=me
Attachment #797338 -
Flags: review?(terrence) → review+
Comment 16•11 years ago
|
||
We removed the indirect-goto code last time because it didn't seem to help performance and it was nice to avoid the extra macro goop. Do you see any interesting cases where this improves performance? (All the preparatory interp cleanup patches of course would be nice to land.)
Assignee | ||
Comment 17•11 years ago
|
||
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #16)
> We removed the indirect-goto code last time because it didn't seem to help
> performance and it was nice to avoid the extra macro goop. Do you see any
> interesting cases where this improves performance? (All the preparatory
> interp cleanup patches of course would be nice to land.)
I don't have any general-interest cases where this improves performance, but I haven't really looked. If I decide to propose the indirect-goto patch for review, I'll definitely collect some more data to support it. For now I'll just work on landing the cleanups.
Comment 19•11 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #18)
Sounds great!
Assignee | ||
Updated•11 years ago
|
Attachment #797339 -
Flags: review?(luke)
Assignee | ||
Updated•11 years ago
|
Attachment #797340 -
Flags: review?(luke)
Updated•11 years ago
|
Attachment #797339 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #797340 -
Flags: review?(luke) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #797342 -
Flags: review?(luke)
Assignee | ||
Comment 20•11 years ago
|
||
Attachment #797341 -
Attachment is obsolete: true
Attachment #798641 -
Flags: review?(luke)
Assignee | ||
Comment 21•11 years ago
|
||
Comment on attachment 797343 [details] [diff] [review]
interp-cleanup-outer-for.patch
This diff looks pretty crazy, it's mostly indentation changes. The regular cases in the interpreter loop body are left-justified, so this patch left-justifies the special cases for consistency, but feel free to disagree with that choice :-).
Attachment #797343 -
Flags: review?(luke)
Assignee | ||
Comment 22•11 years ago
|
||
Attachment #797344 -
Attachment is obsolete: true
Attachment #798647 -
Flags: review?(luke)
Assignee | ||
Comment 23•11 years ago
|
||
Comment on attachment 797347 [details] [diff] [review]
interp-prep-scoping.patch
I named this patch with "prep", but it's really a cleanup. InvokeState contains a RootedScript, so it's nice to give it a more explicit lifetime.
Attachment #797347 -
Flags: review?(luke)
Comment 24•11 years ago
|
||
Comment 25•11 years ago
|
||
Comment on attachment 797337 [details] [diff] [review]
interp-cleanup-comments.patch
Review of attachment 797337 [details] [diff] [review]:
-----------------------------------------------------------------
embarrassing
Attachment #797337 -
Flags: review?(jorendorff) → review+
Updated•11 years ago
|
Attachment #797342 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #797343 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #797347 -
Flags: review?(luke) → review+
Comment 26•11 years ago
|
||
Comment on attachment 798641 [details] [diff] [review]
interp-cleanup-branch.patch
Review of attachment 798641 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/vm/Interpreter.cpp
@@ +1303,5 @@
>
> #define BRANCH(n) \
> JS_BEGIN_MACRO \
> + int32_t nlen = (n); \
> + regs.pc += (nlen); \
does nlen need the parens?
Attachment #798641 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #798647 -
Flags: review?(luke) → review+
Assignee | ||
Comment 27•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #26)
> Comment on attachment 798641 [details] [diff] [review]
> interp-cleanup-branch.patch
>
> Review of attachment 798641 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: js/src/vm/Interpreter.cpp
> @@ +1303,5 @@
> >
> > #define BRANCH(n) \
> > JS_BEGIN_MACRO \
> > + int32_t nlen = (n); \
> > + regs.pc += (nlen); \
>
> does nlen need the parens?
Nope.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ebb5b0609fe6
https://hg.mozilla.org/integration/mozilla-inbound/rev/6df80a4fdb06
https://hg.mozilla.org/integration/mozilla-inbound/rev/d09951d9e0c0
https://hg.mozilla.org/integration/mozilla-inbound/rev/7792dc26b3e1
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e876da27431
https://hg.mozilla.org/integration/mozilla-inbound/rev/bf286f1d5489
Comment 28•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/ebb5b0609fe6
https://hg.mozilla.org/mozilla-central/rev/6df80a4fdb06
https://hg.mozilla.org/mozilla-central/rev/d09951d9e0c0
https://hg.mozilla.org/mozilla-central/rev/7792dc26b3e1
https://hg.mozilla.org/mozilla-central/rev/0e876da27431
https://hg.mozilla.org/mozilla-central/rev/bf286f1d5489
Assignee | ||
Comment 29•11 years ago
|
||
Attachment #797345 -
Attachment is obsolete: true
Assignee | ||
Comment 30•11 years ago
|
||
Attachment #797346 -
Attachment is obsolete: true
Assignee | ||
Comment 31•11 years ago
|
||
Assignee | ||
Comment 32•11 years ago
|
||
Assignee | ||
Comment 33•11 years ago
|
||
Assignee | ||
Comment 34•11 years ago
|
||
Attachment #797348 -
Attachment is obsolete: true
Assignee | ||
Comment 35•11 years ago
|
||
This patch moves the interpreter's FrameGuard, FrameRegs, and opMask variables into the InterpreterActivation, allowing InterpreterActivation's methods and other code to access them directly rather than through an extra pointer indirection.
Assignee | ||
Comment 36•11 years ago
|
||
Assignee | ||
Comment 37•11 years ago
|
||
Assignee | ||
Comment 38•11 years ago
|
||
For straight-line code, parser speed dwarfs interpreter speed. For loops above the baseline threshold, baseline/Ion take over for the win. But there is a small window around and below the baseline threshold where code runs in the interpreter where these patches often provide 10-15% speedups.
I've added several additional cleanup patches, and have further simplified the switch/goto macros, such that the maintenance burden of the threaded interpreter code should be quite low. There's only one #if, and it's pretty small and simple.
Also worth noting is bug 922440, where there is discussion of increasing the thresholds for the JITs on b2g, if not outright disabling them, either of which would put significantly more pressure on the interpreter. I haven't measured the performance of this patch on b2g, but a speedup is not unlikely.
Assignee | ||
Comment 39•11 years ago
|
||
Comment on attachment 821818 [details] [diff] [review]
interp-cleanup-case-macro-names.patch
This is just a simple rename; CASE instead of BEGIN_CASE because it really is just a case (and it's just a label in the indirect-goto version).
Attachment #821818 -
Flags: review?(luke)
Assignee | ||
Comment 40•11 years ago
|
||
Comment on attachment 821819 [details] [diff] [review]
interp-cleanup-cases.patch
This adds some missing opcodes, and makes brace usage consistent.
Attachment #821819 -
Flags: review?(luke)
Assignee | ||
Comment 41•11 years ago
|
||
Comment on attachment 821821 [details] [diff] [review]
interp-cleanup-var.patch
This cleans up what appears to be an obsolete optimization.
Attachment #821821 -
Flags: review?(luke)
Assignee | ||
Comment 42•11 years ago
|
||
Comment on attachment 821748 [details] [diff] [review]
interp-prep-do-macros.patch
Ok. This patch is not strictly a cleanup; it's the first step towards re-enabling the threaded interpreter.
For SunSpider on a 2012 Nexus 7 (quad-core Cortex-A9) with --no-baseline, I see a pretty clear 10% speedup overall with this patch series. Some benchmarks see more, some less, but I don't see any slowdowns. With baseline enabled, I don't see a speedup, though I don't see any slowdowns either. I see similar results on an Intel laptop and an Intel desktop machine.
I am proposing this now because:
- With all the cleanups, the complexity of this code is significantly lower than what was removed in bug 799777, and I claim the maintenance burden is significantly reduced.
- In situations where the interpreter is used, either because the JIT is expensive, possibly including bug 922440, or not available for some reason, it's a pretty clear 10%-ish win on every platform I'm able to test on, excluding Windows where the optimization is currently unavailable.
Attachment #821748 -
Flags: review?(luke)
Comment 43•11 years ago
|
||
I think somebody was using SpiderMonkey on iOS, which forces them to just use the interpreter. This would probably make them quite happy.
Updated•11 years ago
|
Attachment #821818 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #821748 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #821819 -
Flags: review?(luke) → review+
Comment 44•11 years ago
|
||
Comment on attachment 821821 [details] [diff] [review]
interp-cleanup-var.patch
Hehe, real hot.
Attachment #821821 -
Flags: review?(luke) → review+
Comment 45•11 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #42)
I'm down.
Assignee | ||
Comment 46•11 years ago
|
||
Comment on attachment 821817 [details] [diff] [review]
interp-prep-advance-and-dispatch.patch
This is the next preparatory patch. Instead of setting len and then doing ADVANCE_AND_OP_OP(), this passes the length to the macro. It also renames the macro to ADVANCE_AND_DISPATCH.
Attachment #821817 -
Flags: review?(luke)
Assignee | ||
Comment 47•11 years ago
|
||
Comment on attachment 821822 [details] [diff] [review]
interp-indirect-goto.patch
This is the main indirect goto patch. A few notes:
It eliminates the "op" variable, which was just a copy of *regs.pc. In theory this was an optimization, but it doesn't appear to be very important, and it does make the code more complex, since op and *regs.pc have to be manually kept in sync.
It changes the increment of regs.pc from being in a single location at advanceAndDoOp at the top of the interpreter loop to being duplicated at every ADVANCE_AND_DISPATCH. This is theoretically a minor pessimization for the case where a switch is used instead of indirect gotos, but this doesn't appear to be very significant.
Attachment #821822 -
Flags: review?(luke)
Updated•11 years ago
|
Attachment #821817 -
Flags: review?(luke) → review+
Assignee | ||
Comment 48•11 years ago
|
||
Comment on attachment 821827 [details] [diff] [review]
interp-indirection.patch
This patch folds the FrameGuard into InterpreterActivation and moves the interpreter's FrameRegs into InterpreterActivation too. And it fold in the opMask variable as well. The main effect of this is to eliminate InterpreterActivation's pointers to the FrameRegs, FrameGuard, and opMask, and FrameGuard's pointers. More code can access the data it needs without that indirection, and this results in about a 1% speedup of interpreter speed on Intel.
The main downside is the replacement of regs by the REGS macro in most of the interpreter body.
Attachment #821827 -
Flags: review?(luke)
Assignee | ||
Comment 49•11 years ago
|
||
Comment on attachment 821829 [details] [diff] [review]
interp-cleanup-interpret.patch
This just reorganized a few things at the top of the Interpreter function.
Attachment #821829 -
Flags: review?(luke)
Assignee | ||
Comment 50•11 years ago
|
||
Comment on attachment 821830 [details] [diff] [review]
interp-micro-mask.patch
This is a little optimization to finish things off. jsbytecode is a uint8_t, and this makes it easier for compilers to eliminate the zero-extend to size_t after or-ing the mask in. Since this happens after each opcode before the dispatch to the next opcode, this is actually noticeable.
Attachment #821830 -
Flags: review?(luke)
Comment 51•11 years ago
|
||
Comment on attachment 821822 [details] [diff] [review]
interp-indirect-goto.patch
This is much nicer than it was before.
Attachment #821822 -
Flags: review?(luke) → review+
Comment 52•11 years ago
|
||
Comment on attachment 821827 [details] [diff] [review]
interp-indirection.patch
Nice!
Attachment #821827 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #821829 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #821830 -
Flags: review?(luke) → review+
Comment 53•11 years ago
|
||
I was looking at this all a bit more and I had two few questions:
- Why did you decide to store offsets instead of absolute addresses of labels? Even if x86 can (I assume) fold in the displacement of &LABEL(JSOP_NOP), it still seems like an extra 4-byte immediate per jump instruction. Is there a downside to storing the addresses of the labels directly? It seems like it requires sizeof(void*) instead of sizeof(uint32_t) which could help on 64-bit. Another possibility is if the compiler was able to do a better job baking in the constant offsets than constant addresses of labels, but I wouldn't think so.
- By having 'offsets' be a static const, is the compiler able to bake in a constant address? It seems like, in the best case, DISPATCH_TO would be a single
jmp *(ADDRESS_OF_OFFSETS + op<<2)
instruction. Is this what you see in the generated code?
Assignee | ||
Comment 54•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #53)
> I was looking at this all a bit more and I had two few questions:
>
> - Why did you decide to store offsets instead of absolute addresses of
> labels? Even if x86 can (I assume) fold in the displacement of
> &LABEL(JSOP_NOP), it still seems like an extra 4-byte immediate per jump
> instruction. Is there a downside to storing the addresses of the labels
> directly? It seems like it requires sizeof(void*) instead of
> sizeof(uint32_t) which could help on 64-bit. Another possibility is if the
> compiler was able to do a better job baking in the constant offsets than
> constant addresses of labels, but I wouldn't think so.
Absolute addresses in static initializers require load-time relocations in PIC mode. See the remark about shared libraries in GCC's manual:
[0] http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
I'll add this info as a comment.
> - By having 'offsets' be a static const, is the compiler able to bake in a
> constant address? It seems like, in the best case, DISPATCH_TO would be a
> single
> jmp *(ADDRESS_OF_OFFSETS + op<<2)
> instruction. Is this what you see in the generated code?
Offsets actually make it slightly slower at runtime. On x64, the sequence with offsets is:
movslq (%r12,%rax,4),%rax
addq %r14,%rax
jmpq *%rax
It's easy enough to switch to absolute addresses if you prefer.
Comment 55•11 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #54)
> It's easy enough to switch to absolute addresses if you prefer.
I don't have any a priori preferences, I was just wanting to understand the choices. Thanks, I fogot about PIC code. Assuming we just eat the load-time cost, though, is there any noticeable speed difference using absolute addresses?
> movslq (%r12,%rax,4),%rax
> addq %r14,%rax
> jmpq *%rax
I assume %r12 is the address of 'offsets' and %r14 is the LABEL(JSOP_NOP)? If so, are these kept in registers or do most cases have to fill these registers from the stack or somewhere else?
Assignee | ||
Comment 56•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #55)
> (In reply to Dan Gohman [:sunfish] from comment #54)
> > It's easy enough to switch to absolute addresses if you prefer.
>
> I don't have any a priori preferences, I was just wanting to understand the
> choices. Thanks, I fogot about PIC code. Assuming we just eat the
> load-time cost, though, is there any noticeable speed difference using
> absolute addresses?
In some ad-hoc testing, absolute addresses appear to be about a 1% speedup at runtime.
> > movslq (%r12,%rax,4),%rax
> > addq %r14,%rax
> > jmpq *%rax
>
> I assume %r12 is the address of 'offsets' and %r14 is the LABEL(JSOP_NOP)?
Yep.
> If so, are these kept in registers or do most cases have to fill these
> registers from the stack or somewhere else?
It looks like they're in registers most of the time. They get rematerialized in a few places, but most of the time they're preserved.
Assignee | ||
Comment 57•11 years ago
|
||
Here's a patch which changes the interpreter to use absolute addresses.
On x86-64 Linux for example, it increases the number of R_X86_64_RELATIVE relocations in libmozjs-27.0a.so from 25709 to 25965. I hadn't realized it was already that high. Given that, the added cost is possibly not very significant, so the 1% runtime speedup may be worth it.
Attachment #822913 -
Flags: review?(luke)
Comment 58•11 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #56)
Thanks for looking into this!
> It looks like they're in registers most of the time. They get rematerialized
> in a few places, but most of the time they're preserved.
Wow, I'm impressed. I mean, to know that 'offsets' and 'LABEL(JSOP_NOP)' are in registers at the beginning of a case, the compiler has to know that they are in registers, and the same registers at end of every case. (Of course they are, since each indirect goto is using 'offsets' and 'LABEL(JSOP_NOP)', but still :)
Comment 59•11 years ago
|
||
Comment on attachment 822913 [details] [diff] [review]
interp-absolute.patch
Review of attachment 822913 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/vm/Interpreter.cpp
@@ +1220,4 @@
> # define OPPAD(v) \
> + (v) == EnableInterruptsPseudoOpcode ? \
> + LABEL(EnableInterruptsPseudoOpcode) : \
> + LABEL(default),
SM style is ? and : at the beginning of the line, aligned with the condition.
Alternatively, what if you added a third OPINT (in keeping with the 5-char macros :) that was only used for EnableInterruptsPseudoOpcode to avoid the ternary operator (which I realize should be const-evaluated; it might just read a bit better).
Attachment #822913 -
Flags: review?(luke) → review+
Assignee | ||
Comment 60•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #59)
> Comment on attachment 822913 [details] [diff] [review]
> interp-absolute.patch
>
> Review of attachment 822913 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: js/src/vm/Interpreter.cpp
> @@ +1220,4 @@
> > # define OPPAD(v) \
> > + (v) == EnableInterruptsPseudoOpcode ? \
> > + LABEL(EnableInterruptsPseudoOpcode) : \
> > + LABEL(default),
>
> SM style is ? and : at the beginning of the line, aligned with the condition.
Fixed. I also put parens around the whole thing.
> Alternatively, what if you added a third OPINT (in keeping with the 5-char
> macros :) that was only used for EnableInterruptsPseudoOpcode to avoid the
> ternary operator (which I realize should be const-evaluated; it might just
> read a bit better).
I considered this, but I also like keeping the Interpreter's implementation details out of jsopcode.tbl as much as possible. OPPAD is intrusion enough :-}.
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/0f80724faca5
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/88e167fa6f4b
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/d4322df6c15e
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/c5186c7a40ac
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/af637e56ce12
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/fb2bf717ab24
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/953dc75f2bb9
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/a2074887deb6
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/9a5e42743781
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/9540960e67a2
~
~
~
~
~
~
~
~
~
Whiteboard: [leave open]
Comment 61•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/0f80724faca5
https://hg.mozilla.org/mozilla-central/rev/88e167fa6f4b
https://hg.mozilla.org/mozilla-central/rev/d4322df6c15e
https://hg.mozilla.org/mozilla-central/rev/c5186c7a40ac
https://hg.mozilla.org/mozilla-central/rev/af637e56ce12
https://hg.mozilla.org/mozilla-central/rev/fb2bf717ab24
https://hg.mozilla.org/mozilla-central/rev/953dc75f2bb9
https://hg.mozilla.org/mozilla-central/rev/a2074887deb6
https://hg.mozilla.org/mozilla-central/rev/9a5e42743781
https://hg.mozilla.org/mozilla-central/rev/9540960e67a2
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Updated•11 years ago
|
Whiteboard: [qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•