Closed Bug 1576567 Opened 5 years ago Closed 5 years ago

Some interpreter loop optimizations

Categories

(Core :: JavaScript Engine: JIT, task, P1)

task

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(3 files)

Some minor optimizations:

  1. The table base address is currently a 10-byte MOV instruction for each opcode. We should use a RIP-relative LEA here (7 bytes, the difference adds up to a few hundred bytes and it's what C++ compilers do). More importantly, on ARM64 we currently use LDR where we can similarly use ADR (because we know the interpreter code is much smaller than 1 MB).

  2. We could order the list of bytecode ops in BaselineCodeGen.h on measured frequency in the browser. This should be a bit more cache friendly.

  3. The toggled call for the debugger is unfortunately implemented pretty inefficiently on ARM64 (sync stack pointer, LDR, NOP). We should get this down to one instruction, probably with toggledJump.

This affects the following platforms:

  • x64: use a RIP-relative LEA instead of an immediate MOV. This saves a few
    hundred bytes total and seems to be a little bit faster on interpreter
    micro-benchmarks.

  • arm64: use ADR instead of LDR.

We now use real NOPs on all platforms. On x86/x64 this used to be a CMP
instruction and on ARM64 this involved an unconditional LDR with some
other instructions.

Depends on D43413

Priority: -- → P1
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dae1e9839adc part 1 - Optimize table address loads in interpreter code. r=lth https://hg.mozilla.org/integration/autoland/rev/feec09fd96eb part 2 - Allow using nopPatchableToCall outside Wasm code and fix non-sensical return value. r=lth https://hg.mozilla.org/integration/autoland/rev/e72770318826 part 3 - Use real NOPs for debug trap handler calls in interpreter loop. r=tcampbell

(In reply to Jan de Mooij [:jandem] from comment #0)

  1. We could order the list of bytecode ops in BaselineCodeGen.h on measured frequency in the browser. This should be a bit more cache friendly.

I'm leaving this one for now, we can always try it later.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: