Closed Bug 631637 Opened 14 years ago Closed 11 years ago

JM: Measure per-opcode codegen size

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dvander, Assigned: dvander)

References

Details

(Whiteboard: [MemShrink:P3])

Attachments

(2 files)

Attached patch instrumentation (deleted) — Splinter Review
This patch, for every script, emits the following statistics about each op it compiled: (1) How many times that op was encountered (2) The total number of inline bytes generated for that op (3) The total number of out-of-line bytes generated for that op (4) The total number of sync/vmcall bytes for that op, either inline or OOL (this is a subset of 2 and 3)
Attached file quora run (deleted) —
This is the result of logging into Quora and opening four questions. We appear to generate 10MB of inline code and 11MB of OOL code. Of that 21MB, about 9MB goes to sync/vmcalls. The top offending opcodes are: * call (about 130 bytes on il and ool path) * callprop, setprop, getprop (120-160 bytes on ool path) * name (90 bytes on ool path) * getelem (100 bytes on il path, 150 bytes on ool path) * lambda (65 bytes on il path, 55 in sync) * getgname (55 bytes on il path, 81 on ool path). Based on this, I think: (1) We generate way too much sync code, accounting for almost 50% of generated code. (2) Very common ops that have warmups (like CALL) should probably be purely an IC, with no inline or OOL paths.
(In reply to comment #1) > (1) We generate way too much sync code, accounting for almost 50% of generated > code. Definitely. Good analysis in bug 631658. > (2) Very common ops that have warmups (like CALL) should probably be purely an > IC, with no inline or OOL paths. I haven't refreshed myself on the details of compiling CALL lately, but I wonder how likely we are to hit a given CALL opcode. If likely, then it seems like compiling it in the first round is fine, because we'll compile it later, anyway. I'm not sure how likely is "likely", though. Even if it's only 70%, for CALL that could save us 0.3*4.7MB = 1.4MB, or 7% of the total jitcode allocation in this example.
On this Quora run, 2684865 bytes (2.5MB) went to just updating the PC in sync paths!
On techcrunch.com, opcode breakdown is basically identical to quora. Interesting. 33.5MB of inline code, 38.5MB of ool code. Of that ~70MB, 31MB is sync code, and 8.3MB of that is PC updating.
Another techcrunch.com workload: * 8MB to sync code (8123504 bytes) * 2MB to PC syncing (2175795 bytes) * 7MB to vmcall sequences overall (this includes PC/SP updating)... (6951788 bytes) A vm call is: * 15 bytes for regs.pc update * 9 bytes for regs.sp update * 5 bytes for regs.fp update * 12 bytes for move+call * 3 bytes to move VMFrame -> arg0 So, of that 8MB of sync code, * 29% goes to updating regs.pc * 23% goes to call instructions * 17% goes to updating regs.sp * 15% goes to stack syncing * 10% goes to updating regs.fp * 6% goes to moving VMFrame -> arg0
Can we look at computing pc for those (few) native methods and accessors that do bytecode inspection using some side mapping from mjit-generated eip to bytecode pc? Does our debugger support already have something like this? /be
Nice work, dvander!
Whiteboard: [MemShrink:P3]
JM was removed, Baseline shares IC stub code and Ion generates much smaller code due to type information (and is only used for relatively hot code anyway).
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: