Closed
Bug 564580
Opened 14 years ago
Closed 14 years ago
Extend variable and tag tracking (redundant load elimination) across forward branches
Categories
(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)
Tamarin Graveyard
Baseline JIT (CodegenLIR)
Tracking
(Not tracked)
RESOLVED
FIXED
Q3 11 - Serrano
People
(Reporter: wmaddox, Assigned: wmaddox)
References
Details
Attachments
(5 files, 2 obsolete files)
The VarTracker class in CodegenLIR.cpp keeps attempts to keep track of the LIR expression holding the value of a variable (stack slot) at each point in the code, and likewise for runtime tags (tagTracker), allowing the code generator to avoid generating redundant loads. At present, the tracker state is reset at each label, thus subsequent loads may be eliminated only when the original load occurs earlier in the same superblock. Many labels are targets only of forward branches, however, and a simple non-iterative state merging process suffices to extend tracking over more complex acyclic regions of the CFG.
The attached patch implements such an extension.
Assignee | ||
Updated•14 years ago
|
Attachment #444243 -
Attachment is patch: true
Assignee | ||
Comment 1•14 years ago
|
||
A performance suite run shows many small gains and regressions.
The consistency with which the long-running jsbench tests show regressions is
particularly troublesome.
Questions for investigation:
1) Whether the extension results in better treatment of labels resulting from ABC-level control flow or not, will it mitigate the effect of labels introduced by more agressive inlining of code currently relegated to helper functions in the ABC->LIR translation?
2) Since the var tracker extends the lifetime of LIR expressions that are reused, it may increase register pressure. On the other hand, a spill/reload of a register holding the current value of a variable should reduce to a simple reload, as the spill itself may be elided. Regressions due to better var tracking point to potential for improving the register allocator.
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → wmaddox
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2
Comment 2•14 years ago
|
||
A technique I use (borrowed from njn) is to diff verbose dumps of the LIR or assembly code using this script:
#/bin/bash
perl -p -e 's/[0-9a-fA-F]{4,}/....../g' | perl -p -e 's/^ [0-9]{1,}:/ ...:/'
(I call it 'scrub'). say you have before.log and after.log, generated using -Dverbose=verify,jit,opt:
diff -u <(scrub before.log) <(scrub after.log)
taking a look at what the LIR differences and ASM differences are is a good way to tell if the benchmarked differences are relevant. If the LIR or ASM changes are "good looking" then it can be a worthwhile change evn if the results are noisy or inconclusive.
Comment 3•14 years ago
|
||
correctness warning: If you skip exception edges in trackForwardEdges(), it is important that the target of any such edge have its tracker state completely nulled out. if a target is reachable both ways (exception and non-exception edge), then the state at the label would only reflect merges of the non-exception edges.
Comment 4•14 years ago
|
||
some more stable results from a linux machine, x86, core 2 duo, clock explicitly locked at 2.0ghz, run with "sudo nice -n -10".
outliers from this run: (todo: inspect LIR & asm for evidence of the patch helping/hurting)
v8.5/js/richards.as -2.6%
v8.5/typed/richards.as +2.1%
Updated•14 years ago
|
Attachment #444427 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 5•14 years ago
|
||
Compare effect of VarTracker extension on baseline, and on inlined arithmetic without the extension.
All runs were 32-bit x86 on MacOS X.
Updated•14 years ago
|
Component: Virtual Machine → JIT Compiler (NanoJIT)
QA Contact: vm → nanojit
Assignee | ||
Comment 7•14 years ago
|
||
Rebased, and handle exception targets correctly.
Attachment #444243 -
Attachment is obsolete: true
Assignee | ||
Comment 8•14 years ago
|
||
Comment on attachment 468889 [details] [diff] [review]
Attachments Implement variable tracking across forward edges (v2)
This is a prerequisite for inlining of numeric ops.
Attachment #468889 -
Flags: review?(edwsmith)
Assignee | ||
Comment 9•14 years ago
|
||
This run shows the effect of the vartracker patch alone.
Assignee | ||
Updated•14 years ago
|
Attachment #471923 -
Attachment filename: vartrack.log → vartrack.log.txt
Attachment #471923 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 10•14 years ago
|
||
Rebased.
Attachment #468889 -
Attachment is obsolete: true
Attachment #473725 -
Flags: review?(edwsmith)
Attachment #468889 -
Flags: review?(edwsmith)
Comment 11•14 years ago
|
||
Comment on attachment 473725 [details] [diff] [review]
Attachments Implement variable tracking across forward edges (v2, rebased)
Is there a real effect happening in this asmicro test?
closedvar-write-1 4996 4978 3919 3859.4 -21.6 -22.5 --
the other - and -- examples look like noise to me, or at least too small
to justify holding this up.
Some version of comment #3 should be inserted into the "meet" logic in
trackForwardEdge().
We should land this to avoid much future merge pain, but we also need
to assess the memory impact. new arrays of VarTracker state will be
arena-allocated. the size of the array is
2 * sizeof(pointer) * nvar * #forward targets
A very large method with many small diamonds would accumulate a lot of
vartracker state. We probably don't care, but we were also bitten by
extreme weird code, as evidenced by bug 565489.
If the size of the alloc1 arena became grossly large only due to
this dead data, we'd want to consider explicitly allocating vartracker
state at forward branches, and releasing it when the label is reached,
or maybe a simple stack of recyclable arrays of state (each allocation
of target.varTracker and target.tagTracker is the same size).
Attachment #473725 -
Flags: review?(edwsmith) → review+
Assignee | ||
Comment 12•14 years ago
|
||
Pushed to TR:
http://hg.mozilla.org/tamarin-redux/rev/a0700f2fb3ea
Assignee | ||
Comment 13•14 years ago
|
||
Backed out of TR:
http://hg.mozilla.org/tamarin-redux/rev/b63a3d7f2c44
The pach apparently tickles an x87 stack issue on non-SSE IA-32 platforms.
Example:
wmaddox-macpro:~ wmaddox$ tr-vartracker/obj/shell/avmshell -Dnosse tr-vartracker/test/acceptance/as3/Vector/indexof.abc Vector.indexOf()
Assertion failure: (_allocator.active[FST0] && _fpuStkDepth == -1) || (!_allocator.active[FST0] && _fpuStkDepth == 0) (/Users/wmaddox/tr-vartracker/nanojit/Assembler.cpp:366)
Trace/BPT trap
Assignee | ||
Comment 14•14 years ago
|
||
(In reply to comment #13)
> Backed out of TR:
>
> http://hg.mozilla.org/tamarin-redux/rev/b63a3d7f2c44
>
> The pach apparently tickles an x87 stack issue on non-SSE IA-32 platforms.
The following patch fixes this issue:
https://bugzilla.mozilla.org/attachment.cgi?id=476923&action=diff
Assignee | ||
Comment 15•14 years ago
|
||
Pushed to TR:
http://hg.mozilla.org/tamarin-redux/rev/9cbfa07f01f5
Leaving bug open for further investigation per comment #11.
Status: NEW → ASSIGNED
Comment 16•14 years ago
|
||
how about we create a new bug so we can account for this being done (enough) and close bug 563944 as a result as well (thats the blocker for pretty much all the inline-a-diamond projects).
Assignee | ||
Comment 17•14 years ago
|
||
Open work items moved to bug 600057. Closing per comment #16.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•14 years ago
|
Flags: flashplayer-bug-
You need to log in
before you can comment on or make changes to this bug.
Description
•