Closed
Bug 568933
Opened 14 years ago
Closed 6 years ago
Add coarse performance counters to VM for benchmarking
Categories
(Tamarin Graveyard :: Virtual Machine, defect)
Tamarin Graveyard
Virtual Machine
Tracking
(Not tracked)
RESOLVED
WONTFIX
Future
People
(Reporter: edwsmith, Unassigned)
References
Details
(Whiteboard: has-patch)
Attachments
(1 file, 1 obsolete file)
(deleted),
patch
|
Details | Diff | Splinter Review |
This elaborates on an idea for benchmark tracking. instead of having the test print a performance value, let the VM do it. (analogous to using a profiler instead of instrumenting code).
We already do this for memory, and it makes sense to do it for performance.
A simple counter that is incremented at coarse control flow boundaries would be independent of execution mechanism and optimization level. Combined with a vm-level timer instead of code-level. Tests would still need to be somewhat deterministic, but no longer need to measure themselves.
Reporter | ||
Updated•14 years ago
|
Priority: -- → P1
Target Milestone: --- → flash10.2
Reporter | ||
Comment 1•14 years ago
|
||
First prototype. keeps two counters:
call_count
loop_count
call count is incremented once per call, in the prolog. this includes native methods.
loop_count is incremented:
* once per branch, if the branch direction is backwards, whether or not it is taken.
* once per OP_lookupswitch, regardless of case or direction
the VM reports "blops" (blocks per second) as a loose measure, computed as:
(call_count + loop_count) / ms / 10.0
It is simpler for the interpreter to increment loop_count at the point of the branch rather than the loop header, since loop headers are not identified in ABC. Its simpler for the JIT to increment loop_count unconditionally just before a conditional back-branch.
For switch, its simpler to just increment loop_count always. Its expensive for the jit to add increment code to the subset of edges that are backwards (code on edge? table lookup?).
in testing, call_count isn't identical for jit vs interp -- i dont know why yet. I suspect inlined cast expressions: T(expr).
the patch also modifies runtest.py to use "metric blops" if found. indeed, we can now run sunspider and v8 side by side and get beleivable higher-is-better scores, while ignoring all test output and only using vm-generated metrics.
issues
- i'm measuring about a 3% counter overhead cost on boids.as. that's higher than i'd like, but in principle we can live with it.
- need to debug call_count.
- looping inside native method is not accounted for
- there is no tie-in with vprof or dtrace, and there probably should be (for perf and gc, arguably), or even a simple universal telemetry mechanism. (channelling Lars here...)
design question
- should we just count loops and calls and do all the analysis after the fact (outside the vm?). on the one hand, i LIKE having the vm generate a single # as input to various scripts. on the other hand, we'll get more and more metrics over time, and we'll be doing post-game analysis anyway.
Comment 2•14 years ago
|
||
Comment on attachment 448087 [details] [diff] [review]
add "blops" metric
I like it, and I believe in what you write. I think it's most useful to have the shell print a single metric; if there's need for something else add a command line switch to select a different format.
All of this needs to be conditionalized, obviously - 3% is too much overhead for production runs in the Flash Player. For the GC, there's MMGC_POLICY_PROFILING which is on by default in the shell and off by default in the player (5% overhead).
Attachment #448087 -
Flags: feedback?(lhansen) → feedback+
Reporter | ||
Comment 3•14 years ago
|
||
I think we dont want to count builtin code, for two reasons:
1. In ad-hoc testing I find I can reduce the counter overhead significantly by not counting in native methods. Probably this is because it is expensive to access AvmCore. Maybe we should use a static counter instead of one on AvmCore, but read on:
2. For native methods, we can only get the call count, not the loop count. For builtin AS3, we can get both, but if we change builtin code from C++ to AS3 or back, we will affect the metric and we probably don't want to. e.g. by making changes to AS3.push().
Reporter | ||
Comment 4•14 years ago
|
||
Changes since last patch
* conditionalized
* merged call_count and loop_count as "step_count"
* count backedges whether taken or not
* count all OP_lookupswitch (too messy to worry about which path taken)
* do not count builtin code (as3 or native)
* rebased to TR tip
these changes reduced metric overhead in boids to within noise.
open issues:
- print metric to stdout?
- need commandline switch to enable metric? (should all vm metrics be under one switch?)
- use static mem isnstead of AvmCore? (one metric for whole process, or one per vm instance?)
Attachment #448087 -
Attachment is obsolete: true
Comment 5•14 years ago
|
||
Nice, and I'm starting to wonder if it makes sense look at a more dynamic/comprehensive means of generating this data.
We'll probably need something similar in future as input for jit policy decisions and having a generic mechansim that handles both use-cases would be nice.
Although you could argue that the effort/engineering required for the jit policy effort is much higher and suitably involved that having this patch available now is more valuable.
Reporter | ||
Comment 6•14 years ago
|
||
(In reply to comment #5)
> Although you could argue that the effort/engineering required for the jit
> policy effort is much higher and suitably involved that having this patch
> available now is more valuable.
exactly :-) If we end up adding more counters and a general mechanism seems worthwhile, i'm all for it. not a blocker tho.
Reporter | ||
Updated•14 years ago
|
Whiteboard: has-patch
Reporter | ||
Updated•14 years ago
|
Assignee: edwsmith → nobody
Updated•14 years ago
|
Flags: flashplayer-bug-
Whiteboard: has-patch → has-patch, must-fix-candidate
Flags: flashplayer-qrb+
Flags: flashplayer-injection-
Target Milestone: Q3 11 - Serrano → Q4 11 - Anza
This still seems like a useful idea and measurement. Perhaps the blops measurement simply turns into a Telemetry measurement.
Retargeting to future as this has lingered fo so long. Ed, retarget if you feel it is needed for HM.
Priority: P1 → --
Target Milestone: Q4 11 - Anza → Future
Comment 10•6 years ago
|
||
No assignee, updating the status.
Comment 11•6 years ago
|
||
No assignee, updating the status.
Comment 12•6 years ago
|
||
No assignee, updating the status.
Comment 13•6 years ago
|
||
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Comment 14•6 years ago
|
||
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in
before you can comment on or make changes to this bug.
Description
•