Closed Bug 568933 Opened 14 years ago Closed 6 years ago

Add coarse performance counters to VM for benchmarking

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

Milestone:

Future

People

(Reporter: edwsmith, Unassigned)

References

Details

(Whiteboard: has-patch)

Attachments

(1 file, 1 obsolete file)

add "blops" metric 14 years ago Edwin Smith (deleted), patch	lhansen : feedback+	Details \| Diff \| Splinter Review
refined as "metric steps" 14 years ago Edwin Smith (deleted), patch		Details \| Diff \| Splinter Review

Edwin Smith

Reporter

Description

•

14 years ago

This elaborates on an idea for benchmark tracking. instead of having the test print a performance value, let the VM do it. (analogous to using a profiler instead of instrumenting code). We already do this for memory, and it makes sense to do it for performance. A simple counter that is incremented at coarse control flow boundaries would be independent of execution mechanism and optimization level. Combined with a vm-level timer instead of code-level. Tests would still need to be somewhat deterministic, but no longer need to measure themselves.

Edwin Smith

Reporter

Updated

•

14 years ago

Priority: -- → P1

Target Milestone: --- → flash10.2

Edwin Smith

Reporter

Comment 1

•

14 years ago

Attached patch add "blops" metric (obsolete) (deleted) — Details — Splinter Review

First prototype. keeps two counters: call_count loop_count call count is incremented once per call, in the prolog. this includes native methods. loop_count is incremented: * once per branch, if the branch direction is backwards, whether or not it is taken. * once per OP_lookupswitch, regardless of case or direction the VM reports "blops" (blocks per second) as a loose measure, computed as: (call_count + loop_count) / ms / 10.0 It is simpler for the interpreter to increment loop_count at the point of the branch rather than the loop header, since loop headers are not identified in ABC. Its simpler for the JIT to increment loop_count unconditionally just before a conditional back-branch. For switch, its simpler to just increment loop_count always. Its expensive for the jit to add increment code to the subset of edges that are backwards (code on edge? table lookup?). in testing, call_count isn't identical for jit vs interp -- i dont know why yet. I suspect inlined cast expressions: T(expr). the patch also modifies runtest.py to use "metric blops" if found. indeed, we can now run sunspider and v8 side by side and get beleivable higher-is-better scores, while ignoring all test output and only using vm-generated metrics. issues - i'm measuring about a 3% counter overhead cost on boids.as. that's higher than i'd like, but in principle we can live with it. - need to debug call_count. - looping inside native method is not accounted for - there is no tie-in with vprof or dtrace, and there probably should be (for perf and gc, arguably), or even a simple universal telemetry mechanism. (channelling Lars here...) design question - should we just count loops and calls and do all the analysis after the fact (outside the vm?). on the one hand, i LIKE having the vm generate a single # as input to various scripts. on the other hand, we'll get more and more metrics over time, and we'll be doing post-game analysis anyway.

Assignee: nobody → edwsmith

Status: NEW → ASSIGNED

Attachment #448087 - Flags: feedback?(lhansen)

Lars T Hansen

Comment 2

•

14 years ago

Comment on attachment 448087 [details] [diff] [review] add "blops" metric I like it, and I believe in what you write. I think it's most useful to have the shell print a single metric; if there's need for something else add a command line switch to select a different format. All of this needs to be conditionalized, obviously - 3% is too much overhead for production runs in the Flash Player. For the GC, there's MMGC_POLICY_PROFILING which is on by default in the shell and off by default in the player (5% overhead).

Attachment #448087 - Flags: feedback?(lhansen) → feedback+

Edwin Smith

Reporter

Comment 3

•

14 years ago

I think we dont want to count builtin code, for two reasons: 1. In ad-hoc testing I find I can reduce the counter overhead significantly by not counting in native methods. Probably this is because it is expensive to access AvmCore. Maybe we should use a static counter instead of one on AvmCore, but read on: 2. For native methods, we can only get the call count, not the loop count. For builtin AS3, we can get both, but if we change builtin code from C++ to AS3 or back, we will affect the metric and we probably don't want to. e.g. by making changes to AS3.push().

Edwin Smith

Reporter

Comment 4

•

14 years ago

Attached patch refined as "metric steps" (deleted) — Details — Splinter Review

Changes since last patch * conditionalized * merged call_count and loop_count as "step_count" * count backedges whether taken or not * count all OP_lookupswitch (too messy to worry about which path taken) * do not count builtin code (as3 or native) * rebased to TR tip these changes reduced metric overhead in boids to within noise. open issues: - print metric to stdout? - need commandline switch to enable metric? (should all vm metrics be under one switch?) - use static mem isnstead of AvmCore? (one metric for whole process, or one per vm instance?)

Attachment #448087 - Attachment is obsolete: true

Rick Reitmaier

Comment 5

•

14 years ago

Nice, and I'm starting to wonder if it makes sense look at a more dynamic/comprehensive means of generating this data. We'll probably need something similar in future as input for jit policy decisions and having a generic mechansim that handles both use-cases would be nice. Although you could argue that the effort/engineering required for the jit policy effort is much higher and suitably involved that having this patch available now is more valuable.

Edwin Smith

Reporter

Comment 6

•

14 years ago

(In reply to comment #5) > Although you could argue that the effort/engineering required for the jit > policy effort is much higher and suitably involved that having this patch > available now is more valuable. exactly :-) If we end up adding more counters and a general mechanism seems worthwhile, i'm all for it. not a blocker tho.

Chris Peyer

Updated

•

14 years ago

Blocks: 572860

Edwin Smith

Reporter

Updated

•

14 years ago

Whiteboard: has-patch

Edwin Smith

Reporter

Updated

•

14 years ago

Assignee: edwsmith → nobody

Trevor Baker

Updated

•

14 years ago

Flags: flashplayer-bug-

Whiteboard: has-patch → has-patch, must-fix-candidate

Andre Kruetzfeldt

Updated

•

14 years ago

Depends on: 645018

Dan Smith

Updated

•

13 years ago

Flags: flashplayer-qrb+

Flags: flashplayer-injection-

Target Milestone: Q3 11 - Serrano → Q4 11 - Anza

Dan Smith

Comment 7

•

13 years ago

This still seems like a useful idea and measurement. Perhaps the blops measurement simply turns into a Telemetry measurement.

Dan Smith

Updated

•

13 years ago

Whiteboard: has-patch, must-fix-candidate → has-patch

Dan Smith

Comment 8

•

13 years ago

Retargeting to future as this has lingered fo so long. Ed, retarget if you feel it is needed for HM.

Priority: P1 → --

Target Milestone: Q4 11 - Anza → Future

Sylvestre Ledru [:Sylvestre]

Comment 9

•

6 years ago

No assignee, updating the status.

Status: ASSIGNED → NEW

Sylvestre Ledru [:Sylvestre]

Comment 10

•

6 years ago

No assignee, updating the status.

Sylvestre Ledru [:Sylvestre]

Comment 11

•

6 years ago

No assignee, updating the status.

Sylvestre Ledru [:Sylvestre]

Comment 12

•

6 years ago

No assignee, updating the status.

Sylvestre Ledru [:Sylvestre]

Comment 13

•

6 years ago

Tamarin is a dead project now. Mass WONTFIX.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Sylvestre Ledru [:Sylvestre]

Comment 14

•

6 years ago

Tamarin isn't maintained anymore. WONTFIX remaining bugs.

You need to log in before you can comment on or make changes to this bug.