Closed Bug 877878 Opened 11 years ago Closed 9 years ago

Instrument Baseline generated code to collect branch profiles.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1209515

People

(Reporter: nbp, Assigned: wuwei)

References

Details

Attachments

(1 file, 13 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
Branch Profiler collects the targets of conditional jumps by instrumenting SpiderMonkey engine. There are three potential instrumentation points in SpiderMonkey: the bytecode interpreter, IonMonkey and the baseline compiler. Instrumenting interpreter is the easiest way to get things work but the the data it collects is fragmented and less important, since "hot" JavaScript methods are not executed by interpreter. More importantly, Instrumenting interpreter will slow down scripts which are running only once. IonMonkey is another place that can insert profiling code and in fact it do has the PCCount interface for profiling, but this mechanism is activated in debug build only, since the overhead caused by profiling makes it unpractical. Branch Profiler leverages the Baseline Compiler to generate instrumented machine codes. It replaces the output of baseline compiler when it generates code for JSOP_IFEQ, JSOP_IFNE, and JSOP_TABLESWITCH bytecodes. Basic block counter and circular address buffer are two possible ways to store branch profiling data. The basic block counter solution keeps two counters for each branch, as well as the number of the targets the branch might jump. When a block jumps to another the correlate counter will be increased by 1. Then these frequencies will be transformed into probabilities and consumed by some transform passes. On the other hand, circular address buffer maintains a sequence of branch targets. Every time Branch Profiler is triggered it pushes the target address of the branch into buffer. The main benefit of this design is that not only frequencies of each branch can be calculated but also the relationships between jumps are able to be inferred, which may generate new possibilities for more sophisticated optimizations. In this project we choose the circular buffer as our preliminary implementation and allocates one circular buffer for each IonScript. We will switch to basic block counter later if the overhead caused by the circular buffer is high, or if we fail to find any benchmark which can take advantage of a more-clever way of inferring branches ordering. [*] https://github.com/lazyparser/gsoc2013/blob/master/proposal.md
In baseline compiler, js bytecodes are translated into asm directly, without building a control-flow graph. If we want to use block counters as our data structure, we need to know the total number of MIR blocks the jsscript has, which is not available currently.
(In reply to Wei Wu [:wuwei] from comment #1) > In baseline compiler, js bytecodes are translated into asm directly, without > building a control-flow graph. Except for loop-edges all jumps are generated before the jump-target. One way would be to annotate the map the jsbytecode with the block counter (maintain a mapping structure). We do not share the CFG, as we reconstruct it from the bytecode when we compile with IonMonkey. The loop edges will be annotated with source notes, see how we identify loops in IonBuilder.cpp (callers of IonBuilder::pushLoop).
Attached patch branch_counter.patch (obsolete) (deleted) — Splinter Review
The patch instruments the jitcode generated by the Baseline Compiler. Currently only JSOP_IFEQ and and JSOP_IFNE are instruemented. I uploaded it for early review. Here are the problems I did't solve yet: 1. How to directly access the address of counters. In this patch I use jsscript->baselineScript->branchCountersOffset_ + counterIndex to calculate the address of a counter, which is cumbersome and ugly. (I know how to use 'Label' family to 'patch' code, but I was not able to use them to get the address of counters.) 2. This patch keeps two counters for each branch. The other way is to allocate one counter for each block, then calcutate the probabilities of each branch target by solving equations. We might want to implement the 'one counter' solution, if the overhead of 'two counters' solution is high. 3. This patch might causes segmentation fault problem, and I have not found the reason yet.
Attachment #771718 - Attachment is obsolete: true
Comment on attachment 772094 [details] [diff] [review] block_counter.patch (Instruments JSOP_IFEQ and JSOP_IFNE only) Review of attachment 772094 [details] [diff] [review]: ----------------------------------------------------------------- So far, I cannot eye spot the SEGV. ::: js/src/ion/BaselineCompiler.cpp @@ +198,5 @@ > + for (size_t i = 0; i < blockCounterLabels_.length(); i++) { > + CodeOffsetLabel label = blockCounterLabels_[i].label; > + label.fixup(&masm); > + size_t bcEntry = blockCounterLabels_[i].bcEntry; > + BlockCounterEntry *bcEntryAddr = &(baselineScript->blockCounterEntry(bcEntry)); nit: no need for parentheses after '&'. @@ +814,5 @@ > if (!knownBoolean && !emitToBoolean()) > return false; > > + Label counterLabel; > + masm.branchTestBooleanTruthy(!branchIfTrue, R0, &counterLabel); I am not a big fan of duplicating conditions. ::: js/src/ion/BaselineJIT.h @@ +98,5 @@ > +{ > + uint32_t counter; > + jsbytecode *pc; > + BlockCounterEntry(jsbytecode *pc) > + :pc(pc),counter(0) nit: indent and add new lines. ::: js/src/ion/shared/BaselineCompiler-shared.h @@ +106,5 @@ > + BlockCounterEntry *allocateBlockCounterEntry(jsbytecode *pc) { > + if(!blockCounterEntries_.append(BlockCounterEntry(pc))) > + return NULL; > + BlockCounterEntry &bcEntry = blockCounterEntries_[blockCounterEntries_.length() - 1]; > + return &bcEntry; nit: return blockCounterEntries_.back();
Attachment #772094 - Flags: feedback+
(In reply to Nicolas B. Pierron [:nbp] from comment #5) > Comment on attachment 772094 [details] [diff] [review] > block_counter.patch (Instruments JSOP_IFEQ and JSOP_IFNE only) > @@ +814,5 @@ > > if (!knownBoolean && !emitToBoolean()) > > return false; > > > > + Label counterLabel; > > + masm.branchTestBooleanTruthy(!branchIfTrue, R0, &counterLabel); > > I am not a big fan of duplicating conditions. > If we want to avoid this kind of conditional jumps, the instrumentation code should be placed in front of each bytecode witch is the target of any jumps (a.k.a. basic block headers). In this case we need to analyze bytecodes and source notes to figure out this information before emitBody(). It might be reasonable to implement the analysis codes in BytecodeAnalysis class.
There is a 'jumpTarget' boolean property existing in BytecodeInfo struct, so we might able to reuse it directly.
Attached patch Emit one block counter for each jump target (obsolete) (deleted) — Splinter Review
Updates: 1. Map block counters to the offset of bytecodes instead of absoulte memory address. 2. Emit one block counter for each bytecode which is the target of any jumps. 3. Add two IonSpewer output for debugging. TODO && FIXME: 1. The SEGV bug remains in current patch despite that all jittests were passed (make check). Ask Nicolas [:nbp] and Kannan [:djvj] for help. 2. Avoid emitting counters for bytecodes which is target of jumps but not a header of basic block.
Comment on attachment 772094 [details] [diff] [review] block_counter.patch (Instruments JSOP_IFEQ and JSOP_IFNE only) Review of attachment 772094 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ion/BaselineCompiler.cpp @@ +817,5 @@ > + Label counterLabel; > + masm.branchTestBooleanTruthy(!branchIfTrue, R0, &counterLabel); > + emitBlockCounter(pc + GET_JUMP_OFFSET(pc)); > + masm.bind(&counterLabel); > + The next branch might jump to a wrong location, for the value in register R0 was overwritten in emitBlockCounter(). It then caused JS_ASSERT() failed, which raised the SEGV.
Attached patch Dumping block counters when building MIRGraph. (obsolete) (deleted) — Splinter Review
This patch is for testing only. Just to make sure everything was correct. Dumps can be viewed and downloaded at: https://github.com/lazyparser/gsoc2013/tree/master/block_counter_dump/ You may want to use 'cat dumpFile | grep "block counter"' to filter out unrelated spews. Here is an example: $grep 'block counter' 3d-cube.js.counters.dump ... [BaselineOp] [block counter] Emitting a block counter for op @ 1021: loophead (lineno: 312 column: 42) [BaselineOp] [block counter] Emitting a block counter for op @ 1022: getgname (lineno: 312 column: 42) [BaselineOp] [block counter] Emitting a block counter for op @ 1144: loopentry (lineno: 312 column: 18) [BaselineOp] [block counter] Emitting a block counter for op @ 1170: getgname (lineno: 315 column: 2) [BaselineOp] [block counter] Emitting a block counter for op @ 1276: loophead (lineno: 319 column: 36) [BaselineOp] [block counter] Emitting a block counter for op @ 1277: callgname (lineno: 319 column: 36) [BaselineOp] [block counter] Emitting a block counter for op @ 1306: loopentry (lineno: 319 column: 18) [BaselineOp] [block counter] Emitting a block counter for op @ 1327: bindgname (lineno: 321 column: 2) [BaselineOp] [block counter] Emitting a block counter for op @ 1438: loophead (lineno: 326 column: 0) [BaselineOp] [block counter] Emitting a block counter for op @ 1439: getgname (lineno: 326 column: 4) [BaselineOp] [block counter] Emitting a block counter for op @ 1497: loopentry (lineno: 325 column: 9) [BaselineOp] [block counter] Emitting a block counter for op @ 1510: callgname (lineno: 328 column: 2) [BaselineOp] [block counter] Emitting a block counter for op @ 1559: loophead (lineno: 335 column: 0) [BaselineOp] [block counter] Emitting a block counter for op @ 1560: getgname (lineno: 335 column: 0) [BaselineOp] [block counter] Emitting a block counter for op @ 1588: loophead (lineno: 337 column: 6) [BaselineOp] [block counter] Emitting a block counter for op @ 1589: getlocal (lineno: 337 column: 6) [BaselineOp] [block counter] Emitting a block counter for op @ 1614: loopentry (lineno: 336 column: 20) [BaselineOp] [block counter] Emitting a block counter for op @ 1633: getlocal (lineno: 334 column: 32) [BaselineOp] [block counter] Emitting a block counter for op @ 1643: loopentry (lineno: 334 column: 18) [BaselineOp] [block counter] Emitting a block counter for op @ 1664: getlocal (lineno: 339 column: 0) [BaselineOp] [block counter] Emitting a block counter for op @ 1682: string (lineno: 340 column: 4) [BaselineOp] [block counter] Emitting a block counter for op @ 1718: stop (lineno: 340 column: 120) [BaselineScripts] [block counter] 22 block counters were emitted for script 3d-cube.js:258 (0x1f468d0) (lineno: 340 column: 120) ... [BaselineScripts] [block counter] Dump 22 block counters for 0x1f468d0 [BaselineScripts] [block counter] Op offset: 1021, count: 6 [BaselineScripts] [block counter] Op offset: 1022, count: 6 [BaselineScripts] [block counter] Op offset: 1144, count: 7 [BaselineScripts] [block counter] Op offset: 1170, count: 1 [BaselineScripts] [block counter] Op offset: 1276, count: 1014 [BaselineScripts] [block counter] Op offset: 1277, count: 1014 [BaselineScripts] [block counter] Op offset: 1306, count: 1016 [BaselineScripts] [block counter] Op offset: 1327, count: 1 [BaselineScripts] [block counter] Op offset: 1438, count: 9 [BaselineScripts] [block counter] Op offset: 1439, count: 9 [BaselineScripts] [block counter] Op offset: 1497, count: 10 [BaselineScripts] [block counter] Op offset: 1510, count: 1 [BaselineScripts] [block counter] Op offset: 1559, count: 9 [BaselineScripts] [block counter] Op offset: 1560, count: 9 [BaselineScripts] [block counter] Op offset: 1588, count: 36 [BaselineScripts] [block counter] Op offset: 1589, count: 36 [BaselineScripts] [block counter] Op offset: 1614, count: 45 [BaselineScripts] [block counter] Op offset: 1633, count: 9 [BaselineScripts] [block counter] Op offset: 1643, count: 10 [BaselineScripts] [block counter] Op offset: 1664, count: 1 [BaselineScripts] [block counter] Op offset: 1682, count: 0 [BaselineScripts] [block counter] Op offset: 1718, count: 1 ...
Here is a preliminary performance profiling result. The test platform has 8-cores (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz), running Ubuntu 12.04 amd64. The profiling data may not be reliable, and I am doing more profiling now. Kraken on Firefox: TEST COMPARISON FROM TO DETAILS ==================================================================================== ** TOTAL **: - 1508.5ms +/- 1.8% 1502.0ms +/- 1.8% ==================================================================================== ai: - 96.2ms +/- 1.0% 95.3ms +/- 0.9% astar: - 96.2ms +/- 1.0% 95.3ms +/- 0.9% audio: ?? 565.2ms +/- 4.6% 567.8ms +/- 4.1% might be *1.005x as slow* beat-detection: ?? 154.6ms +/- 17.0% 159.2ms +/- 14.4% might be *1.030x as slow* dft: - 192.0ms +/- 0.8% 191.0ms +/- 0.4% fft: ?? 74.4ms +/- 0.9% 74.5ms +/- 2.4% might be *1.001x as slow* oscillator: - 144.2ms +/- 2.9% 143.1ms +/- 2.8% imaging: - 321.9ms +/- 0.6% 319.7ms +/- 0.9% gaussian-blur: - 105.8ms +/- 1.1% 105.6ms +/- 1.2% darkroom: 1.010x as fast 113.0ms +/- 0.4% 111.9ms +/- 0.5% significant desaturate: - 103.1ms +/- 2.0% 102.2ms +/- 2.2% json: ?? 78.6ms +/- 6.4% 80.5ms +/- 7.6% might be *1.024x as slow* parse-financial: - 34.4ms +/- 5.6% 33.5ms +/- 1.1% stringify-tinderbox: ?? 44.2ms +/- 12.1% 47.0ms +/- 12.9% might be *1.063x as slow* stanford: - 446.6ms +/- 1.9% 438.7ms +/- 1.4% crypto-aes: 1.062x as fast 198.9ms +/- 1.0% 187.3ms +/- 1.0% significant crypto-ccm: - 80.8ms +/- 7.9% 80.4ms +/- 7.7% crypto-pbkdf2: ?? 119.2ms +/- 6.9% 124.9ms +/- 5.8% might be *1.048x as slow* crypto-sha256-iterative: - 47.7ms +/- 5.7% 46.1ms +/- 4.6% SunSpider on Firefox: TEST COMPARISON FROM TO DETAILS =============================================================================== ** TOTAL **: - 150.8ms +/- 1.5% 150.8ms +/- 0.7% =============================================================================== 3d: ?? 24.2ms +/- 2.7% 24.3ms +/- 3.4% not conclusive: might be *1.004x as slow* cube: ?? 10.1ms +/- 4.0% 10.2ms +/- 3.0% not conclusive: might be *1.010x as slow* morph: ?? 4.0ms +/- 0.0% 4.1ms +/- 5.5% not conclusive: might be *1.025x as slow* raytrace: - 10.1ms +/- 4.0% 10.0ms +/- 4.8% access: - 13.8ms +/- 3.3% 13.4ms +/- 5.2% binary-trees: ?? 1.9ms +/- 11.9% 2.1ms +/- 10.8% not conclusive: might be *1.105x as slow* fannkuch: - 5.9ms +/- 6.9% 5.6ms +/- 6.6% nbody: - 2.7ms +/- 12.8% 2.7ms +/- 12.8% nsieve: - 3.3ms +/- 10.5% 3.0ms +/- 0.0% bitops: - 8.8ms +/- 9.2% 8.8ms +/- 5.1% 3bit-bits-in-byte: - 1.0ms +/- 0.0% 1.0ms +/- 0.0% bits-in-byte: - 2.8ms +/- 16.1% 2.7ms +/- 12.8% bitwise-and: ?? 1.7ms +/- 20.3% 1.8ms +/- 16.7% not conclusive: might be *1.059x as slow* nsieve-bits: - 3.3ms +/- 10.5% 3.3ms +/- 10.5% controlflow: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% recursive: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% crypto: - 13.6ms +/- 2.7% 13.5ms +/- 4.5% aes: - 6.9ms +/- 3.3% 6.7ms +/- 7.2% md5: ?? 3.7ms +/- 9.3% 3.8ms +/- 7.9% not conclusive: might be *1.027x as slow* sha1: - 3.0ms +/- 0.0% 3.0ms +/- 0.0% date: ?? 18.5ms +/- 2.0% 18.9ms +/- 2.1% not conclusive: might be *1.022x as slow* format-tofte: ?? 10.4ms +/- 3.5% 10.5ms +/- 3.6% not conclusive: might be *1.010x as slow* format-xparb: ?? 8.1ms +/- 2.8% 8.4ms +/- 4.4% not conclusive: might be *1.037x as slow* math: - 16.4ms +/- 3.0% 16.1ms +/- 4.4% cordic: - 2.3ms +/- 15.0% 2.2ms +/- 13.7% partial-sums: - 12.5ms +/- 3.0% 12.3ms +/- 2.8% spectral-norm: - 1.6ms +/- 23.1% 1.6ms +/- 23.1% regexp: - 9.2ms +/- 3.3% 9.2ms +/- 3.3% dna: - 9.2ms +/- 3.3% 9.2ms +/- 3.3% string: ?? 44.3ms +/- 1.3% 44.6ms +/- 1.5% not conclusive: might be *1.007x as slow* base64: - 5.2ms +/- 5.8% 5.2ms +/- 5.8% fasta: - 5.5ms +/- 6.8% 5.5ms +/- 6.8% tagcloud: ?? 12.8ms +/- 2.4% 13.1ms +/- 1.7% not conclusive: might be *1.023x as slow* unpack-code: - 16.0ms +/- 2.1% 16.0ms +/- 2.1% validate-input: - 4.8ms +/- 6.3% 4.8ms +/- 6.3% SunSpider on JSShell (runs=5000): TEST COMPARISON FROM TO DETAILS =============================================================================== ** TOTAL **: *1.003x as slow* 156.5ms +/- 0.0% 156.9ms +/- 0.0% significant =============================================================================== 3d: *1.002x as slow* 28.3ms +/- 0.1% 28.3ms +/- 0.1% significant cube: 1.001x as fast 11.9ms +/- 0.1% 11.9ms +/- 0.1% significant morph: *1.002x as slow* 3.8ms +/- 0.1% 3.8ms +/- 0.1% significant raytrace: *1.004x as slow* 12.6ms +/- 0.1% 12.7ms +/- 0.1% significant access: 1.002x as fast 12.9ms +/- 0.1% 12.9ms +/- 0.1% significant binary-trees: 1.002x as fast 1.9ms +/- 0.1% 1.9ms +/- 0.1% significant fannkuch: *1.002x as slow* 5.8ms +/- 0.1% 5.8ms +/- 0.1% significant nbody: *1.003x as slow* 2.6ms +/- 0.1% 2.6ms +/- 0.1% significant nsieve: 1.017x as fast 2.6ms +/- 0.1% 2.5ms +/- 0.1% significant bitops: 1.003x as fast 8.2ms +/- 0.1% 8.2ms +/- 0.1% significant 3bit-bits-in-byte: ?? 0.8ms +/- 0.1% 0.8ms +/- 0.2% not conclusive: might be *1.002x as slow* bits-in-byte: - 2.8ms +/- 0.1% 2.8ms +/- 0.1% bitwise-and: - 1.4ms +/- 0.1% 1.4ms +/- 0.1% nsieve-bits: 1.007x as fast 3.2ms +/- 0.1% 3.2ms +/- 0.1% significant controlflow: - 2.0ms +/- 0.1% 2.0ms +/- 0.1% recursive: - 2.0ms +/- 0.1% 2.0ms +/- 0.1% crypto: *1.003x as slow* 15.2ms +/- 0.1% 15.3ms +/- 0.1% significant aes: *1.005x as slow* 9.0ms +/- 0.1% 9.1ms +/- 0.1% significant md5: - 3.6ms +/- 0.1% 3.6ms +/- 0.1% sha1: *1.005x as slow* 2.6ms +/- 0.1% 2.6ms +/- 0.1% significant date: *1.003x as slow* 20.9ms +/- 0.1% 20.9ms +/- 0.1% significant format-tofte: *1.005x as slow* 10.1ms +/- 0.1% 10.2ms +/- 0.1% significant format-xparb: *1.002x as slow* 10.8ms +/- 0.1% 10.8ms +/- 0.1% significant math: - 12.3ms +/- 0.1% 12.3ms +/- 0.1% cordic: *1.003x as slow* 2.2ms +/- 0.1% 2.3ms +/- 0.1% significant partial-sums: - 8.5ms +/- 0.1% 8.5ms +/- 0.1% spectral-norm: *1.003x as slow* 1.6ms +/- 0.1% 1.6ms +/- 0.1% significant regexp: *1.002x as slow* 10.5ms +/- 0.0% 10.5ms +/- 0.0% significant dna: *1.002x as slow* 10.5ms +/- 0.0% 10.5ms +/- 0.0% significant string: *1.006x as slow* 46.2ms +/- 0.1% 46.5ms +/- 0.1% significant base64: *1.007x as slow* 4.8ms +/- 0.1% 4.8ms +/- 0.1% significant fasta: *1.003x as slow* 5.3ms +/- 0.1% 5.3ms +/- 0.1% significant tagcloud: *1.003x as slow* 13.1ms +/- 0.1% 13.1ms +/- 0.1% significant unpack-code: *1.008x as slow* 18.2ms +/- 0.1% 18.3ms +/- 0.1% significant validate-input: *1.009x as slow* 4.9ms +/- 0.1% 4.9ms +/- 0.1% significant
For early review.
Attachment #775146 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 775146 [details] [diff] [review] Attaching block profiles to MBasicBlocks, with testing code. Review of attachment 775146 [details] [diff] [review]: ----------------------------------------------------------------- This sounds good modulo the few remarks which are following … ::: js/src/ion/Ion.cpp @@ +959,5 @@ > > if (mir->shouldCancel("Renumber Blocks")) > return false; > > + if (!AttachBlockProfiles(graph)) OptimizeMIR is used both by OdinMoney and by IonMonkey, you might want to only do it in IonMonkey as OdinMonkey is an ahead-of-time compiler, which does not even run into the baseline compiler. ::: js/src/ion/IonAnalysis.cpp @@ +51,5 @@ > + BaselineScript *blscript = jsscript->baselineScript(); > + size_t numCounters = blscript->numBlockCounters(); > + > + IonSpew(IonSpew_Scripts, "[AttachBlockProfiles] jsscript(%p) blscript(%p) code(%p)", > + jsscript, blscript, code); You might want to have a new Spew channel for profile data. @@ +53,5 @@ > + > + IonSpew(IonSpew_Scripts, "[AttachBlockProfiles] jsscript(%p) blscript(%p) code(%p)", > + jsscript, blscript, code); > + > + jsbytecode *pc = block->pc(); Not all blocks have a pc, only blocks which have a resume points. @@ +57,5 @@ > + jsbytecode *pc = block->pc(); > + size_t i; > + for (i = 0; i < numCounters; i++) { > + BlockCounterEntry entry = blscript->blockCounterEntry(i); > + if (entry.pcOffset + code == pc) { As long as iterating linearly for each block is not performance issue, this should be fine. Otherwise, I guess the block counters can be sorted by pc. Also you can break early, and handle this case after the loop as you already do with the error message in the spew. @@ +66,5 @@ > + break; > + } > + } > + if (i >= numCounters) { > + IonSpew(IonSpew_Scripts, "[AttachBlockProfiles] Oops no counter for block(%zu)", block->id()); This case might happen when we inline some code, where we might for example insert an instruction to switch to one basic block or the other in function of the JSFunction pointer. ::: js/src/ion/MIRGraph.h @@ +496,5 @@ > + > + bool isWorthFiltered() { > + uint32_t scriptUseCount = info_.script()->getUseCount(); > + // TODO: Decide whether this block is worth to be filtered out. > + if (isBlockUseCountAvailable() && getBlockUseCount() < scriptUseCount / 10) This should be part of another bug, but ok.
Attachment #775146 - Flags: feedback?(nicolas.b.pierron) → feedback+
Here is a preliminary performance profiling result. The test platform has 8-cores (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz), running Ubuntu 12.04 amd64. The IonMonkey was disabled (--no-ion). sunspider, 100 times, baseline vs. instrumented TEST COMPARISON FROM TO DETAILS =============================================================================== ** TOTAL **: - 352.9ms +/- 0.6% 352.5ms +/- 0.2% =============================================================================== 3d: - 48.8ms +/- 0.7% 48.5ms +/- 1.0% cube: ?? 19.0ms +/- 0.9% 19.0ms +/- 0.7% not conclusive: might be *1.002x as slow* morph: 1.028x as fast 13.5ms +/- 0.9% 13.1ms +/- 1.4% significant raytrace: ?? 16.4ms +/- 0.8% 16.4ms +/- 1.1% not conclusive: might be *1.002x as slow* access: *1.014x as slow* 57.1ms +/- 0.8% 57.9ms +/- 0.2% significant binary-trees: 1.025x as fast 10.2ms +/- 1.0% 9.9ms +/- 1.0% significant fannkuch: *1.035x as slow* 26.2ms +/- 0.8% 27.2ms +/- 0.2% significant nbody: 1.040x as fast 12.6ms +/- 1.4% 12.2ms +/- 0.2% significant nsieve: *1.074x as slow* 8.0ms +/- 0.9% 8.6ms +/- 0.6% significant bitops: *1.036x as slow* 52.1ms +/- 0.6% 54.0ms +/- 0.4% significant 3bit-bits-in-byte: *1.186x as slow* 7.0ms +/- 0.8% 8.4ms +/- 1.8% significant bits-in-byte: *1.045x as slow* 13.3ms +/- 0.7% 13.9ms +/- 0.6% significant bitwise-and: ?? 17.5ms +/- 0.8% 17.6ms +/- 0.3% not conclusive: might be *1.007x as slow* nsieve-bits: 1.010x as fast 14.4ms +/- 0.8% 14.2ms +/- 0.3% significant controlflow: 1.032x as fast 7.2ms +/- 0.8% 7.0ms +/- 0.2% significant recursive: 1.032x as fast 7.2ms +/- 0.8% 7.0ms +/- 0.2% significant crypto: 1.023x as fast 33.4ms +/- 1.0% 32.7ms +/- 0.4% significant aes: 1.021x as fast 18.1ms +/- 1.3% 17.7ms +/- 0.4% significant md5: 1.016x as fast 8.2ms +/- 1.0% 8.1ms +/- 0.8% significant sha1: 1.036x as fast 7.1ms +/- 1.2% 6.9ms +/- 0.5% significant date: - 33.8ms +/- 1.0% 33.7ms +/- 0.6% format-tofte: - 23.1ms +/- 1.1% 23.0ms +/- 0.6% format-xparb: - 10.7ms +/- 1.2% 10.7ms +/- 1.5% math: 1.016x as fast 40.0ms +/- 0.7% 39.4ms +/- 0.3% significant cordic: 1.041x as fast 19.5ms +/- 1.0% 18.7ms +/- 0.4% significant partial-sums: *1.012x as slow* 13.0ms +/- 0.9% 13.1ms +/- 0.7% significant spectral-norm: - 7.6ms +/- 1.1% 7.6ms +/- 0.2% regexp: 1.020x as fast 10.8ms +/- 0.8% 10.6ms +/- 0.2% significant dna: 1.020x as fast 10.8ms +/- 0.8% 10.6ms +/- 0.2% significant string: 1.013x as fast 69.6ms +/- 0.8% 68.7ms +/- 0.5% significant base64: - 8.0ms +/- 1.0% 7.9ms +/- 0.3% fasta: ?? 17.7ms +/- 1.0% 17.8ms +/- 0.8% not conclusive: might be *1.002x as slow* tagcloud: 1.019x as fast 17.3ms +/- 1.2% 17.0ms +/- 1.0% significant unpack-code: 1.023x as fast 17.5ms +/- 1.1% 17.1ms +/- 0.5% significant validate-input: - 9.0ms +/- 1.1% 8.9ms +/- 0.7%
Drive by comment. Since this information will only be used by IonMonkey, please disable instrumenting baseline scripts that we know will not run in IM. We also do this for usecount, see "ion/BaselineCompiler.cpp:440". Also discussed this with Nicolas and since instrumenting has some overhead, we might also want code that dynamically disables this profiling when IonBuilder is ran and the script is marked as uncompileable. Since baseline doesn't recompile, we currently don't do this for usecount.
Attachment #772094 - Attachment is obsolete: true
(In reply to Hannes Verschore [:h4writer] from comment #15) > Drive by comment. Since this information will only be used by IonMonkey, > please disable instrumenting baseline scripts that we know will not run in > IM. We also do this for usecount, see "ion/BaselineCompiler.cpp:440". Agree. Baseline scripts should not be instrumented if neither ionCompileable_ nor ionOSRCompileable_ is true. > Also discussed this with Nicolas and since instrumenting has some overhead, > we might also want code that dynamically disables this profiling when > IonBuilder is ran and the script is marked as uncompileable. Since baseline > doesn't recompile, we currently don't do this for usecount. Currently I don't know how to disable the counters dynamically. It might be possible to load and check a flag (boolean value) which indicates whether the profiling is necessary or not, but it introduces more overhead, unfortunately. Would you provide me some hints on it?
(In reply to Wei Wu [:wuwei UTC+8] from comment #16) > > Also discussed this with Nicolas and since instrumenting has some overhead, > > we might also want code that dynamically disables this profiling when > > IonBuilder is ran and the script is marked as uncompileable. Since baseline > > doesn't recompile, we currently don't do this for usecount. > > Currently I don't know how to disable the counters dynamically. It might be > possible to load and check a flag (boolean value) which indicates whether > the profiling is necessary or not, but it introduces more overhead, > unfortunately. Would you provide me some hints on it? The idea was to replace the code in-place by an unconditional relative jump code which does the increment of the counters. Discussing with sstangl, who made the instrumentation for doing a similar thing for the Incremental GC write barriers. He suggests that we land the patch without bothering with the performance of the baseline compiler (as it is between 2% - 10% slowdown when only the baseline is enabled)
Attached patch Toggle Branch Profiling (obsolete) (deleted) — Splinter Review
1. Do not emit block counters if neither 'ionCompileable_' nor 'ionOSRCompileable_' is true. 2. Add a jsshell command line option for toggling instrumentation on/off.
Attachment #776869 - Flags: feedback?(nicolas.b.pierron)
Attached patch Dynamic toggle branch profiling (obsolete) (deleted) — Splinter Review
Learning from SPS profiler related implementations.
Attachment #777098 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 773018 [details] [diff] [review] Emit one block counter for each jump target Review of attachment 773018 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ion/BaselineCompiler.cpp @@ +356,5 @@ > + CodeOffsetLabel counterOffset = masm.movWithPatch(ImmWord(-1), addressReg); > + Address counterAddr(addressReg, BlockCounterEntry::offsetOfCounter()); > + masm.load32(counterAddr, counterReg); > + masm.add32(Imm32(1), counterReg); > + masm.store32(counterReg, counterAddr); This should just be: masm.add32(Imm32(1), counterAddr); The macroAssembler will make it into 1 instruction on x86 and x64.
Attached patch Simplify counter increment code (obsolete) (deleted) — Splinter Review
Eliminating explicit load/store instructions when incrementing a counter. (Thank you, Kannan :-) )
Attachment #777501 - Flags: feedback?(nicolas.b.pierron)
Attachment #777501 - Flags: feedback?(kvijayan)
Attachment #773018 - Flags: feedback?(nicolas.b.pierron)
Attachment #773018 - Flags: feedback?(nicolas.b.pierron) → feedback+
Comment on attachment 776869 [details] [diff] [review] Toggle Branch Profiling Review of attachment 776869 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ion/BaselineCompiler.cpp @@ +343,5 @@ > > bool > BaselineCompiler::emitBlockCounter(jsbytecode *pc) > { > + if (!ionCompileable_ && !ionOSRCompileable_) what is the meaning of ionOSRCompileable? Can we still Ion compile without OSR? ::: js/src/ion/Ion.h @@ +181,5 @@ > > + // Whether baseline scripts are instrumented. > + // > + // Default: false > + bool baselineBranchProfiling; I will suggest on by default if the perf are not bad. If the preformances show a slow down, we can disable it in a followup patch.
Attachment #776869 - Flags: feedback?(nicolas.b.pierron) → feedback+
Comment on attachment 777098 [details] [diff] [review] Dynamic toggle branch profiling Review of attachment 777098 [details] [diff] [review]: ----------------------------------------------------------------- Sounds good. What is the overhead of the dummy comparisons for the first iterations (or after a bailout) in the baseline code?
Attachment #777098 - Flags: feedback?(nicolas.b.pierron) → feedback+
Comment on attachment 777501 [details] [diff] [review] Simplify counter increment code Review of attachment 777501 [details] [diff] [review]: ----------------------------------------------------------------- Fold that in the previous patch, there is no need to make a separated patch for this small modification.
Attachment #777501 - Flags: feedback?(nicolas.b.pierron)
Attachment #777501 - Flags: feedback?(kvijayan)
Attached patch Instrumentation and Propagation (obsolete) (deleted) — Splinter Review
All in one patch.
Attachment #773018 - Attachment is obsolete: true
Attachment #774034 - Attachment is obsolete: true
Attachment #775146 - Attachment is obsolete: true
Attachment #776869 - Attachment is obsolete: true
Attachment #777098 - Attachment is obsolete: true
Attachment #777501 - Attachment is obsolete: true
Attachment #780478 - Flags: review?(nicolas.b.pierron)
Comment on attachment 780478 [details] [diff] [review] Instrumentation and Propagation Review of attachment 780478 [details] [diff] [review]: ----------------------------------------------------------------- This looks good. Please address the comments and submit another patch for review & checkin. ::: js/src/ion/BaselineCompiler.cpp @@ +186,5 @@ > ImmWord(uintptr_t(-1))); > } > > + IonSpew(IonSpew_BranchProfiles, > + "[BaselineCompiler] Statistics %d counters emitted for script %s:%d (%p)", nit: Remove the brackets from the spew message. @@ +675,5 @@ > return Method_Error; > > + if (js_IonOptions.baselineBranchProfiling) { > + // Instrument all jump targets and the first opcode. > + if ( (pc == script->code || info->jumpTarget ) && !emitBlockCounter(pc)) nit: remove extra space between the 2 parentheses. ::: js/src/ion/BaselineJIT.h @@ +157,5 @@ > uint32_t pcMappingOffset_; > uint32_t pcMappingSize_; > > + uint32_t blockCounterOffset_; > + uint32_t blockCounters_; nit: the naming scheme here is that the size-argument mention if it is a number of Entries or the Size in bytes. Add "Entries" at the end of blockCounters. @@ +286,5 @@ > static size_t offsetOfFlags() { > return offsetof(BaselineScript, flags_); > } > + > + static size_t offsetOfBlockCounterOffset(){ nit: remove this function, it is not used. ::: js/src/ion/Ion.cpp @@ +976,5 @@ > return false; > > + if (js_IonOptions.baselineBranchProfiling && !AttachBranchProfiles(graph)) > + return false; > + IonSpewPass("Attach Branch Profiles"); You want any of these spew & checks only if the phase is enabled. if (js_IonOptions.baselineBranchProfiling) { if (!AttachBranchProfiles(graph)) … … } ::: js/src/ion/IonAnalysis.cpp @@ +50,5 @@ > +bool > +ion::AttachBranchProfiles(MIRGraph &graph) > +{ > + // Skip AsmJs MIRGraphs. > + if (isAsmJSCompilation(graph)) nit: In the comment, mention why we are skipping, such as "Skip AsmJS optimization as OdinMonkey is an ahead of time compiler and we do not have any profiled information." @@ +64,5 @@ > + > + jsbytecode *pc = block->pc(); > + > + // Not all blocks have a pc, only blocks which have a resume point. > + if (!pc) Hum … the comment is right about the resume point, but the condition check the block->pc() which is not the same as the resume point one. And this one must be set in all IonBuilder constructed block and after, except AsmJS. So I will try to assert that the pc is not NULL here. @@ +72,5 @@ > + // might assign the pc of previous opcode (JSOP_NOP or JSOP_GOTO) > + // to corresponding MBasicBlock. These mismatch can be corrected > + // by adding the length of the previous opcode. > + int loopHeaderOffset = 0; > + JSOp op = JSOp(*pc); move this inside the if brackets. @@ +75,5 @@ > + int loopHeaderOffset = 0; > + JSOp op = JSOp(*pc); > + if (block->isLoopHeader()) { > + if(op == JSOP_NOP || op == JSOP_GOTO) > + loopHeaderOffset = GetBytecodeLength(pc); replace that by: pc += GetBytecodeLength(pc); and remove the loopHeaderOffset variable. @@ +77,5 @@ > + if (block->isLoopHeader()) { > + if(op == JSOP_NOP || op == JSOP_GOTO) > + loopHeaderOffset = GetBytecodeLength(pc); > + else > + JS_ASSERT(op == JSOP_LOOPHEAD); replace the else by the update of op before doing this assertion. @@ +86,5 @@ > + // We can switch to binary search if necessary. > + size_t numCounters = blscript->numBlockCounters(); > + for (size_t i = 0; i < numCounters; i++) { > + BlockCounterEntry entry = blscript->blockCounterEntry(i); > + if (entry.pcOffset + code == pc + loopHeaderOffset) { Compute the block pcOffset before entering the loop instead of comparing with the sum every time. ::: js/src/ion/MIRGraph.h @@ +494,5 @@ > + uint32_t getBlockUseCount() { > + return blockUseCount_; > + } > + > + bool isWorthFiltered() { This function is not used in this patch, remove it. @@ +497,5 @@ > + > + bool isWorthFiltered() { > + uint32_t scriptUseCount = info_.script()->getUseCount(); > + // TODO: Decide whether this block is worth to be filtered out. > + if (isBlockUseCountAvailable() && getBlockUseCount() < scriptUseCount / 10) at first I would not try removing branches. ::: js/src/shell/js.cpp @@ +5366,5 @@ > " lsra: Linear Scan register allocation (default)\n" > " backtracking: Priority based backtracking register allocation\n" > " stupid: Simple block local register allocation") > + || !op.addStringOption('\0', "branch-profiling", "on/off", > + "Profile baseline generated codes (default: on, off to disable)") make it "off" by default until we get any optimization working on top of it.
Attachment #780478 - Flags: review?(nicolas.b.pierron) → feedback+
Blocks: 896783
Attached patch branchProfiling.patch (obsolete) (deleted) — Splinter Review
Fixed mentioned nits + rebased.
Attachment #783135 - Flags: review?(nicolas.b.pierron)
Attachment #780478 - Attachment is obsolete: true
Comment on attachment 783135 [details] [diff] [review] branchProfiling.patch Review of attachment 783135 [details] [diff] [review]: ----------------------------------------------------------------- This sounds good to me. Just a last modification, to avoid calling the annotating function when we can avoid it. ::: js/src/ion/BaselineCompiler.cpp @@ +199,5 @@ > + for (size_t i = 0; i < blockCounterLabels_.length(); i++) { > + CodeOffsetLabel label = blockCounterLabels_[i].label; > + label.fixup(&masm); > + size_t bcEntry = blockCounterLabels_[i].bcEntry; > + BlockCounterEntry *bcEntryAddr = &baselineScript->blockCounterEntry(bcEntry); This sounds weird to me to have 2 vectors with identical indexes just to split between data used only during the code generation and data used after the code generation, but this is the current design of the Baseline IC so I will not block this patch on it. I would have preferred to have a BaselineCompiler::BlockCounterEntry and a BaselineScript::BlockCounterEntry, and let the copyBlockCounterEntries do the conversion between the Compiler's structure to the Script's structure. (stripping the label) ::: js/src/ion/Ion.cpp @@ +986,5 @@ > > if (mir->shouldCancel("Renumber Blocks")) > return false; > > + if (js_IonOptions.baselineBranchProfiling) { Add "&& mir->compilingAsmJS()" in the condition, to avoid doing the spew and the check for asm js compilations. ::: js/src/ion/IonAnalysis.cpp @@ +55,5 @@ > +{ > + // Skip AsmJS optimization as OdinMonkey is an ahead of time compiler > + // and we do not have any profiled information. > + if (isAsmJSCompilation(graph)) > + return true; Use an assert instead of a if. Replace "Skip" by "Forbid".
Attachment #783135 - Flags: review?(nicolas.b.pierron)
Attachment #783135 - Flags: review?(kvijayan)
Attachment #783135 - Flags: review+
Comment on attachment 783135 [details] [diff] [review] branchProfiling.patch Review of attachment 783135 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ion/BaselineCompiler.cpp @@ +215,5 @@ > if (cx->runtime()->spsProfiler.enabled()) > baselineScript->toggleSPS(true); > > + if (js_IonOptions.baselineBranchProfiling) > + baselineScript->toggleBlockCounters(true); This should also be done in case of failures of Ion compilations, to prevent the profiling overhead.
Blocks: 901221
Attached patch Disable profiling after ionmonkey failed. (obsolete) (deleted) — Splinter Review
Blocks: 906418
Scripts in bug757428.js entered ForbidCompilation() more than once with --baseline-eager option, which crashed toggleBlockCounters(). To void this we test the state of block counters before we toggle.
Attachment #785770 - Attachment is obsolete: true
Attached patch Update: Normalize counters (obsolete) (deleted) — Splinter Review
Update AttachBranchProfiles(), trying to normalize counter values.
Attachment #793852 - Attachment is obsolete: true
Attachment #798635 - Flags: feedback?(nicolas.b.pierron)
Attachment #798635 - Attachment description: branchProfiling.patch → Update: Normalize counters
Comment on attachment 798635 [details] [diff] [review] Update: Normalize counters Review of attachment 798635 [details] [diff] [review]: ----------------------------------------------------------------- I cannot see anything related to the normalization, did you upload the right patch, or can you split the new stuff from the old one.
Comment on attachment 798635 [details] [diff] [review] Update: Normalize counters Review of attachment 798635 [details] [diff] [review]: ----------------------------------------------------------------- Cancel the review (appears in my review list), see last answer.
Attachment #798635 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 783135 [details] [diff] [review] branchProfiling.patch Canceling review because I'm just not able to get to this ATM.. especially considering that it's already been reviewed.
Attachment #783135 - Flags: review?(kvijayan)
Attached patch BranchProfiling.patch (deleted) — Splinter Review
Fix several issues. 1. Fixed a bug related with loopheader: when a basic block is loop header, and its pc points to JSOP_GOTO, we should use the counter of JSOP_LOOPENTRY that the JSOP_GOTO jumps to. 2. A basic block might be divided into two blocks because of function inlining. The first block (bb1) inherits the pc of the original basic block, and hence has a counter. The second block (bb2) is created with a new pc which is not a jump target, so it doesn't have any counters available. In this case we find bb1 for bb2 by iterating the dominators of bb2, and assign bb1's profile to bb2. 3. 'FunctionDispatch' and 'TypeObjectDispatch' generates multiple basic blocks before and after inlined functions, and a return block as their post dominator. This return block has no counter available. In this case, it should use its dominator's counter. 4. Two blocks are generated for JSOP_TRY during MIRGraph building with different bytecode addresses (pc). The pc of the first block is the address of JSOP_TRY, which has a counter available. The pc of second block is the address of the bytecode right after the JSOP_TRY, which is not a jump target. In this case we copy the counter of JSOP_TRY to the second block. 5. Things get complicated when the lastIns of a basic block is 'TableSwitch': 1) There might be more than one split blocks, and 2) one pc might be shared between multiple successors, which makes the result incorrect. This issue haven't been fixed yet, and I will update the patch once I fix it.
Attachment #783135 - Attachment is obsolete: true
Attachment #798635 - Attachment is obsolete: true
Attachment #808575 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 808575 [details] [diff] [review] BranchProfiling.patch Review of attachment 808575 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/Ion.h @@ +200,5 @@ > + // Whether baseline scripts are instrumented. > + // > + // Default: false > + bool baselineBranchProfiling; > + nit: trailing whitespace. ::: js/src/jit/IonAnalysis.cpp @@ +181,5 @@ > + // we should not copy the counter from its dominator. > + for (; i < block->numPredecessors(); i++) > + if (resumePoint == block->getPredecessor(i)->callerResumePoint()) > + break; > + if (i == block->numPredecessors()) { These Test should be moved to the previous condition. It is weird to read something like: if (…) { … some predicate code … if (predicate is true) { … some specialized code … } } it would be easier to read: … some predicate code … if (… && predicate is true) { … some specialized code … continue; } The same remarks apply for the other conditions. @@ +200,5 @@ > + block->id(), pcOffset); > + } > + } > + } > + // 'FunctionDispatch' and 'TypeObjectDispatch' generates multiple nit: Add a new line before this comment.
Attachment #808575 - Flags: feedback?(nicolas.b.pierron) → feedback+
(In reply to Wei Wu [:wuwei UTC+8] from comment #36) > Fix several issues. This is really nice that if we can distinguish them precisely, and have this in the code. Good work! How do you spot such cases, do you add an assertion to spot missing blocks?
(In reply to Nicolas B. Pierron [:nbp] from comment #38) > (In reply to Wei Wu [:wuwei UTC+8] from comment #36) > > Fix several issues. > > This is really nice that if we can distinguish them precisely, and have this > in the code. > Good work! > > How do you spot such cases, do you add an assertion to spot missing blocks? I added some ionspews and assertions to catch these missing or wrong matched blocks, and analyzed them case by case. These scaffolds were removed before submitting.
The function branch profiler was finished that time, but never landed since I failed to gain significant performance progress in subsequent optimization passes. I'd like to find out whether this profiler is still useful, and retrofit it if necessary.
Flags: needinfo?(lazyparser)
This work should no longer be necessary, as when --ion-pgo=on is given to the JS shell, the basic blocks are annotated with the number of time they got visited out-side IonMonkey. (see Bug 1209515) I think we can mark this bug as a duplicate of Bug 1209515.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(lazyparser)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: