Closed Bug 683804 Opened 13 years ago Closed 13 years ago

2x regression with JM+TI compared to JM+TM on the darkroom subtest of kraken

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10
Tracking Status
firefox10 + ---

People

(Reporter: bzbarsky, Assigned: bhackett1024)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

http://www.extremetech.com/computing/94532-firefox-9-javascript-performance-improved-by-20-30-with-type-inference has TI+JM at 2.32 times as slow. The difference between the two is about 250ms, or about 10% of total kraken time.
From my reading of this article, it sounded like they may have used Nightly and just disabled Type Inference, which I presume has some different performance characteristics compared to TM+JM prior to the TI landing. I'm not really sure.
Unless the argument is that nightly with TI disabled is a lot faster than pre-landing TM+JM was on this test _and_ is faster than TI+JM on this test, comment 1 doesn't really make sense to me.... For what it's worth, http://arewefastyet.com/?a=b&view=breakdown shows the same thing: TI+JM is at 800ms while TM+JM+profiling is at about 400ms.
Oops! Yes, I was confused.
My machine, darkroom: Firefox 6 234.9 Nightly 551.1
Whiteboard: js-triage-needed
I can confirm this as well with Nightly of sep. 8: Darkroom on Kraken 1.1: Firefox 6: 363.1 Nightly: 789.0 (No complaints though: overall score went from 10465 to 5479; a 90% improvement!) Beside this, Gaussian Blur is also slower: Firefox 6: 783.9 Nightly: 994.3
> (No complaints though: overall score went from 10465 to 5479; a 90% improvement!) I _have_ to comment on this. When computing change percentages, they're always percentages of the _old_ number. So this is a 48% improvement. A 90% improvement would have been the score going to 1047 (aka a 10x speedup).
According to Instruments we spend 10% of the time in TypeMonitorResult. I'll try to find out why.. Gaussian Blur needs CSE/GVN. Hard to do in JM but IonMonkey will have this optimization.
OK so the problem is that we call Math.log and Math.pow from inlined functions like this: function FastLog2(x) { return Math.log(x) / Math.LN2; } The IC is disabled for the Math.log call because "we can't recompile them yet". If I comment out the f.regs.inlined() check (unsafe of course) we are like 300 ms faster... Brian, how hard is it to fix this?
I think the main issue is that when expanding inline frames we can kick a frame into the interpreter while retaining jitcode for the (outer) script making the call. This doesn't work right when the recompiler steals native stubs, as it will patch up jumps in the stub without fixing the jumps entering the code. It would need to make sure to reset the IC when stealing the stub, but this should not be hard. The same would also apply to getter ICs, for which recompilation works the same way as native ICs and which are also disabled for inline frames.
Fix allowing native and getter ICs to be generated for inline frames. Takes my darkroom time from 640 to 400 (-mjp is about 300). http://hg.mozilla.org/projects/jaegermonkey/rev/323595f354b1
Whiteboard: js-triage-needed → fixed-in-jaegermonkey
Whiteboard: fixed-in-jaegermonkey
This was causing crashes and didn't make it into Firefox 9. It also hurts some of the Peacekeeper tests, such as stringDetect.
Blocks: 691314
(In reply to Brian Hackett from comment #11) > This was causing crashes and didn't make it into Firefox 9. It also hurts > some of the Peacekeeper tests, such as stringDetect. Just to make things clear, did the patch hurt stringDetect, or does not having the patch hurt stringDetect?
This patch helps stringDetect, which repeatedly calls RegExp.test outside of a loop. We inline the run() method call, and don't generate ICs at the test() calls. There is an easy fix here which is to just mark as uninlineable scripts we try to generate native ICs for. The time saved by inlining the scripted call is small compared to the cost incurred by not generating the IC. I don't know yet whether to do the easy fix or the 'real' fix, but given the botch when this initially landed the easy fix is probably better. I'd like it if I never need to touch the recompiler again.
Attached patch easy fix (deleted) — Splinter Review
Easy fix described above, just mark functions as uninlineable when we try to generate native ICs for them (native calls handled via FastBuiltins don't affect inlining). Gives the same speedup to darkroom, and a lot simpler.
Assignee: general → bhackett1024
Attachment #567592 - Flags: review?(dvander)
x86_64-pc-linux-gnu gcc-4.7.0 profiledbuild before/after patch TEST COMPARISON FROM TO DETAILS ==================================================================================== ** TOTAL **: 1.074x as fast 4366.0ms +/- 0.6% 4064.5ms +/- 0.2% significant ==================================================================================== ai: 1.141x as fast 277.5ms +/- 4.1% 243.2ms +/- 1.4% significant astar: 1.141x as fast 277.5ms +/- 4.1% 243.2ms +/- 1.4% significant audio: 1.095x as fast 1469.6ms +/- 1.3% 1341.9ms +/- 0.6% significant beat-detection: - 339.9ms +/- 0.6% 340.0ms +/- 0.9% dft: 1.29x as fast 584.1ms +/- 3.0% 453.7ms +/- 0.7% significant fft: ?? 235.4ms +/- 1.5% 235.8ms +/- 1.2% oscillator: ?? 310.2ms +/- 0.7% 312.4ms +/- 1.6% imaging: 1.067x as fast 1763.1ms +/- 0.7% 1652.3ms +/- 0.4% significant gaussian-blur: ?? 959.3ms +/- 0.2% 960.9ms +/- 0.4% darkroom: 1.29x as fast 500.6ms +/- 0.9% 386.6ms +/- 1.1% significant desaturate: ?? 303.2ms +/- 3.1% 304.8ms +/- 0.1% json: 1.050x as fast 133.5ms +/- 1.9% 127.2ms +/- 0.4% significant parse-financial: 1.046x as fast 70.7ms +/- 1.7% 67.6ms +/- 0.5% significant stringify-tinderbox: 1.054x as fast 62.8ms +/- 2.5% 59.6ms +/- 0.8% significant stanford: 1.032x as fast 722.3ms +/- 0.8% 699.9ms +/- 0.3% significant crypto-aes: 1.035x as fast 178.1ms +/- 1.5% 172.1ms +/- 0.7% significant crypto-ccm: 1.032x as fast 124.1ms +/- 0.6% 120.2ms +/- 0.7% significant crypto-pbkdf2: - 296.1ms +/- 3.1% 287.9ms +/- 0.5% crypto-sha256-iterative: - 124.0ms +/- 4.7% 119.7ms +/- 0.6% Firefox with patch/ Chromium 16.0.904.0 TEST COMPARISON FROM TO DETAILS ==================================================================================== ** TOTAL **: 1.163x as fast 4064.5ms +/- 0.2% 3495.5ms +/- 0.8% significant ==================================================================================== ai: *1.81x as slow* 243.2ms +/- 1.4% 441.0ms +/- 0.7% significant astar: *1.81x as slow* 243.2ms +/- 1.4% 441.0ms +/- 0.7% significant audio: 1.151x as fast 1341.9ms +/- 0.6% 1165.5ms +/- 1.7% significant beat-detection: *1.081x as slow* 340.0ms +/- 0.9% 367.5ms +/- 1.7% significant dft: 1.28x as fast 453.7ms +/- 0.7% 355.1ms +/- 4.6% significant fft: ?? 235.8ms +/- 1.2% 236.5ms +/- 1.5% oscillator: 1.51x as fast 312.4ms +/- 1.6% 206.4ms +/- 2.4% significant imaging: 1.26x as fast 1652.3ms +/- 0.4% 1314.4ms +/- 1.2% significant gaussian-blur: 3.32x as fast 960.9ms +/- 0.4% 289.0ms +/- 0.4% significant darkroom: *1.28x as slow* 386.6ms +/- 1.1% 494.0ms +/- 3.5% significant desaturate: *1.74x as slow* 304.8ms +/- 0.1% 531.4ms +/- 0.7% significant json: *1.23x as slow* 127.2ms +/- 0.4% 156.2ms +/- 1.1% significant parse-financial: *1.179x as slow* 67.6ms +/- 0.5% 79.7ms +/- 1.8% significant stringify-tinderbox: *1.28x as slow* 59.6ms +/- 0.8% 76.5ms +/- 1.8% significant stanford: 1.67x as fast 699.9ms +/- 0.3% 418.4ms +/- 1.2% significant crypto-aes: 1.70x as fast 172.1ms +/- 0.7% 101.5ms +/- 1.7% significant crypto-ccm: 1.21x as fast 120.2ms +/- 0.7% 99.1ms +/- 3.3% significant crypto-pbkdf2: 2.11x as fast 287.9ms +/- 0.5% 136.2ms +/- 1.3% significant crypto-sha256-iterative: 1.47x as fast 119.7ms +/- 0.6% 81.6ms +/- 3.0% significant
Attachment #567592 - Flags: review?(dvander) → review+
Status: NEW → RESOLVED
Closed: 13 years ago
OS: Mac OS X → All
Hardware: x86 → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: