683804 - 2x regression with JM+TI compared to JM+TM on the darkroom subtest of kraken

Reporter

Description

•

13 years ago

http://www.extremetech.com/computing/94532-firefox-9-javascript-performance-improved-by-20-30-with-type-inference has TI+JM at 2.32 times as slow. The difference between the two is about 250ms, or about 10% of total kraken time.

Andrew McCreight [:mccr8]

Comment 1

•

13 years ago

From my reading of this article, it sounded like they may have used Nightly and just disabled Type Inference, which I presume has some different performance characteristics compared to TM+JM prior to the TI landing. I'm not really sure.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

13 years ago

Unless the argument is that nightly with TI disabled is a lot faster than pre-landing TM+JM was on this test _and_ is faster than TI+JM on this test, comment 1 doesn't really make sense to me.... For what it's worth, http://arewefastyet.com/?a=b&view=breakdown shows the same thing: TI+JM is at 800ms while TM+JM+profiling is at about 400ms.

Andrew McCreight [:mccr8]

Comment 3

•

13 years ago

Oops! Yes, I was confused.

David Mandelin [:dmandelin]

Comment 4

•

13 years ago

My machine, darkroom: Firefox 6 234.9 Nightly 551.1

David Mandelin [:dmandelin]

Updated

•

13 years ago

Whiteboard: js-triage-needed

Onno

Comment 5

•

13 years ago

I can confirm this as well with Nightly of sep. 8: Darkroom on Kraken 1.1: Firefox 6: 363.1 Nightly: 789.0 (No complaints though: overall score went from 10465 to 5479; a 90% improvement!) Beside this, Gaussian Blur is also slower: Firefox 6: 783.9 Nightly: 994.3

Boris Zbarsky [:bzbarsky]

Reporter

Comment 6

•

13 years ago

> (No complaints though: overall score went from 10465 to 5479; a 90% improvement!) I _have_ to comment on this. When computing change percentages, they're always percentages of the _old_ number. So this is a 48% improvement. A 90% improvement would have been the score going to 1047 (aka a 10x speedup).

Jan de Mooij [:jandem]

Comment 7

•

13 years ago

According to Instruments we spend 10% of the time in TypeMonitorResult. I'll try to find out why.. Gaussian Blur needs CSE/GVN. Hard to do in JM but IonMonkey will have this optimization.

Jan de Mooij [:jandem]

Comment 8

•

13 years ago

OK so the problem is that we call Math.log and Math.pow from inlined functions like this: function FastLog2(x) { return Math.log(x) / Math.LN2; } The IC is disabled for the Math.log call because "we can't recompile them yet". If I comment out the f.regs.inlined() check (unsafe of course) we are like 300 ms faster... Brian, how hard is it to fix this?

Brian Hackett [Laid off!]

Assignee

Comment 9

•

13 years ago

I think the main issue is that when expanding inline frames we can kick a frame into the interpreter while retaining jitcode for the (outer) script making the call. This doesn't work right when the recompiler steals native stubs, as it will patch up jumps in the stub without fixing the jumps entering the code. It would need to make sure to reset the IC when stealing the stub, but this should not be hard. The same would also apply to getter ICs, for which recompilation works the same way as native ICs and which are also disabled for inline frames.

Brian Hackett [Laid off!]

Assignee

Comment 10

•

13 years ago

Fix allowing native and getter ICs to be generated for inline frames. Takes my darkroom time from 640 to 400 (-mjp is about 300). http://hg.mozilla.org/projects/jaegermonkey/rev/323595f354b1

Whiteboard: js-triage-needed → fixed-in-jaegermonkey

Brian Hackett [Laid off!]

Assignee

Updated

•

13 years ago

Whiteboard: fixed-in-jaegermonkey

Brian Hackett [Laid off!]

Assignee

Comment 11

•

13 years ago

This was causing crashes and didn't make it into Firefox 9. It also hurts some of the Peacekeeper tests, such as stringDetect.

Blocks: 691314

David Mandelin [:dmandelin]

Comment 12

•

13 years ago

(In reply to Brian Hackett from comment #11) > This was causing crashes and didn't make it into Firefox 9. It also hurts > some of the Peacekeeper tests, such as stringDetect. Just to make things clear, did the patch hurt stringDetect, or does not having the patch hurt stringDetect?

David Mandelin [:dmandelin]

Updated

•

13 years ago

status-firefox10: --- → affected

tracking-firefox10: --- → +

Brian Hackett [Laid off!]

Assignee

Comment 13

•

13 years ago

This patch helps stringDetect, which repeatedly calls RegExp.test outside of a loop. We inline the run() method call, and don't generate ICs at the test() calls. There is an easy fix here which is to just mark as uninlineable scripts we try to generate native ICs for. The time saved by inlining the scripted call is small compared to the cost incurred by not generating the IC. I don't know yet whether to do the easy fix or the 'real' fix, but given the botch when this initially landed the easy fix is probably better. I'd like it if I never need to touch the recompiler again.

Brian Hackett [Laid off!]

Assignee

Comment 14

•

13 years ago

Attached patch easy fix (deleted) — Details — Splinter Review

Easy fix described above, just mark functions as uninlineable when we try to generate native ICs for them (native calls handled via FastBuiltins don't affect inlining). Gives the same speedup to darkroom, and a lot simpler.

Assignee: general → bhackett1024

Brian Hackett [Laid off!]

Assignee

Updated

•

13 years ago

Attachment #567592 - Flags: review?(dvander)

Octoploid

Comment 15

•

13 years ago

x86_64-pc-linux-gnu gcc-4.7.0 profiledbuild before/after patch TEST COMPARISON FROM TO DETAILS ==================================================================================== ** TOTAL **: 1.074x as fast 4366.0ms +/- 0.6% 4064.5ms +/- 0.2% significant ==================================================================================== ai: 1.141x as fast 277.5ms +/- 4.1% 243.2ms +/- 1.4% significant astar: 1.141x as fast 277.5ms +/- 4.1% 243.2ms +/- 1.4% significant audio: 1.095x as fast 1469.6ms +/- 1.3% 1341.9ms +/- 0.6% significant beat-detection: - 339.9ms +/- 0.6% 340.0ms +/- 0.9% dft: 1.29x as fast 584.1ms +/- 3.0% 453.7ms +/- 0.7% significant fft: ?? 235.4ms +/- 1.5% 235.8ms +/- 1.2% oscillator: ?? 310.2ms +/- 0.7% 312.4ms +/- 1.6% imaging: 1.067x as fast 1763.1ms +/- 0.7% 1652.3ms +/- 0.4% significant gaussian-blur: ?? 959.3ms +/- 0.2% 960.9ms +/- 0.4% darkroom: 1.29x as fast 500.6ms +/- 0.9% 386.6ms +/- 1.1% significant desaturate: ?? 303.2ms +/- 3.1% 304.8ms +/- 0.1% json: 1.050x as fast 133.5ms +/- 1.9% 127.2ms +/- 0.4% significant parse-financial: 1.046x as fast 70.7ms +/- 1.7% 67.6ms +/- 0.5% significant stringify-tinderbox: 1.054x as fast 62.8ms +/- 2.5% 59.6ms +/- 0.8% significant stanford: 1.032x as fast 722.3ms +/- 0.8% 699.9ms +/- 0.3% significant crypto-aes: 1.035x as fast 178.1ms +/- 1.5% 172.1ms +/- 0.7% significant crypto-ccm: 1.032x as fast 124.1ms +/- 0.6% 120.2ms +/- 0.7% significant crypto-pbkdf2: - 296.1ms +/- 3.1% 287.9ms +/- 0.5% crypto-sha256-iterative: - 124.0ms +/- 4.7% 119.7ms +/- 0.6% Firefox with patch/ Chromium 16.0.904.0 TEST COMPARISON FROM TO DETAILS ==================================================================================== ** TOTAL **: 1.163x as fast 4064.5ms +/- 0.2% 3495.5ms +/- 0.8% significant ==================================================================================== ai: *1.81x as slow* 243.2ms +/- 1.4% 441.0ms +/- 0.7% significant astar: *1.81x as slow* 243.2ms +/- 1.4% 441.0ms +/- 0.7% significant audio: 1.151x as fast 1341.9ms +/- 0.6% 1165.5ms +/- 1.7% significant beat-detection: *1.081x as slow* 340.0ms +/- 0.9% 367.5ms +/- 1.7% significant dft: 1.28x as fast 453.7ms +/- 0.7% 355.1ms +/- 4.6% significant fft: ?? 235.8ms +/- 1.2% 236.5ms +/- 1.5% oscillator: 1.51x as fast 312.4ms +/- 1.6% 206.4ms +/- 2.4% significant imaging: 1.26x as fast 1652.3ms +/- 0.4% 1314.4ms +/- 1.2% significant gaussian-blur: 3.32x as fast 960.9ms +/- 0.4% 289.0ms +/- 0.4% significant darkroom: *1.28x as slow* 386.6ms +/- 1.1% 494.0ms +/- 3.5% significant desaturate: *1.74x as slow* 304.8ms +/- 0.1% 531.4ms +/- 0.7% significant json: *1.23x as slow* 127.2ms +/- 0.4% 156.2ms +/- 1.1% significant parse-financial: *1.179x as slow* 67.6ms +/- 0.5% 79.7ms +/- 1.8% significant stringify-tinderbox: *1.28x as slow* 59.6ms +/- 0.8% 76.5ms +/- 1.8% significant stanford: 1.67x as fast 699.9ms +/- 0.3% 418.4ms +/- 1.2% significant crypto-aes: 1.70x as fast 172.1ms +/- 0.7% 101.5ms +/- 1.7% significant crypto-ccm: 1.21x as fast 120.2ms +/- 0.7% 99.1ms +/- 3.3% significant crypto-pbkdf2: 2.11x as fast 287.9ms +/- 0.5% 136.2ms +/- 1.3% significant crypto-sha256-iterative: 1.47x as fast 119.7ms +/- 0.6% 81.6ms +/- 3.0% significant

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

13 years ago

Attachment #567592 - Flags: review?(dvander) → review+

Brian Hackett [Laid off!]

Assignee

Comment 16

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/356cd0d8f5a1

Ed Morley [:emorley]

Comment 17

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/356cd0d8f5a1

Status: NEW → RESOLVED

Closed: 13 years ago

OS: Mac OS X → All

Hardware: x86 → All

Resolution: --- → FIXED

Target Milestone: --- → mozilla10

Version: unspecified → Trunk

Scoobidiver (away)

Updated

•

13 years ago

status-firefox10: affected → ---