Closed
Bug 683804
Opened 13 years ago
Closed 13 years ago
2x regression with JM+TI compared to JM+TM on the darkroom subtest of kraken
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla10
Tracking | Status | |
---|---|---|
firefox10 | + | --- |
People
(Reporter: bzbarsky, Assigned: bhackett1024)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
(deleted),
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
http://www.extremetech.com/computing/94532-firefox-9-javascript-performance-improved-by-20-30-with-type-inference has TI+JM at 2.32 times as slow. The difference between the two is about 250ms, or about 10% of total kraken time.
Comment 1•13 years ago
|
||
From my reading of this article, it sounded like they may have used Nightly and just disabled Type Inference, which I presume has some different performance characteristics compared to TM+JM prior to the TI landing. I'm not really sure.
Reporter | ||
Comment 2•13 years ago
|
||
Unless the argument is that nightly with TI disabled is a lot faster than pre-landing TM+JM was on this test _and_ is faster than TI+JM on this test, comment 1 doesn't really make sense to me....
For what it's worth, http://arewefastyet.com/?a=b&view=breakdown shows the same thing: TI+JM is at 800ms while TM+JM+profiling is at about 400ms.
Comment 3•13 years ago
|
||
Oops! Yes, I was confused.
Comment 4•13 years ago
|
||
My machine, darkroom:
Firefox 6 234.9
Nightly 551.1
Updated•13 years ago
|
Whiteboard: js-triage-needed
I can confirm this as well with Nightly of sep. 8:
Darkroom on Kraken 1.1:
Firefox 6: 363.1
Nightly: 789.0
(No complaints though: overall score went from 10465 to 5479; a 90% improvement!)
Beside this, Gaussian Blur is also slower:
Firefox 6: 783.9
Nightly: 994.3
Reporter | ||
Comment 6•13 years ago
|
||
> (No complaints though: overall score went from 10465 to 5479; a 90% improvement!)
I _have_ to comment on this. When computing change percentages, they're always percentages of the _old_ number. So this is a 48% improvement. A 90% improvement would have been the score going to 1047 (aka a 10x speedup).
Comment 7•13 years ago
|
||
According to Instruments we spend 10% of the time in TypeMonitorResult. I'll try to find out why..
Gaussian Blur needs CSE/GVN. Hard to do in JM but IonMonkey will have this optimization.
Comment 8•13 years ago
|
||
OK so the problem is that we call Math.log and Math.pow from inlined functions like this:
function FastLog2(x) {
return Math.log(x) / Math.LN2;
}
The IC is disabled for the Math.log call because "we can't recompile them yet". If I comment out the f.regs.inlined() check (unsafe of course) we are like 300 ms faster...
Brian, how hard is it to fix this?
Assignee | ||
Comment 9•13 years ago
|
||
I think the main issue is that when expanding inline frames we can kick a frame into the interpreter while retaining jitcode for the (outer) script making the call. This doesn't work right when the recompiler steals native stubs, as it will patch up jumps in the stub without fixing the jumps entering the code. It would need to make sure to reset the IC when stealing the stub, but this should not be hard. The same would also apply to getter ICs, for which recompilation works the same way as native ICs and which are also disabled for inline frames.
Assignee | ||
Comment 10•13 years ago
|
||
Fix allowing native and getter ICs to be generated for inline frames. Takes my darkroom time from 640 to 400 (-mjp is about 300).
http://hg.mozilla.org/projects/jaegermonkey/rev/323595f354b1
Whiteboard: js-triage-needed → fixed-in-jaegermonkey
Assignee | ||
Updated•13 years ago
|
Whiteboard: fixed-in-jaegermonkey
Assignee | ||
Comment 11•13 years ago
|
||
This was causing crashes and didn't make it into Firefox 9. It also hurts some of the Peacekeeper tests, such as stringDetect.
Blocks: 691314
Comment 12•13 years ago
|
||
(In reply to Brian Hackett from comment #11)
> This was causing crashes and didn't make it into Firefox 9. It also hurts
> some of the Peacekeeper tests, such as stringDetect.
Just to make things clear, did the patch hurt stringDetect, or does not having the patch hurt stringDetect?
Updated•13 years ago
|
status-firefox10:
--- → affected
tracking-firefox10:
--- → +
Assignee | ||
Comment 13•13 years ago
|
||
This patch helps stringDetect, which repeatedly calls RegExp.test outside of a loop. We inline the run() method call, and don't generate ICs at the test() calls.
There is an easy fix here which is to just mark as uninlineable scripts we try to generate native ICs for. The time saved by inlining the scripted call is small compared to the cost incurred by not generating the IC. I don't know yet whether to do the easy fix or the 'real' fix, but given the botch when this initially landed the easy fix is probably better. I'd like it if I never need to touch the recompiler again.
Assignee | ||
Comment 14•13 years ago
|
||
Easy fix described above, just mark functions as uninlineable when we try to generate native ICs for them (native calls handled via FastBuiltins don't affect inlining). Gives the same speedup to darkroom, and a lot simpler.
Assignee: general → bhackett1024
Assignee | ||
Updated•13 years ago
|
Attachment #567592 -
Flags: review?(dvander)
Comment 15•13 years ago
|
||
x86_64-pc-linux-gnu gcc-4.7.0 profiledbuild
before/after patch
TEST COMPARISON FROM TO DETAILS
====================================================================================
** TOTAL **: 1.074x as fast 4366.0ms +/- 0.6% 4064.5ms +/- 0.2% significant
====================================================================================
ai: 1.141x as fast 277.5ms +/- 4.1% 243.2ms +/- 1.4% significant
astar: 1.141x as fast 277.5ms +/- 4.1% 243.2ms +/- 1.4% significant
audio: 1.095x as fast 1469.6ms +/- 1.3% 1341.9ms +/- 0.6% significant
beat-detection: - 339.9ms +/- 0.6% 340.0ms +/- 0.9%
dft: 1.29x as fast 584.1ms +/- 3.0% 453.7ms +/- 0.7% significant
fft: ?? 235.4ms +/- 1.5% 235.8ms +/- 1.2%
oscillator: ?? 310.2ms +/- 0.7% 312.4ms +/- 1.6%
imaging: 1.067x as fast 1763.1ms +/- 0.7% 1652.3ms +/- 0.4% significant
gaussian-blur: ?? 959.3ms +/- 0.2% 960.9ms +/- 0.4%
darkroom: 1.29x as fast 500.6ms +/- 0.9% 386.6ms +/- 1.1% significant
desaturate: ?? 303.2ms +/- 3.1% 304.8ms +/- 0.1%
json: 1.050x as fast 133.5ms +/- 1.9% 127.2ms +/- 0.4% significant
parse-financial: 1.046x as fast 70.7ms +/- 1.7% 67.6ms +/- 0.5% significant
stringify-tinderbox: 1.054x as fast 62.8ms +/- 2.5% 59.6ms +/- 0.8% significant
stanford: 1.032x as fast 722.3ms +/- 0.8% 699.9ms +/- 0.3% significant
crypto-aes: 1.035x as fast 178.1ms +/- 1.5% 172.1ms +/- 0.7% significant
crypto-ccm: 1.032x as fast 124.1ms +/- 0.6% 120.2ms +/- 0.7% significant
crypto-pbkdf2: - 296.1ms +/- 3.1% 287.9ms +/- 0.5%
crypto-sha256-iterative: - 124.0ms +/- 4.7% 119.7ms +/- 0.6%
Firefox with patch/ Chromium 16.0.904.0
TEST COMPARISON FROM TO DETAILS
====================================================================================
** TOTAL **: 1.163x as fast 4064.5ms +/- 0.2% 3495.5ms +/- 0.8% significant
====================================================================================
ai: *1.81x as slow* 243.2ms +/- 1.4% 441.0ms +/- 0.7% significant
astar: *1.81x as slow* 243.2ms +/- 1.4% 441.0ms +/- 0.7% significant
audio: 1.151x as fast 1341.9ms +/- 0.6% 1165.5ms +/- 1.7% significant
beat-detection: *1.081x as slow* 340.0ms +/- 0.9% 367.5ms +/- 1.7% significant
dft: 1.28x as fast 453.7ms +/- 0.7% 355.1ms +/- 4.6% significant
fft: ?? 235.8ms +/- 1.2% 236.5ms +/- 1.5%
oscillator: 1.51x as fast 312.4ms +/- 1.6% 206.4ms +/- 2.4% significant
imaging: 1.26x as fast 1652.3ms +/- 0.4% 1314.4ms +/- 1.2% significant
gaussian-blur: 3.32x as fast 960.9ms +/- 0.4% 289.0ms +/- 0.4% significant
darkroom: *1.28x as slow* 386.6ms +/- 1.1% 494.0ms +/- 3.5% significant
desaturate: *1.74x as slow* 304.8ms +/- 0.1% 531.4ms +/- 0.7% significant
json: *1.23x as slow* 127.2ms +/- 0.4% 156.2ms +/- 1.1% significant
parse-financial: *1.179x as slow* 67.6ms +/- 0.5% 79.7ms +/- 1.8% significant
stringify-tinderbox: *1.28x as slow* 59.6ms +/- 0.8% 76.5ms +/- 1.8% significant
stanford: 1.67x as fast 699.9ms +/- 0.3% 418.4ms +/- 1.2% significant
crypto-aes: 1.70x as fast 172.1ms +/- 0.7% 101.5ms +/- 1.7% significant
crypto-ccm: 1.21x as fast 120.2ms +/- 0.7% 99.1ms +/- 3.3% significant
crypto-pbkdf2: 2.11x as fast 287.9ms +/- 0.5% 136.2ms +/- 1.3% significant
crypto-sha256-iterative: 1.47x as fast 119.7ms +/- 0.6% 81.6ms +/- 3.0% significant
Updated•13 years ago
|
Attachment #567592 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 16•13 years ago
|
||
Comment 17•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
OS: Mac OS X → All
Hardware: x86 → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Version: unspecified → Trunk
Updated•13 years ago
|
status-firefox10:
affected → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•