Closed Bug 497788 Opened 15 years ago Closed 3 years ago

[meta]Improve performance on Ben Galbraith's linked bubblemark

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Unassigned)

References

()

Details

(Keywords: meta, perf)

Attachments

(6 files, 1 obsolete file)

On the url in the url field, when set to 128 balls, we get about 23fps on my machine. Safari 4 gets closer to 80. I profiled this on m-c, and the time split on a high level (ignoring the time spent on the URI-classifier thread, which seems to be 7% of the profiler hits) is: 22% in js_Interpret (we mostly fail to trace this testcase, at least in part because of non-stub getters/setters, but there are other issues too) 15% under js_MonitorLoopEdge, breaking down as: 10% js_ExecuteTree (largely LeaveTree, JS_ArenaAllocate, BuildNativeStackFrame, etc). 2% js_CheckEntryTypes 1% Running jitted code 2% various other small stuff (attempting to extend trees, etc) 10% under js_GetPropertyHelper, breaking down as: 2% self 4% js_FillPropertyCache 4% js_LookupPropertyWithFlags (in that function, and under it in js_SearchScope) 8% under js_SetPropertyHelper, breaking down as: 3.5% DOM SetTop 3.5% DOM SetLeft 1% js_LookupPropertyWithFlags and js_FillPropertyCache 6% under js_ValueToNumber, mostly calling js_strtod. 4% under js_FullTestPropertyCache 2% under js_SetProperty 2% under js_ValueToString (mostly calling js_NumberToString) 2% allocating jsdoubles 2% js_FindPropertyHelper ~5% in other smaler js things (js_NativeGet, js_fun_call, math_abs, js_GetProperty, etc). Non-JS bits: 15% painting 4% style recomputation 2% reflow
Attached file First js file for benchmark (deleted) —
Attached file Second js file for benchmark (deleted) —
Attached image Image for benchmark (deleted) —
Attached file HTML for benchmark: run me! (obsolete) (deleted) —
Oh, the attached JS differs from the original in one important way. In the first js file, this part: // process collisions for (i=0; i<_this._N; i++) { for (var j=i+1; j<_this._N; j++) { _this._ballsO[i].doCollide(_this._ballsO[j]); } } the original testcase has just |j=i+1| (so js is a global variable).
Depends on: 497789
Filed bug 497789 on the other thing that makes us fall off trace here, and do so on the O(N^2) part of the benchmark, which seems to dominate for N==128 per above profile data.
Keywords: perf
Attached file CSS for benchmark (deleted) —
Attachment #382905 - Attachment is obsolete: true
So as an experiment, I tried just commenting out the guard that leads to the aborts in bug 497789. That helps a bit: fps goes from 23 to 50.... for the first 10-15 seconds. Then it collapses back to 23. And there are more "inner tree is trying to grow" aborts. Maybe the commenting out is just too hacky to really work...
Depends on: 498559
Depends on: 498562
Depends on: 498565
Depends on: 498579
This is a totally nonminimal JS shell version of the testcase. It still outputs fps, and gcs every so often just like the browser. This testcase gets 60fps or so until the first gc on m-c right now, then drops off to 45fps. With the patch in bug 497789 it goes up to 400fps or so... until the first gc. Then it drops to about 45fps. That's what comment 9 is about. I'll be filing a bug on that.
Depends on: 499866
OK, with the fix for bug 497789 (and all other pending patches for bugs blocking this one applied) the new profile looks like: js-related stuff: 7% running jitted code, boxing and unboxing doubles, etc. 5% js_VaueToString 4% getting .style off nodes (JS-wrapping, unwrapping, etc). Slimwrapper might help. 2.5% interpreter time from js_fun_call. Might get better once we trace getters/setters. 1% other. Total js-related: 20% or so. A big step up from the 68% in comment 0. Non-js-related: 36% painting (see bug 498579, though it's not happening as much in this profile; compositor might help here) 15% setting style.top/left (at least 3/4 under DeclarationChanged) 10% style recomputation (see bug 479655) 4% reflow 3% js_LookupPropertyWithFlags The remaining 10% or so looks like mostly profiler artifacts (dtrace_get_cpu_int_stack_top, I'm looking at you).
Quick update: We now hit 40fps on trunk here (compared to 28fps back in June). If I hack the JS to avoid bug 497789, we hit 68fps. Safari 4 on the same hardware is at 90fps on the original testcase; 95fps on the hacked on. Chrome is at 130fps, but cheating on the timeouts. With that hack, 15% of the time is spent in jit-generated code or libmozjs. Also some xpconnect-ish time around. So pretty similar to comment 11...
Depends on: 528208
QA Contact: general → brendan
In case it wasn't clear, the 68fps cap from comment 13 was due to bug 528208. With that fixed, and still with the hackaround bug 497789 in place, m-c is at 100fps on my machine. If I drop the interval clamping in core to below 10ms the fps actually goes down; I think the timer thread is screwing that over somehow. We could try more balls to see whether we can get useful head-to-head with chrome. I'll probably do that once bug 497789 lands and reprofile.
OK, I redid a profile with the current patch in bug 497789 and the DOM's rate-limiting on timeouts removed (so we could go over the 100fps we were hitting). General breakdown: js-related: 1% quickstubs glue setting style.top/left 9% js_NumberToStringWithBase 4.5% js_UnboxDouble 3.5% getting .style (unwrapping this, tearoffs, wrapping the decl) 1% js_ConcatStrings 0.5% js_BoxDouble 12% jit-generated code 1% js_ValueToString 0.4% in js_Interpret (yay!) Total JS-related: 33% non-js-related: 23.5% setting style.left/top ( 14% processing restyles 3% reflow 21% painting Total non-JS-related: 62% The remaining 5% looks to mostly be cocoa widgetry stuff or something. We'll likely win a few % here if painting moves to refresh driver; about 1/7 of the restyling time was happening off WillPaint. In general, the time spent under the setTimeout call is about 58% of total, with JS accounting for a bit over half of that. In the style attr setting there's some COM stuff to be killed off; zwol is working on that.
Oh, and when not running under the profiler, for the comment 15 setup we hit 133fps on my machine. Chrome hits about 148fps.
QA Contact: brendan → general
For what it's worth, current numbers for the comment 15 setup on my current machine are: m-c: 175fps Opera: 116fps Chrome: 180fps
Assignee: general → nobody

Meta bug with all dependent bugs fixed. Shell test case is fast, too.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: