Closed Bug 590379 Opened 14 years ago Closed 10 years ago

IonMonkey: Investigate perf of realtime JS raytracer

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: dmandelin, Unassigned)

References

()

Details

(Keywords: perf)

Attachments

(1 file)

See the URL. Chrome gets about 30fps on my machine. We don't go above 5fps.
I forgot to say that jandem originally reported this, and also observed we use a lot of memory running it, which may be related to the slowness.
Keywords: perf
At least on Mac, a profile looks like this: 10% free() calls on the background GC thread. 20% in JM-generated jit code. 26% js_GC called from stubs::Interrupt; of this 20% is marking, including what looks like a bunch of arrays, and the rest is in js_GC itself. Almost half the marking is JSScopeProperty::trace. 18% calling stubs::SetName. About 2/3 of this is under js_GetMutableScope (mostly the malloc call and js_InitTitle and self time) and the other 1/3 in SetName itself. 8% calling stubs::New. Mostly self time and, js_GetProperty. 3% calling stubs::Call. 2% calling stubs::NewInitObject. 1% canvas drawImage calls. I also did an L2 miss profile, since that seemed like an obvious thing to check: 22% of our L2 misses were on the background gc thread. 32% of misses are in (not under) PL_DHashTableOperate called from GCGraphBuilder::NoteScriptChild under array_trace. 10% of misses are in ChangeTable, called from the above-mentioned PL_DHashTableOperate. 10% of misses are malloc calls under js_GetMutableScope from SetName.
Might be worth retesting, in lieu of bug 558451 comment 154.
err, "in view of". And Gregor already did, and he said the profile looked a lot different.
Would be good to get the testcase attached to this bug so it won't go away...
On TM, the patch linked to from comment 3 made us go from 1.5fps to 2.5fps. ;) Once that merges to JM, should remeasure and reprofile.
I filed bug 592007 because the GC heuristics doesn't work right now in the browser with the new Scope patch. There is also a file included that shows the before and after Scope patch GC times.
I experience the same issue in linux 4.0b7pre build (rev: f5c0015afe0e) Oprofile results: http://pastebin.mozilla.org/795750
OK, I just reprofiled (on m-c tip). 52% of the time is spent in methodjit-generated code. 24% is spent under gc (called from js_NewFinalizableGCThing). 8% is under stubs::NewObject but outside gc (js_GetPropertyHelper, self time for NewObject, etc) The rest is a few things under 2% each (drawImage on the canvas, js_math_floor, stubs::SetElem<0>, RunTracer, stubs::NewInitObject, InitPropOrMethod, etc. Interestingly, with tracer on we're at 14fps for a bit before dropping to 7fps; I profiled the 7fps steady-state.
Attached file Part of the problem (deleted) —
The raytracer creates canvas elements and uses them as image buffers. On the attached test case JM is 2x slower than Chrome. (TM is 5x faster than Chrome though). For 10 frames, filling the buffer (the loop body) is 55 ms faster (80 vs 25) in Chrome. Canvas elements are also used for textures, so fixing bug 594247 will help JM here.
Depends on: 594247
Attachment #481918 - Attachment is patch: false
Attachment #481918 - Attachment mime type: text/plain → text/html
I can reproduce after the fixing of bug 594247: Minefield 4.0b13pre/2011023: 12 fps, hang every seconds Chrome 11.0.672.2 : 27 fps
Rereprofiled on Win 7. Top stuff as a percentage of JS time: GC (mark/sweep/etc) 40% AddPropertyHelper/AddAtomProperty 9 Object::init 7 MathCache::lookup 5 DOUBLE_IS_INT32 3 It looks like nothing short of a new GC will make this run well.
Yes this needs a generational GC. We only perform per-compartment GCs but the set of reachable objects is still more than 3.1 mill. We finalize about 4 mill objects per GC which is about 30% of the total GC time. Moving this to the background would help but the marking is the expensive part here. This is a very good example where we create the fixed set of 3149831 long lived objects and afterwards we only create short lived objects. The set of long lived objects stays exactly at 3149831 for the whole animation time.
No longer blocks: GenerationalGC
Depends on: GenerationalGC
Nothing earth-shatteringly new to report here. Current Chrome Canary gets between 90 and 120 fps (with the rotation looking everything but smooth, though). Current Nightly consistently gets above 50fps, so that's great. However, it's stuttering horribly, so the fps counter might be a bit broken. Looking forward to seeing how ggc's going to fare, here.
Summary: JM: Investigate perf of realtime JS raytracer → IonMonkey: Investigate perf of realtime JS raytracer
Assignee: general → nobody
On Ultra: Chrome 39: 6 fps Nightly: 8 fps On Low: Chrome 39: 160 fps Nightly: 200 fps (but slows down every 3~4s)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
The periodic slowdown might be interesting from a GC point of view. From comment #14, we shouldn't be able to GC anything long-lived (but that doesn't necessarily mean we won't try). Terrence, worth a look?
Flags: needinfo?(terrence)
I looked at the realtime raytracer relatively recently. The test creates 100's of MiB of garbage per second in tenured: we need to find out why the nursery is so lightly used on this test. This in itself isn't too bad as the incremetnal GC's appear to be mostly working okay, at least on my machine. However, this does expose a problem currently with creating this much garbage consistently: we seem to be releasing and immediately reacquiring 100's of MiB of chunk addresses every second because we have a 30MiB cap on empty chunks. We need to develop better limits here, however, this is largely dependent on getting better control of our GC triggers.
I split this off as bug 1301213.
Flags: needinfo?(terrence)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: