590379 - IonMonkey: Investigate perf of realtime JS raytracer

At least on Mac, a profile looks like this: 10% free() calls on the background GC thread. 20% in JM-generated jit code. 26% js_GC called from stubs::Interrupt; of this 20% is marking, including what looks like a bunch of arrays, and the rest is in js_GC itself. Almost half the marking is JSScopeProperty::trace. 18% calling stubs::SetName. About 2/3 of this is under js_GetMutableScope (mostly the malloc call and js_InitTitle and self time) and the other 1/3 in SetName itself. 8% calling stubs::New. Mostly self time and, js_GetProperty. 3% calling stubs::Call. 2% calling stubs::NewInitObject. 1% canvas drawImage calls. I also did an L2 miss profile, since that seemed like an obvious thing to check: 22% of our L2 misses were on the background gc thread. 32% of misses are in (not under) PL_DHashTableOperate called from GCGraphBuilder::NoteScriptChild under array_trace. 10% of misses are in ChangeTable, called from the above-mentioned PL_DHashTableOperate. 10% of misses are malloc calls under js_GetMutableScope from SetName.

Luke Wagner [:luke]

Comment 3

•

14 years ago

Might be worth retesting, in lieu of bug 558451 comment 154.

Luke Wagner [:luke]

Comment 4

•

14 years ago

err, "in view of". And Gregor already did, and he said the profile looked a lot different.

Boris Zbarsky [:bzbarsky]

Comment 5

•

14 years ago

Would be good to get the testcase attached to this bug so it won't go away...

Boris Zbarsky [:bzbarsky]

Comment 6

•

14 years ago

On TM, the patch linked to from comment 3 made us go from 1.5fps to 2.5fps. ;) Once that merges to JM, should remeasure and reprofile.

Gregor Wagner [:gwagner]

Comment 7

•

14 years ago

I filed bug 592007 because the GC heuristics doesn't work right now in the browser with the new Scope patch. There is also a file included that shows the before and after Scope patch GC times.

Konstantin Novichikhin

Comment 8

•

14 years ago

I experience the same issue in linux 4.0b7pre build (rev: f5c0015afe0e) Oprofile results: http://pastebin.mozilla.org/795750

Boris Zbarsky [:bzbarsky]

Comment 9

•

14 years ago

OK, I just reprofiled (on m-c tip). 52% of the time is spent in methodjit-generated code. 24% is spent under gc (called from js_NewFinalizableGCThing). 8% is under stubs::NewObject but outside gc (js_GetPropertyHelper, self time for NewObject, etc) The rest is a few things under 2% each (drawImage on the canvas, js_math_floor, stubs::SetElem<0>, RunTracer, stubs::NewInitObject, InitPropOrMethod, etc. Interestingly, with tracer on we're at 14fps for a bit before dropping to 7fps; I profiled the 7fps steady-state.

Jan de Mooij [:jandem]

Comment 11

•

14 years ago

Attached file Part of the problem (deleted) — Details

The raytracer creates canvas elements and uses them as image buffers. On the attached test case JM is 2x slower than Chrome. (TM is 5x faster than Chrome though). For 10 frames, filling the buffer (the loop body) is 55 ms faster (80 vs 25) in Chrome. Canvas elements are also used for textures, so fixing bug 594247 will help JM here.

Jan de Mooij [:jandem]

Updated

•

14 years ago

Depends on: 594247

Jan de Mooij [:jandem]

Updated

•

14 years ago

Attachment #481918 - Attachment is patch: false

Jan de Mooij [:jandem]

Updated

•

14 years ago

Attachment #481918 - Attachment mime type: text/plain → text/html

Scoobidiver (away)

Comment 12

•

14 years ago

I can reproduce after the fixing of bug 594247: Minefield 4.0b13pre/2011023: 12 fps, hang every seconds Chrome 11.0.672.2 : 27 fps

David Mandelin [:dmandelin]

Reporter

Comment 13

•

14 years ago

Rereprofiled on Win 7. Top stuff as a percentage of JS time: GC (mark/sweep/etc) 40% AddPropertyHelper/AddAtomProperty 9 Object::init 7 MathCache::lookup 5 DOUBLE_IS_INT32 3 It looks like nothing short of a new GC will make this run well.

Gregor Wagner [:gwagner]

Comment 14

•

14 years ago

Yes this needs a generational GC. We only perform per-compartment GCs but the set of reachable objects is still more than 3.1 mill. We finalize about 4 mill objects per GC which is about 30% of the total GC time. Moving this to the background would help but the marking is the expensive part here. This is a very good example where we create the fixed set of 3149831 long lived objects and afterwards we only create short lived objects. The set of long lived objects stays exactly at 3149831 for the whole animation time.

Ryan VanderMeulen [:RyanVM]

Updated

•

13 years ago

Blocks: GenerationalGC

Marco Castelluccio [:marco]

Updated

•

13 years ago

No longer blocks: GenerationalGC

Depends on: GenerationalGC

Till Schneidereit [:till]

Comment 15

•

11 years ago

Nothing earth-shatteringly new to report here. Current Chrome Canary gets between 90 and 120 fps (with the rotation looking everything but smooth, though). Current Nightly consistently gets above 50fps, so that's great. However, it's stuttering horribly, so the fps counter might be a bit broken. Looking forward to seeing how ggc's going to fare, here.

Summary: JM: Investigate perf of realtime JS raytracer → IonMonkey: Investigate perf of realtime JS raytracer

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

Guilherme Lima

Comment 16

•

10 years ago

On Ultra: Chrome 39: 6 fps Nightly: 8 fps On Low: Chrome 39: 160 fps Nightly: 200 fps (but slows down every 3~4s)

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WORKSFORME

Emanuel Hoogeveen [:ehoogeveen]

Comment 17

•

10 years ago

The periodic slowdown might be interesting from a GC point of view. From comment #14, we shouldn't be able to GC anything long-lived (but that doesn't necessarily mean we won't try). Terrence, worth a look?

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Comment 18

•

10 years ago

I looked at the realtime raytracer relatively recently. The test creates 100's of MiB of garbage per second in tenured: we need to find out why the nursery is so lightly used on this test. This in itself isn't too bad as the incremetnal GC's appear to be mostly working okay, at least on my machine. However, this does expose a problem currently with creating this much garbage consistently: we seem to be releasing and immediately reacquiring 100's of MiB of chunk addresses every second because we have a 30MiB cap on empty chunks. We need to develop better limits here, however, this is largely dependent on getting better control of our GC triggers.

Terrence Cole [:terrence]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1301213

Terrence Cole [:terrence]

Comment 19

•

8 years ago

I split this off as bug 1301213.

Flags: needinfo?(terrence)