Closed
Bug 590379
Opened 14 years ago
Closed 10 years ago
IonMonkey: Investigate perf of realtime JS raytracer
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: dmandelin, Unassigned)
References
()
Details
(Keywords: perf)
Attachments
(1 file)
(deleted),
text/html
|
Details |
See the URL. Chrome gets about 30fps on my machine. We don't go above 5fps.
Reporter | ||
Comment 1•14 years ago
|
||
I forgot to say that jandem originally reported this, and also observed we use a lot of memory running it, which may be related to the slowness.
Comment 2•14 years ago
|
||
At least on Mac, a profile looks like this:
10% free() calls on the background GC thread.
20% in JM-generated jit code.
26% js_GC called from stubs::Interrupt; of this 20% is marking, including what
looks like a bunch of arrays, and the rest is in js_GC itself. Almost half
the marking is JSScopeProperty::trace.
18% calling stubs::SetName. About 2/3 of this is under js_GetMutableScope
(mostly the malloc call and js_InitTitle and self time) and the other 1/3 in
SetName itself.
8% calling stubs::New. Mostly self time and, js_GetProperty.
3% calling stubs::Call.
2% calling stubs::NewInitObject.
1% canvas drawImage calls.
I also did an L2 miss profile, since that seemed like an obvious thing to check:
22% of our L2 misses were on the background gc thread.
32% of misses are in (not under) PL_DHashTableOperate called from
GCGraphBuilder::NoteScriptChild under array_trace.
10% of misses are in ChangeTable, called from the above-mentioned
PL_DHashTableOperate.
10% of misses are malloc calls under js_GetMutableScope from SetName.
Comment 3•14 years ago
|
||
Might be worth retesting, in lieu of bug 558451 comment 154.
Comment 4•14 years ago
|
||
err, "in view of". And Gregor already did, and he said the profile looked a lot different.
Comment 5•14 years ago
|
||
Would be good to get the testcase attached to this bug so it won't go away...
Comment 6•14 years ago
|
||
On TM, the patch linked to from comment 3 made us go from 1.5fps to 2.5fps. ;)
Once that merges to JM, should remeasure and reprofile.
Comment 7•14 years ago
|
||
I filed bug 592007 because the GC heuristics doesn't work right now in the browser with the new Scope patch. There is also a file included that shows the before and after Scope patch GC times.
Comment 8•14 years ago
|
||
I experience the same issue in linux 4.0b7pre build (rev: f5c0015afe0e)
Oprofile results:
http://pastebin.mozilla.org/795750
Comment 9•14 years ago
|
||
OK, I just reprofiled (on m-c tip).
52% of the time is spent in methodjit-generated code.
24% is spent under gc (called from js_NewFinalizableGCThing).
8% is under stubs::NewObject but outside gc (js_GetPropertyHelper, self time
for NewObject, etc)
The rest is a few things under 2% each (drawImage on the canvas, js_math_floor, stubs::SetElem<0>, RunTracer, stubs::NewInitObject, InitPropOrMethod, etc.
Interestingly, with tracer on we're at 14fps for a bit before dropping to 7fps; I profiled the 7fps steady-state.
Comment 11•14 years ago
|
||
The raytracer creates canvas elements and uses them as image buffers. On the attached test case JM is 2x slower than Chrome. (TM is 5x faster than Chrome though). For 10 frames, filling the buffer (the loop body) is 55 ms faster (80 vs 25) in Chrome.
Canvas elements are also used for textures, so fixing bug 594247 will help JM here.
Updated•14 years ago
|
Attachment #481918 -
Attachment is patch: false
Updated•14 years ago
|
Attachment #481918 -
Attachment mime type: text/plain → text/html
Comment 12•14 years ago
|
||
I can reproduce after the fixing of bug 594247:
Minefield 4.0b13pre/2011023: 12 fps, hang every seconds
Chrome 11.0.672.2 : 27 fps
Reporter | ||
Comment 13•14 years ago
|
||
Rereprofiled on Win 7. Top stuff as a percentage of JS time:
GC (mark/sweep/etc) 40%
AddPropertyHelper/AddAtomProperty 9
Object::init 7
MathCache::lookup 5
DOUBLE_IS_INT32 3
It looks like nothing short of a new GC will make this run well.
Comment 14•14 years ago
|
||
Yes this needs a generational GC.
We only perform per-compartment GCs but the set of reachable objects is still more than 3.1 mill.
We finalize about 4 mill objects per GC which is about 30% of the total GC time.
Moving this to the background would help but the marking is the expensive part here.
This is a very good example where we create the fixed set of 3149831 long lived objects and afterwards we only create short lived objects. The set of long lived objects stays exactly at 3149831 for the whole animation time.
Updated•13 years ago
|
Blocks: GenerationalGC
Updated•13 years ago
|
No longer blocks: GenerationalGC
Depends on: GenerationalGC
Comment 15•11 years ago
|
||
Nothing earth-shatteringly new to report here.
Current Chrome Canary gets between 90 and 120 fps (with the rotation looking everything but smooth, though).
Current Nightly consistently gets above 50fps, so that's great. However, it's stuttering horribly, so the fps counter might be a bit broken. Looking forward to seeing how ggc's going to fare, here.
Summary: JM: Investigate perf of realtime JS raytracer → IonMonkey: Investigate perf of realtime JS raytracer
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 16•10 years ago
|
||
On Ultra:
Chrome 39: 6 fps
Nightly: 8 fps
On Low:
Chrome 39: 160 fps
Nightly: 200 fps (but slows down every 3~4s)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Comment 17•10 years ago
|
||
The periodic slowdown might be interesting from a GC point of view. From comment #14, we shouldn't be able to GC anything long-lived (but that doesn't necessarily mean we won't try). Terrence, worth a look?
Flags: needinfo?(terrence)
Comment 18•10 years ago
|
||
I looked at the realtime raytracer relatively recently. The test creates 100's of MiB of garbage per second in tenured: we need to find out why the nursery is so lightly used on this test. This in itself isn't too bad as the incremetnal GC's appear to be mostly working okay, at least on my machine. However, this does expose a problem currently with creating this much garbage consistently: we seem to be releasing and immediately reacquiring 100's of MiB of chunk addresses every second because we have a 30MiB cap on empty chunks. We need to develop better limits here, however, this is largely dependent on getting better control of our GC triggers.
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•