Closed
Bug 531185
Opened 15 years ago
Closed 12 years ago
JM+TI 1.5x slower than v8 in JS-based JPEG encoder
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: blizzard, Assigned: dvander)
References
(Blocks 1 open bug, )
Details
Attachments
(1 file)
(deleted),
application/zip
|
Details |
This guy ported over a JPEG encoder into JS. TraceMonkey (3.6b1) is about 10x slower than Safari.
Reporter | ||
Updated•15 years ago
|
Comment 1•15 years ago
|
||
We should turn this into a shell test case and then do some analysis.
Bug 531225 will help the worker version, probably.
Depends on: 531225
Comment 3•15 years ago
|
||
jitstats for the non-worker version:
recorder: started(85), aborted(28), completed(197), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(85), breaks(0), returns(0), merged loop exits(116), unstableInnerCalls(17), blacklisted(8)
monitor: exits(6085), timeouts(1), type mismatch(0), triggered(6086), global mismatch(0), flushed(0)
Aborts (piped through sort | uniq -c | sort -n -r):
5 Abort recording of tree jpegencoder.js:558@263 at jpegencoder.js:560@275: No compatible inner tree.
3 Abort recording of tree jpegencoder.js:558@263 at jpegencoder.js:560@289: Inner tree is trying to grow, abort outer recording.
2 Abort recording of tree jpegencoder.js:629@270 at jpegencoder.js:635@312: No compatible inner tree.
2 Abort recording of tree jpegencoder.js:629@270 at jpegencoder.js:560@289: Inner tree is trying to grow, abort outer recording.
2 Abort recording of tree jpegencoder.js:629@270 at jpegencoder.js:550@213: Inner tree is trying to grow, abort outer recording.
2 Abort recording of tree jpegencoder.js:629@270 at jpegencoder.js:550@203: No compatible inner tree.
2 Abort recording of tree jpegencoder.js:194@30 at jpegencoder.js:195@39: No compatible inner tree.
1 Abort recording of tree jpegencoder.js:629@270 at jpegencoder.js:558@263: No compatible inner tree.
1 Abort recording of tree jpegencoder.js:629@270 at jpegencoder.js:558@263: Inner tree is trying to stabilize, abort outer recording.
1 Abort recording of tree jpegencoder.js:629@270 at jpegencoder.js:536@68: No compatible inner tree.
1 Abort recording of tree jpegencoder.js:629@270 at jpegencoder.js:263@66: Inner tree is trying to grow, abort outer recording.
1 Abort recording of tree jpegencoder.js:627@261 at jpegencoder.js:629@270: No compatible inner tree.
1 Abort recording of tree jpegencoder.js:558@263 at jpegencoder.js:564@341: No compatible inner tree.
1 Abort recording of tree jpegencoder.js:558@263 at jpegencoder.js:263@66: Inner tree is trying to grow, abort outer recording.
1 Abort recording of tree jpegencoder.js:558@263 at jpegencoder.js:258@34: Inner tree is trying to grow, abort outer recording.
1 Abort recording of tree jpegencoder.js:558@263 at jpegencoder.js:257@24: No compatible inner tree.
1 Abort recording of tree jpegencoder.js:219@20 at jpegencoder.js:221@31: No compatible inner tree.
Comment 4•15 years ago
|
||
If I make Blacklist(jsbytecode* pc) a no-op, that doesn't change the wall-clock times much as far as I can tell. Just gets rid of that blacklisted(8) and adds a few hundred triggered/exits.
Looking at the profile:
35% under js_GetPropertyHelper, called from GetClosureVar.
5% in GetClosureVar itself.
18% under TraceRecorder::monitorRecording (about half of this under closeLoop,
17% under JSOP_GETELEM, 6% under js_GetScrNoteCached, etc).
4% under TraceRecorder::recordLoopEdge, mostly under attemptTreeCall doing the
emit, prepare, snapshot dance.
32% under ExecuteTree. The time under here breaks down as (percentages of
ExecuteTree time; all times in not under):
10% js_Array_dense_setelem_double
6% js_NewDoubleInRootedValue
5% js_UnboxDouble
4% VisitFrameSlots<FlusNativeStackFrameVisitor>
4% js_DoubleToInt32
3% VisitFrameSlots<BuildNativeFrameVisitor>
3% LeaveTree
2% js_SetReservedSlot
2% js_Array_dense_setelem_int
31% actual jit-generated instructions
2.2% spent in js_Interpret
General diagnosis, without having examined the typemaps for those compatible inner tree aborts is that we have int/double mismatch issues in the typemaps, lots of cost in GetClosureVar (maybe bug 530255 will help?). We're also storing doubles in arrays, which is making alarms ring in my head; that ought not to be happening in this testcase, I'd think.
Comment 5•15 years ago
|
||
I guess all those doubles might be coming from us reading values out of arrays.... I thought we had a bug on that.
Comment 6•15 years ago
|
||
Applying the patch from bug 486213 doesn't seem to make the jitstats or wall-clock time any better. And the profile still shows doubles all over.
Updated•15 years ago
|
Assignee: general → dvander
Assignee | ||
Comment 7•15 years ago
|
||
I made a shell test case out of this (attachment forthcoming). Roughly, where we are with performance here:
TM Interp: 424ms (x64)
TM JIT: 180ms (x64)
v8: 70ms (x86)
Nitro: 30ms (x64)
We seem to have a few problems.
1. Hoisted undefineds poison trace type stability.
2. We always read numbers from arrays as doubles.
3. Because of #2, we get more type instability.
4. We spend time in the interpreter.
So to test the first two theories I edited the JPEG decoder to hoist definitions to the top of functions. This got us down to around 160ms. Then I disabled the Oracle and we got down to very few inner compatible tree problems, and that brought us to 140ms.
Clearly those are both big problems we need to tackle, but there's still something else dominating this test case. TraceVis says:
interpret 257.881230
monitor 5.255656
record 12.698644
compile 12.710549
execute 1.352516
native 213.188171
-------------------------
Subtotal 503.086766
Non-JS 0.000000
=========================
Total 503.086766
recorder: started(40), aborted(7), completed(81), different header(0), trees trashed(34), slot promoted(0), unstable loop variable(2), breaks(0), returns(0), merged loop exits(43), unstableInnerCalls(4), blacklisted(0)
monitor: exits(880), timeouts(0), type mismatch(0), triggered(880), global mismatch(0), flushed(0)
So tree connectivity is still not perfect... will investigate further.
Comment 8•15 years ago
|
||
We should work on the array unbox path:
https://bugzilla.mozilla.org/show_bug.cgi?id=486213
We have a patch, but if I remember correctly it regressed 3d-cube a lot for some unknown reason.
Assignee | ||
Comment 9•15 years ago
|
||
Comment 10•13 years ago
|
||
Current js shell results for jpeg_encoder_basic:
Interp: 558ms
JM: 77ms
JM+TI: 71ms
d8: 46ms
Current js shell results for altered_jpeg_encoder:
Interp: 560ms
JM: 78ms
JM+TI: 72ms
d8: 48ms
Looks like v8 is still about 1.5x faster than JM+TI.
Blocks: 467263
OS: Mac OS X → All
Hardware: x86 → All
Summary: TraceMonkey 10x slower in JS-based JPEG encoder → JM+TI 1.5x slower than v8 in JS-based JPEG encoder
Version: 1.9.2 Branch → Trunk
Comment 11•12 years ago
|
||
We are now as fast as d8 with --ion-parallel-compile=on (20 ms). With baseline we are a few ms faster (17 ms).
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•