Closed
Bug 465773
Opened 16 years ago
Closed 16 years ago
TM: jsMSX demo regressions. Slower by 15% vs non-JIT. 40% slower than past JIT tests
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bugs, Assigned: dmandelin)
References
(Blocks 1 open bug, )
Details
(Keywords: fixed1.9.1)
Attachments
(1 file)
(deleted),
patch
|
Details | Diff | Splinter Review |
... however under different circumstances from bug #458227
This time, after updating, if I run jsMSX for a little while (just hit start and let it do its thing), up to around interrupt 1840, it crashes.
This is immediately after the main screen clears.
Quite repeatable, I hadn't tried jsMSX in a while, but probably appeared sometime in last couple of weeks.
http://crash-stats.mozilla.com/report/index/283d0b15-c391-42cf-ab2d-529020081119?p=1
0 js3250.dll nanojit::LIns::deref js/src/nanojit/LIR.cpp:233
1 js3250.dll nanojit::LInsHashSet::hashcode js/src/nanojit/LIR.cpp:1233
2 js3250.dll nanojit::LInsHashSet::grow js/src/nanojit/LIR.cpp:1295
3 js3250.dll nanojit::CseFilter::insGuard js/src/nanojit/LIR.cpp:1968
4 js3250.dll nanojit::ExprFilter::insGuard js/src/nanojit/LIR.cpp:948
5 js3250.dll js3250.dll@0xac773
Updated•16 years ago
|
Flags: blocking1.9.1+
This might be relevant (or not).
Am only able to reproduce this one with Firebug turned off.
Perhaps related to the slowdown Firebug added.
Turning off the addon allowed it to crash at exactly same place, latest nightly.
I don't see a crash on Windows until much later (was past interrupt 2500), but I did see garbage on the "screen" well before that. The crash was right when I clicked the Help menu to find out what build I was running.
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1b2pre) Gecko/20081125 Minefield/3.1b2pre
Based upon Mike's comment I retested on a completely different machine.
Also XP, although this one running as an unprivileged user, and completely different architecture.
Crashed at almost *exactly* the same place. Somewhere around 1830.
This is the latest nightly.
http://crash-stats.mozilla.com/report/index/c5274364-17e4-4150-9502-fb6fb2081126
0 @0x0
1 @0x82903b
2 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:3714
3 js3250.dll js_Interpret js/src/jsopcode.tbl:135
4 js3250.dll js_Interpret js/src/jsinterp.cpp:3720
This was from the prior build. 11/20:
http://crash-stats.mozilla.com/report/index/f43b090a-13f2-4955-b41c-8ceca2081126
0 js3250.dll nanojit::LirWriter::ins0 js/src/nanojit/LIR.h:442
1 js3250.dll TraceRecorder::TraceRecorder js/src/jstracer.cpp:1006
2 js3250.dll js_StartRecorder js/src/jstracer.cpp:2691
3 js3250.dll js_AttemptToExtendTree js/src/jstracer.cpp:3064
4 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:3725
5 js3250.dll js_Interpret js/src/jsinterp.cpp:3720
6 js3250.dll js_Execute js/src/jsinterp.cpp:1559
7 js3250.dll JS_EvaluateUCScriptForPrincipals js/src/jsapi.cpp:5187
8 xul.dll nsJSContext::EvaluateString dom/src/base/nsJSEnvironment.cpp:1586
9 xul.dll xul.dll@0x305c8d
10 xul.dll nsGlobalWindow::TimerCallback dom/src/base/nsGlobalWindow.cpp:7993
11 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:420
12 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:512
13 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510
14 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170
15 nspr4.dll PR_GetEnv
16 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87
17 firefox.exe firefox.exe@0x2197
18 kernel32.dll kernel32.dll@0x17066
Did you have Firebug running?
Also, regarding "garbage" - if you're referring to the green line, I haven't RTFSed but that was always there, I assumed it was an artifact of some debug code.
This is interesting because the stack traces were not the same.
Odd. The trace changed again. Of course, who knows what you guys are doing in some seeeekret bug. :)
0 kernel32.dll RaiseException
1 mozcrt19.dll _CxxThrowException throw.cpp:159
2 mozcrt19.dll operator new obj-firefox/memory/jemalloc/src/new.cpp:57
3 xul.dll nsSupportsArray::Create xpcom/ds/nsSupportsArray.cpp:212
4 xul.dll EditAggregateTxn::EditAggregateTxn editor/libeditor/base/EditAggregateTxn.cpp:44
5 xul.dll DeleteRangeTxn::DeleteRangeTxn editor/libeditor/base/DeleteRangeTxn.cpp:66
6 xul.dll xul.dll@0x8b609f
7 xul.dll nsEditor::CreateTxnForDeleteSelection editor/libeditor/base/nsEditor.cpp:4814
8 xul.dll nsEditor::DeleteSelectionImpl editor/libeditor/base/nsEditor.cpp:4390
9 xul.dll xul.dll@0x2f52be
10 xul.dll nsTextEditRules::WillInsertText editor/libeditor/text/nsTextEditRules.cpp:550
11 xul.dll nsQueryReferent::operator obj-firefox/xpcom/build/nsWeakReference.cpp:88
12 xul.dll nsPlaintextEditor::InsertText editor/libeditor/text/nsPlaintextEditor.cpp:781
Oh. Um. Sorry. That last one was:
http://crash-stats.mozilla.com/report/index/17a877a9-77f1-4a31-a153-7bd962081204
No longer able to reproduce in latest nightly.
Going to go with "fixed"
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Darn. Spoke too soon.
I left it running for a few minutes since I was doing a timing test on this one to see if it was getting any faster.
Crashed after about 1½m.
http://crash-stats.mozilla.com/report/index/a48178b9-8bff-4558-be73-b0b462081211
0 kernel32.dll RaiseException
1 mozcrt19.dll _CxxThrowException throw.cpp:159
2 mozcrt19.dll operator new obj-firefox/memory/jemalloc/src/new.cpp:57
3 xul.dll nsThebesDeviceContext::CreateRenderingContextInstance gfx/src/thebes/nsThebesDeviceContext.cpp:365
4 xul.dll nsThebesDeviceContext::CreateRenderingContext gfx/src/thebes/nsThebesDeviceContext.cpp:321
5 xul.dll PresShell::CreateRenderingContext layout/base/nsPresShell.cpp:3478
6 xul.dll PresShell::DoReflow layout/base/nsPresShell.cpp:6314
7 xul.dll PresShell::ProcessReflowCommands layout/base/nsPresShell.cpp:6466
8 xul.dll PresShell::DoFlushPendingNotifications layout/base/nsPresShell.cpp:4590
9 xul.dll PresShell::FlushPendingNotifications layout/base/nsPresShell.cpp:4521
10 xul.dll nsEditor::EndUpdateViewBatch editor/libeditor/base/nsEditor.cpp:4346
11 xul.dll nsEditor::EndPlaceHolderTransaction editor/libeditor/base/nsEditor.cpp:956
12 xul.dll nsAutoPlaceHolderBatch::~nsAutoPlaceHolderBatch editor/libeditor/base/nsEditorUtils.h:66
13 xul.dll nsPlaintextEditor::InsertText editor/libeditor/text/nsPlaintextEditor.cpp:769
14 xul.dll nsTextControlFrame::SetValue layout/forms/nsTextControlFrame.cpp:2759
15 xul.dll nsCOMPtr_base::~nsCOMPtr_base obj-firefox/dist/include/xpcom/nsAutoPtr.h:956
16 xul.dll nsTextControlFrame::SetFormProperty layout/forms/nsTextControlFrame.cpp:1960
17 xul.dll nsHTMLInputElement::SetValueInternal content/html/content/src/nsHTMLInputElement.cpp:945
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
2 more retests.
No longer at a predictable 1830 interrupts.
One crashed almost instantaneously at 70 (after a complete restart of firefox)
The 2nd crashed after 4 minutes.
At Littlemutt's urging, I will retest in hourly TM build.
http://crash-stats.mozilla.com/report/index/678bd6cf-15bf-4c14-88b4-68ed72081211
http://crash-stats.mozilla.com/report/index/b8422b24-45b1-46d1-b886-33e352081211
The last two crashes.
In latest hourly:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1228998184/
Ran for 5m minutes without crashing. 8020 interrupts.
I guess that bodes well for future nightly in trunk :)
Comment 10•16 years ago
|
||
so, crashes because of new() are expected. that means we ran out of memory. and we don't really handle that much, atm the best we can do is die. and besides a crash outside js isn't relevant for a bug against javascript engine.
Reporter | ||
Comment 11•16 years ago
|
||
Spoke too soon.
On 2nd run, hourly crashed too.
http://crash-stats.mozilla.com/report/index/6e9cb4bd-2164-4218-8fff-9733a2081211
not a very useful trace..
BTW, it seems slower than it used to be. I was rerunning it a 2nd time because I used to get almost 33 interrupts per second
over 5 minutes, and on first run I was far below that at 27 interrupts per second.
As for your comment... dunno, I'm just trying to run this app. :)
I assure you I have a ton of memory on the desktop.
Reporter | ||
Comment 12•16 years ago
|
||
Oh, and re: in JS engine vs out.
This crash does only happen with JIT enabled.
Anyway, since you were mentioning memory limits.
Ran a new test, still against hourly build.
Sure enough, while running, Firefox memory usage did climb a bit, but not dramatically.
From around 90 megabytes prior to launching, it climbed to a high of 115 with plenty of fluctuation - dropping to 85 or lower.
I still had well over a gigabyte of system memory free when it crashed, 3½ minutes into playback.
http://crash-stats.mozilla.com/report/index/1745939e-5638-4a77-ac3e-68a532081211
This is a P4 dual core system running windows XP sp3. As noted previously, the demo seems overall slower than in times past, although with periods of dramatic speedup.
Reporter | ||
Comment 13•16 years ago
|
||
Retest with latest nightly. As with tinderbox hourly, it no longer crashes consistently at 1830 or so.
FF mem started at ~58 megs (had a couple of unrelated pages open).
After starting, immediately jumped to 92 in first few seconds, then up and down a bit with spikes at 117.
Interrupts per second avg stayed much lower than in times past. Again around 26, with some sequences seemingly much faster, but others much slower, so more deviation.
On first run, it completed 5 minutes of play without crashing. 7920 interrupts over 300 seconds for average of 26.4 - significantly lower than in past nightlies.
I then hit pause, reset and tried again.
2nd run similar to first, no crashing, 8055 over 300 seconds. Still slower than in past, but still no crashing.
So, this was a little boring. I decided to not watch the 3rd run, but just let it spin cycles.
pause/reset.
Somewhere a couple of minutes into 3rd run it crashed.
If past crash history is any pattern, the memory usage was probably not above 120.
http://crash-stats.mozilla.com/report/index/0c4428e1-8505-4415-aa50-b8a022081212
0 @0x0
1 @0x7cc03b
2 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:3784
3 js3250.dll js_Interpret js/src/jsinterp.cpp:6676
same appearance as that in comment #3.
Reporter | ||
Comment 14•16 years ago
|
||
FWIW, the game ran with no crashes at all if I disabled JIT (ran it 5 times for 5 minutes or more each time).
With the JIT disabled, interestingly, it ran significantly *faster*. Nowhere near the 37½ interrupts per second average that FF nightlies used to do with JIT, but still, was smooth in average speed with no significant memory fluctuations (peak memory usage over half an hour of ~86 megs).
With JIT disabled, acheived 9290 interrupts over 300 seconds for almost 31 per second. About a 15% speed increase over using the JIt.
Reporter | ||
Comment 15•16 years ago
|
||
Title changes based on Littlemutt's suggestion.
Summary: JIT crashes jsMSX demo (again) → TM: Crashes jsMSX demo. Slower by 15% vs non-JIT. 40% slower than past JIT tests that didn't crash
Reporter | ||
Comment 16•16 years ago
|
||
Also at Littlemutt's suggestion.
The following have all been run on this machine:
http://dromaeo.com/?id=55466,55465 - latest nightly. JIT on left, non-JIT on right.
Interesting to see how even they are.
http://dromaeo.com/?id=55300,55299 - 3.0 vs yesterday's nightly w/ JIT enabled.
http://dromaeo.com/?id=55299,55466 - comparing those tests can see whatever last night's patches were, Firefox did get a lot faster.
However, as 55299 vs 55465 seems to suggest, a lot of it is not JIT related.
'course, the error margins are kind of large.
Reporter | ||
Comment 17•16 years ago
|
||
By way of comparison on the speed front. I tested the latest webkit nightly for windows:
http://builds.nightly.webkit.org/files/trunk/win/WebKit-SVN-r39293.zip
webkit nightlies around the time TM was doing 37.5 interrupts per second were at about 32 interrupts.
SFX can now do 16,820 interrupts over 5 minutes of that demo on my machine with an average speed of 56.1 interrupts per second.
Furthermore, the core that is running the thread maintained a steady 50% load.
Amazing.
Updated•16 years ago
|
Priority: -- → P2
Reporter | ||
Comment 18•16 years ago
|
||
Still crashes before I can get through my 5m run, but I did notice something interesting that could be related to stuff you guys are doing in back.
CPU usage under TM is also around 50-60% of the core it is using (25-30% in windows perf mon).
This is interesting to me 'cause it isn't running at full speed.
Reporter | ||
Comment 19•16 years ago
|
||
I suppose you guys did something new since the trace has a bit more in it this time.
http://crash-stats.mozilla.com/report/index/5ca5208b-e67b-454c-b0cb-bdd642090112
0 js3250.dll nanojit::LIns::isTramp js/src/nanojit/LIR.h:334
1 js3250.dll nanojit::LIns::deref js/src/nanojit/LIR.cpp:260
2 js3250.dll nanojit::LIns::oprnd1 js/src/nanojit/LIR.cpp:640
3 js3250.dll nanojit::StackFilter::read js/src/nanojit/LIR.cpp:1184
4 js3250.dll nanojit::StackFilter::read js/src/nanojit/LIR.cpp:1171
5 js3250.dll nanojit::DeadCodeFilter::read js/src/nanojit/Assembler.cpp:77
6 js3250.dll nanojit::Assembler::gen js/src/nanojit/Assembler.cpp:1025
7 @0x12f51f
Reporter | ||
Comment 20•16 years ago
|
||
Took a lot longer to crash (2nd or 3rd run, several minutes in), and the trace was a lot longer too.
http://crash-stats.mozilla.com/report/index/b3676266-fbe2-4944-9e32-c20a02090121
0 @0x0
1 @0x12f62b
2 js3250.dll js_Interpret js/src/jsinterp.cpp:3700
3 js3250.dll js_Invoke js/src/jsinterp.cpp:1336
4 js3250.dll js_fun_apply js/src/jsfun.cpp:1732
5 js3250.dll js_Interpret js/src/jsinterp.cpp:4994
6 js3250.dll js_Execute js/src/jsinterp.cpp:1564
7 js3250.dll JS_EvaluateUCScriptForPrincipals js/src/jsapi.cpp:5192
8 xul.dll nsJSContext::EvaluateString dom/src/base/nsJSEnvironment.cpp:1599
9 xul.dll nsGlobalWindow::RunTimeout dom/src/base/nsGlobalWindow.cpp:7629
10 xul.dll nsGlobalWindow::TimerCallback dom/src/base/nsGlobalWindow.cpp:7980
11 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:420
12 nspr4.dll _PR_MD_UNLOCK nsprpub/pr/src/md/windows/w95cv.c:344
13 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:512
14 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510
15 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170
16 xul.dll nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192
17 nspr4.dll PR_GetEnv
18 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87
19 firefox.exe firefox.exe@0x2197
20 kernel32.dll BaseProcessStart
Reporter | ||
Comment 21•16 years ago
|
||
Hm. After making sure I was updated to:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090120 Minefield/3.2a1pre
(latest nightly)
It is now back to crashing instantaneously again.
Same as with bug #465784 actually.
Assignee | ||
Comment 22•16 years ago
|
||
I've been testing with the OOM patch for bug 465784 applied and I have not seen the crash but I have noticed a 50% slowdown vs. non-JIT. I don't know what the problem is yet. The basic stats I'm recording are that for 2000 interrupts, with non-JIT we have 40s spent in the interpreter, while for JIT, we have:
interpreter: 48
recording traces: 5
compiling traces: 2
running traces: 7
JIT flushes: 130
traces started: 20k
traces finished: 8k
traces called: 700k
jsops run under interpreter (including recording): 400M
jsops run natively: 1000M
This is weird. It indicates that we run 2/3 of the ops natively, yet we are slower, which I have never observed before. I would expect to see maybe a 50% speedup or so given that jsop execution mode distribution.
Andreas guessed that we are running too many short traces and that is my best guess at this point. I think this would be a problem if the overhead of calling a trace is too much. In this case, we are seeing about 2000 cycles per call to a trace in pure native execution. I don't know how much trace call overhead there is added to that but it could be substantial relative to 2000 cycles. I will measure it.
"Something to do with DOM calls" is my second guess but I don't know much about that.
I tried running with various values of MAX_BRANCHES in the range 4-256. The total interpreter time didn't vary much, and in fact the differences could just be noise. The time spent on trace goes from 4s at 4 through 7s at 16 to 10s at 256. The time spent recording and compiling traces goes up by a lot with bigger values. At 256, the stats are:
interpreter: 22
recording traces: 36
compiling traces: 17
running traces: 9
flushes: 256
traces started: 150k
traces finished: 132k
traces called: 260k
interp ops: 16M
trace ops: 1400M
Assignee | ||
Comment 23•16 years ago
|
||
Oops, the recording traces number should be "16 seconds" in the MAX_BRANCHES=256 run.
Reporter | ||
Comment 24•16 years ago
|
||
Nifty. How do you generate stats from tracer?
Do I need to recreate a build environment, or can I just use the tm tinderbox builds?
Oh, and still crashed for me in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090122 Minefield/3.2a1pre (nightly) but that could be simply 'cause your patch hadn't percolated up yet.
Assignee | ||
Comment 25•16 years ago
|
||
(In reply to comment #24)
> Nifty. How do you generate stats from tracer?
I have some personal patches for this that apply to TM, so you would need to set up a build of TM and apply my patches. I just added them to bug 474829 with the eventual goal of landing them in TM.
Assignee | ||
Comment 26•16 years ago
|
||
The timings I gave above are off, partly because of deficiencies in my timer, and partly because I forgot what the output meant. Here are fixed timing stats from an improved version of the internals timer:
MAX_BRANCHES 16 clocks time
interpret = 53082229915 24.128286
record = 10939430887 4.972469
native = 17604643144 8.002111
compile = 3910572050 1.777533
overhead = 92910378187 42.231990
= 178447254183 81.112388
In the above case we seem to be taking a huge charge for "overhead", which is basically all the trace monitoring, stitching, compilation, etc., stuff, including js_ExecuteTree (but not the native code itself).
MAX_BRANCHES 256 clocks time
interpret = 21321387109 9.691540
record = 71657487660 32.571585
native = 21041540399 9.564337
compile = 40826810145 18.557641
overhead = 7598975406 3.454080
= 162446200719 73.839182
I have no idea what's going on here. There may still be defects in the timers. But it kind of looks like a lot of traces are recorded, and then run few or no times, which is consistent with the counter output I collected earlier.
Assignee | ||
Comment 27•16 years ago
|
||
Things look better with the most recent tip.
MAX_BRANCHES jsMSX perf hit (vs. non-tracing) SS perf hit (vs. tracing 16)
16 1.36x 1x
32 1.23x 1.004x
64 1.02x 1.011x
256 1.02x 1.091x
Reporter | ||
Comment 28•16 years ago
|
||
Looks like the fix that closed out bug #465784 has resolved the crashing in this one. Yay.
Only the speed regressions appear to be pretty much the same.
Reporter | ||
Comment 29•16 years ago
|
||
renamed title to remove crashing refs.
Summary: TM: Crashes jsMSX demo. Slower by 15% vs non-JIT. 40% slower than past JIT tests that didn't crash → TM: jsMSX demo regressions. Slower by 15% vs non-JIT. 40% slower than past JIT tests
Reporter | ||
Comment 30•16 years ago
|
||
Sorry for bug spam, but do the performance issues still warrant the "blocking" flag?
I'm guessing crashing was the more serious part.
Assignee | ||
Updated•16 years ago
|
Assignee: general → dmandelin
Reporter | ||
Comment 31•16 years ago
|
||
Back to crashing again, with a new, longer, trace.
http://crash-stats.mozilla.com/report/index/05f317a6-7863-46f1-8f64-a8e0b2090202
0 js3250.dll nanojit::Assembler::releaseRegisters js/src/nanojit/Assembler.cpp:960
1 js3250.dll nanojit::Assembler::gen js/src/nanojit/Assembler.cpp:1363
2 js3250.dll nanojit::Assembler::assemble js/src/nanojit/Assembler.cpp:828
3 js3250.dll nanojit::compile js/src/nanojit/LIR.cpp:2140
4 js3250.dll TraceRecorder::compile js/src/jstracer.cpp:2499
5 js3250.dll TraceRecorder::closeLoop js/src/jstracer.cpp:2625
6 js3250.dll js_CloseLoop js/src/jstracer.cpp:3541
7 js3250.dll js_RecordLoopEdge js/src/jstracer.cpp:3576
8 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:4121
9 js3250.dll js_Interpret js/src/jsinterp.cpp:3698
10 js3250.dll js_Invoke js/src/jsinterp.cpp:1334
11 js3250.dll js_fun_apply js/src/jsfun.cpp:1732
12 js3250.dll js_Interpret js/src/jsinterp.cpp:4992
13 js3250.dll js_Execute js/src/jsinterp.cpp:1564
14 js3250.dll JS_EvaluateUCScriptForPrincipals js/src/jsapi.cpp:5244
15 xul.dll nsJSContext::EvaluateString dom/src/base/nsJSEnvironment.cpp:1598
16 xul.dll nsGlobalWindow::RunTimeout dom/src/base/nsGlobalWindow.cpp:7642
17 xul.dll nsGlobalWindow::TimerCallback dom/src/base/nsGlobalWindow.cpp:7993
18 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:420
19 nspr4.dll _PR_MD_UNLOCK nsprpub/pr/src/md/windows/w95cv.c:344
20 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:512
21 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510
22 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170
23 xul.dll nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192
24 nspr4.dll PR_GetEnv
25 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87
26 firefox.exe firefox.exe@0x21a7
27 kernel32.dll BaseProcessStart
Summary: TM: jsMSX demo regressions. Slower by 15% vs non-JIT. 40% slower than past JIT tests → TM: Crashes jsMSX demo. Slower by 15% vs non-JIT. 40% slower than past JIT tests
Reporter | ||
Comment 32•16 years ago
|
||
That fix you committed a few days ago appears to have helped with crashing.
Still averaging 28 interrupts per second, so a lot slower than jsMSX used to go.
Reading that xi bug with interest and vague comprehension. :)
Summary: TM: Crashes jsMSX demo. Slower by 15% vs non-JIT. 40% slower than past JIT tests → TM: jsMSX demo regressions. Slower by 15% vs non-JIT. 40% slower than past JIT tests
Assignee | ||
Comment 33•16 years ago
|
||
The xi/xtbl patch has landed on TM, and once it gets to nightlies tracing perf should be about equal to non-tracing overall. I do find that tracing runs sometimes faster, sometimes slower--it's weirdly uneven.
Initial info into why tracing isn't faster: time breakdown for tracing:
interpret = 12590490319 5.722950
record = 2601140256 1.182336
native = 2159817803 0.981735
compile = 1363429980 0.619741
overhead = 615674840 0.279852
execute = 220603471 0.100274
= 19551156669 8.886889
Compare to about 10s execution time without tracing. (Workload comes from my clicking, so don't read anything into the 1s difference.) The first problem is that we're only tracing 40% of the work or so. Second, trace execution time is low relative to the overheads--we're probably not executing each compiled trace enough times to make it really pay off. I'll have to look deeper to see why.
Reporter | ||
Comment 34•16 years ago
|
||
I assume that new patch is reason that speed has dropped from 28 interrupts per second to 24 or 25 in the 200901213 nightly. :)
Timed it several times. As usual, 5 minutes, no interaction with game, not reloading page in case that helped. All time TM high on this machine, 37 or 38 in late september/early november - no crashes then either ;).
Reporter | ||
Comment 35•16 years ago
|
||
... 20090213 nightly
sorry.
Actually, I just realised I had firebug enabled. That has mucked up time in past (shocking).
Ignore prior comment 'till I rerun :)
Reporter | ||
Comment 36•16 years ago
|
||
Yeah. That was it.
Rerun a couple of times, and got 29 and 28.
So. Basically unchanged.
Oh well. Still fun to watch you guys poke at this.
Assignee | ||
Comment 37•16 years ago
|
||
Higher quality numbers from a longer run: JIT:
interpret = 116919754050 53.145343
record = 173724828519 78.965831
native = 85461236707 38.846017
compile = 102967063001 46.803210
overhead = 37141156833 16.882344
execute = 22992612874 10.451188
= 539206651984 245.093933
Non-JIT:
interpret = 553970204403 251.804638
Thus, only 20% of the work is in pure interpreted mode. This is not bad and in most programs is enough to get a big speedup. My earlier run was too short and so it didn't have enough time for traces to get compiled.
Combined with the record time, we are running 50% of the work in some kind of interpreted mode. While we are actually running as native code, we get about 3x throughput compared to interpreted. This is great--it's on the high side of the usual effect of tracing. But the overheads (compiling, trace entry, and other overheads) take time equal to 2x native code execution time, nullifying the speedup.
It's fairly clear that the problem is that we are not executing traces enough times on average to win back the time it takes to compile them. It also looks like we're probably making too many transitions from interpreted mode to native mode. (And that in turn is probably from making too many transitions from native mode to interpreted.)
The high record time is also troubling. It could mean something easier to fix, like too many flushes, or something harder to fix, like too many unique trace/typemap combinations.
Assignee | ||
Comment 38•16 years ago
|
||
MAX_BRANCHES=256 (default is 16):
interpret = 107205565423 48.729802
record = 189790131135 86.268241
native = 87946432376 39.975651
compile = 111621314871 50.736961
overhead = 18723080277 8.510491
execute = 5355031506 2.434105
= 520641555588 236.655253
Trace cache size is 256MB (default is 16 MB):
interpret = 123768224932 56.258284
record = 55956919293 25.434963
native = 86807426761 39.457921
compile = 30131391895 13.696087
overhead = 75757551991 34.435251
execute = 32044378553 14.565627
= 404465893425 183.848133
MAX_BRANCHES=256, trace cache size=256MB:
interpret = 108486404807 49.312002
record = 29858574207 13.572079
native = 90011340287 40.914246
compile = 16918097548 7.690044
overhead = 11157749295 5.071704
execute = 4055881027 1.843582
= 260488047171 118.403658
This is what we want to see.
Assignee | ||
Comment 39•16 years ago
|
||
This patch brings JS execution time for my standard test (~10187 interrupts) down to 141s from the previous base of about 256s. The CASE_EXIT test is something that should have been there all along that means we'll always try to extend when it's a tableswitch case. The code cache size increase to 32MB is apparently needed to avoid excess OOM-related flushes. Numbers for a few configurations:
MAX_BRANCHES=16, cache=16M, no CASE_EXIT patch: 256s
MAX_BRANCHES=16, cache=16M, CASE_EXIT patch: 256s
MAX_BRANCHES=16, cache=32M, no CASE_EXIT patch: 147s
MAX_BRANCHES=16, cache=32M, CASE_EXIT patch: 141s (the attachment)
MAX_BRANCHES=256, cache=32M, no CASE_EXIT patch: 137s
MAX_BRANCHES=256, cache=32M, CASE_EXIT patch: 137s
The final config is the best for this test but causes a big regression on unpack-code. (It has no effect on the rest of SS.)
The attached patch has no effect on SS but of course will increase memory usage.
If we want to speed up this application this patch is probably the quickest way to do it. Long-term, we should figure out how to reduce memory consumption.
Assignee | ||
Comment 40•16 years ago
|
||
Besides the large number of traces, our other main perf problem here is how much time we spend in the interpreter. I'm trying to understand this better. Here is a basic measurement from a 2000-interrupt run:
time (s) count t/c (ms) cause
6.049485 23130 0.261543 loop2
5.279892 4237 1.246139 none
3.345187 385908 0.008668 branch
0.802810 77984 0.010295 badexit
0.310630 304 1.021808 oracle1
0.112716 16 7.044748 record
0.100979 9465 0.010669 cold
0.067557 6660 0.010144 innerPC
0.004553 347 0.013122 unstable
0.000174 7 0.024826 callback
This shows all the transitions from a non-interpreting state to an interpreting state. There is one row for each "cause" that makes us transition to an interpreting state. The columns are: (1) time spent in interpreter due to this cause, (2) number of transitions to interpreter due to this cause, (3) [column1]/[column2] = average time spent in interpreter for each transition due to this cause.
The causes break down into two major categories: "none" and everything else. A transition from "none" means a call into js_Interpret from outside of JS. All the others represent different returns from js_MonitorLoopEdge.
The 3 main causes for this run of jsMSX are "loop2", "none", and "branch":
- "none" as above represents entry to js_Interpret. I note that there seems to be a fair amount of reentrance in those cases: it looks like there are typically 0-8 recursive calls to js_Interpret on each of these. I'm not sure how it manages to take so long before reaching a loop edge (which would cause a transition to a "tracing overhead" state at least briefly). It could be simply all the functions called, or maybe some native call. Or do we disable tracing in reentered js_Interpret?
- "loop2" is a 'return false' exit from js_MonitorLoopEdge that occurs when the exit type is LOOP_EXIT and innermostNestedGuard is NULL. I don't know what innermostNestedGuard means, so I don't know what's going on here.
- "branch" is an exit from js_MonitorLoopEdge that occurs when the exit type is BRANCH_EXIT and js_AttemptToExtendTree returns false. I believe the latter currently happens only if the current tree has MAX_BRANCHES branches. Thus, "branch" means we exited on a branch and we've already hit MAX_BRANCHES. My runs in comment 39 show a speedup from setting MAX_BRANCHES=256 consistent with eliminating this category.
Does jsMSX call from setTimeout or other events very much?
Assignee | ||
Comment 42•16 years ago
|
||
(In reply to comment #41)
> Does jsMSX call from setTimeout or other events very much?
It appears to set 1 or 2 timers with setInterval, both at a frequency of 60/second (17 ms argument to setInterval).
Reporter | ||
Comment 43•16 years ago
|
||
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090220 Minefield/3.2a1pre
David has actual in-depth stats, but figured, since I'd been trying it on my machine for months now,
noting how things were going might be useful in comparison.
Disabling JIT immediately resulted in a very smooth experience. I was wondering. Is there anything at all the JIT does that can be safely done in a separate thread?
With JIT enabled, there were very rhythmic stutters. For example, on the default Goonies, the opening sequence, each character would stop for almost exactly the same time.
34 interrupts per second without JIT, almost 36 with.
However, and this isn't so good, suddenly all the graphics past the start screen appear corrupted.
Each individual sprite suddenly appears colour shifted and scrambled.
Reporter | ||
Comment 44•16 years ago
|
||
I have no idea if this is helpful either, but:
http://dromaeo.com/?id=60617,60619
This results from the machine referenced when I was talking about jsMSX speed over past few months - you know, just so there's some context.
Left is gecko nightly, right is webkit nightly.
Some of these I wonder if there's like some data error.
Like:
DeltaBlue Constraint Solving 3.33runs/s ±1.88% 248.55runs/s ±4.66%
Assignee | ||
Comment 45•16 years ago
|
||
DeltaBlue is definitely hard for the JIT but it shouldn't be *that* bad. I get about 100 in the JIT now for the shell version of the v8 tests.
I too have noticed the uneven speed of the VM when using the JIT, especially as the VM is starting up. Diagnosing that will require more powerful performance analysis tools, which I am developing, but I haven't reached the point yet where I'm ready to attack this problem.
Regarding threading, that is part of our long-term plan, but I don't know any planning details. If the stuttering is caused by recording and compilation, then threading may solve it. But if it's caused by certain sections not tracing well, then general improvements to the tracer will be the solution.
Reporter | ||
Comment 46•16 years ago
|
||
Well, since you seemed interested, retested in
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090224 Minefield/3.2a1pre
First time I had the firebug addon accidentally enabled (but not with debugging/net turned on for dromaeo website, and was my only addon).
"unresponsive script - jquery.js:50"
Actually got that triggered twice in latest nightly when rerunning Dromaeo one more time in the Strings test.
http://dromaeo.com/?id=60694
DeltaBlue was just as bad as before.
So. I tried rerunning with Firebug disabled, to have a clean setup. Hung for like a half hour at the 2 minute mark, peak memory usage of 1.5GiB, then finally gave me a:
"Script: http://dromaeo.com/tests/dromaeo-3d-cube.html:115"
http://dromaeo.com/?id=60735
Final run, with TM disabled. Churned along more consistently, still sucked up 1.4GiB
http://dromaeo.com/?id=60761
http://dromaeo.com/?id=60735,60761 - interestingly, TM is slightly slower overall on Dromaeo. And indeed, that DeltaBlue thing seems to be death for it.
Oh. and there's still the pixel corruption when running TraceMonkey on jsMSX.
Reporter | ||
Comment 47•16 years ago
|
||
... er. sorry. TM is slightly faster overall. TM is on the left.
Reporter | ||
Comment 48•16 years ago
|
||
No more graphics glitches in JIT, JIT does 32 interrupts per second again.
non-JIT manages 25.
I'm finding this a little odd. non-JIT used to manage 32.
Of course, JIT used to do 37.5 on this same setup.
Reporter | ||
Comment 49•16 years ago
|
||
36.5! you guys are basically back to your old selves! :)
This is even with the still-open bug #478949
I wonder what you guys did...
And a bit over 34 for non-JIT? what the heck.
So basically no JIT benefit.
Oh well. Initial complaint in bug has been addressed, basically, I suppose it can be closed :)
Comment 50•16 years ago
|
||
dmandelin, can we close this?
Assignee | ||
Comment 51•16 years ago
|
||
(In reply to comment #50)
> dmandelin, can we close this?
Well, I'd like it to run faster, instead of just on par. But sure, we can close this bug. We can open a follow-on bug for further improvement if needed.
Updated•16 years ago
|
Status: REOPENED → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → FIXED
Updated•16 years ago
|
Keywords: fixed1.9.1
You need to log in
before you can comment on or make changes to this bug.
Description
•