Closed Bug 465773 Opened 16 years ago Closed 16 years ago

TM: jsMSX demo regressions. Slower by 15% vs non-JIT. 40% slower than past JIT tests

Categories

(Core :: JavaScript Engine, defect, P2)

x86
All
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: bugs, Assigned: dmandelin)

References

(Blocks 1 open bug, )

Details

(Keywords: fixed1.9.1)

Attachments

(1 file)

... however under different circumstances from bug #458227 This time, after updating, if I run jsMSX for a little while (just hit start and let it do its thing), up to around interrupt 1840, it crashes. This is immediately after the main screen clears. Quite repeatable, I hadn't tried jsMSX in a while, but probably appeared sometime in last couple of weeks. http://crash-stats.mozilla.com/report/index/283d0b15-c391-42cf-ab2d-529020081119?p=1 0 js3250.dll nanojit::LIns::deref js/src/nanojit/LIR.cpp:233 1 js3250.dll nanojit::LInsHashSet::hashcode js/src/nanojit/LIR.cpp:1233 2 js3250.dll nanojit::LInsHashSet::grow js/src/nanojit/LIR.cpp:1295 3 js3250.dll nanojit::CseFilter::insGuard js/src/nanojit/LIR.cpp:1968 4 js3250.dll nanojit::ExprFilter::insGuard js/src/nanojit/LIR.cpp:948 5 js3250.dll js3250.dll@0xac773
Flags: blocking1.9.1+
This might be relevant (or not). Am only able to reproduce this one with Firebug turned off. Perhaps related to the slowdown Firebug added. Turning off the addon allowed it to crash at exactly same place, latest nightly.
I don't see a crash on Windows until much later (was past interrupt 2500), but I did see garbage on the "screen" well before that. The crash was right when I clicked the Help menu to find out what build I was running. Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1b2pre) Gecko/20081125 Minefield/3.1b2pre
Based upon Mike's comment I retested on a completely different machine. Also XP, although this one running as an unprivileged user, and completely different architecture. Crashed at almost *exactly* the same place. Somewhere around 1830. This is the latest nightly. http://crash-stats.mozilla.com/report/index/c5274364-17e4-4150-9502-fb6fb2081126 0 @0x0 1 @0x82903b 2 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:3714 3 js3250.dll js_Interpret js/src/jsopcode.tbl:135 4 js3250.dll js_Interpret js/src/jsinterp.cpp:3720 This was from the prior build. 11/20: http://crash-stats.mozilla.com/report/index/f43b090a-13f2-4955-b41c-8ceca2081126 0 js3250.dll nanojit::LirWriter::ins0 js/src/nanojit/LIR.h:442 1 js3250.dll TraceRecorder::TraceRecorder js/src/jstracer.cpp:1006 2 js3250.dll js_StartRecorder js/src/jstracer.cpp:2691 3 js3250.dll js_AttemptToExtendTree js/src/jstracer.cpp:3064 4 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:3725 5 js3250.dll js_Interpret js/src/jsinterp.cpp:3720 6 js3250.dll js_Execute js/src/jsinterp.cpp:1559 7 js3250.dll JS_EvaluateUCScriptForPrincipals js/src/jsapi.cpp:5187 8 xul.dll nsJSContext::EvaluateString dom/src/base/nsJSEnvironment.cpp:1586 9 xul.dll xul.dll@0x305c8d 10 xul.dll nsGlobalWindow::TimerCallback dom/src/base/nsGlobalWindow.cpp:7993 11 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:420 12 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:512 13 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510 14 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170 15 nspr4.dll PR_GetEnv 16 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87 17 firefox.exe firefox.exe@0x2197 18 kernel32.dll kernel32.dll@0x17066 Did you have Firebug running? Also, regarding "garbage" - if you're referring to the green line, I haven't RTFSed but that was always there, I assumed it was an artifact of some debug code. This is interesting because the stack traces were not the same.
Odd. The trace changed again. Of course, who knows what you guys are doing in some seeeekret bug. :) 0 kernel32.dll RaiseException 1 mozcrt19.dll _CxxThrowException throw.cpp:159 2 mozcrt19.dll operator new obj-firefox/memory/jemalloc/src/new.cpp:57 3 xul.dll nsSupportsArray::Create xpcom/ds/nsSupportsArray.cpp:212 4 xul.dll EditAggregateTxn::EditAggregateTxn editor/libeditor/base/EditAggregateTxn.cpp:44 5 xul.dll DeleteRangeTxn::DeleteRangeTxn editor/libeditor/base/DeleteRangeTxn.cpp:66 6 xul.dll xul.dll@0x8b609f 7 xul.dll nsEditor::CreateTxnForDeleteSelection editor/libeditor/base/nsEditor.cpp:4814 8 xul.dll nsEditor::DeleteSelectionImpl editor/libeditor/base/nsEditor.cpp:4390 9 xul.dll xul.dll@0x2f52be 10 xul.dll nsTextEditRules::WillInsertText editor/libeditor/text/nsTextEditRules.cpp:550 11 xul.dll nsQueryReferent::operator obj-firefox/xpcom/build/nsWeakReference.cpp:88 12 xul.dll nsPlaintextEditor::InsertText editor/libeditor/text/nsPlaintextEditor.cpp:781
No longer able to reproduce in latest nightly. Going to go with "fixed"
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Darn. Spoke too soon. I left it running for a few minutes since I was doing a timing test on this one to see if it was getting any faster. Crashed after about 1½m. http://crash-stats.mozilla.com/report/index/a48178b9-8bff-4558-be73-b0b462081211 0 kernel32.dll RaiseException 1 mozcrt19.dll _CxxThrowException throw.cpp:159 2 mozcrt19.dll operator new obj-firefox/memory/jemalloc/src/new.cpp:57 3 xul.dll nsThebesDeviceContext::CreateRenderingContextInstance gfx/src/thebes/nsThebesDeviceContext.cpp:365 4 xul.dll nsThebesDeviceContext::CreateRenderingContext gfx/src/thebes/nsThebesDeviceContext.cpp:321 5 xul.dll PresShell::CreateRenderingContext layout/base/nsPresShell.cpp:3478 6 xul.dll PresShell::DoReflow layout/base/nsPresShell.cpp:6314 7 xul.dll PresShell::ProcessReflowCommands layout/base/nsPresShell.cpp:6466 8 xul.dll PresShell::DoFlushPendingNotifications layout/base/nsPresShell.cpp:4590 9 xul.dll PresShell::FlushPendingNotifications layout/base/nsPresShell.cpp:4521 10 xul.dll nsEditor::EndUpdateViewBatch editor/libeditor/base/nsEditor.cpp:4346 11 xul.dll nsEditor::EndPlaceHolderTransaction editor/libeditor/base/nsEditor.cpp:956 12 xul.dll nsAutoPlaceHolderBatch::~nsAutoPlaceHolderBatch editor/libeditor/base/nsEditorUtils.h:66 13 xul.dll nsPlaintextEditor::InsertText editor/libeditor/text/nsPlaintextEditor.cpp:769 14 xul.dll nsTextControlFrame::SetValue layout/forms/nsTextControlFrame.cpp:2759 15 xul.dll nsCOMPtr_base::~nsCOMPtr_base obj-firefox/dist/include/xpcom/nsAutoPtr.h:956 16 xul.dll nsTextControlFrame::SetFormProperty layout/forms/nsTextControlFrame.cpp:1960 17 xul.dll nsHTMLInputElement::SetValueInternal content/html/content/src/nsHTMLInputElement.cpp:945
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
2 more retests. No longer at a predictable 1830 interrupts. One crashed almost instantaneously at 70 (after a complete restart of firefox) The 2nd crashed after 4 minutes. At Littlemutt's urging, I will retest in hourly TM build. http://crash-stats.mozilla.com/report/index/678bd6cf-15bf-4c14-88b4-68ed72081211 http://crash-stats.mozilla.com/report/index/b8422b24-45b1-46d1-b886-33e352081211 The last two crashes.
In latest hourly: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1228998184/ Ran for 5m minutes without crashing. 8020 interrupts. I guess that bodes well for future nightly in trunk :)
so, crashes because of new() are expected. that means we ran out of memory. and we don't really handle that much, atm the best we can do is die. and besides a crash outside js isn't relevant for a bug against javascript engine.
Spoke too soon. On 2nd run, hourly crashed too. http://crash-stats.mozilla.com/report/index/6e9cb4bd-2164-4218-8fff-9733a2081211 not a very useful trace.. BTW, it seems slower than it used to be. I was rerunning it a 2nd time because I used to get almost 33 interrupts per second over 5 minutes, and on first run I was far below that at 27 interrupts per second. As for your comment... dunno, I'm just trying to run this app. :) I assure you I have a ton of memory on the desktop.
Oh, and re: in JS engine vs out. This crash does only happen with JIT enabled. Anyway, since you were mentioning memory limits. Ran a new test, still against hourly build. Sure enough, while running, Firefox memory usage did climb a bit, but not dramatically. From around 90 megabytes prior to launching, it climbed to a high of 115 with plenty of fluctuation - dropping to 85 or lower. I still had well over a gigabyte of system memory free when it crashed, 3½ minutes into playback. http://crash-stats.mozilla.com/report/index/1745939e-5638-4a77-ac3e-68a532081211 This is a P4 dual core system running windows XP sp3. As noted previously, the demo seems overall slower than in times past, although with periods of dramatic speedup.
Retest with latest nightly. As with tinderbox hourly, it no longer crashes consistently at 1830 or so. FF mem started at ~58 megs (had a couple of unrelated pages open). After starting, immediately jumped to 92 in first few seconds, then up and down a bit with spikes at 117. Interrupts per second avg stayed much lower than in times past. Again around 26, with some sequences seemingly much faster, but others much slower, so more deviation. On first run, it completed 5 minutes of play without crashing. 7920 interrupts over 300 seconds for average of 26.4 - significantly lower than in past nightlies. I then hit pause, reset and tried again. 2nd run similar to first, no crashing, 8055 over 300 seconds. Still slower than in past, but still no crashing. So, this was a little boring. I decided to not watch the 3rd run, but just let it spin cycles. pause/reset. Somewhere a couple of minutes into 3rd run it crashed. If past crash history is any pattern, the memory usage was probably not above 120. http://crash-stats.mozilla.com/report/index/0c4428e1-8505-4415-aa50-b8a022081212 0 @0x0 1 @0x7cc03b 2 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:3784 3 js3250.dll js_Interpret js/src/jsinterp.cpp:6676 same appearance as that in comment #3.
FWIW, the game ran with no crashes at all if I disabled JIT (ran it 5 times for 5 minutes or more each time). With the JIT disabled, interestingly, it ran significantly *faster*. Nowhere near the 37½ interrupts per second average that FF nightlies used to do with JIT, but still, was smooth in average speed with no significant memory fluctuations (peak memory usage over half an hour of ~86 megs). With JIT disabled, acheived 9290 interrupts over 300 seconds for almost 31 per second. About a 15% speed increase over using the JIt.
Title changes based on Littlemutt's suggestion.
Summary: JIT crashes jsMSX demo (again) → TM: Crashes jsMSX demo. Slower by 15% vs non-JIT. 40% slower than past JIT tests that didn't crash
Also at Littlemutt's suggestion. The following have all been run on this machine: http://dromaeo.com/?id=55466,55465 - latest nightly. JIT on left, non-JIT on right. Interesting to see how even they are. http://dromaeo.com/?id=55300,55299 - 3.0 vs yesterday's nightly w/ JIT enabled. http://dromaeo.com/?id=55299,55466 - comparing those tests can see whatever last night's patches were, Firefox did get a lot faster. However, as 55299 vs 55465 seems to suggest, a lot of it is not JIT related. 'course, the error margins are kind of large.
By way of comparison on the speed front. I tested the latest webkit nightly for windows: http://builds.nightly.webkit.org/files/trunk/win/WebKit-SVN-r39293.zip webkit nightlies around the time TM was doing 37.5 interrupts per second were at about 32 interrupts. SFX can now do 16,820 interrupts over 5 minutes of that demo on my machine with an average speed of 56.1 interrupts per second. Furthermore, the core that is running the thread maintained a steady 50% load. Amazing.
Priority: -- → P2
Still crashes before I can get through my 5m run, but I did notice something interesting that could be related to stuff you guys are doing in back. CPU usage under TM is also around 50-60% of the core it is using (25-30% in windows perf mon). This is interesting to me 'cause it isn't running at full speed.
I suppose you guys did something new since the trace has a bit more in it this time. http://crash-stats.mozilla.com/report/index/5ca5208b-e67b-454c-b0cb-bdd642090112 0 js3250.dll nanojit::LIns::isTramp js/src/nanojit/LIR.h:334 1 js3250.dll nanojit::LIns::deref js/src/nanojit/LIR.cpp:260 2 js3250.dll nanojit::LIns::oprnd1 js/src/nanojit/LIR.cpp:640 3 js3250.dll nanojit::StackFilter::read js/src/nanojit/LIR.cpp:1184 4 js3250.dll nanojit::StackFilter::read js/src/nanojit/LIR.cpp:1171 5 js3250.dll nanojit::DeadCodeFilter::read js/src/nanojit/Assembler.cpp:77 6 js3250.dll nanojit::Assembler::gen js/src/nanojit/Assembler.cpp:1025 7 @0x12f51f
Took a lot longer to crash (2nd or 3rd run, several minutes in), and the trace was a lot longer too. http://crash-stats.mozilla.com/report/index/b3676266-fbe2-4944-9e32-c20a02090121 0 @0x0 1 @0x12f62b 2 js3250.dll js_Interpret js/src/jsinterp.cpp:3700 3 js3250.dll js_Invoke js/src/jsinterp.cpp:1336 4 js3250.dll js_fun_apply js/src/jsfun.cpp:1732 5 js3250.dll js_Interpret js/src/jsinterp.cpp:4994 6 js3250.dll js_Execute js/src/jsinterp.cpp:1564 7 js3250.dll JS_EvaluateUCScriptForPrincipals js/src/jsapi.cpp:5192 8 xul.dll nsJSContext::EvaluateString dom/src/base/nsJSEnvironment.cpp:1599 9 xul.dll nsGlobalWindow::RunTimeout dom/src/base/nsGlobalWindow.cpp:7629 10 xul.dll nsGlobalWindow::TimerCallback dom/src/base/nsGlobalWindow.cpp:7980 11 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:420 12 nspr4.dll _PR_MD_UNLOCK nsprpub/pr/src/md/windows/w95cv.c:344 13 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:512 14 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510 15 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170 16 xul.dll nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192 17 nspr4.dll PR_GetEnv 18 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87 19 firefox.exe firefox.exe@0x2197 20 kernel32.dll BaseProcessStart
Hm. After making sure I was updated to: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090120 Minefield/3.2a1pre (latest nightly) It is now back to crashing instantaneously again. Same as with bug #465784 actually.
I've been testing with the OOM patch for bug 465784 applied and I have not seen the crash but I have noticed a 50% slowdown vs. non-JIT. I don't know what the problem is yet. The basic stats I'm recording are that for 2000 interrupts, with non-JIT we have 40s spent in the interpreter, while for JIT, we have: interpreter: 48 recording traces: 5 compiling traces: 2 running traces: 7 JIT flushes: 130 traces started: 20k traces finished: 8k traces called: 700k jsops run under interpreter (including recording): 400M jsops run natively: 1000M This is weird. It indicates that we run 2/3 of the ops natively, yet we are slower, which I have never observed before. I would expect to see maybe a 50% speedup or so given that jsop execution mode distribution. Andreas guessed that we are running too many short traces and that is my best guess at this point. I think this would be a problem if the overhead of calling a trace is too much. In this case, we are seeing about 2000 cycles per call to a trace in pure native execution. I don't know how much trace call overhead there is added to that but it could be substantial relative to 2000 cycles. I will measure it. "Something to do with DOM calls" is my second guess but I don't know much about that. I tried running with various values of MAX_BRANCHES in the range 4-256. The total interpreter time didn't vary much, and in fact the differences could just be noise. The time spent on trace goes from 4s at 4 through 7s at 16 to 10s at 256. The time spent recording and compiling traces goes up by a lot with bigger values. At 256, the stats are: interpreter: 22 recording traces: 36 compiling traces: 17 running traces: 9 flushes: 256 traces started: 150k traces finished: 132k traces called: 260k interp ops: 16M trace ops: 1400M
Oops, the recording traces number should be "16 seconds" in the MAX_BRANCHES=256 run.
Nifty. How do you generate stats from tracer? Do I need to recreate a build environment, or can I just use the tm tinderbox builds? Oh, and still crashed for me in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090122 Minefield/3.2a1pre (nightly) but that could be simply 'cause your patch hadn't percolated up yet.
(In reply to comment #24) > Nifty. How do you generate stats from tracer? I have some personal patches for this that apply to TM, so you would need to set up a build of TM and apply my patches. I just added them to bug 474829 with the eventual goal of landing them in TM.
The timings I gave above are off, partly because of deficiencies in my timer, and partly because I forgot what the output meant. Here are fixed timing stats from an improved version of the internals timer: MAX_BRANCHES 16 clocks time interpret = 53082229915 24.128286 record = 10939430887 4.972469 native = 17604643144 8.002111 compile = 3910572050 1.777533 overhead = 92910378187 42.231990 = 178447254183 81.112388 In the above case we seem to be taking a huge charge for "overhead", which is basically all the trace monitoring, stitching, compilation, etc., stuff, including js_ExecuteTree (but not the native code itself). MAX_BRANCHES 256 clocks time interpret = 21321387109 9.691540 record = 71657487660 32.571585 native = 21041540399 9.564337 compile = 40826810145 18.557641 overhead = 7598975406 3.454080 = 162446200719 73.839182 I have no idea what's going on here. There may still be defects in the timers. But it kind of looks like a lot of traces are recorded, and then run few or no times, which is consistent with the counter output I collected earlier.
Things look better with the most recent tip. MAX_BRANCHES jsMSX perf hit (vs. non-tracing) SS perf hit (vs. tracing 16) 16 1.36x 1x 32 1.23x 1.004x 64 1.02x 1.011x 256 1.02x 1.091x
Depends on: 475115
Looks like the fix that closed out bug #465784 has resolved the crashing in this one. Yay. Only the speed regressions appear to be pretty much the same.
renamed title to remove crashing refs.
Summary: TM: Crashes jsMSX demo. Slower by 15% vs non-JIT. 40% slower than past JIT tests that didn't crash → TM: jsMSX demo regressions. Slower by 15% vs non-JIT. 40% slower than past JIT tests
Sorry for bug spam, but do the performance issues still warrant the "blocking" flag? I'm guessing crashing was the more serious part.
Assignee: general → dmandelin
Back to crashing again, with a new, longer, trace. http://crash-stats.mozilla.com/report/index/05f317a6-7863-46f1-8f64-a8e0b2090202 0 js3250.dll nanojit::Assembler::releaseRegisters js/src/nanojit/Assembler.cpp:960 1 js3250.dll nanojit::Assembler::gen js/src/nanojit/Assembler.cpp:1363 2 js3250.dll nanojit::Assembler::assemble js/src/nanojit/Assembler.cpp:828 3 js3250.dll nanojit::compile js/src/nanojit/LIR.cpp:2140 4 js3250.dll TraceRecorder::compile js/src/jstracer.cpp:2499 5 js3250.dll TraceRecorder::closeLoop js/src/jstracer.cpp:2625 6 js3250.dll js_CloseLoop js/src/jstracer.cpp:3541 7 js3250.dll js_RecordLoopEdge js/src/jstracer.cpp:3576 8 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:4121 9 js3250.dll js_Interpret js/src/jsinterp.cpp:3698 10 js3250.dll js_Invoke js/src/jsinterp.cpp:1334 11 js3250.dll js_fun_apply js/src/jsfun.cpp:1732 12 js3250.dll js_Interpret js/src/jsinterp.cpp:4992 13 js3250.dll js_Execute js/src/jsinterp.cpp:1564 14 js3250.dll JS_EvaluateUCScriptForPrincipals js/src/jsapi.cpp:5244 15 xul.dll nsJSContext::EvaluateString dom/src/base/nsJSEnvironment.cpp:1598 16 xul.dll nsGlobalWindow::RunTimeout dom/src/base/nsGlobalWindow.cpp:7642 17 xul.dll nsGlobalWindow::TimerCallback dom/src/base/nsGlobalWindow.cpp:7993 18 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:420 19 nspr4.dll _PR_MD_UNLOCK nsprpub/pr/src/md/windows/w95cv.c:344 20 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:512 21 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510 22 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170 23 xul.dll nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192 24 nspr4.dll PR_GetEnv 25 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87 26 firefox.exe firefox.exe@0x21a7 27 kernel32.dll BaseProcessStart
Summary: TM: jsMSX demo regressions. Slower by 15% vs non-JIT. 40% slower than past JIT tests → TM: Crashes jsMSX demo. Slower by 15% vs non-JIT. 40% slower than past JIT tests
Depends on: 476560
That fix you committed a few days ago appears to have helped with crashing. Still averaging 28 interrupts per second, so a lot slower than jsMSX used to go. Reading that xi bug with interest and vague comprehension. :)
Summary: TM: Crashes jsMSX demo. Slower by 15% vs non-JIT. 40% slower than past JIT tests → TM: jsMSX demo regressions. Slower by 15% vs non-JIT. 40% slower than past JIT tests
The xi/xtbl patch has landed on TM, and once it gets to nightlies tracing perf should be about equal to non-tracing overall. I do find that tracing runs sometimes faster, sometimes slower--it's weirdly uneven. Initial info into why tracing isn't faster: time breakdown for tracing: interpret = 12590490319 5.722950 record = 2601140256 1.182336 native = 2159817803 0.981735 compile = 1363429980 0.619741 overhead = 615674840 0.279852 execute = 220603471 0.100274 = 19551156669 8.886889 Compare to about 10s execution time without tracing. (Workload comes from my clicking, so don't read anything into the 1s difference.) The first problem is that we're only tracing 40% of the work or so. Second, trace execution time is low relative to the overheads--we're probably not executing each compiled trace enough times to make it really pay off. I'll have to look deeper to see why.
I assume that new patch is reason that speed has dropped from 28 interrupts per second to 24 or 25 in the 200901213 nightly. :) Timed it several times. As usual, 5 minutes, no interaction with game, not reloading page in case that helped. All time TM high on this machine, 37 or 38 in late september/early november - no crashes then either ;).
... 20090213 nightly sorry. Actually, I just realised I had firebug enabled. That has mucked up time in past (shocking). Ignore prior comment 'till I rerun :)
Yeah. That was it. Rerun a couple of times, and got 29 and 28. So. Basically unchanged. Oh well. Still fun to watch you guys poke at this.
Higher quality numbers from a longer run: JIT: interpret = 116919754050 53.145343 record = 173724828519 78.965831 native = 85461236707 38.846017 compile = 102967063001 46.803210 overhead = 37141156833 16.882344 execute = 22992612874 10.451188 = 539206651984 245.093933 Non-JIT: interpret = 553970204403 251.804638 Thus, only 20% of the work is in pure interpreted mode. This is not bad and in most programs is enough to get a big speedup. My earlier run was too short and so it didn't have enough time for traces to get compiled. Combined with the record time, we are running 50% of the work in some kind of interpreted mode. While we are actually running as native code, we get about 3x throughput compared to interpreted. This is great--it's on the high side of the usual effect of tracing. But the overheads (compiling, trace entry, and other overheads) take time equal to 2x native code execution time, nullifying the speedup. It's fairly clear that the problem is that we are not executing traces enough times on average to win back the time it takes to compile them. It also looks like we're probably making too many transitions from interpreted mode to native mode. (And that in turn is probably from making too many transitions from native mode to interpreted.) The high record time is also troubling. It could mean something easier to fix, like too many flushes, or something harder to fix, like too many unique trace/typemap combinations.
MAX_BRANCHES=256 (default is 16): interpret = 107205565423 48.729802 record = 189790131135 86.268241 native = 87946432376 39.975651 compile = 111621314871 50.736961 overhead = 18723080277 8.510491 execute = 5355031506 2.434105 = 520641555588 236.655253 Trace cache size is 256MB (default is 16 MB): interpret = 123768224932 56.258284 record = 55956919293 25.434963 native = 86807426761 39.457921 compile = 30131391895 13.696087 overhead = 75757551991 34.435251 execute = 32044378553 14.565627 = 404465893425 183.848133 MAX_BRANCHES=256, trace cache size=256MB: interpret = 108486404807 49.312002 record = 29858574207 13.572079 native = 90011340287 40.914246 compile = 16918097548 7.690044 overhead = 11157749295 5.071704 execute = 4055881027 1.843582 = 260488047171 118.403658 This is what we want to see.
Attached patch Tune for jsMSX (deleted) — Splinter Review
This patch brings JS execution time for my standard test (~10187 interrupts) down to 141s from the previous base of about 256s. The CASE_EXIT test is something that should have been there all along that means we'll always try to extend when it's a tableswitch case. The code cache size increase to 32MB is apparently needed to avoid excess OOM-related flushes. Numbers for a few configurations: MAX_BRANCHES=16, cache=16M, no CASE_EXIT patch: 256s MAX_BRANCHES=16, cache=16M, CASE_EXIT patch: 256s MAX_BRANCHES=16, cache=32M, no CASE_EXIT patch: 147s MAX_BRANCHES=16, cache=32M, CASE_EXIT patch: 141s (the attachment) MAX_BRANCHES=256, cache=32M, no CASE_EXIT patch: 137s MAX_BRANCHES=256, cache=32M, CASE_EXIT patch: 137s The final config is the best for this test but causes a big regression on unpack-code. (It has no effect on the rest of SS.) The attached patch has no effect on SS but of course will increase memory usage. If we want to speed up this application this patch is probably the quickest way to do it. Long-term, we should figure out how to reduce memory consumption.
Depends on: 478949
Besides the large number of traces, our other main perf problem here is how much time we spend in the interpreter. I'm trying to understand this better. Here is a basic measurement from a 2000-interrupt run: time (s) count t/c (ms) cause 6.049485 23130 0.261543 loop2 5.279892 4237 1.246139 none 3.345187 385908 0.008668 branch 0.802810 77984 0.010295 badexit 0.310630 304 1.021808 oracle1 0.112716 16 7.044748 record 0.100979 9465 0.010669 cold 0.067557 6660 0.010144 innerPC 0.004553 347 0.013122 unstable 0.000174 7 0.024826 callback This shows all the transitions from a non-interpreting state to an interpreting state. There is one row for each "cause" that makes us transition to an interpreting state. The columns are: (1) time spent in interpreter due to this cause, (2) number of transitions to interpreter due to this cause, (3) [column1]/[column2] = average time spent in interpreter for each transition due to this cause. The causes break down into two major categories: "none" and everything else. A transition from "none" means a call into js_Interpret from outside of JS. All the others represent different returns from js_MonitorLoopEdge. The 3 main causes for this run of jsMSX are "loop2", "none", and "branch": - "none" as above represents entry to js_Interpret. I note that there seems to be a fair amount of reentrance in those cases: it looks like there are typically 0-8 recursive calls to js_Interpret on each of these. I'm not sure how it manages to take so long before reaching a loop edge (which would cause a transition to a "tracing overhead" state at least briefly). It could be simply all the functions called, or maybe some native call. Or do we disable tracing in reentered js_Interpret? - "loop2" is a 'return false' exit from js_MonitorLoopEdge that occurs when the exit type is LOOP_EXIT and innermostNestedGuard is NULL. I don't know what innermostNestedGuard means, so I don't know what's going on here. - "branch" is an exit from js_MonitorLoopEdge that occurs when the exit type is BRANCH_EXIT and js_AttemptToExtendTree returns false. I believe the latter currently happens only if the current tree has MAX_BRANCHES branches. Thus, "branch" means we exited on a branch and we've already hit MAX_BRANCHES. My runs in comment 39 show a speedup from setting MAX_BRANCHES=256 consistent with eliminating this category.
Does jsMSX call from setTimeout or other events very much?
(In reply to comment #41) > Does jsMSX call from setTimeout or other events very much? It appears to set 1 or 2 timers with setInterval, both at a frequency of 60/second (17 ms argument to setInterval).
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090220 Minefield/3.2a1pre David has actual in-depth stats, but figured, since I'd been trying it on my machine for months now, noting how things were going might be useful in comparison. Disabling JIT immediately resulted in a very smooth experience. I was wondering. Is there anything at all the JIT does that can be safely done in a separate thread? With JIT enabled, there were very rhythmic stutters. For example, on the default Goonies, the opening sequence, each character would stop for almost exactly the same time. 34 interrupts per second without JIT, almost 36 with. However, and this isn't so good, suddenly all the graphics past the start screen appear corrupted. Each individual sprite suddenly appears colour shifted and scrambled.
I have no idea if this is helpful either, but: http://dromaeo.com/?id=60617,60619 This results from the machine referenced when I was talking about jsMSX speed over past few months - you know, just so there's some context. Left is gecko nightly, right is webkit nightly. Some of these I wonder if there's like some data error. Like: DeltaBlue Constraint Solving 3.33runs/s ±1.88% 248.55runs/s ±4.66%
DeltaBlue is definitely hard for the JIT but it shouldn't be *that* bad. I get about 100 in the JIT now for the shell version of the v8 tests. I too have noticed the uneven speed of the VM when using the JIT, especially as the VM is starting up. Diagnosing that will require more powerful performance analysis tools, which I am developing, but I haven't reached the point yet where I'm ready to attack this problem. Regarding threading, that is part of our long-term plan, but I don't know any planning details. If the stuttering is caused by recording and compilation, then threading may solve it. But if it's caused by certain sections not tracing well, then general improvements to the tracer will be the solution.
Well, since you seemed interested, retested in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090224 Minefield/3.2a1pre First time I had the firebug addon accidentally enabled (but not with debugging/net turned on for dromaeo website, and was my only addon). "unresponsive script - jquery.js:50" Actually got that triggered twice in latest nightly when rerunning Dromaeo one more time in the Strings test. http://dromaeo.com/?id=60694 DeltaBlue was just as bad as before. So. I tried rerunning with Firebug disabled, to have a clean setup. Hung for like a half hour at the 2 minute mark, peak memory usage of 1.5GiB, then finally gave me a: "Script: http://dromaeo.com/tests/dromaeo-3d-cube.html:115" http://dromaeo.com/?id=60735 Final run, with TM disabled. Churned along more consistently, still sucked up 1.4GiB http://dromaeo.com/?id=60761 http://dromaeo.com/?id=60735,60761 - interestingly, TM is slightly slower overall on Dromaeo. And indeed, that DeltaBlue thing seems to be death for it. Oh. and there's still the pixel corruption when running TraceMonkey on jsMSX.
... er. sorry. TM is slightly faster overall. TM is on the left.
No more graphics glitches in JIT, JIT does 32 interrupts per second again. non-JIT manages 25. I'm finding this a little odd. non-JIT used to manage 32. Of course, JIT used to do 37.5 on this same setup.
36.5! you guys are basically back to your old selves! :) This is even with the still-open bug #478949 I wonder what you guys did... And a bit over 34 for non-JIT? what the heck. So basically no JIT benefit. Oh well. Initial complaint in bug has been addressed, basically, I suppose it can be closed :)
dmandelin, can we close this?
(In reply to comment #50) > dmandelin, can we close this? Well, I'd like it to run faster, instead of just on par. But sure, we can close this bug. We can open a follow-on bug for further improvement if needed.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Blocks: 467263
Keywords: fixed1.9.1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: