Closed Bug 922440 Opened 11 years ago Closed 8 years ago

Baseline slows down B2G app startup by ~10%

Categories

(Core :: JavaScript Engine, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED WONTFIX
blocking-b2g -

People

(Reporter: zbraniecki, Assigned: nbp)

References

(Depends on 1 open bug)

Details

(Keywords: perf, Whiteboard: [c=progress p= s= u=])

Attachments

(1 obsolete file)

Environment: Device: Keon Gecko: mozilla-central (Sept 30th) Gaia: master (Sept 30th) Variable: adding --disable-ion to gonk-misc/default-gecko-config Result: master no-ion Δ ms Δ % ======================================== Browser 457 449 -8 -2% Calendar 652 618 -34 -6% Camera 448 395 -53 -13% Clock 792 729 -63 -9% Email 344 310 -34 -11% Gallery 449 398 -51 -13% Messages 595 568 -27 -5% Music 541 515 -26 -5% Phone 592 564 -28 -5% Settings 768 700 -68 -10% Video 499 440 -59 -13% ======================================== AVG: -41 -8.27%
Wow, that's a lot. IIRC, --disable-ion also disables the baseline jit, so this isn't just Ion compilation. It'd be useful to measure with --enable-ion but with javascript.options.ion.content = false. It would also be useful to measure with javascript.options.baselinejit.{chrome,content} = false to see if perhaps the startup overhead is something more basic like the extra code size or some initialization code.
My original blind guess was that this was related to the initialization of the IonRuntime, especially the part which is generating the code of all VMWrapper, but then I cannot explain why so browser application does not seems to be affected, unless nothing use baseline in the browser app. I did some profile as part of Bug 849103. So I guess this might be related to baseline landing. AFAIK, there was a big regression (~150ms) on the switch between 18 to central, but (most of?) it was related to the pre-allocated app not being ready in time. (In reply to Luke Wagner [:luke] from comment #1) > Wow, that's a lot. IIRC, --disable-ion also disables the baseline jit, so > this isn't just Ion compilation. None of the default applications are using Ion during their start-up.
Keywords: perf
(In reply to Nicolas B. Pierron [:nbp] from comment #2) > My original blind guess was that this was related to the initialization of > the IonRuntime, especially the part which is generating the code of all > VMWrapper, but then I cannot explain why so browser application does not > seems to be affected, unless nothing use baseline in the browser app. I don't know if that's still the case, but Browser app used to be the only in-process app.
How are the start/stop events determined for the numbers in comment 0? Is this b2gperf cold_load_time test?
(In reply to Luke Wagner [:luke] from comment #1) > It'd be useful to measure with > --enable-ion but with javascript.options.ion.content = false. It would also > be useful to measure with javascript.options.baselinejit.{chrome,content} = > false to see if perhaps the startup overhead is something more basic like > the extra code size or some initialization code. So, the scenarios you'd like to get are: 1) --enable-ion + javascript.options.ion.content = false 2) javascript.options.baselinejit.chrome = false + javascript.options.baselinejit.content = false Anything else? --enable-ion is on by default, so just removing --disable-ion from my buildconfig (so basically building vanilla gecko) should be enough. Is that right?
(In reply to Ben Kelly [:bkelly] from comment #4) > How are the start/stop events determined for the numbers in comment 0? Is > this b2gperf cold_load_time test? Yes, this is b2gperf cold_load_time's median on 30 replications
blocking-b2g: --- → 1.3?
Whiteboard: [c=progress p= s= u=]
Flags: needinfo?(luke)
(In reply to Zbigniew Braniecki [:gandalf] from comment #5) > So, the scenarios you'd like to get are: > > 1) --enable-ion + javascript.options.ion.content = false > 2) javascript.options.baselinejit.chrome = false + > javascript.options.baselinejit.content = false > > > Anything else? I wasn't clear originally, but for (2), I meant: --enable-ion + javascript.options.ion.content = false, javascript.options.baselinejit.{chrome,content} = false The goal is to isolate the costs of the extra code and initialization. (Enabling ion but not baseline wouldn't be very useful since the baseline jit provides profiling information to Ion.) Thanks!
Flags: needinfo?(luke)
Scenario 1: javascript.options.ion.content = false master change Δ ms Δ % ======================================== Browser 462 468 6 1% Calendar 656 658 2 0% Camera 453 448 -5 -1% Clock 798 793 -5 -1% Email 333 330 -3 -1% Gallery 452 426 -26 -6% Messages 599 594 -5 -1% Music 543 543 0 0% Phone 596 593 -3 -1% Settings 773 770 -3 0% Video 500 497 -3 -1% ======================================== avg: -4 -0.86% Scenario 2: javascript.options.ion.content = false javascript.options.baselinejit.chrome = false javascript.options.baselinejit.content = false master change Δ ms Δ % ======================================== Browser 462 447 -15 -3% Calendar 656 632 -24 -4% Camera 453 406 -47 -12% Clock 798 740 -58 -8% Email 333 312 -21 -7% Gallery 452 407 -45 -11% Messages 599 574 -25 -4% Music 543 520 -23 -4% Phone 596 572 -24 -4% Settings 773 724 -49 -7% Video 500 479 -21 -4% ======================================== avg: -32 -6.23% It seems that baselinejit explains most of the regression (over 80% of it)
A next experiment would be to increase the warmup count on the baseline jit to various values (default is 10, I'd try 20, 50, 100, 200). This isn't a pref; it requires changing this constant: http://mxr.mozilla.org/mozilla-central/source/js/src/jit/Ion.h#221 and rebuilding (make -C js/src && make libs -C toolkit/library should be sufficient). If this goes well, an option we could consider would be changing js_ionOptions.baselineUsesBeforeCompile (which can be set through JS_SetGlobalJitCompilerOption) to a higher value during the startup of a b2g app.
Do we create a new JSRuntime when starting a b2g app? If so it's probably the VM wrapper initialization.
(In reply to Jan de Mooij [:jandem] from comment #10) > Do we create a new JSRuntime when starting a b2g app? No we do not, the JSRuntime is created as part of the pre-allocated app, and so it does not appear in the previous results. > If so it's probably > the VM wrapper initialization. On the other hand, the IonRuntime is created the first time we enter the baseline compiler, is sounds like a likely target of this overhead as I mentionned in comment 2. A few existing bug are looking at getting rid of static initializations (Bug 905210, Bug 894092) as this is costly for the desktop, but we should remove the ahead generation of VMWrappers (revert Bug 786146) and generate them when we create a path to call them. But first we should confirm that most of this overhead comes from the VMWrapper generation. I am planning to use the TraceLogger on B2G to profile the IonRuntime initialization.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #11) > But first we should confirm that most of this overhead comes from the > VMWrapper generation. > I am planning to use the TraceLogger on B2G to profile the IonRuntime > initialization. Does it make sense for me to spend time on what Luke asked for: "A next experiment would be to increase the warmup count on the baseline jit to various values (default is 10, I'd try 20, 50, 100, 200)." ?
Depends on: 925207
Ok, I run some test on an Unagi and I was able to reproduce this by launching the settings application ~10 times each, and waiting enough for the pre-allocated app to appear. Settings app ion & baseline on 858.6 ion & baseline off 833.7 ion & baseline high 864.9 on: {default} off: user_pref("javascript.options.baselinejit.content", false); user_pref("javascript.options.baselinejit.chrome", false); user_pref("javascript.options.ion.content", false); high: user_pref("javascript.options.baselinejit.unsafe_eager_compilation", 1000000); user_pref("javascript.options.ion.unsafe_eager_compilation", 2000000); Having JITs on with a high tigger surely induce a cost, but I would not expect it to be that expensive compared to the cost of having the JITs disabled.
Can you post your std? Also, do you use mean or median? My numbers are even higher than yours for the same app (~50-60ms diff between on/off) and I'm wondering if it's something that happened in Gecko or our experiment methodology is different.
(In reply to Zbigniew Braniecki [:gandalf] from comment #14) > Can you post your std? Also, do you use mean or median? I did a mean over 9 - 11 tests that I did manually. Since installed b2gperf which is reporting higher but more stable results. I guess marionette instrumentation might be to blame too. I can copy the raw if you are really interested, but I don't think this brings any value to this conversation. The point is that I was able to reproduce and that the problem persist even when we do not enter any jitted code (which is what "high" is showing).
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #15) > (In reply to Zbigniew Braniecki [:gandalf] from comment #14) > > Can you post your std? Also, do you use mean or median? > > I did a mean over 9 - 11 tests that I did manually. Since installed b2gperf > which is reporting higher but more stable results. Well, first of all, we switched to use median in b2gperf, because it mitigates high variance. I'd suggest to rely on median for those tests for consistency. Also, I usually get better mean on 10 tries than on 30, so once you use b2gperf, you can use 30 replications to keep numbers comparable to what datazilla reports. > The point is that I was able to reproduce and that the problem persist even > when we do not enter any jitted code (which is what "high" is showing). Yeah. I guess it makes sense to just verify the solution we'll get here on both systems.
(In reply to Zbigniew Braniecki [:gandalf] from comment #0) > Environment: > Device: Keon > Gecko: mozilla-central (Sept 30th) > Gaia: master (Sept 30th) > > Variable: adding --disable-ion to gonk-misc/default-gecko-config > > Result: > > master no-ion Δ ms Δ % > ======================================== > Browser 457 449 -8 -2% > Calendar 652 618 -34 -6% > Camera 448 395 -53 -13% > Clock 792 729 -63 -9% > Email 344 310 -34 -11% > Gallery 449 398 -51 -13% > Messages 595 568 -27 -5% > Music 541 515 -26 -5% > Phone 592 564 -28 -5% > Settings 768 700 -68 -10% > Video 499 440 -59 -13% > ======================================== > AVG: -41 -8.27% Is it possible to try this with ion parallel-compilation turned off? When it's turned on, Ion can't generate the trampoline code for VM-function-entries on the fly, and it must pre-generate them at runtime creation. If it's turned off, the the pre-generation can be avoided and they can be generated on demand as needed. This setting may help startup times quite a bit.
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
(In reply to Kannan Vijayan [:djvj] from comment #17) > Is it possible to try this with ion parallel-compilation turned off? Parallel compilation is disabled if there is only one core. > When > it's turned on, Ion can't generate the trampoline code for > VM-function-entries on the fly, and it must pre-generate them at runtime > creation. If it's turned off, the the pre-generation can be avoided and > they can be generated on demand as needed. > > This setting may help startup times quite a bit. My idea was to move the IonCompartment initialization to the preallocated app, which would hide this cost. But first thing first, let's make sure this is the thing. Currently I bet for the initiailization of the IonCompartment, which is done before checking the use-count for baseline. I will first try to move this one below the use-count check and see if I can reproduce this slowdown with a high-trigger for baseline.
Summary: ION slows down B2G app startup by ~10% → Baseline slows down B2G app startup by ~10%
Discard these results, I did not noticed that the prefs.js file was reseted every time we restart b2g. $ b2gperf --delay=10 --iterations=30 --no-restart Settings on: Results for Settings, cold_load_time: median:997, mean:1014, std: 52, max:1203, min:969 off: Results for Settings, cold_load_time: median:968, mean:982v, std: 34, max:1117, min:947 high: Results for Settings, cold_load_time: median:990, mean:1007, std: 41, max:1154, min:961 It seems that the overall cost of having the Jit enabled on the system app on top on an inbound build is around 20ms. These results are what I got when I move the initialization of the IonCompartment after the use count check of baseline. I will continue to instrument the code to see if I can blame some initialization function which is done prematurely, and which is not yet done as part of the pre-allocated app.
(In reply to Nicolas B. Pierron [:nbp] from comment #19) > Discard these results, I did not noticed that the prefs.js file was reseted > every time we restart b2g. Can you elaborate? I don't think prefs.js on my Keon changes between restarts. I'd like to try to reproduce your results.
So here is the result of a logcat, and indeed it seems that the creation of the IonRuntime is the beast. Strangely it seems to be already initialized in the pre-allocated app. I/Gecko ( 2270): ######################## BrowserElementChildPreload.js loaded I/IonBench( 2270): Time inside createIonRuntime: 24.719000 ms I/IonBench( 2270): Time inside ensureIonCompartmentExists: 24.781000 ms I/Gecko ( 1683): 1382670818754 Marionette [… b2gperf starts the Settings app …] I/IonBench( 2270): Time inside createIonRuntime: 25.849000 ms I/IonBench( 2270): Time inside ensureIonCompartmentExists: 26.031000 ms I/Gecko ( 1683): 1382670821273 Marionette […] I/Gecko ( 2270): I/Gecko ( 2270): ###!!! [Child][OnMaybeDequeueOne] Error: Channel closing: too late to send/recv, messages will be lost I/Gecko ( 2270): Does the settings application relies on a worker? Does it wait on the worker for the initialization?
Flags: needinfo?(21)
(In reply to Zbigniew Braniecki [:gandalf] from comment #20) > (In reply to Nicolas B. Pierron [:nbp] from comment #19) > > Discard these results, I did not noticed that the prefs.js file was reseted > > every time we restart b2g. > > Can you elaborate? I don't think prefs.js on my Keon changes between > restarts. I did a: adb push prefs.js $(adb shell echo -n "/data/b2g/mozilla/*.default")/prefs.js adb shell 'stop b2g; sleep 5; start b2g' <b2gperf …> adb shell cat $(adb shell echo -n "/data/b2g/mozilla/*.default")/prefs.js and I did not always got the same file back, as if it got reseted. So now I am changing /system/b2g/defaults/pref/user.js instead.
(In reply to Nicolas B. Pierron [:nbp] from comment #21) > I/Gecko ( 2270): ######################## BrowserElementChildPreload.js > loaded > I/IonBench( 2270): Time inside createIonRuntime: 24.719000 ms > I/IonBench( 2270): Time inside ensureIonCompartmentExists: 24.781000 ms Note that when the JITs are turned off, *only* these 2 lines are removed, which makes me wonder what b2gperf is measuring since the next 2 lines are constantly there. > I/Gecko ( 1683): 1382670818754 Marionette [… b2gperf starts > the Settings app …] > I/IonBench( 2270): Time inside createIonRuntime: 25.849000 ms > I/IonBench( 2270): Time inside ensureIonCompartmentExists: 26.031000 ms > I/Gecko ( 1683): 1382670821273 Marionette […] > I/Gecko ( 2270): > I/Gecko ( 2270): ###!!! [Child][OnMaybeDequeueOne] Error: Channel closing: > too late to send/recv, messages will be lost > I/Gecko ( 2270): Jonathan: Any idea what is Marionette looking at? The result mention the cold_load_time, does that mean that the pre-allocated app start-up time is included, or I am not looking at the right thing?
Flags: needinfo?(jgriffin)
(In reply to Nicolas B. Pierron [:nbp] from comment #21) > So here is the result of a logcat, and indeed it seems that the creation of > the IonRuntime is the beast. Strangely it seems to be already initialized > in the pre-allocated app. > > I/Gecko ( 2270): ######################## BrowserElementChildPreload.js > loaded > I/IonBench( 2270): Time inside createIonRuntime: 24.719000 ms > I/IonBench( 2270): Time inside ensureIonCompartmentExists: 24.781000 ms > I/Gecko ( 1683): 1382670818754 Marionette [… b2gperf starts > the Settings app …] > I/IonBench( 2270): Time inside createIonRuntime: 25.849000 ms > I/IonBench( 2270): Time inside ensureIonCompartmentExists: 26.031000 ms > I/Gecko ( 1683): 1382670821273 Marionette […] > I/Gecko ( 2270): > I/Gecko ( 2270): ###!!! [Child][OnMaybeDequeueOne] Error: Channel closing: > too late to send/recv, messages will be lost > I/Gecko ( 2270): > > Does the settings application relies on a worker? Does it wait on the > worker for the initialization? It does not by itself but I guess the settings application use the RIL API / WIFI API that use workers at the end. http://mxr.mozilla.org/mozilla-central/source/dom/wifi/WifiWorker.js http://mxr.mozilla.org/mozilla-central/source/dom/system/gonk/ril_worker.js Not sure if this is related though.
Flags: needinfo?(21)
nbp, b2gperf measures app launch times by handling the apploadtime event, and finds the type == 'cold' field in the event to report. See: https://github.com/mozilla/b2gperf/blob/master/b2gperf/launchapp.js#L15
Flags: needinfo?(jgriffin)
Depends on: 932039
I looked into from where the IonRuntime was created, and it appears that we are creating it eagerly at 3 locations: - While checking if we can enter Baseline - While checking if we can enter Ion - While creating a type for the This of a new function. (Bug 932039) I will attach a patch for the first 2, and I opened a bug for the third case as it might be more complex. Then, once we have finish this, we can look at moving the cost away from the creation of the IonRuntime. Moving the IonRuntime to the pre-allocated app might not be the right solution to do for workers, as B2G applications are moving towards using more workers in a near future.
This patch check the usecount before using ensureIonCompartmentExists / createThisForFunction (which eagerly compiles with Baseline). This patch tackle the Baseline & Ion issues of the problem.
Attachment #823764 - Flags: review?(kvijayan)
Is fixing this actually going to improve startup times? The first time any function gets called more than 10 times, or any loop goes around more than 10 times, we'll invoke a Baseline compile and have to create the IonRuntime anyway. It'd expect this to happen during start-up anyway. This might delay IonRuntime creation, but probably not enough to push it out of app startup time. Have you tested this to verify whether it shows a gain?
(In reply to Kannan Vijayan [:djvj] from comment #28) > It'd expect this to happen during start-up anyway. Depends on the applications. If the application is running a simple function, there is no reason to add extra cost at the start-up. And this put the IonRuntime where it should be needed instead of being eagerly initialized. > This might delay IonRuntime creation, but probably not enough to push it out > of app startup time. Have you tested this to verify whether it shows a gain? I checked that Ion and Baseline where no longer eagerly initializing the IonRuntime. Now Bug 932039 is doing it. One of the reason why it matters is that currently changing the baseline-usecount trigger is a no-op, and I think we should reduce the impact of this eager initialization.
(In reply to Nicolas B. Pierron [:nbp] from comment #29) > Depends on the applications. If the application is running a simple > function, there is no reason to add extra cost at the start-up. All the apps measured in comment 0 look non-trivial. Do you have any evidence that any of them make it through app load w/o baseline-compiling anything? The last time I looked at JitRuntime::initialize, it seemed like we are creating tons of little stubs. Assuming this is still the majority of the time (I haven't looked since bug 851627 was fixed), that suggests that we lazily generate these stubs. A simple bit of instrumentation could show what % of stubs have been accessed at least once at the end of startup. I'd expect this % to be small.
(In reply to Luke Wagner [:luke] from comment #30) > The last time I looked at JitRuntime::initialize, it seemed like we are > creating tons of little stubs. Yes, this is what is reported under the createIonRuntime. The loop which is creating the VM function seems to take a long amount of time, but even the trampolines are measures in ~1ms, which is an awful amount of time. I am trying to get a profile info on the Linker::newCode, and probably do the same thing for the macro assembler. I bet there is something which is wrong with the way our ARM macro assembler works, and which might cost us extra time for nothing. We should not take ~1ms to generate these trampolines. > Assuming this is still the majority of the > time (I haven't looked since bug 851627 was fixed), that suggests that we > lazily generate these stubs. No VMFunctions are generated lazily, this was the case a long time ago, but it got disabled last year before the time we switch to a parallel compilation. With the arrival of PJS & Baseline the number of VMFunction increased like crazy and we are generating 198 stubs when we initialize the IonRuntime. > A simple bit of instrumentation could show > what % of stubs have been accessed at least once at the end of startup. I'd > expect this % to be small. I bet so too. Still I want to attack the right things first. We have a change to have an highlight on something wrong in our macro assembler / linker, I want to shave this one while it is still under the spot light, as it also matters for the performance of the compilation. After, I we can probably look into doing a lazy-allocation on the first call of VMFunctions, but this would involve way more "Magic".
(In reply to Nicolas B. Pierron [:nbp] from comment #21) > I/Gecko ( 2270): ######################## BrowserElementChildPreload.js > loaded > I/IonBench( 2270): Time inside createIonRuntime: 24.719000 ms > I/IonBench( 2270): Time inside ensureIonCompartmentExists: 24.781000 ms After running a more verbose timing, I noticed that this IonRuntime was initialized on another thread that the main Gecko thread. I recently discuss with fabrice, and it seems that recently the pre-allocated app got a worker thread initialized as part of it, which might explain this timing here. > I/Gecko ( 1683): 1382670818754 Marionette [… b2gperf starts > the Settings app …] > I/IonBench( 2270): Time inside createIonRuntime: 25.849000 ms > I/IonBench( 2270): Time inside ensureIonCompartmentExists: 26.031000 ms And this IonRuntime initialization is done on the main thread. > I/Gecko ( 1683): 1382670821273 Marionette […] > I/Gecko ( 2270): > I/Gecko ( 2270): ###!!! [Child][OnMaybeDequeueOne] Error: Channel closing: > too late to send/recv, messages will be lost > I/Gecko ( 2270):
Depends on: 932627
Comment on attachment 823764 [details] [diff] [review] Remove eager IonRuntime creation from canCompile functions. Review of attachment 823764 [details] [diff] [review]: ----------------------------------------------------------------- Code seems reasonable enough to me.
Attachment #823764 - Flags: review?(kvijayan) → review+
Is this patch ready to land? Is there more work that needs to be done?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Dave Huseby [:huseby] from comment #34) > Is this patch ready to land? Is there more work that needs to be done? It should be ready to land, but this is just a sanity patch which does not solve this issue. I tried to fix it as part of Bug 932627, but it appears that what I solved is not reported by b2gperf. So, for the moment, I have no clear understanding of what is measured and what is reported by b2gperf. I guessed something which is now fixed in Bug 932627, but this does not seems to be the offender of this issue.
Flags: needinfo?(nicolas.b.pierron)
Depends on: 944690
Comment on attachment 823764 [details] [diff] [review] Remove eager IonRuntime creation from canCompile functions. Obsolete => Attachement moved to Bug 944690 to be landed.
Attachment #823764 - Attachment is obsolete: true
blocking-b2g: 1.3? → -
Hi Nicolas, This bug came across our triage. We just had a few questions: 1) Is this a regression in startup time since 1.1 because baseline didn't land? 2) Is ionmonkey compiling large amounts of code on startup and is the cost of compilation negating any performance gains? 3) Any idea how much bug 932627 should help with startup time? Thanks!
Flags: needinfo?(nicolas.b.pierron)
(In reply to Mason Chang [:mchang] from comment #37) > 1) Is this a regression in startup time since 1.1 because baseline didn't > land? It seems that there is a regression which appears when jits are enabled, and which appear in applications such as the Settings application (which does not rely on IonMonkey). > 2) Is ionmonkey compiling large amounts of code on startup and is the cost > of compilation negating any performance gains? As of today, yes. This cost 25ms for each JSRuntime, as soon as we start doing the first "new", or as soon as we start compiling the first for loop. > 3) Any idea how much bug 932627 should help with startup time? Based on b2gperf, it does not change the start-up time :( Then I have no clue what is measured by b2gperf. I have not checked the memory impact of Bug 932627. As the code is not supposed to be mutable, we could in fact do the opposite and run this before we fork, as keep these generated code in shared memory for all applications.
Flags: needinfo?(nicolas.b.pierron)
Priority: -- → P2
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: