Closed Bug 834622 Opened 12 years ago Closed 7 years ago

[meta] More Gecko startup win potpourri

Categories

(Core :: General, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cjones, Unassigned)

References

(Depends on 2 open bugs)

Details

(Keywords: meta, perf, Whiteboard: [c=progress p= s= u=])

Attachments

(7 files, 10 obsolete files)

(deleted), image/jpeg
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), image/jpeg
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
I applied the patch in bug 797189 that I (half forgot I) made a while ago to drop nice sample labels onto IPC work, and the scales immediately fell from my eyes! The following profile of Template app startup is captured with 1ms sample interval in a 10s buffer http://people.mozilla.com/~bgirard/cleopatra/#report=65a7832015a3c3e3335f17e322da8f3a5c150b6b There's quite a bit we can do here. Roughly speaking, in units of samples that are 2ms+ each, the time is going to these significant things (in chronological order) 21: something to do with setting app ID 24: blocked on parent in NotifyIMEFocus() (???) 51: running BrowserElementChild.js 35: compiling two chrome JS scripts 5: running forms.js 24: running UserAgentOverrides.jsm 21: something to do with AppProtocolHandler.js Now we've actually started loading content. We spent about 181 samples to get there, which is around 350ms. 6: more BrowserElementChild.js stuff; setVisible/nextPaintListener 5: UpdateDimensions; first layout flush 7: TabChild::RecvActivate (focus), mostly in forms.js 26: nsInputStreamPump::OnStateStart, broken down like so 6: BrowserElementChild.js and AppsService.js stuff (maybe different) 5: blocked on parent getting DPI (???) 15: potpourri 16: flushing results of HTML parsing, including selector matching 20: first refresh driver tick, second reflow and first paint 7: style computation 5: BrowserElementChild.js titleChangedHandler() (???) 28: second refresh driver tick, broken down like so 20: painting 7: BrowserElementChild.js onMozAfterPaint() handler At this point the page should be fully loaded and painted. We spend a bit of time sitting in the event loop after that, then the screenshotting code runs. We have a lot of winning we can do here.
Mostly a checkpoint. This gets almost all of the PBrowser::RecvLoadRemoteScript samples out of the profile, 66 -> 5. This is theoretically a 100ms+ win, and I maybe measure 40ms-100ms win on stopwatch but it's tricky. http://people.mozilla.com/~bgirard/cleopatra/#report=311208ba64bfc846c5e1f558e757faaea95aca19
One more thing that "ain't it" before I forget: I modified the Template application.zip to include a giant mp3 (9.5MB), and it didn't impact startup time at all. So looks like our JAR reading code is doing a good job.
Attached patch Changes to profiler only (obsolete) (deleted) — Splinter Review
Attachment #706299 - Attachment is obsolete: true
Attached patch Precompile some more scripts (obsolete) (deleted) — Splinter Review
Initial wins.
The rebased profile was rebuilt without --disable-elf-hack, and now the resolved libxul.so symbols look less believable. Known issue?
> Known issue? Yes.
Attached patch Patch from fabrice to speed up RecvPBrowserCtor (obsolete) (deleted) — Splinter Review
This seems to shave 5-8 samples off the processing there, but I don't have enough data yet to know if that's statistically significant.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6) > The rebased profile was rebuilt without --disable-elf-hack, and now the > resolved libxul.so symbols look less believable. Known issue? Seems to be. See bug 827846 There is a bionic patch which apparently fixes things.
We have three or four synchronous IPC's during PBrowser::RecvShow() that are causing us to hang on the b2g process for a nontrivial number of cycles (25-50). Two of those calls are necessary and can't really be avoided, but this one is totally pointless in b2g. For some reason, this call seemed to be sticking more than the others, so a couple profiles showed it knocking off ~20 samples. Take with a grain of salt though.
Stepping back a bit, the biggest remaining offenders are 50-60 samples: RecvShow -> BrowserElementChild.js init 20-30 samples: sometimes we see 1 refresh driver tick, and we spend ~20 samples in reflow+painting stuff. Sometimes we see 3 ticks and spend ~50 samples there. We shouldn't do unnecessary work here. ~25 samples: nsInputStreamPump::OnInputStreamReady. We do a variety of things here and I don't see any low-hanging fruit. ~20 samples: flushing HTML parse state. This seems high for a trivial test like the Template app, but there's probably not a lot we can feasibly do here. But, there's not much left here that's OOP specific. Will compare against in-process load of Template.
It's very hard to measure any difference by stopwatch with Template in- vs. out-of-process. This may be due to regressions in the gaia window manager.
Quick notes - average sample interval is a bit over 2ms for OOP, about 1.3ms for in-process. So each OOP sample is about 50% more clock time. - BEC startup seems to take about the same amount of *samples* in/out-of-process - gaia window_manager is triggering a stupid-expensive sync reflow in setDisplayedApp(), apparently caused by var cssHeight = window.innerHeight - StatusBar.height; ^^^ this I'm not 100% sure we've already started the child load when that happens, but we should have. - we're processing *12* expensive refresh driver ticks, each pretty expensive. The rest is a bit cloudier because the samples aren't organized as neatly, but - InputStream, html5 parsing, and CSS style computation "stuff" seems to consume about the same number of samples in/OOP - loading the app screenshot shows up in the profile
A bit earlier back in the tap-icon phase of launch, - system app is chewing ~30 samples processing touch events that seem to be eventually forwarded to the Homescreen process. That's a bit odd and probably adds ~50ms latency to Homescreen processing the tap. - after the touchend is sent to Homescreen, b2g process sits idle for ~50 samples before it gets the WebApps:launchApp message. Seems to be ~75ms that Homescreen is doing ... something. - homescreen profile shows ~60 samples between touchstart and WebApps::launch. (Homescreen samples come in at ~1.3ms apart.) - homescreen hits a pretty exensive refresh driver tick (~40 samples) after touchstart but before touchend. Most likely the app icon being highlighted. - homescreen spends another ~40 samples reflowing and restyling after sending launch(). Probably resetting the icon tap state. This is happening while the b2g process starts chewing CPU so probably appears more expensive than it really is. - homescreen sits at SendGetInputContext() for *174* samples after the post-launch reflow/restyle. I have no idea what's happening here, and I don't think it's necessary. - homescreen spends ~20 samples in something CSP-related around an nsInputStreamPump::OnInputStreamReady. Maybe this is the code actually fetching the highlighted icon? - then homescreen finishes off with another expensive refresh driver tick
blocking-b2g: --- → tef?
Whiteboard: [FFOS_perf]
After some more gecko fixes, the parts of the Homescreen app on the critical startup path are looking /OK/, but some room for improvement. - touchstart is handled quickly by Homescreen. - ... but then BrowserElementScrolling does 20-30ms unnecessary work. Bug 834996. - refresh driver tick. Reflow and repaint. 40-50ms. Not sure why this is happening yet. - touchmove dispatched. mouse_event_shim seems to be doing ~10ms unnecessary work, bug 834997. - homescreen handles synthetic mousemove quickly. - touchend dispatched. Homescreen takes about 7ms to call launch(). Based on the profile, no idle event-loop time, it appears that the touchmove and touchend are sitting in the event queue. So processing those more quickly would shorten the critical path here, but by how much I can't say without better instrumentation.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #16) > Based on the profile, no idle event-loop time, it appears that the touchmove > and touchend are sitting in the event queue. So processing those more > quickly would shorten the critical path here, but by how much I can't say > without better instrumentation. The main b2g process sits idle for a little over 40ms between sending touchend and receiving WebApps:launchApp(). So we may have ~30ms to win here.
The b2g process is not behaving as well as the homescreen. Rough breakdown is - small amount of useless work on touchend, ~5ms. Then sends to homescreen. - receives launchApp(). Then lots of fun stuff happens, total of about 200 samples ~250ms - eventually dispatch to gaia window_manager.js - 60 samples ~75ms in manifest manipulation (bug 834999) - 75 samples ~85ms in BrowserElementParent and other iframe-related C++ init (bug 835001) - 40 samples ~50ms mostly doing a sync reflow forced by |StatusBar.height| in gaia setDisplayedApp() We SendLoadURL() to the app process just before the sync reflow, but everything else is on the critical path. - after that a smattering of refresh driver ticks and JS callbacks, mostly cheap though one tick is 20 samples ~25ms - ~30 samples ~35ms synchronously decoding an image on the main thread (app screenshot?), interrupted by refresh driver tick - another relatively expensive refresh driver tick, ~15 samples - relatively expensive repaint, 30 samples - expensive transitionend handler, 45 samples The last four aren't on the critical startup path but will compete for and win CPU over the loading app process.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8) > Created attachment 706633 [details] [diff] [review] > Patch from fabrice to speed up RecvPBrowserCtor > > This seems to shave 5-8 samples off the processing there, but I don't have > enough data yet to know if that's statistically significant. From what I've seen so far this is on the borderline of significant so this patch is probably worth taking.
Summary: More Gecko startup win potpourri → [meta] More Gecko startup win potpourri
If this is indeed a meta bug we won't block on it but if not please remove [meta] and re-nom. I don't know if we have a clear bar for perf work to be blockers but nomming perf fixes is very much appreciated.
blocking-b2g: tef? → ---
Keywords: meta
Maybe BenWa will be able to help figure out what's wrong here.
Comment on attachment 706605 [details] [diff] [review] Precompile some more scripts Tracked in bug 835584 now.
Attachment #706605 - Attachment is obsolete: true
Attached patch Backport bug 773428 part 1 (obsolete) (deleted) — Splinter Review
Still not getting markers in my profile ... but this is more for other folks's benefit, so I'm going to bail for now.
Comment on attachment 706649 [details] [diff] [review] Don't initialize cross-process IME when we don't need it Tracked in bug 835591 now.
Attachment #706649 - Attachment is obsolete: true
Comment on attachment 707288 [details] [diff] [review] Backport bug 773428 part 2 to b2g18 Tracked in bug 835592 .
Attachment #707288 - Attachment is obsolete: true
Comment on attachment 707298 [details] [diff] [review] Apparently nonfunctional patch to add sample markers to critical startup path Tracked in bug 835592 .
Attachment #707298 - Attachment is obsolete: true
Comment on attachment 707329 [details] [diff] [review] Backport bug 773428 part 1 Tracked in bug 835592 .
Attachment #707329 - Attachment is obsolete: true
Depends on: 835596
before: 0.94, 0.97, 0.91 after: 0.85, 0.75, 0.88 My stopwatch timings are getting extremely difficult to do, but we should be sitting on ~100ms win so far.
Comment on attachment 706633 [details] [diff] [review] Patch from fabrice to speed up RecvPBrowserCtor Tracked in bug 835596 .
Attachment #706633 - Attachment is obsolete: true
With some manual annotations on relevant startup methods, I see 01-29 01:38:54.015 I/Gecko ( 1584): STARTUP: TabParent::SendLoadURL 01-29 01:38:54.235 I/Gecko ( 1666): STARTUP: TabChild::RecvLoadURL 01-29 01:39:25.375 I/Gecko ( 1584): STARTUP: TabParent::SendLoadURL 01-29 01:39:25.556 I/Gecko ( 1684): STARTUP: TabChild::RecvLoadURL so the TabParent is sending LoadURL about 200ms before TabChild starts processing it. The startup "stuff" before that mostly runs in parallel, so we should look for ways to rearrange things so that we can start processing LoadURL earlier. Unfortunately, the child process isn't idle during any of this time. But let me check when we reset the new app to foreground priority ...
This timeline starts at PContent:SendBrowserCtor. The "stuff" on the critical path in the b2g process before this diagram totals for about 90 samples, or ~110ms. (Will be a bit lower outside the profiler.) This diagram shows the b2g process finishing its work up to TabParent::SendLoadURL() relatively quickly, ~30ms. It continues to chew a lot of CPU after its last entry on the diagram here though. The app process is twiddling its thumbs when it finally gets TabChild::RecvBrowserCtor(); it starts processing that in the same ms it's sent. But then TabChild gets backed up for a long time on TabChild::RecvShow() (BrowserElementChild initialization mainly). The LoadURL message is sitting in the content process's IPC queue for almost 200ms before the content process dequeues it. So by my count, we have about 320ms of garbage work on the critical path to TabChild::RecvLoadURL(). At a high level, the wins will come from - make BrowserElementParent/nsFrameLoader init faster - make TabChild::RecvShow()/BrowserElementChild init faster Those get more CPU cycles off the current critical path. We can try to reshape the critical path by starting the SendBrowserCtor/SendLoadURL code much earlier in the b2g process. Preferably, before we do the BrowserElementParent init. Maybe we can rejigger gaia to help with this. If we could reorder the pipeline to look like b2g | SendPBrowserCtor/SendLoadURL | | (expensive post-setup) App | | .../RecvLoadURL | (load content) Then we can kick off the load of app:// resources as early in startup as possible, and hopefully hide the IO latency there behind the CPU bound b2g process "(expensive post-setup)", which includes BEP init and starting the app-open animation.
Attached patch Startup logging (deleted) — Splinter Review
Patch used to generate above diagram, along with |adb logcat -v time|.
I don't think it matters based on the timeline above, but FTR the preallocated process is running with background priority during its early startup work through and beyond RecvLoadURL(). I think jlebar fixed that, so I should probably rebase everything now ...
Indeed, we now load the app with fg priority, but as expected the stopwatch timings don't change.
The b2g process profile during startup didn't change at all http://people.mozilla.com/~bgirard/cleopatra/#report=dfb87035451fbaf36d064d2957903b6de2b6ad07 but the Template app profile looks a lot different. There are now 370 samples in the profile vs. 260 samples before, however the stopwatch results are identical. http://people.mozilla.com/~bgirard/cleopatra/#report=427594015c91a5fedfa918262af7c4767efa68c5 Now that the Template process runs with higher priority during startup, it appears that it's doing a "hurry up and wait" (as they say in the Army) to locations it blocks on the b2g process. So the end result is that the Template process gets more time slices to capture a profiler sample, but the critical path is still about the same.
Update: the patch from bug 835548 (along with numerous others applied) gets us pretty consistently under 800ms startup time. So we're closing in on 200ms win now.
It looks like we might not be drawing app-frame pixels to screen until 200-300ms after they're ready. However, I need to measure more carefully. That would be a nice win!
Attached patch Rollup of wins so far (obsolete) (deleted) — Splinter Review
I went to clean up a branch and replicate results and have better news. I had left in a patch that did some logging on critical paths and measuring without improves results measurably. We're now around 750ms for Template startup on otoro. Rollup patch is included, which includes these commits $ git log -11 commit b6d61fdecfb1fc39fe1907f05a1246a093c2f40c Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:43:10 2013 -0800 bug 835641: only send updatedims when they change commit 0ffc6c3ac15c7aafbb79e579e8f8549de9577a0e Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:38:17 2013 -0800 bug 835591: remove cross-process ime commit c9366b9cb1c021792856f0d2b763201da29906cc Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:36:49 2013 -0800 EXPERIMENTAL bug 835681: remove sync from bec init commit 815cf7a009bc6f83d91e9028a0c79dce72ab20c4 Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:35:34 2013 -0800 bug 835679: improve getting dpi and renderframe ctor commit 5ec8ed109ee5a198d168ec21ce94b7b8987d377e Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:25:54 2013 -0800 bug 835596: speed up getappbylocalid commit 0438acdfdf9d704c5f999dbdf1abb75d2ea99568 Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:24:50 2013 -0800 bug 835584: precompile moar things commit 1758a5c110e8008d75c728841b563bbb46ffebfc Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:23:04 2013 -0800 bug 835548: make browserelement* init cheaper commit ed1ab4af4e8899ec7a253e86e4b758394166b0b8 Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:21:16 2013 -0800 bug 835013: make appprotocolhandler not block on parent commit 50daada643a5f3684d7aa968f4f17ba4aca3acb6 Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 20:01:51 2013 -0800 bug 834999: make manifest wrapping cheaper commit 1869741e6d04dcab0c680f34d0eef4b44fddf877 Author: Chris Jones <jones.chris.g@gmail.com> Date: Tue Jan 29 19:58:13 2013 -0800 bug 834622: better startup info from profiler Based on commit cedbc43dfa7602f37ba451d2bdb6c3536fea823b Author: Doug Turner <dougt@dougt.org> Date: Tue Jan 22 21:26:00 2013 -0800 Bug 833178 - Increase Stream Transport thread pool max threads. r=bsmith, a=overholt
(In reply to Chris Jones [:cjones] [:warhammer] from comment #38) > It looks like we might not be drawing app-frame pixels to screen until > 200-300ms after they're ready. However, I need to measure more carefully. > That would be a nice win! Better data is pointing this way, but to a slightly lesser extent. 01-30 08:34:14.407 I/Gecko (20198): STARTUP: child process blocking txn posted for 0 01-30 08:34:14.407 I/Gecko (20198): STARTUP: resolving child process tree 2 [ touchstart? icon -> blue highlight? ] -- snip -- 01-30 08:34:14.757 I/Gecko (20198): STARTUP: child process blocking txn posted for 4 [ first paint of new process ~350ms later; not sure what was painted though ] 01-30 08:34:14.757 I/Gecko (20198): STARTUP: resolving child process tree 2 01-30 08:34:14.777 I/Gecko (20198): STARTUP: resolving child process tree 2 01-30 08:34:14.787 I/Gecko (20198): STARTUP: child process blocking txn posted for 0 01-30 08:34:14.787 I/Gecko (20198): STARTUP: resolving child process tree 2 -- snip -- 01-30 08:34:14.787 I/Gecko (20198): STARTUP: resolving child process tree 4 [ new process pixels in layer tree, ~480ms later; not sure if actually composited ] -- snip -- 01-30 08:34:14.968 I/Gecko (20198): STARTUP: child process blocking txn posted for 4 [ second paint of new process ~560ms later; also not sure what was painted ] -- snip -- 01-30 08:34:14.968 I/Gecko (20198): STARTUP: resolving child process tree 4 [ new process pixels in layer tree, ~560ms later; not sure if actually composited ] -- snip -- 01-30 08:34:15.008 I/Gecko (20198): STARTUP: child process blocking txn posted for 0 01-30 08:34:15.008 I/Gecko (20198): STARTUP: resolving child process tree 4 [ maybe first sample seen onscreen, allowing for reaction lag; ~600ms later ] -- snip -- 01-30 08:34:15.118 I/Gecko (20198): STARTUP: child process blocking txn posted for 0 01-30 08:34:15.118 I/Gecko (20198): STARTUP: resolving child process tree 4 [ first sample unequivocably seen onscreen; ~700ms later ] The "unequivocably seen onscreen" sample was me eyeballing the phone while watching logcat. What this data shows is that between the first composite after launching Template, and the /last/ repaint of Template before I'm sure it was onscreen, ~560ms elapsed. There was one paint of Template before that, so judging by bug 836198 the second paint was the first "real paint". So that means it takes us ~560ms from ~touchstart to compositor seeing first real Template pixels painted. Not great but getting there. Then we have another 150ms before I'm 100% sure those Template pixels made it to screen. If I guess that's the right tick and add that to the almost real delay to paint, 560ms, and then add in a little fudge for the human reaction time to press "Stop" on a stopwatch, then we're around the same times I'm seeing in comment 39. So my guess is that gaia + gecko is adding about 150ms overhead to putting the pixels we've already painted on startup to screen.
Attached image Diagram of data in comment 40 (deleted) —
The highlighted blue area (if you can distinguish that in the horrific white balance) is the contested delay.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #39) > Created attachment 707993 [details] [diff] [review] > Rollup of wins so far > > I went to clean up a branch and replicate results and have better news. I > had left in a patch that did some logging on critical paths and measuring > without improves results measurably. With the patch applied the homescreen never ends loading with: E/profiler( 352): Registering start signal E/GeckoConsole( 352): [JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import]" {file: "jar:file:///system/b2g/omni.ja!/components/BrowserElementParent.js" line: 14}] E/GeckoConsole( 352): [JavaScript Warning: "Unknown property '-moz-align-self'. Declaration dropped." {file: "resource://gre-resources/ua.css" line: 44}]
Do you have a BrowserElementParent.jsm in your tree?
With bug 836605 knocked out, the only significant time in one spot on the critical path to PBrowser::SendLoadURL() in the parent process will be 10 samples or so in BrowserElementParentFactory.observe(). There'll of course still be a smattering of samples distributed about.
Keywords: perf
No longer depends on: 835001
I've been wondering my stopwatch timings have been getting increasingly unreliable, and I just accidentally figured out why: if I load Template, put it into the background, and then tap its icon again (just putting it into foreground, not relaunching) I measured a 0.63s startup time. Right now we're just under 0.75s for cold start, so we're going to need better tools to measure firstpaint wins much below that.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #45) > we're going to need better tools > to measure firstpaint wins much below that. We also should soon stop using Template app as a reference "minimum overhead" app since it's startup characteristics are becoming less of a predictor of what happens with most apps. For example, I was able to get Template startup below 300ms (another 200ms win over reference build) by just disabling setFrameBackground (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/window_manager.js#L493), but didn't manage to reproduce any win on any app with actual scripts/links and content.
That definition of startup isn't what's being measured here. That said, I don't necessarily disagree, but launching a Hello World app in 750ms is still not acceptable.
NBB!: for this patch to be meaningful, you need to have the fix for bug 836198 in your tree. This resets a stopwatch every time gecko sees touchstart (as close to driver as we can see it), and then when a new layer subtree is seen, the elapsed time is reported. This corroborates the analysis of comment 40 (but much more simply with bug 836198 gone!): Template's layer tree appears at the compositor quite noticeably before it comes onscreen. This is lag due to gaia. The first-layer-tree-appears-in-compositor time for Template is right around 500ms. With appropriate gaia fixes, we can make that the real firstpaint time.
Would it make sense to use this number for apploadtime event (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/window_manager.js#L1030) ? Also, is firstPaint always with DOM painted? When I was testing using mozbrowserfirstpaint event reporter, I think I got the event fired with blank screen appearing and the DOM has been painted later. (same for any imgs within the DOM)
firstpaint is not always with DOM painted. Because of the asynchronicity of painting, it's hard to (read, I don't know how to) fire a "first paint" event that guarantees that DOM was painted.
Oh, sorry, when I said "firstpaint" I meant, "first display of app pixels on screen", not mozbrowsefirstpaint. Sorry for confusing things. (In reply to Zbigniew Braniecki [:gandalf] from comment #49) > Would it make sense to use this number for apploadtime event > (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/ > window_manager.js#L1030) ? The number reported by the above patch relies on measuring after a touchstart, so it's a little more specific than what code wants, I think. The patch is geared towards measuring launching off of user input. > Also, is firstPaint always with DOM painted? When I was testing using > mozbrowserfirstpaint event reporter, I think I got the event fired with > blank screen appearing and the DOM has been painted later. (same for any > imgs within the DOM) The definition of "DOM painted" is not very precise, unfortunately :(. Which is why writing this kind of measuring code is hard.
(In reply to Zbigniew Braniecki [:gandalf] from comment #42) > (In reply to Chris Jones [:cjones] [:warhammer] from comment #39) > > Created attachment 707993 [details] [diff] [review] > > Rollup of wins so far > > > > I went to clean up a branch and replicate results and have better news. I > > had left in a patch that did some logging on critical paths and measuring > > without improves results measurably. > > With the patch applied the homescreen never ends loading with: Building on a different machine, I reproduce this now.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #52) > Building on a different machine, I reproduce this now. https://bugzilla.mozilla.org/show_bug.cgi?id=835809#c10 - it may be this
It wasn't filling the full frame, so many (most?) of my stopwatch timings were actually counting when gaia replaced the screenshot with the white background! (The "Implement me" text got a layer sized to it, ~100x20 pixels.) With this change, I measure startup at ~660-690ms on vanilla gecko-18, and ~540ms with all dependent patches applied. Getting there!
This measurement had a bug: it was counting the first time the subtree was *resolved*, but not the first time it had a leaf layer that was composited. I found out in bug 837881 that we sometimes get a few frames with just a container layer, no leaf layers. The gecko profiler wasn't catching these txns. With this patch, we're counting startup more accurately. The reported times are within 100ms or so of what my stopwatch is measuring, and the appears-in-logcat to seen-with-eye lag is hard to notice.
Attachment #708941 - Attachment is obsolete: true
(In reply to Zbigniew Braniecki [:gandalf] from comment #46) > We also should soon stop using Template app as a reference "minimum > overhead" app since it's startup characteristics are becoming less of a > predictor of what happens with most apps. > For example, I was able to get Template startup below 300ms (another 200ms > win over reference build) by just disabling setFrameBackground > (https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/ > window_manager.js#L493), but didn't manage to reproduce any win on any app > with actual scripts/links and content. FWIW I ran a (quick) experiment similar to bug 837881 comment 15 with the Dialer and saw very similar results to what I saw with Template. So it'd be good to understand what's different in our testing setups.
No longer depends on: 841760
Attached patch Changes to profiler, rebased (deleted) — Splinter Review
Attachment #706602 - Attachment is obsolete: true
Whiteboard: [FFOS_perf] → [FFOS_perf] c=
Whiteboard: [FFOS_perf] c= → [c=progress p= s= u=]
Blocks: 923949
Priority: -- → P2
Closing this old B2G bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: