Closed Bug 834622 Opened 12 years ago Closed 7 years ago

[meta] More Gecko startup win potpourri

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: cjones, Unassigned)

References

(Depends on 2 open bugs)

Details

(Keywords: meta, perf, Whiteboard: [c=progress p= s= u=])

Attachments

(7 files, 10 obsolete files)

Profiler changes, bug 819000, and some initial wins 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Changes to profiler only 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Precompile some more scripts 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Patch from fabrice to speed up RecvPBrowserCtor 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Don't initialize cross-process IME when we don't need it 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Backport bug 773428 part 2 to b2g18 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Apparently nonfunctional patch to add sample markers to critical startup path 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Backport bug 773428 part 1 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Timeline of critical path to TabChild:LoadURL 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), image/jpeg		Details
Startup logging 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Rollup of wins so far 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Diagram of data in comment 40 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), image/jpeg		Details
Report firstpaint time as accurately as gecko can 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Rollup of dependent bugs (based on cedbc43dfa7602f37ba451d2bdb6c3536fea823b) 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Patch to Template to make it easier to measure 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Report time from touchstart to first-composite as accurately as gecko can 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review
Changes to profiler, rebased 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me (deleted), patch		Details \| Diff \| Splinter Review

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Description

•

12 years ago

I applied the patch in bug 797189 that I (half forgot I) made a while ago to drop nice sample labels onto IPC work, and the scales immediately fell from my eyes! The following profile of Template app startup is captured with 1ms sample interval in a 10s buffer http://people.mozilla.com/~bgirard/cleopatra/#report=65a7832015a3c3e3335f17e322da8f3a5c150b6b There's quite a bit we can do here. Roughly speaking, in units of samples that are 2ms+ each, the time is going to these significant things (in chronological order) 21: something to do with setting app ID 24: blocked on parent in NotifyIMEFocus() (???) 51: running BrowserElementChild.js 35: compiling two chrome JS scripts 5: running forms.js 24: running UserAgentOverrides.jsm 21: something to do with AppProtocolHandler.js Now we've actually started loading content. We spent about 181 samples to get there, which is around 350ms. 6: more BrowserElementChild.js stuff; setVisible/nextPaintListener 5: UpdateDimensions; first layout flush 7: TabChild::RecvActivate (focus), mostly in forms.js 26: nsInputStreamPump::OnStateStart, broken down like so 6: BrowserElementChild.js and AppsService.js stuff (maybe different) 5: blocked on parent getting DPI (???) 15: potpourri 16: flushing results of HTML parsing, including selector matching 20: first refresh driver tick, second reflow and first paint 7: style computation 5: BrowserElementChild.js titleChangedHandler() (???) 28: second refresh driver tick, broken down like so 20: painting 7: BrowserElementChild.js onMozAfterPaint() handler At this point the page should be fully loaded and painted. We spend a bit of time sitting in the event loop after that, then the screenshotting code runs. We have a lot of winning we can do here.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 1

•

12 years ago

Attached patch Profiler changes, bug 819000, and some initial wins (obsolete) (deleted) — Details — Splinter Review

Mostly a checkpoint. This gets almost all of the PBrowser::RecvLoadRemoteScript samples out of the profile, 66 -> 5. This is theoretically a 100ms+ win, and I maybe measure 40ms-100ms win on stopwatch but it's tricky. http://people.mozilla.com/~bgirard/cleopatra/#report=311208ba64bfc846c5e1f558e757faaea95aca19

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 2

•

12 years ago

One more thing that "ain't it" before I forget: I modified the Template application.zip to include a giant mp3 (9.5MB), and it didn't impact startup time at all. So looks like our JAR reading code is doing a good job.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 3

•

12 years ago

Attached patch Changes to profiler only (obsolete) (deleted) — Details — Splinter Review

Attachment #706299 - Attachment is obsolete: true

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 4

•

12 years ago

Attached patch Precompile some more scripts (obsolete) (deleted) — Details — Splinter Review

Initial wins.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 5

•

12 years ago

Rebased profile with above patches applied http://people.mozilla.com/~bgirard/cleopatra/#report=d8156a72df7c213ed9821f8f29b31d4e29dd9cc3

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 6

•

12 years ago

The rebased profile was rebuilt without --disable-elf-hack, and now the resolved libxul.so symbols look less believable. Known issue?

Justin Lebar (not reading bugmail)

Comment 7

•

12 years ago

> Known issue? Yes.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 8

•

12 years ago

Attached patch Patch from fabrice to speed up RecvPBrowserCtor (obsolete) (deleted) — Details — Splinter Review

This seems to shave 5-8 samples off the processing there, but I don't have enough data yet to know if that's statistically significant.

Dave Hylands [:dhylands]

Comment 9

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #6) > The rebased profile was rebuilt without --disable-elf-hack, and now the > resolved libxul.so symbols look less believable. Known issue? Seems to be. See bug 827846 There is a bionic patch which apparently fixes things.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 10

•

12 years ago

Attached patch Don't initialize cross-process IME when we don't need it (obsolete) (deleted) — Details — Splinter Review

We have three or four synchronous IPC's during PBrowser::RecvShow() that are causing us to hang on the b2g process for a nontrivial number of cycles (25-50). Two of those calls are necessary and can't really be avoided, but this one is totally pointless in b2g. For some reason, this call seemed to be sticking more than the others, so a couple profiles showed it knocking off ~20 samples. Take with a grain of salt though.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 11

•

12 years ago

Stepping back a bit, the biggest remaining offenders are 50-60 samples: RecvShow -> BrowserElementChild.js init 20-30 samples: sometimes we see 1 refresh driver tick, and we spend ~20 samples in reflow+painting stuff. Sometimes we see 3 ticks and spend ~50 samples there. We shouldn't do unnecessary work here. ~25 samples: nsInputStreamPump::OnInputStreamReady. We do a variety of things here and I don't see any low-hanging fruit. ~20 samples: flushing HTML parse state. This seems high for a trivial test like the Template app, but there's probably not a lot we can feasibly do here. But, there's not much left here that's OOP specific. Will compare against in-process load of Template.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 12

•

12 years ago

It's very hard to measure any difference by stopwatch with Template in- vs. out-of-process. This may be due to regressions in the gaia window manager.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 13

•

12 years ago

Hmmmm lots of good stuff here ... out-of-process (with 3 refresh driver ticks behavior) http://people.mozilla.com/~bgirard/cleopatra/#report=3cb37bcffa8e2b4cf4e8cf332ce5f8acec2f9a08 in-process http://people.mozilla.com/~bgirard/cleopatra/#report=6e2d6a5b5523b6e5748f324716a32e06c185688b

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 14

•

12 years ago

Quick notes - average sample interval is a bit over 2ms for OOP, about 1.3ms for in-process. So each OOP sample is about 50% more clock time. - BEC startup seems to take about the same amount of *samples* in/out-of-process - gaia window_manager is triggering a stupid-expensive sync reflow in setDisplayedApp(), apparently caused by var cssHeight = window.innerHeight - StatusBar.height; ^^^ this I'm not 100% sure we've already started the child load when that happens, but we should have. - we're processing *12* expensive refresh driver ticks, each pretty expensive. The rest is a bit cloudier because the samples aren't organized as neatly, but - InputStream, html5 parsing, and CSS style computation "stuff" seems to consume about the same number of samples in/OOP - loading the app screenshot shows up in the profile

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 15

•

12 years ago

A bit earlier back in the tap-icon phase of launch, - system app is chewing ~30 samples processing touch events that seem to be eventually forwarded to the Homescreen process. That's a bit odd and probably adds ~50ms latency to Homescreen processing the tap. - after the touchend is sent to Homescreen, b2g process sits idle for ~50 samples before it gets the WebApps:launchApp message. Seems to be ~75ms that Homescreen is doing ... something. - homescreen profile shows ~60 samples between touchstart and WebApps::launch. (Homescreen samples come in at ~1.3ms apart.) - homescreen hits a pretty exensive refresh driver tick (~40 samples) after touchstart but before touchend. Most likely the app icon being highlighted. - homescreen spends another ~40 samples reflowing and restyling after sending launch(). Probably resetting the icon tap state. This is happening while the b2g process starts chewing CPU so probably appears more expensive than it really is. - homescreen sits at SendGetInputContext() for *174* samples after the post-launch reflow/restyle. I have no idea what's happening here, and I don't think it's necessary. - homescreen spends ~20 samples in something CSP-related around an nsInputStreamPump::OnInputStreamReady. Maybe this is the code actually fetching the highlighted icon? - then homescreen finishes off with another expensive refresh driver tick