1259770 - Session restore regressed 400%-500% under e10s with changes in bug 1245891

Reporter

Description

•

9 years ago

https://treeherder.mozilla.org/perf.html#/e10s?filter=sessionrestore David, any ideas on what caused this? FWIW this will likely block e10s rollout so we'll need to figure this out.

Flags: needinfo?(dteller)

Jim Mathies [:jimm]

Reporter

Updated

•

9 years ago

Flags: needinfo?(dteller)

Brad Lassey [:blassey] (use needinfo?)

Updated

•

9 years ago

Blocks: e10s-perf

tracking-e10s: ? → +

Priority: -- → P3

Jim Mathies [:jimm]

Reporter

Comment 1

•

9 years ago

e10s team triaged this. This test appears to measure non-default behavior, session restore without restore on demand. Therefor it doesn't block e10s rollout.

David Teller [:Yoric] - still alive but not very active

Comment 2

•

9 years ago

The first test, `sessionrestore`, measures the behavior in which we automatically restore the session. The second test, `sessionrestore_no_auto_restore`, measures the behavior in which we load the session but do not restore it. **Neither test changes the preference that determines whether tabs are restored on demand.** The measurement methodology for this test has changed on 2016-03-16, so changes on that day are 100% expected. I have checked again: the numbers I observe on my machine for Session Restore both with and without e10s seem to match what I am seeing on screen. I do not witness noticeable slowdowns in real usage with e10s wrt without, but the variance is so huge that it's hard to be sure.

Flags: needinfo?(jmathies)

Mike Conley (:mconley) (:⚙️)

Comment 3

•

9 years ago

(In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from comment #2) > The first test, `sessionrestore`, measures the behavior in which we > automatically restore the session. The second test, > `sessionrestore_no_auto_restore`, measures the behavior in which we load the > session but do not restore it. > > **Neither test changes the preference that determines whether tabs are > restored on demand.** > Ah, that was my mistake. I assumed both sessionrestore and sessionrestore_no_auto_restore were both doing tab/window restoration, but that sessionrestore_no_auto_restore did restore on demand. I understand now - sessionrestore will restore the windows / tabs using the default pref (so it's restoring on demand). The sessionrestore_no_auto_restore is just until we read the session info off the disk and fire sessionRestored. So I actually do think our users will get impacted by this, and this should be investigated.

Jim Mathies [:jimm]

Reporter

Comment 4

•

9 years ago

(In reply to Mike Conley (:mconley) - Needinfo me! from comment #3) > (In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from > comment #2) > > The first test, `sessionrestore`, measures the behavior in which we > > automatically restore the session. The second test, > > `sessionrestore_no_auto_restore`, measures the behavior in which we load the > > session but do not restore it. > > > > **Neither test changes the preference that determines whether tabs are > > restored on demand.** > > > > Ah, that was my mistake. I assumed both sessionrestore and > sessionrestore_no_auto_restore were both doing tab/window restoration, but > that sessionrestore_no_auto_restore did restore on demand. > > I understand now - sessionrestore will restore the windows / tabs using the > default pref (so it's restoring on demand). The > sessionrestore_no_auto_restore is just until we read the session info off > the disk and fire sessionRestored. It sounds like sessionrestore and sessionrestore_no_auto_restore should be testing the same thing. What's the difference between these tests if the default pref specifies no auto-restore?

Flags: needinfo?(jmathies) → needinfo?(dteller)

Mike Conley (:mconley) (:⚙️)

Comment 5

•

9 years ago

(In reply to Jim Mathies [:jimm] from comment #4) > (In reply to Mike Conley (:mconley) - Needinfo me! from comment #3) > > (In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from > > comment #2) > > > The first test, `sessionrestore`, measures the behavior in which we > > > automatically restore the session. The second test, > > > `sessionrestore_no_auto_restore`, measures the behavior in which we load the > > > session but do not restore it. > > > > > > **Neither test changes the preference that determines whether tabs are > > > restored on demand.** > > > > > > > Ah, that was my mistake. I assumed both sessionrestore and > > sessionrestore_no_auto_restore were both doing tab/window restoration, but > > that sessionrestore_no_auto_restore did restore on demand. > > > > I understand now - sessionrestore will restore the windows / tabs using the > > default pref (so it's restoring on demand). The > > sessionrestore_no_auto_restore is just until we read the session info off > > the disk and fire sessionRestored. > > It sounds like sessionrestore and sessionrestore_no_auto_restore should be > testing the same thing. What's the difference between these tests if the > default pref specifies no auto-restore? I think the confusion lies in what we mean by "auto-restore". So, there is restoring tabs on demand, which is what we do by default, and what we originally though these two tests were comparing. What these two tests are actually comparing are these two states: 1) Don't automatically restore the user's windows and tabs from last time. Show about:home instead, with a button that allows them to restore their windows and tabs from last time. 2) Restore windows and tabs from last time.

David Teller [:Yoric] - still alive but not very active

Comment 6

•

9 years ago

Yep, as said by mconley.

Flags: needinfo?(dteller)

Jim Mathies [:jimm]

Reporter

Comment 7

•

9 years ago

> What these two tests are actually comparing are these two states: > > 1) Don't automatically restore the user's windows and tabs from last time. > Show about:home instead, with a button that allows them to restore their > windows and tabs from last time. > > 2) Restore windows and tabs from last time. Requesting triage again, #2 above does not involve loading content in restored tabs, we open empty tabs (based on the default behavior set in prefs) and somehow regress that. We should confirm this is actually what's happening because it seems highly dubious that e10s regresses this.

tracking-e10s: + → ?

Priority: P3 → --

Mike Conley (:mconley) (:⚙️)

Comment 8

•

9 years ago

Performance profile from my OS X machine: https://cleopatra.io/#report=089d01ac42d549897c94a5203e6a6847281e77ca

Mike Conley (:mconley) (:⚙️)

Comment 9

•

9 years ago

Okay, I think I know what's going on. The work in bug 1245891 takes advantage of a module that got added called StartupPerformance.jsm. Briefly, StartupPerformance.jsm works by noticing when session restore init has begun, and then records that time. It also starts a 10 second timer. It then starts listening for SSTabRestored events to be fired. Every time one of the SSTabRestored events comes in, we record the current time, and the 10 second timer is reset. Once the timer finally goes off (so we haven't seen an SSTabRestored event in 10 seconds), the SSTabRestored event listener detaches itself, and the observer that the sessionrestore talos test is listening for is fired off. The talos test then takes the delta between sessionrestore init and the last SSTabRestored event time. So that's how the sessionrestore test is supposed to work. There's a small problem though, and I think it's causing us to report different results in the e10s case (even though e10s is not performing worse). The problem is that the talos test loads an index.html file along with the session it's restoring from. That index.html file is not part of the sessionrestore set, and is passed in the cmdline args to Firefox in order to do the reporting of the test results back to talos. It's also the selected tab. The final puzzle piece is that the initial browser in a new browser window is non-remote by default. In order for it to go anywhere, some DocShell / SessionStore stuff occurs where any pre-existing state is scooped out of the browser, the remoteness is flipped, and then the scooped out state is sent back down to the new browser / DocShell. That's important, because that scoop/remoteness-flip/sending of state results in an SSTabRestored event to be fired, since we're using the same tab restoration mechanism to do the sending of state. So how this ties all together is that the browser starts up, it loads up the session, and in the non-e10s case, because a non-session tab is selected and loaded (the index.html file), an SSTabRestored event never fires (since index.html wasn’t in the old session, remember - it was loaded as part of the cmdline). When StartupPerformance never gets any SSTabRestored events, the delta that gets computed is the delta between session restore init and when restoration starts. For e10s, it’s the same up until the index.html load. In order to load that into the initial tab, we do the remoteness flip, which fires an SSTabRestored event, which StartupPerformance notices. That’s where the discrepancy is. TL;DR: non-e10s is measuring the delta between session restore init and when tab restoration starts. e10s is measuring the delta between session restore init, and when the index.html of the talos test content page has finished being restored. This explains the big gap in the reading: we're not measuring the same thing in both cases. I'll see if I can come up with some ways of making these measurements more comparable.

Mike Conley (:mconley) (:⚙️)

Comment 10

•

9 years ago

To prove my case, here's a try push where I forced the initial browser in the window to be remote (this might also contribute to wins to tpaint, so this was a thing I was already pursuing): https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba7b791b1ac8095ffd090b295909c2663c6c2cbe Try to ignore the high scores from the Linux64 spot instances - I'm not sure how to hide those, and unfortunately, Perfherder mixes the AWS-instance results with the normal Linux64 talos machine results, which kinda throws off the numbers (see bug 1260926). In this case, we appear to at least match non-e10s. On some platforms e10s beats non-e10s significantly. So, to sum, I think e10s is not regressing sessionstore performance when we're restoring tabs on demand. In fact, I think it performs better. We just need to make sure the test starts comparing things properly.

Mike Conley (:mconley) (:⚙️)

Updated

•

9 years ago

Depends on: 1261657

Mike Conley (:mconley) (:⚙️)

Comment 11

•

9 years ago

Here's a comparison for that last try push: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=2c8a3acfb48c&newProject=try&newRevision=ba7b791b1ac8&framework=1&showOnlyImportant=0 Looks good. I've opened bug 1261657 with a patch that I think should settle this.

Jim Mathies [:jimm]

Reporter

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

tracking-e10s: ? → +

Priority: -- → P1

Resolution: --- → FIXED

Jim Mathies [:jimm]

Reporter

Comment 12

•

9 years ago

Thanks Mike!

Mike Conley (:mconley) (:⚙️)

Comment 13

•

8 years ago

bugnotes

Attached file Bugnotes (deleted) — Details

http://people.mozilla.org/~mconley2/bugnotes/bug-1259770.html

Bugzilla

Session restore regressed 400%-500% under e10s with changes in bug 1245891

Categories

(Testing :: Talos, defect, P1)

Tracking

(e10s+)

People

(Reporter: jimm, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Updated

Comment 12

Comment 13

Attachment

General

Description

File Name

Content Type