Fix sessionrestore_many_windows Talos test to not ignore SSTabRestored events (was showing ~80% improvement on Windows shippable with Fission enabled)
Categories
(Firefox :: Session Restore, defect, P3)
Tracking
()
Fission Milestone | Future |
People
(Reporter: mconley, Unassigned)
References
(Blocks 2 open bugs)
Details
(Whiteboard: fission-perf)
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 1•5 years ago
|
||
The good news is that I can reproduce this result locally. Starting to investigate now...
Reporter | ||
Comment 2•5 years ago
|
||
Okay, I've figured out the problem.
The way that the SessionRestore Talos tests work is by restoring a session, and then listening for SSTabRestored events firing in each window that gets restored. 10 seconds after the last SSTabRestored event is observed, the session is considered restored, and we record the delta between process start and the last SSTabRestored event time.
The problem is here:
Wayyyyy back in bug 1261657, I made it so that StartupPerformance ignores SSTabRestored events caused by remoteness flips. See that bug for the reason why.
With Fission enabled, for each tab that's getting restored automatically, we're doing a remoteness flip because the initial remoteType of the initial tab ("web") does not match what is being loaded in the test ("webIsolated=http://localhost"). So each of the restored tabs does a remoteness flip.
And that means that we don't count any of the SSTabRestored events. Because we see no SSTabRestored events, the delta that we end up recording is between process start and the sessionstore-restoring-on-startup
, so it's no wonder that Fission is reporting a better score here - none of the tab restorations are being counted!
Reporter | ||
Comment 3•5 years ago
|
||
Hey pbone, from what I can tell from the Fission meeting notes, you're working on moving us to the new process flipping mechanism, and (blessedly!) removing that responsibility from front-end / SessionStore. Do you have any WIP patches that I can look at, or planning documents or notes or something I can look at to see how that factors in with this test?
Comment 4•5 years ago
|
||
I have some patches that are still in the "embarrassing" range of WIPness. It's possible I can tidy them up and show you tomorrow. The plan is to remove the onMayChangeProcess stuff from SessionStore.jsm and move it into nsDocumentChannelParent.cpp. Moving the majority of this code into C++ where we can debug it a bit easier. It was originally in SessionStore because of some historical reasons that I was probably told but don't remember. I'm not sure how this could affect the SSTabRestored notification (it'll probably still be broken when my patches land).
If you think you've got a quick-ish fix to the session store code then land it now, if it conflicts with mine I'll figure it out. My change won't land until at least the week after next since I'm going on PTO next week.
Comment 5•5 years ago
|
||
We'll want to fix this for Fission milestone M6 (enable Fission in Nightly).
Reporter | ||
Comment 7•5 years ago
|
||
Having spoken with pbone, mattwoodrow and kmag, there's still quite a bit of work remaining to get process switching out of the wheelhouse of SessionStore.
I don't think it makes sense to try to fix this until that work is done.
Updated•5 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 9•4 years ago
|
||
Started looking at this again. It looks like with Fission enabled, we can probably stop ignoring the isRemotenessUpdate SSTabRestored events, which puts us back in the ballpark again. There does, however, still appear to be a slight regression - we're updating remoteness with Fission enabled from a "web" type process to a "webIsolated" process, which is eating up some time and causing a regression.
Reporter | ||
Comment 10•4 years ago
|
||
Spoke with nika about this. We need to wait until SessionStore no longer needs to manually do the remoteness flipping of browsers, which involves making SHIP working outside of Fission.
Comment 11•4 years ago
|
||
That won't be complete in time for M6c so pushing this out to M7. It can be reviewed and re-triaged again, if needed.
Comment 12•4 years ago
|
||
Process switching is largely out of the wheelhouse of SessionStore now (at least when SHIP is enabled, which it always is with Fission) - could we perhaps keep the old logic, but only for the non-ship case, and start getting performance results here?
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 13•4 years ago
|
||
could we perhaps keep the old logic, but only for the non-ship case, and start getting performance results here?
Yes, this sounds like a reasonable approach - we should special-case the logic in the test for the Fission-on case to start getting measurements here.
Comment 14•4 years ago
|
||
I took a quick look at the profile which is being restored (from https://searchfox.org/mozilla-central/source/testing/talos/talos/startup_test/sessionrestore/profile/sessionstore.js, though I imagine the actual restore is from sessionstore.jsonlz4
in the same directory), and noticed that the URLs appear to all be of the form http://localhost:8080/?httpsurl=www.whatever.com/whatever
. This means that the test won't use multiple content processes under Fission (as they'll all be served under the same site http://localhost
), and that we'll probably be measuring inaccurate data there.
If we can get away with it, we should probably also fix the profiles which we're loading to actually use the correct domain names through a proxy so that we get a more accurate process allocation structure.
Comment 15•4 years ago
|
||
Mike said he won't have time to work on this for Fission M7, because of Proton work assigned to him.
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Not a M7 beta blocker but we'd like this test re-enabled and working correctly as soon as possible so pushing this out to M7a.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 17•4 years ago
|
||
Randell will start looking into fixing this.
Updated•4 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 18•3 years ago
|
||
This bug is a soft blocker for Fission MVP. We'd like to fix it before our Release channel rollout, but we won't delay the rollout waiting for it.
Comment 19•3 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #18)
This bug is a soft blocker for Fission MVP. We'd like to fix it before our Release channel rollout, but we won't delay the rollout waiting for it.
Since this work probably won't make Firefox 94 and is a much lower priority than the other Fission perf work Randell is working on, I am deferring this bug from Fission MVP to Future, maybe targeting Firefox 95 ([fission:m95]
).
Comment 20•3 years ago
|
||
Unassigning Randell because he will be too busy on some non-Fission work to look at this bug.
Comment 21•3 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #20)
Unassigning Randell because he will be too busy on some non-Fission work to look at this bug.
Do we still think this is S2/P2? It's been 2 years and keeps being deprio'd, is this a realistic severity/priority?
Comment 22•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #21)
Do we still think this is S2/P2? It's been 2 years and keeps being deprio'd, is this a realistic severity/priority?
Good point. I will drop this bug to Severity N/A and Priority P3.
Updated•3 years ago
|
Description
•