1456270 - Improve talos startup tests to collect more relevant numbers

Reporter

Description

•

6 years ago

I may be missing something, or there may be a duplicate bug, but I can't find one. When bug 1447719 landed, our ts_paint numbers went way down, because the test now measures the time to the first paint of the blank window. That's an important number, but it's different from what ts_paint used to measure. If ts_paint is going to measure blank window timing now, we need another test to measure the paint time of the actual browser UI, or people are going to be free to regress that without noticing. As it stands, from my local talos runs, it looks like we pay a pretty hefty price for the layout and styling of the initial blank window, so we probably took a hit to the timing for the actual UI being visible despite talos showing an improvement.

Florian Quèze [:florian]

Comment 1

•

6 years ago

(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #0) > we probably > took a hit to the timing for the actual UI being visible despite talos > showing an improvement. We have telemetry numbers that look pretty flat: https://mzl.la/2J46N6y (if you ignore the pic due to bug 1454908).

Florian Quèze [:florian]

Comment 2

•

6 years ago

(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #0) > I may be missing something, or there may be a duplicate bug, but I can't > find one. Bug 1369417 may be what you were looking for. Although there was a lot of discussion but little action there.

Kris Maglione [:kmag]

Reporter

Comment 3

•

6 years ago

(In reply to Florian Quèze [:florian] from comment #1) > (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're > blocked) from comment #0) > > we probably > > took a hit to the timing for the actual UI being visible despite talos > > showing an improvement. > > We have telemetry numbers that look pretty flat: https://mzl.la/2J46N6y (if > you ignore the pic due to bug 1454908). Telemetry is good, but we really need talos tests for this. We know how easy it is to regress startup performance, how often it happens, and how important it is not to. It's bad enough that people already have a tendency to fix startup perf regressions in ways that add jank during startup. Fixing them in ways that prevent the UI from being visible, or not even noticing that they've regressed that in the first place, is a much bigger problem. This feels like a blocker to me.

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

6 years ago

Thanks for filing this :kmag, we could always add the old test back- in this case I need someone to own the documentation here and directing what we want to measure. In addition, what makes sense to measure with the "webext" case as well. Next week rwood or myself can help out to ensure this is running everywhere as desired.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Whiteboard: [PI:May]

Kris Maglione [:kmag]

Reporter

Comment 5

•

6 years ago

(In reply to Joel Maher ( :jmaher - limited bugzilla access until May 1st) (UTC+2) from comment #4) > In addition, what makes sense to measure with the "webext" case as well. I think the webext case should probably just measure the full UI visibility, and ignore blank window case. That's what I tend to be most worried about us regressing.

Florian Quèze [:florian]

Comment 6

•

6 years ago

(In reply to Kris Maglione [:kmag] from comment #0) > When bug 1447719 landed, our ts_paint numbers went way down, because the > test now measures the time to the first paint of the blank window. That's an > important number, but it's different from what ts_paint used to measure. The fact that bug 1447719 landing made ts_paint drop by more than 50% on Linux but didn't affect it at all on Windows makes me think it was already pretty broken, and couldn't be relied on. I don't know what it actually measured, nor what it measures now. > If ts_paint is going to measure blank window timing now, we need another > test to measure the paint time of the actual browser UI, or people are going > to be free to regress that without noticing. Thankfully, in the current situation people are not "free to regress that without noticing". I did regress it with bug 1450293, and Talos caught it in the sessionrestore_many_windows test (see bug 1454908). > As it stands, from my local talos runs, it looks like we pay a pretty hefty > price for the layout and styling of the initial blank window, so we probably > took a hit to the timing for the actual UI being visible despite talos > showing an improvement. I think you were seeing bug 1454908. The more I think about this, the more I think I would like a Talos startup test that reports the startup timings that telemetry collects. This would give us a way to compare what Talos reports (which is more stable, but also more artificial, due to only having warm startups) with what real Nightly users experience. A side benefit would be that Talos would no longer need custom code in its startup test to collect numbers. From about:telemetry#simple-measurements-tab, I think (at least) the following numbers would be relevant: - afterProfileLocked or selectProfile - blankWindowShown - firstPaint (<-- fwiw, on Windows this isn't the blank window anymore) - delayedStartupStarted (<-- this is when the browser UI is guaranteed to be on screen) - delayedStartupFinished - sessionRestored

Summary: We no longer have an equivalent to the old ts_paint test → Improve talos startup tests to collect more relevant numbers

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 7

•

6 years ago

I would like to see us start using standardized terminology around those topics. It's not "first paint" and "another first paint". And it shouldn't use named based on our internal functions either. In my proposal for the bootstrap architecture and measuring from last year [0] I suggested we follow W3C paint-timing spec proposal [1]. At the time, some confusion came from the distinction between `firstPaint` stage and `firstContentfulPaint`. I believe that now it may be more easy to understand because it perfectly first this scenario, both as a milestone and measuring timestamp, but also as a stage of bootstrapping that all bootstrap code can be attached to. [0] https://groups.google.com/d/msg/mozilla.dev.platform/z7MQVwrJayk/SvJt-dnkAAAJ [1] https://github.com/w3c/paint-timing

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

6 years ago

:rwood- do you think the ts_paint and sessionrestore tests could be ported to the new raptor framework? That is where we support by default multiple data points to collect. :kmag, is the proposal that :florian proposed sound like a solution to your concerns?

Flags: needinfo?(rwood)

Flags: needinfo?(kmaglione+bmo)

Robert Wood [:rwood]

Comment 9

•

6 years ago

(In reply to Joel Maher ( :jmaher - limited bugzilla access until May 1st) (UTC+2) from comment #8) > :rwood- do you think the ts_paint and sessionrestore tests could be ported > to the new raptor framework? That is where we support by default multiple > data points to collect. No, as much as I'd like raptor to be able to do all types of tests, due to its architecture I don't think it's a good idea to try to adapt it to handle startup tests. That would be a significant design shift. The raptor runner core that drives the pageload tests (and takes the measurements) is a web extension which in itself makes it limited. We use a control server to be the go-between for the raptor web extension and python harness, which works fine for pageloads but wouldn't work for startup tests. Also, one goal of raptor is to work cross-browser on both Firefox and Google Chrome - we haven't attempted any kind of startup tests on Chrome (as far as I know) so I think there's a bunch of unknowns there too. I believe we should keep raptor strictly for pageload tests and standard benchmark tests like speedometer. That way we reduce it's complexity as much as possible so it is easier to maintain, and add new types of pageload measurements, etc. In the case of talos ts_paint and sessionrestore, I think we'll have to modify the existing tests to support the new measurement types. If it's too much overhead in production to have separate tests / subtests for each single measurement type, then yes we'd need to add multiple measurement support per startup to talos. Being startup tests makes it more complex as they don't just use the standard pageloader addon content code like talos pageloader tests do, so it would take a fair bit of work IMO but probably the best and fastest route.

Flags: needinfo?(rwood)

Mike Conley (:mconley) (:⚙️)

Updated

•

6 years ago

Whiteboard: [PI:May] → [PI:May][fxperf]

Mike Conley (:mconley) (:⚙️)

Updated

•

6 years ago

Whiteboard: [PI:May][fxperf] → [PI:May][fxperf:p2]

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Whiteboard: [PI:May][fxperf:p2] → [fxperf:p2]

Dão Gottwald [:dao]

Updated

•

6 years ago

Blocks: 1336227
No longer blocks: 1447719

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Flags: needinfo?(kmaglione+bmo)

Dão Gottwald [:dao]

Updated

•

6 years ago

Type: enhancement → task

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Priority: -- → P3

Mathew Hodson

Updated

•

3 years ago

No longer blocks: 1336227

Severity: normal → --

Depends on: 1336227

Mike Conley (:mconley) (:⚙️)

Updated

•

3 years ago

Keywords: perf:startup

Whiteboard: [fxperf:p2]

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

1 year ago

See Also: → https://mozilla-hub.atlassian.net/browse/FXP-402

Whiteboard: [fxp]

Bugzilla

Quick Search

Improve talos startup tests to collect more relevant numbers

Categories

(Testing :: Talos, task, P3)

Tracking

(Not tracked)

People

(Reporter: kmag, Unassigned)

References

Details

(Keywords: perf:startup, Whiteboard: [fxp])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated