0.28 - 4.65% tp5n main_startup_fileio / tp5n nonmain_startup_fileio / tp5n time_to_session_store_window_restored_ms (windows7-32) regression on push 06e9cd204aacf5bb735a8f7370a409f64ec6e975
Categories
(Testing :: General, defect, P3)
Tracking
(firefox69 wontfix, firefox70 fix-optional, firefox71 ?)
Tracking | Status | |
---|---|---|
firefox69 | --- | wontfix |
firefox70 | --- | fix-optional |
firefox71 | --- | ? |
People
(Reporter: igoldan, Unassigned)
References
(Depends on 1 open bug)
Details
(5 keywords)
Talos has detected a Firefox performance regression from push:
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
5% tp5n time_to_session_store_window_restored_ms windows7-32 pgo e10s stylo 746.43 -> 781.13
4% tp5n nonmain_startup_fileio windows7-32 pgo e10s stylo 2,596,034.46 -> 2,693,698.25
0% tp5n main_startup_fileio windows7-32 pgo e10s stylo 97,318,858.00 -> 97,592,742.33
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=20060
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
Reporter | ||
Comment 1•6 years ago
|
||
Valentin, sorry for filing this bug so late. I barely noticed it.
I know you moved some business logic from one thread type to another, so these results should be expected.
If that's the case, you can simply close this as WONTFIX.
Comment 2•6 years ago
|
||
So what I did in bug 1435141 was go from this:
+-------------+------------------------------+--------------------------------------------+
| | Startup | User action |
+-----------------------------------------------------------------------------------------+
| Main Thread | | net_EnsurePSMInit HasUserCertsInstalled |
+-------------+------------------------------+--------------------------------------------+
to this:
+-------------+------------------------------+--------------------------------------------+
| | Startup | User action |
+-----------------------------------------------------------------------------------------+
| Main Thread | net_EnsurePSMInit + | |
+---------------------------------|----------+--------------------------------------------+
| Bg thread | +---> HasUserCertsInstalled |
+-------------+---------------------------------------------------------------------------+
So while the NSS initialization is delaying startup a bit more, it's not delaying the main thread in response to user interaction, which IMO is a lot better.
But indeed, it is delaying startup, which isn't that great.
There's bug 1370516, which would improve this.
There's also the option of not calling net_EnsurePSMInit at startup, but waiting for it to happen in response to a network load (session restore or user interaction), but that would be perceived as slowness during navigation, which I think is worse.
I'm inclined to close this as WONTFIX.
:keeler, what do you think?
Comment 3•6 years ago
|
||
At which point of startup is net_EnsurePSMInit called, and is it triggering main thread I/O?
I thought the previous behavior was to trigger NSS init during delayed startup (ie. right after browser first paint, but before any user action) when captive portal starts its HTTPS request to detect if we are behind a captive portal.
Comment 4•6 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #3)
At which point of startup is net_EnsurePSMInit called, and is it triggering main thread I/O?
From looking at the patch in bug 1435141, it seems it's triggered by browser-delayed-startup-finished, which is after the captive portal detection request is started: https://searchfox.org/mozilla-central/rev/56705678f5fc363be5e0237e1686f619b0d23009/browser/base/content/browser.js#1711,1722
So I don't see how this patch changes anything to startup. I'm just confused I guess.
Comment 5•6 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #3)
At which point of startup is net_EnsurePSMInit called, and is it triggering main thread I/O?
Not sure about the I/O. Question for :keeler
I thought the previous behavior was to trigger NSS init during delayed startup (ie. right after browser first paint, but before any user action)
I now call it in response to browser-delayed-startup-finished.
Before it was called in SpeculativeConnectInternal, at which point it may have been a no-op if PSM had already been initialized by some other component, or it may have blocked the main thread otherwise.
when captive portal starts its HTTPS request to detect if we are behind a captive portal.
Captive portal detection is done over http, not https, so it wouldn't trigger it.
Updated•6 years ago
|
Comment 6•6 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #5)
(In reply to Florian Quèze [:florian] from comment #3)
when captive portal starts its HTTPS request to detect if we are behind a captive portal.
Captive portal detection is done over http, not https, so it wouldn't trigger it.
Oh indeed, it's http, I always thought it was HTTPS because it triggers the load of NSS libraries: https://perfht.ml/2JFKARB shows the network request for http://detectportal.firefox.com/success.txt triggers the load of softokn3.dll and freebl3.dll.
Initializing NSS can cause a number of files to be read. There's the dll files themselves, and then there's the the certificate/key/module DBs (cert9.db, key4.db, and pkcs11.txt in most cases, although cert8.db, key3.db, and secmod.db can also be read if they're present).
I don't see why bug 1435141 would cause startup to be slower. In any case, though, the real solution would be something like bug 1370516.
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 8•5 years ago
|
||
:keeler Do you think we will get any progress on this soon?
This is a non-trivial amount of work and our team's engineering capacity is allocated elsewhere, so unfortunately no, I don't think we'll get any progress on this soon.
Comment 10•4 years ago
|
||
Marking the performance regression as WONTFIX as it's been over a year since it was reported and long-since shipped to user. Once bug 1370516 is resolved we should see a performance improvement but I don't see the value in keeping this open until then.
Description
•