Closed
Bug 1392280
Opened 7 years ago
Closed 7 years ago
Regression in FX_NEW_WINDOW_MS
Categories
(Firefox :: General, defect, P1)
Firefox
General
Tracking
()
Tracking | Status | |
---|---|---|
firefox55 | --- | unaffected |
firefox56 | --- | unaffected |
firefox57 | --- | fixed |
People
(Reporter: past, Assigned: eoger)
References
(Blocks 1 open bug)
Details
(Whiteboard: [photon-performance])
Attachments
(2 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/x-review-board-request
|
mconley
:
review+
|
Details |
Telemetry shows a regression in the time it takes to open a new tab since Friday:
https://mzl.la/2vXkYFv
The regression is significant (takes us back to early-55 days) so we should figure out what caused it.
Updated•7 years ago
|
Blocks: photon-perf-windows
Whiteboard: [photon-performance]
Comment 1•7 years ago
|
||
(In reply to Panos Astithas [:past] (56 Regression Engineering Owner) (please ni?) from comment #0)
> Telemetry shows a regression in the time it takes to open a new tab since
> Friday:
>
To be clear, this probe measures the time to open new windows, and not new tabs.
Comment 2•7 years ago
|
||
Here are the pushes to mozilla-central between the August 17th and August 18th Nightly builds:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=932388b8c22c9775264e543697ce918415db9e23&tochange=a6a1f5c1d971dbee67ba6eec7ead7902351ddca2
Updated•7 years ago
|
Whiteboard: [photon-performance] → [photon-performance] [triage]
Comment 3•7 years ago
|
||
Normally, this kind of bug would be hellish to try to diagnose, especially because tpaint (the window open talos test) didn't notice any regression.
I was, however, able to reproduce this regression locally. Here's what I did:
1) Get a baseline measurement of FX_NEW_WINDOW_MS on the 2017-08-17 Nightly. I used:
./mach mozregression --launch=2017-08-17 --pref "toolkit.telemetry.enabled:true"
Then opened a bunch of windows using Ctrl-N, and then opened up about:telemetry and looked at the FX_NEW_WINDOW_MS histogram. Here's what I got:
FX_NEW_WINDOW_MS
11 samples, average = 60.4, sum = 664
22 | 0 0%
37 |######################### 6 55%
62 |##################### 5 45%
103 | 0 0%
Then I did the same thing for the 2017-08-19 build:
./mach mozregression --launch=2017-08-19 --pref "toolkit.telemetry.enabled:true"
FX_NEW_WINDOW_MS
18 samples, average = 99.6, sum = 1793
37 | 0 0%
62 |######################### 12 67%
103 |############# 6 33%
171 | 0 0%
The text-based histograms are probably making this a little difficult to notice, but there's been a definite shift towards the low 100's in the 2017-08-19th build as compared to the 2017-08-17th build.
I was able to reproduce this regression with multiple repetitions of the experiment.
2) Then, I ran the following mozregression command:
./mach mozregression --good=2017-08-17 --bad=2017-08-19 --pref "toolkit.telemetry.enabled:true"
and repeated the above experiment, using a fixed number of windows to manually open with Ctrl-N (I chose 7 arbitrarily).
Here's what I got:
3:57.38 INFO: Narrowed inbound regression window from [6a8b37d4, ccffcdc9] (3 builds) to [6a8b37d4, b0f76b21] (2 builds) (~1 steps left)
3:57.38 INFO: No more inbound revisions, bisection finished.
3:57.38 INFO: Last good revision: 6a8b37d47733df46e639c885eda6616e66379711
3:57.38 INFO: First bad revision: b0f76b216a196dc786c2853e3077746d8b772b2a
3:57.38 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=6a8b37d47733df46e639c885eda6616e66379711&tochange=b0f76b216a196dc786c2853e3077746d8b772b2a
That's bug 1380377. A manual backout of that patch locally confirms that this improves the FX_NEW_WINDOW_MS probe by a significant degree for me locally.
So I think that's our best candidate for the regressor.
Blocks: 1380377
Comment 4•7 years ago
|
||
Bug 1380377 is simple enough that a backout for a few days would probably confirm or refute that it's the regressor.
Assignee | ||
Comment 5•7 years ago
|
||
I can reproduce this reliably following mconley's comment 3 instructions.
Funny thing is, dropping a <html:img> with a correct src attribute anywhere in the file """"fixes"""" the problem. Not really sure what is happening here. (see attached file).
What course of action can we take?
Comment 6•7 years ago
|
||
(In reply to Edouard Oger [:eoger] from comment #5)
> What course of action can we take?
I recommend the following:
1) Back out bug 1380377, and ensure that we see our FX_NEW_WINDOW_MS numbers bounce back
2) Put window.performance.mark markers around the start and end points for the FX_NEW_WINDOW_MS probe, and then use window.performance.measure to construct a measurement between those two markers. This will cause the time range being measured to show up in the Gecko Profiler timeline.
3) Capture a profile of opening a few windows without the patch, and then with the patch, and then manually compare the profiles looking for an explanation.
I can coach somebody through (2) and (3), but I think the highest priority is probably (1).
Updated•7 years ago
|
Flags: qe-verify-
Priority: -- → P2
Whiteboard: [photon-performance] [triage] → [photon-performance]
Comment hidden (mozreview-request) |
Updated•7 years ago
|
Assignee: nobody → eoger
Status: NEW → ASSIGNED
Priority: P2 → P1
Comment 8•7 years ago
|
||
mozreview-review |
Comment on attachment 8899879 [details]
Bug 1392280 - Backout bug 1380377.
https://reviewboard.mozilla.org/r/171212/#review176354
Thanks!
Attachment #8899879 -
Flags: review?(mconley) → review+
Pushed by eoger@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5f5549876ea6
Backout bug 1380377. r=mconley
Assignee | ||
Comment 10•7 years ago
|
||
Normal profile:
https://perfht.ml/2xpHfeB
Regressed profile:
https://perfht.ml/2xpGdz1
We can see that we are calling an expensive nsDocument::DispatchContentLoadedEvents method in the regressed profile, which might explain the slowdown. Can we ni? someone who might know what is happening here?
Merged backout:
https://hg.mozilla.org/mozilla-central/rev/5f5549876ea6
Does this resolve this bug and reopen 1380377?
Flags: needinfo?(eoger)
Comment 12•7 years ago
|
||
Yep.
Updated•7 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → unaffected
status-firefox56:
--- → unaffected
Resolution: --- → FIXED
Updated•7 years ago
|
Iteration: --- → 57.2 - Aug 29
Flags: needinfo?(eoger)
Target Milestone: --- → Firefox 57
Comment 13•7 years ago
|
||
(In reply to Edouard Oger [:eoger] from comment #10)
> Normal profile:
> https://perfht.ml/2xpHfeB
>
> Regressed profile:
> https://perfht.ml/2xpGdz1
>
> We can see that we are calling an expensive
> nsDocument::DispatchContentLoadedEvents method in the regressed profile,
> which might explain the slowdown. Can we ni? someone who might know what is
> happening here?
Comparing these 2 profiles, I don't see a big difference in the code that's running in them. The major difference I see is that in the "normal" profile the "window-open" measurement stops around when we start executing the 'load' handler of the new window, whereas in the 'regressed' profile the "window-open" measurement stops just before the 'MozAfterPaint' event.
This makes me wonder if it's possible that the FX_NEW_WINDOW_MS probe has been measuring incorrect data because the "document-shown" observer notification may not be what we want (after paint is more relevant).
You need to log in
before you can comment on or make changes to this bug.
Description
•