Closed Bug 1319845 Opened 8 years ago Closed 8 years ago

Investigate FX_TAB_SWITCH_SPINNER_VISIBLE_LONG_MS regression on Windows that started on Nov 6

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1316632

People

(Reporter: mconley, Unassigned)

References

Details

Attachments

(4 files, 6 obsolete files)

This regression showed up on Telemetry and appears to be persisting: https://mzl.la/2fRJmlU The 95 percentile of the probe went from 3.05s to 8.21s, which is pretty bad. Investigating.
Attached file Preregression stacks by time (Nov 2-5) (obsolete) (deleted) —
Attached file Postregression stacks by time (Nov 9-12) (obsolete) (deleted) —
FWIW I've been having periodic issue for the last month related to massive disk usage on my windows machine. The entire OS freezes, but it definitely seems triggered from firefox somehow. I've been having difficulty tracking it down since any tool I would use to investigate pretty much waits to run until the disk storm ends. The site most likely to trigger this is foxnews.com. It only seems to happen once a day or so, though. I don't know if that is related here, but just thought I would mention it.
Comment on attachment 8813744 [details] Preregression stacks by time (Nov 2-5) I computed the pre-regression stacks incorrectly. Going to regenerate these.
Attachment #8813744 - Attachment is obsolete: true
Comment on attachment 8813746 [details] Postregression stacks by time (Nov 9-12) See above.
Attachment #8813746 - Attachment is obsolete: true
Attached file Preregression stacks by time (Nov 2-5) (obsolete) (deleted) —
Let's try this again.
Attached file Postregression stacks by time (Nov 9-12) (obsolete) (deleted) —
Attached file Preregression stacks by frequency (Nov 2-5) (obsolete) (deleted) —
Attached file Postregression stacks by frequency (Nov 9-12) (obsolete) (deleted) —
> This regression showed up on Telemetry and appears to be persisting: > > https://mzl.la/2fRJmlU > > The 95 percentile of the probe went from 3.05s to 8.21s, which is pretty bad. > > Investigating. This Telemetry link is ordering by submission date and not build ID. This one is by build ID, which is far more illustrative. https://mzl.la/2fGwPyG
It's clear from the graph in comment 11 that the regression was introduced on the Nightly build of November 5th. I will now produce stacks for the 4th and the 5th _only_ to see if anything unusual stands out.
Attached file Pre-regression stacks by freq (Nov 4) (deleted) —
Going to just take a single days worth of samples now.
Attachment #8813761 - Attachment is obsolete: true
Attachment #8813762 - Attachment is obsolete: true
Attachment #8813791 - Attachment is obsolete: true
Attachment #8813793 - Attachment is obsolete: true
Interestingly, there's a drop-off for the 95% percentile of the probe from September 25 to September 26th: https://mzl.la/2fME2x2 September 25th, 2016 was a Sunday, and I guess no patches landed that day, because when I look at the changesets that were used for those two Nightlies[1] (25th and 26th), the commits that they were built off are the same (29beaebdfaccbdaeb4c1ee5a43a9795ab015ef49). So, assuming the above is true, then there was no difference in the builds from the 25th and the 26th - and yet, we see this drop-off on the probe. I wonder if something changed on the web that day - perhaps Facebook shipped something. Or maybe an add-on shipped an update. [1]: https://ftp.mozilla.org/pub/firefox/nightly/2016/09/2016-09-25-03-02-26-mozilla-central/firefox-52.0a1.en-US.win32.txt and https://ftp.mozilla.org/pub/firefox/nightly/2016/09/2016-09-26-03-02-03-mozilla-central/firefox-52.0a1.en-US.win32.txt respectively
I'm beginning to have a trust issue with telemetry.mozilla.org. The graph I linked to in comment 11 is organized by build ID and shows a clear regression. It reports that the regression did not exist in the build with ID starting with 20161104, but does exist in the build with ID 20161105. Using the revisions stored in our Nightly "ftp" server thinger, we were able to get a changeset range - see comment 10. The GPU process was enabled by default on Nightly in bug 1314133. It landed on Nov 5th, but is not in the changeset range for the build of 20161105. It is, however, in the range between 20161105 and 20161106: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a7c654513f2ffd9d9ef38fa2bf512b9e8dae3cdd&tochange=c44c01dfd264370c1558b747525d220a9a89b51c Now, here's the kicker: the GPU process, when enabled, accumulates timing information on a GPU_PROCESS_LAUNCH_TIME_MS Telemetry probe. Here's the graph of that probe: https://mzl.la/2fMWENn The first time that data shows up is for a build with the apparent build ID of 20161105, which is _one day early_. The GPU process shouldn't have been enabled on the 20161105 build! I'm more inclined to believe the data in our Nightly build archives about what changesets were used. If that information is accurate, it means that telemetry.mozilla.org might be showing data offset by one day, which means I just blew a few days worth of investigation on a range of time that was irrelevant.
I've been talking with rvitillo and chutten in #telemetry. This is likely a Dashboard bug - something in the front-end is being sensitive to the client timezone. If I convince my computer that I'm in, for example, Spain, then the dashboard shows the regression appearing on the Nov 6th (20161106) build.
Summary: Investigate FX_TAB_SWITCH_SPINNER_VISIBLE_LONG_MS regression on Windows that started on Nov 7 → Investigate FX_TAB_SWITCH_SPINNER_VISIBLE_LONG_MS regression on Windows that started on Nov 6
Mike, we see a lot of timeouts in linux debug tests, could this be related to this bug too ?
Flags: needinfo?(mconley)
(In reply to Carsten Book [:Tomcat] from comment #21) > Mike, we see a lot of timeouts in linux debug tests, could this be related > to this bug too ? If so, I would have expected those timeouts to have gone away around late November, as the regression highlighted by this bug was fixed in bug 1316632. That fix was first available in the Nov 29th build. Which is to say that this bug should be closed. I guess I'll just dupe it over to the one that actually fixed it.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mconley)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: