Closed Bug 1377226 Opened 7 years ago Closed 7 years ago

Find out why the tabpaint test appears to regress in early June

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mconley, Assigned: mconley)

Details

Here's the graph of the tabpaint test from the Quantum Health dashboard:

https://screenshots.firefoxusercontent.com/images/b4d3b2bc-841a-4eb1-b9da-5865103bba6e.png

There's a creeping regression on Win7-32 from May 14th onwards, and then all platforms have a huge jump in early June. There's quite a bit of noise in the test results as well, though this tightens up around June 10th - June 13th.

We should explain this.
I found this Talos alert:

https://treeherder.mozilla.org/perf.html#/alerts?id=6999

Which led me to bug 1369662.

Boy, my memory was failing me. I remember this now:

Up until bug 1369662 landed (June 9th), tabpaint should be considered untrustworthy. See bug 1369662 comment 21 - specifically:

"So what I think I'm getting at is that this test has been likely flawed from the start, and has not properly measured meaningful paint of content. I've pushed patches to try that wait for the first MozAfterPaint event that has a rect with some dimensions to it, both with and without ehsan's patch:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b80e1f83576e&newProject=try&newRevision=3534252d4864&framework=1&showOnlyImportant=0

The comparison shows _no difference_ taking that into account."


So, TL;DR: The tabpaint test was flawed at the time it was "baselined", and results prior to June 9th should be treated with extreme skepticism. On June 9th, a patch landed which (I believe) causes the test to measure the "right thing". This appears as a regression, but is really just a test correction.
If it's really important to ensure that we haven't regressed since early May on tabpaint, we should take the patch from bug 1369662, manually apply it on every Nightly leading up to the June 9th build, and graph the results - essentially, backfilling the measurement with the corrected test.
Hey digitarald, given comment 1 and comment 2, is there a way we can "re-baseline" the tabpaint measurement?
Flags: needinfo?(hkirschner)
Or is comment 3 more of a question for you, jmaher? Is there pre-existing workflows / support for backfilling like this?
Flags: needinfo?(jmaher)
there is no easy way to do this other than many try pushes- since we are talking nightly that would be a smaller number of pushes- but then the data needs to be manually extracted.

I would use try pushes over something local so you could compare numbers to what we are seeing in our normal CI.  Keep in mind nightly will be PGO builds.
Flags: needinfo?(jmaher)
The graph got re-baselined.
Flags: needinfo?(hkirschner)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.