Open Bug 1645275 Opened 4 years ago Updated 4 years ago

tscrollx can very easily toggle from measuring one thing, to measuring a completely different thing

Categories

(Testing :: Talos, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: botond, Unassigned)

Details

While investigating a tscrollx regression (in bug 1643604), I discovered what I believe is a deeper issue with tscrollx (specifically, the sync scroll part of it) that makes it susceptible to unrelated changes causing the quantity it measures to fluctuate between two different things.

The sync scroll part of tscrollx has a tick function which performs a scrollBy(), and then schedules the tick() function to run again after the next MozAfterPaint event.

MozAfterPaint is fired when content receives a notification from the compositor that a composite is finished. So, if the only thing triggering paints and composites is the scrolling done by the test, then a given scrollBy() will trigger a paint, which will trigger a composite, and only once that composite is finished will we get a MozAfterPaint and perform the next scrollBy().

So, if A is the amount of time it takes for the main thread part of one iteration of this (dispatching the MozAfterPaint event, performing the scrollBy(), painting, and sending a transaction the compositor), and B is the amount of time it takes for the compositor part of the iteration (processing the transaction, compositing, and sending the did-composite notification back to the main thread), then in this situation the test is measuring A + B. Note that, in this case, there is no parallelism between the main thread and the compositor. Let's call this the test's "serial mode".

However, if at any point during the test, anything else other than the test's scrolling causes a single extra paint (and therefore composite), the test's behaviour will change for the entire remainder of the test. If a paint or composite is already in flight when the test performs a scrollBy(), then it will be that in-flight composite that will trigger the next MozAfterPaint, rather than the subsequent composite that will eventually result from the scrollBy(). In this situation, the main thread is woken up by that MozAfterPaint and performs the next scrollBy() while the compositor is compositing the previous scrollBy(), and this continues for the remainder of the test -- the main thread and the compositor do work in parallel (assuming a CPU core is available for each). In this situation, the test is measuring max(A, B). Let's call this the tests's "parallel mode".

The fact that a single extra paint scheduled for any reason over the course of the test, can cause the test to switch from serial mode to parallel mode, and start measuring a completely different quantity, for the remainder of the test, is concerning, and means that tscrollx can easily be "regressed" or "improved" by largely unrelated changes.

(For example, in the regression I investigated in bug 1643604, the "regressing" patch changed the timing of a paint near the test's startup, and caused the test to remain in serial mode for much longer / much more often. This showed up as a large regression.)

We should decide which of the above quantities -- A + B or max(A, B) -- we want tscrollx to measure, and then harden tscrollx to ensure it measures that quantity even in the presence of things like one-off extra paints.

Interesting! It does sound like the test intended to measure serial mode. In order to do that properly, it should be checking the MozAfterPaint event's transactionId, to make sure it gets the event for the paint that it just triggered, and not for an earlier paint. See this newsgroup post for details on MozAfterPaint and transaction IDs.

After reading comment 0 I remembered that I ran into a similar problem with tresize (bug 1471961) which was also addressed by checking transaction IDs.

(In reply to Markus Stange [:mstange] from comment #1)

Interesting! It does sound like the test intended to measure serial mode. In order to do that properly, it should be checking the MozAfterPaint event's transactionId, to make sure it gets the event for the paint that it just triggered, and not for an earlier paint. See this newsgroup post for details on MozAfterPaint and transaction IDs.

So, to have something to compare the MozAfterPaint event's transactionId to, the post suggests querying nsIDOMWindowUtils.lastTransactionId. That's what the tresize change that :kats references in comment 2 does as well.

However, in tscrollx, it doesn't look like we have access to nsIDOMWindowUtils? (This comment says it runs as unprivileged code, and trying to access win.windowUtils on this object fails.)

We'd need a SpecialPowers like thing, there is something called TalosPowers in that test that might be a starting point.

Whiteboard: [perftest:triage]
Severity: -- → S3
Priority: -- → P3

:jmaher would TalosPowers be a suitable approach here?

Flags: needinfo?(jmaher)
Whiteboard: [perftest:triage]

I believe TalosPowers would work here. It has been a few years since I played with it, the intention was a very lightweight specialPowers

Flags: needinfo?(jmaher)
You need to log in before you can comment on or make changes to this bug.