1841305 - Firefox (intermittently) failing WPTs in css/cssom-view/scroll-behavior-main-frame-root.html and scroll-behavior-main-frame-window.html

Reporter

Description

•

1 year ago

The tests being failed in both of these are Main frame with auto scroll-behavior ; scroll() with smooth behavior, so I presume they may have the same underlying cause:

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 1

•

1 year ago

The failure happens at the first async scrolling which should have been processed by APZ, but in fact it doesn't since the test expands the root scroll container and at the time the failure test runs, APZ seems to not yet know the expanded size.

Severity: -- → S3

Component: Layout → Panning and Zooming

Priority: -- → P3

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 2

•

1 year ago

This is tough. The comment 1 isn't true. There's a race condition in between a paint in the browser parent process and a paint in the content process.

The race comes from this SetReferentId call in the paint in the browser parent process. When we call the function, it's too late for processing the async scroll update in APZ, the scroll update had been already reached to APZ and without the proper ReferentId it was ignored unfortunately.

I don't have any clear idea to solve the race. Queuing the scroll updates until the proper ReferentId comes if we know for sure it's coming (but how?)? Or we do skip painting in the content process until we got the proper ReferentId? The latter will probably regress first-contentful paint?

CCing Botond.

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 3

•

1 year ago

Note that just making the test pass reliably is to give a chance to the parent process to do a paint, waiting two rAF callbacks just before starting the async scroll test case here would be sufficient.

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 4

•

1 year ago

Kats had already noticed the race in bug 1437295.

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1437295

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 5

•

1 year ago

There's something which horribly takes some amount of times on the parent process. The time difference between when we call a paint when we got the LayersId of the content and when we call the SetReferentId in a paint is over 150ms, it's truly slow given that my machine is not so slow (16 cores machine)

Hiroyuki Ikezoe (:hiro)

Assignee

Updated

•

1 year ago

Summary: Firefox failing WPTs in css/cssom-view/scroll-behavior-main-frame-root.html and scroll-behavior-main-frame-window.html → Firefox (intermittently) failing WPTs in css/cssom-view/scroll-behavior-main-frame-root.html and scroll-behavior-main-frame-window.html

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 6

•

1 year ago

Okay, I've tracked down what's going on in between the 150ms time span.

When we got the proper ReferentId (i.e. LayersId) in BrowserParent::InitRendering(), we actually call InvalidateFrame, so a paint will be triggered in the next refresh driver's tick. Indeed it happens.

But unfortunately when we query the LayersId in the paint, we use the outdated mRetainedRemoteFrame. It was introduced in bug 1677324 to avoid flashing on page navigation.

Just calling RecvDidUnsuppressPainting instead of InvalidateFrame in BrowserParent::InitRendering() would solve this race, but I am almost 100% sure it will regress bug 1677324.

Emilio, what's the right course to fix this test failure and fix potential failure risks caused by the race? I can add awaiting two rAF callbacks into this test, but I have no idea how we can eliminate the potential risks.

Flags: needinfo?(emilio)

Emilio Cobos Álvarez (:emilio)

Comment 7

•

1 year ago

Do you know why we haven't unsuppressed by the time the test runs? The test runs on load, so we should have unsuppressed then.

Is the issue we don't hit this code on time? Or something else?

Do you have a pernosco trace we could poke at or something?

Flags: needinfo?(emilio) → needinfo?(hikezoe.birchill)

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 8

•

1 year ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #7)

Do you know why we haven't unsuppressed by the time the test runs? The test runs on load, so we should have unsuppressed then.

Is the issue we don't hit this code on time? Or something else?

Yeah the code actually hit, but it's too late. In the content process after calling UnsuppressAndInvalidate we build the scroll data in question and it's sent to APZ. In the meantime, the UnsuppressAndInvalidate reaches to the parent process via an IPC call, then in the parent process we do display list building with the proper LayersId and it's sent to APZ. But unfortunately when we got the proper LayersId in APZ the scroll data from the content process has been already processed.

Flags: needinfo?(hikezoe.birchill)

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 9

•

1 year ago

I'd like to propose to do a workaround in the tests themselves, and file a followup bug to fix the race because the purpose of the tests, I believe, is testing each scroll-behavior for each scroll operations (e.g. scrollTo) works correctly. Adding the workaround (awaiting two rAF) would not spoil the purpose.

We could also add a new wpt for the race itself. Actually I've written it. A new interesting found during I was writing it is the race happens only if there's a paint in the content after the paint where we sent the scroll update data. If there's no such paint we don't clobber the scroll update data here in APZUpdater::UpdateScrollDataAndTreeState even if the parent process's paint is delayed. And the paint in question in the test is triggered if there are two instant scroll operations before the async one. I don't know why. Though the new wpt I've written doesn't use instant scroll operations, it rather just changes the scroll container's color to make a paint happen.

Hiroyuki Ikezoe (:hiro)

Assignee

Updated

•

1 year ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1845646

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 10

•

1 year ago

I sent a PR to the upstream; https://github.com/web-platform-tests/wpt/pull/41200

See Also: → https://github.com/web-platform-tests/wpt/pull/41200

Hiroyuki Ikezoe (:hiro)

Assignee

Updated

•

1 year ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1786332

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 11

•

1 year ago

The PR has been merged.

Assignee: nobody → hikezoe.birchill

Status: NEW → RESOLVED

Closed: 1 year ago

Resolution: --- → FIXED

Bugzilla

Firefox (intermittently) failing WPTs in css/cssom-view/scroll-behavior-main-frame-root.html and scroll-behavior-main-frame-window.html

Categories

(Core :: Panning and Zooming, defect, P3)

Tracking

()

People

(Reporter: twisniewski, Assigned: hiro)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Updated

Comment 11