Closed Bug 1730998 Opened 3 years ago Closed 3 years ago

Use `control` priority for PAPZ.RequestContentRepaint and process all pending RequestRequest in a single refresh driver tick

Tracking

()

Status:

RESOLVED FIXED

Milestone:

95 Branch

Tracking Flags:

Tracking

Status

firefox95

---

fixed

People

(Reporter: hiro, Assigned: hiro)

References

(Regressed 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

Bug 1730998 - Bail out if the pres context was destroyed in an early runner. r?smaug 3 years ago Hiroyuki Ikezoe (:hiro) (deleted), text/x-phabricator-request		Details
Bug 1730998 - Call advanceTimeAndRefresh inside a setTimeout callback repeatedly under test-controlled refresh driver mode. r?botond 3 years ago Hiroyuki Ikezoe (:hiro) (deleted), text/x-phabricator-request		Details
Bug 1730998 - Use `control` priority for RequestContentRepaint and NotifyFlushComplete, and process them in an early refresh driver runner. r?botond!,smaug! 3 years ago Hiroyuki Ikezoe (:hiro) (deleted), text/x-phabricator-request		Details
Bug 1730998 - Add a browser mochitest to make sure a position:sticky element gets painted properly during fast async scrolling. r?botond 3 years ago Hiroyuki Ikezoe (:hiro) (deleted), text/x-phabricator-request		Details
Bug 1730998 - Remove apz peek messages stuff. r?botond 3 years ago Hiroyuki Ikezoe (:hiro) (deleted), text/x-phabricator-request		Details

Hiroyuki Ikezoe (:hiro)

Assignee

Description

•

3 years ago

In APZ, we do call RequestContentRepaint a bunch of times (but basically we call it once per vsync tick) during fast scrolling

Unfortunately [Compress=all] is not suitable for the RequestContentRepaint since RequestContentRepaint calls should be differentiated by scrollIds. So for example, if there are two scroll frames, and both cause jank at the same time, we have two RequestContentRepaint calls with different scroll ids, and we need to process both requests in the content process respectively.

So a pseudo code will look like this;

void MessageChannel::StealMessages(msgid_t aMsgId, std::function<bool(const Message& aMsg)>& aMatcher) {
  for (MessageTask* task = mPending.getFirst(); task;) {
    Message& msg = task->Msg();
    if (msg.type() == aMsgId) {
      if (aMatcher(msg)) {
        task = task->removeAndGetnext();
      } else {
        task = task->getNext();
      }
    }
  }
}

Botond Ballo [:botond]

Comment 1

•

3 years ago

Here is a past discussion, about a different mechanism with the same goal: https://groups.google.com/g/mozilla.dev.platform/c/vfPTH8DcCf8/m/HCdaY16pjt4J

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 2

•

3 years ago

Touch events had possibly a bit similar issue and those use two separate ipc messages which can both be compressed (but not compress=all).
Sender side knows if compression is ok, and if not, it changes which message to send. That way compression doesn't kick in.

But if the messages were using Priority=control and receiver stored the data somewhere and handled that when RefreshDriver is about to tick, wouldn't that basically handle this case?

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 3

•

3 years ago

Thank you, Olli. I did try control locally, and it looks promising! We can filter out unnecessary repaints requests. I will figure out where we should store the data.

(And now I am pretty sure with the control priority we can drop the apz.peek_messages machinery entirely)

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 4

•

3 years ago

I am going to reuse this bug for what Olli suggested in bug 1730993 comment 2.

Assignee: nobody → hikezoe.birchill

Status: NEW → ASSIGNED

Component: IPC → Panning and Zooming

Summary: Add a machinery to steal the latest message(s) which matches what the call site wants from the MessageChannel's pending queue and discard the rest of the matched messages → Use `control` priority for PAPZ.RequestContentRepaint and process all pending RequestRequest in a single refresh driver tick

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 6

•

3 years ago

I've mostly finished writing the change. There are a couple of caveats on the change we need to care about;

PAPZ.NotifyFlushComplete also needs to be control and be delivered in a refresh driver's tick since some tests expect when the notification arrives all pending RepaintRequests have been finished (i.e. scroll position is surely updated as expected)
Under the test-controlled refresh driver mode, we need to call nsIDOMWindowUtils.advanceTimeAndRefresh in a couple of tests (in test utility functions) since without it any refresh driver tick doesn't happen
In my initial implementation there's a race condition where don't add as an early runner of the proper refresh driver when we've already added an early runner to a refresh driver but the previous refresh driver was destroyed due to a page transition

I thought I've addressed all the points, but there's still a similar race to 3), a reftest failed on a state waiting for "apz-repaints-flushed" notification.

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 7

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #6)

I thought I've addressed all the points, but there's still a similar race to 3), a reftest failed on a state waiting for "apz-repaints-flushed" notification.

Oops, I did compare raw pointers of the refresh drivers to tell whether it's been changed or not, in fact when the reftest failed, the pointer were same but different refresh drivers. I start thinking "apz-repaints-flushed" notification should have a unique id, and I also wonder it's a pre-exisiting issue?

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 8

•

3 years ago

I did compare the pres shell id there. I'd think it should work.

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 9

•

3 years ago

Looks like browser_test_scroll_thumb_dragging.js is the last failure test with the change, it fails only on Windows (opt?), I can't reproduce the failure locally on my Linux box, and I am not able to reproduce the failure on my Windows laptop either so far.

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 10

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #9)

Looks like browser_test_scroll_thumb_dragging.js is the last failure test with the change, it fails only on Windows (opt?), I can't reproduce the failure locally on my Linux box, and I am not able to reproduce the failure on my Windows laptop either so far.

The failure culprit was a browser mochitest I added. In the test I did use promisenativeMouseDrag without awaiting the returned function so the last "mouseup" event seems to be fired in the next test, browser_test_scroll_thumb_dragging.js, or promiseVerticalScrollbarDrag in browser_test_scroll_thumb_dragging.js didn't work as expected.

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 11

•

3 years ago

Attached file Bug 1730998 - Bail out if the pres context was destroyed in an early runner. r?smaug (deleted) — Details

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 12

•

3 years ago

Attached file Bug 1730998 - Call advanceTimeAndRefresh inside a setTimeout callback repeatedly under test-controlled refresh driver mode. r?botond (obsolete) (deleted) — Details

In the next change, we will move "apz-repaint-flushed" notification process and
updating RepaintRequests process into refesh driver's tick, thus those processes
will not be invoked under the test-controlled refresh driver mode without
explicit advanceTimeAndRefresh calls.

Depends on D127030

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 13

•

3 years ago

Attached file Bug 1730998 - Use `control` priority for RequestContentRepaint and NotifyFlushComplete, and process them in an early refresh driver runner. r?botond!,smaug! (deleted) — Details

Depends on D127031

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 14

•

3 years ago

Attached file Bug 1730998 - Add a browser mochitest to make sure a position:sticky element gets painted properly during fast async scrolling. r?botond (deleted) — Details

This test is based on a test case attached in bug 1585378.

This test has to be a browser mochitest since we need to drag the root content
vertical scroll thumb during the content process is under heavy load to build
a tons of display items.

Depends on D127032

Botond Ballo [:botond]

Updated

•

3 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1675899

Phabricator Automation

Updated

•

3 years ago

Attachment #9243614 - Attachment is obsolete: true

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 15

•

3 years ago

Attached file Bug 1730998 - Remove apz peek messages stuff. r?botond (deleted) — Details

Pulsebot

Comment 16

•

3 years ago

Pushed by hikezoe.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/64a2b879aa79 Bail out if the pres context was destroyed in an early runner. r=smaug https://hg.mozilla.org/integration/autoland/rev/60d769e12932 Use `control` priority for RequestContentRepaint and NotifyFlushComplete, and process them in an early refresh driver runner. r=botond,smaug https://hg.mozilla.org/integration/autoland/rev/7e496a4d1b55 Add a browser mochitest to make sure a position:sticky element gets painted properly during fast async scrolling. r=botond https://hg.mozilla.org/integration/autoland/rev/92641110e5c9 Remove apz peek messages stuff. r=botond

Natalia Csoregi [:nataliaCs]

Comment 17

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/64a2b879aa79
https://hg.mozilla.org/mozilla-central/rev/60d769e12932
https://hg.mozilla.org/mozilla-central/rev/7e496a4d1b55
https://hg.mozilla.org/mozilla-central/rev/92641110e5c9

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox95: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 95 Branch

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

3 years ago

Regressions: 1696785

Hiroyuki Ikezoe (:hiro)

Assignee

Updated

•

3 years ago

Regressions: 1734275

Sandor Molnar

Comment 18

•

3 years ago

Backed out for frequent Android gv-junit failures

Backout link

Push with failures

Failure log

Status: RESOLVED → REOPENED

status-firefox95: fixed → ---

Flags: needinfo?(hikezoe.birchill)

Resolution: FIXED → ---

Target Milestone: 95 Branch → ---

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 19

•

3 years ago

Yeah, Aryx noticed me last night about the failure. I think the failure reason is one of the HTML file used in the junit test has no meta viewport tag. I pushed a try run last night with a sane meta viewport tag. Though it seems to work, now I did retrigger it.

I will debug what's going on while the junit test is running later, what I am now suspecting is; without the meta tag the test HTML content gets zoomed in/out repeatedly (since the test modifies content's styles) so that the test doesn't work as expected. Anyway the failure is originally flaky (bug 1696785).

Here is the try; https://treeherder.mozilla.org/jobs?repo=try&revision=17ca730d36dccc6c145a6353848f58df86b0fb13

Looks like there's another failure, it looks more serious. :/

Flags: needinfo?(hikezoe.birchill)

Hiroyuki Ikezoe (:hiro)

Assignee

Comment 20

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #19)

Looks like there's another failure, it looks more serious. :/

Phew, it seems to be a real high frequent known failure (bug 1733737).

Hiroyuki Ikezoe (:hiro)

Assignee

Updated

•

3 years ago

Depends on: 1696785

No longer regressions: 1696785

Pulsebot

Comment 21

•

3 years ago

Pushed by hikezoe.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0d0fcd1370d0 Bail out if the pres context was destroyed in an early runner. r=smaug https://hg.mozilla.org/integration/autoland/rev/31bba792417d Use `control` priority for RequestContentRepaint and NotifyFlushComplete, and process them in an early refresh driver runner. r=botond,smaug https://hg.mozilla.org/integration/autoland/rev/fca207e78b12 Add a browser mochitest to make sure a position:sticky element gets painted properly during fast async scrolling. r=botond https://hg.mozilla.org/integration/autoland/rev/788cf4242ab0 Remove apz peek messages stuff. r=botond

Andreea Pavel [:apavel]

Comment 22

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0d0fcd1370d0
https://hg.mozilla.org/mozilla-central/rev/31bba792417d
https://hg.mozilla.org/mozilla-central/rev/fca207e78b12
https://hg.mozilla.org/mozilla-central/rev/788cf4242ab0

Status: REOPENED → RESOLVED

Closed: 3 years ago → 3 years ago

status-firefox95: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 95 Branch

Hiroyuki Ikezoe (:hiro)

Assignee

Updated

•

3 years ago

Blocks: 1714569

Hiroyuki Ikezoe (:hiro)

Assignee

Updated

•

3 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1609593

Timothy Nikkel (:tnikkel)

Updated

•

3 years ago

Regressions: 1758352

Botond Ballo [:botond]

Updated

•

3 years ago

Regressions: 1764878

Botond Ballo [:botond]

Updated

•

3 years ago

No longer regressions: 1758352

Botond Ballo [:botond]

Updated

•

2 years ago

Regressions: 1772732

Botond Ballo [:botond]

Updated

•

2 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1795784

You need to log in before you can comment on or make changes to this bug.