Closed Bug 1730998 Opened 3 years ago Closed 3 years ago

Use `control` priority for PAPZ.RequestContentRepaint and process all pending RequestRequest in a single refresh driver tick

Categories

(Core :: Panning and Zooming, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
95 Branch
Tracking Status
firefox95 --- fixed

People

(Reporter: hiro, Assigned: hiro)

References

(Regressed 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

In APZ, we do call RequestContentRepaint a bunch of times (but basically we call it once per vsync tick) during fast scrolling

Unfortunately [Compress=all] is not suitable for the RequestContentRepaint since RequestContentRepaint calls should be differentiated by scrollIds. So for example, if there are two scroll frames, and both cause jank at the same time, we have two RequestContentRepaint calls with different scroll ids, and we need to process both requests in the content process respectively.

So a pseudo code will look like this;

void MessageChannel::StealMessages(msgid_t aMsgId, std::function<bool(const Message& aMsg)>& aMatcher) {
  for (MessageTask* task = mPending.getFirst(); task;) {
    Message& msg = task->Msg();
    if (msg.type() == aMsgId) {
      if (aMatcher(msg)) {
        task = task->removeAndGetnext();
      } else {
        task = task->getNext();
      }
    }
  }
}

Here is a past discussion, about a different mechanism with the same goal: https://groups.google.com/g/mozilla.dev.platform/c/vfPTH8DcCf8/m/HCdaY16pjt4J

Touch events had possibly a bit similar issue and those use two separate ipc messages which can both be compressed (but not compress=all).
Sender side knows if compression is ok, and if not, it changes which message to send. That way compression doesn't kick in.

But if the messages were using Priority=control and receiver stored the data somewhere and handled that when RefreshDriver is about to tick, wouldn't that basically handle this case?

Thank you, Olli. I did try control locally, and it looks promising! We can filter out unnecessary repaints requests. I will figure out where we should store the data.

(And now I am pretty sure with the control priority we can drop the apz.peek_messages machinery entirely)

I am going to reuse this bug for what Olli suggested in bug 1730993 comment 2.

Assignee: nobody → hikezoe.birchill
Status: NEW → ASSIGNED
Component: IPC → Panning and Zooming
Summary: Add a machinery to steal the latest message(s) which matches what the call site wants from the MessageChannel's pending queue and discard the rest of the matched messages → Use `control` priority for PAPZ.RequestContentRepaint and process all pending RequestRequest in a single refresh driver tick

I've mostly finished writing the change. There are a couple of caveats on the change we need to care about;

  1. PAPZ.NotifyFlushComplete also needs to be control and be delivered in a refresh driver's tick since some tests expect when the notification arrives all pending RepaintRequests have been finished (i.e. scroll position is surely updated as expected)
  2. Under the test-controlled refresh driver mode, we need to call nsIDOMWindowUtils.advanceTimeAndRefresh in a couple of tests (in test utility functions) since without it any refresh driver tick doesn't happen
  3. In my initial implementation there's a race condition where don't add as an early runner of the proper refresh driver when we've already added an early runner to a refresh driver but the previous refresh driver was destroyed due to a page transition

I thought I've addressed all the points, but there's still a similar race to 3), a reftest failed on a state waiting for "apz-repaints-flushed" notification.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #6)

I thought I've addressed all the points, but there's still a similar race to 3), a reftest failed on a state waiting for "apz-repaints-flushed" notification.

Oops, I did compare raw pointers of the refresh drivers to tell whether it's been changed or not, in fact when the reftest failed, the pointer were same but different refresh drivers. I start thinking "apz-repaints-flushed" notification should have a unique id, and I also wonder it's a pre-exisiting issue?

I did compare the pres shell id there. I'd think it should work.

Looks like browser_test_scroll_thumb_dragging.js is the last failure test with the change, it fails only on Windows (opt?), I can't reproduce the failure locally on my Linux box, and I am not able to reproduce the failure on my Windows laptop either so far.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #9)

Looks like browser_test_scroll_thumb_dragging.js is the last failure test with the change, it fails only on Windows (opt?), I can't reproduce the failure locally on my Linux box, and I am not able to reproduce the failure on my Windows laptop either so far.

The failure culprit was a browser mochitest I added. In the test I did use promisenativeMouseDrag without awaiting the returned function so the last "mouseup" event seems to be fired in the next test, browser_test_scroll_thumb_dragging.js, or promiseVerticalScrollbarDrag in browser_test_scroll_thumb_dragging.js didn't work as expected.

In the next change, we will move "apz-repaint-flushed" notification process and
updating RepaintRequests process into refesh driver's tick, thus those processes
will not be invoked under the test-controlled refresh driver mode without
explicit advanceTimeAndRefresh calls.

Depends on D127030

This test is based on a test case attached in bug 1585378.

This test has to be a browser mochitest since we need to drag the root content
vertical scroll thumb during the content process is under heavy load to build
a tons of display items.

Depends on D127032

Attachment #9243614 - Attachment is obsolete: true
Pushed by hikezoe.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/64a2b879aa79 Bail out if the pres context was destroyed in an early runner. r=smaug https://hg.mozilla.org/integration/autoland/rev/60d769e12932 Use `control` priority for RequestContentRepaint and NotifyFlushComplete, and process them in an early refresh driver runner. r=botond,smaug https://hg.mozilla.org/integration/autoland/rev/7e496a4d1b55 Add a browser mochitest to make sure a position:sticky element gets painted properly during fast async scrolling. r=botond https://hg.mozilla.org/integration/autoland/rev/92641110e5c9 Remove apz peek messages stuff. r=botond
Regressions: 1734275

Backed out for frequent Android gv-junit failures

Backout link

Push with failures

Failure log

Status: RESOLVED → REOPENED
Flags: needinfo?(hikezoe.birchill)
Resolution: FIXED → ---
Target Milestone: 95 Branch → ---

Yeah, Aryx noticed me last night about the failure. I think the failure reason is one of the HTML file used in the junit test has no meta viewport tag. I pushed a try run last night with a sane meta viewport tag. Though it seems to work, now I did retrigger it.

I will debug what's going on while the junit test is running later, what I am now suspecting is; without the meta tag the test HTML content gets zoomed in/out repeatedly (since the test modifies content's styles) so that the test doesn't work as expected. Anyway the failure is originally flaky (bug 1696785).

Here is the try; https://treeherder.mozilla.org/jobs?repo=try&revision=17ca730d36dccc6c145a6353848f58df86b0fb13

Looks like there's another failure, it looks more serious. :/

Flags: needinfo?(hikezoe.birchill)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #19)

Looks like there's another failure, it looks more serious. :/

Phew, it seems to be a real high frequent known failure (bug 1733737).

Depends on: 1696785
No longer regressions: 1696785
Pushed by hikezoe.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0d0fcd1370d0 Bail out if the pres context was destroyed in an early runner. r=smaug https://hg.mozilla.org/integration/autoland/rev/31bba792417d Use `control` priority for RequestContentRepaint and NotifyFlushComplete, and process them in an early refresh driver runner. r=botond,smaug https://hg.mozilla.org/integration/autoland/rev/fca207e78b12 Add a browser mochitest to make sure a position:sticky element gets painted properly during fast async scrolling. r=botond https://hg.mozilla.org/integration/autoland/rev/788cf4242ab0 Remove apz peek messages stuff. r=botond
Blocks: 1714569
Regressions: 1758352
Regressions: 1764878
No longer regressions: 1758352
Regressions: 1772732
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: