Closed Bug 1257491 Opened 4 years ago Closed 4 years ago

Perma-checkerboard on bugzilla

Categories

(Core :: Panning and Zooming, defect)

48 Branch
x86
Windows
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- unaffected
firefox48 --- fixed

People

(Reporter: jimm, Assigned: kats)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [gfx-noted])

Attachments

(7 files, 2 obsolete files)

Attached image checkerboard.jpg (deleted) —
I see checkerboarding pretty regularly on bugzilla bug pages, see attached. I'm not sure what triggered this for this particular bug, I don't think it has anything to do with it being sec related. The only things I can think of is that this particular bug happens to have a very log cc list. It's bug 322423.

48.0a1 (2016-03-16)
Attached file aboutsupport.txt (deleted) —
STR:

1) load the bug
2) wait a few seconds
3) scroll down a bit with the mouse wheel
Can you repro, then go to about: checkerboard, find the log for the checkerboard event, copy the raw log from the textarea at the bottom, and attach it to this bug? Thanks!
Keywords: perf
OS: Unspecified → Windows
Hardware: Unspecified → x86
Whiteboard: [gfx-noted]
Version: Trunk → 48 Branch
Attached file log.txt (deleted) —
For this event, I scrolled down until the checkerboard area filled the entire visible view, then clicked in content which triggered a paint.
Summary: Checkerboard on bugzilla → Perma-checkerboard on bugzilla
Attached file consoleout.txt (deleted) —
It's not clear to me how we get into this state, but the reason we don't get out of it is that when we run the paint-skipping checks, we compare the "old" displayport to the "new" displayport and allow paint-skipping if they are the same. However the "old" displayport isn't actually what we sent over to the APZ, it's what we *think* we sent over to the APZ. That is, instead of recording the displayport that we actually sent over, we recompute it by calling nsLayoutUtils::GetDisplayPort. And that depends on a bunch of things (including stuff like displayport suppression) which may have changed. So what *think* the old displayport was might not be what it actually was, and so we don't repaint and APZ gets stuck with a short displayport, leading to checkerboarding.
I'm still stumped on why we enter this state. From the log, we see that the APZC (0F67A000) gets the first-paint notification with the displayport h=1408. Note that since the composition bounds height is 1213, this means the displayport was calculated with a zero (or small) margin, and aligned to 128 pixels. In particular it means displayport suppression was NOT active at the time.

We later see the APZC request a content repaint with margins dpm=(l=-0.000000, t=-0.000000, r=0.000000, b=3032.500000) which is supposed to enlarge the displayport and kick off a paint with the new larger displayport. The SchedulePaint() call in nsLayoutUtils::SetDisplayPortMargins is supposed to do that. However for some reason that doesn't happen.

Beyond that point, whenever we hit the paint-skipping code in nsGfxScrollFrame it compares the "old" and "new" displayport, both of which are computed using the larger margins, and so we skip the paint. This continues happening until we scroll down far enough that the "new" displayport rolls down to the next virtual tile, at which point the paint is triggered.
Attached patch Logging patch (deleted) — Splinter Review
Jim, can you do the same with this logging patch? It has some additional logging around the part that is eluding me. Hopefully it'll shed more light on how we end up in this state. Thanks!
Flags: needinfo?(jmathies)
Assignee: nobody → bugmail.mozilla
Attached file log2.txt (obsolete) (deleted) —
Flags: needinfo?(jmathies)
Flags: needinfo?(bugmail.mozilla)
This log looks like it is missing the child process output - in particular I wanted the output from APZCCallbackHelper and nsLayoutUtils which isn't on this log :(
Flags: needinfo?(bugmail.mozilla) → needinfo?(jmathies)
Attached file log3.txt (obsolete) (deleted) —
here you go
Attachment #8732337 - Attachment is obsolete: true
Flags: needinfo?(jmathies)
Comment on attachment 8732640 [details]
log3.txt

That doesn't have it either. Having trouble reproducing while attached to the content process.
Attachment #8732640 - Attachment is obsolete: true
Attached file BEAST.LOG (deleted) —
here we go
Flags: needinfo?(bugmail.mozilla)
Attachment #8732643 - Attachment mime type: text/x-log → text/plain
Thanks! From the log I see that initially we have this:

00009054    9.83381653  [3272] APZCCH: Init root dp for scrollId 4
00009055    9.83383751  [3272] staktrace: SetDPM (x=0, y=0, w=0, h=0) (x=0, y=0, w=7680, h=7680) 1 0 0B4B3DF8

which initializes the displayport margins to 0 for that frame. Then the next displayport margins change recorded is this:

00009209    9.96075821  [3272] APZCCH: Setting margins (l=-0.000000, t=-0.000000, r=0.000000, b=3347.500000) (hadDisplayPort: 1) for id 4
00009210    9.96081734  [3272] staktrace: SetDPM (x=0, y=0, w=106200, h=291840) (x=0, y=0, w=106200, h=291840) 0 1 0B4B3DF8

which increases the margins to b=3347. However the SetDPM call sees that the "old margins" have already been enlarged, which means some other piece of code is enlarging the margins without having triggered a repaint. The only other code that touches the DisplayPortMargins content property directly is the newly-added UpdateDisplayPortMarginsForPendingMetrics optimization code, which is almost certainly the cause of this regression - it updates the displayport margins without scheduling a paint.

I'll put up a patch shortly.
Flags: needinfo?(bugmail.mozilla)
You should also be able to verify this diagnosis by setting apz.peek_messages.enabled to false and verifying that the issue is no longer reproducible.
Attached patch Patch (deleted) — Splinter Review
This causes the PeekMessages update of the displayport to go through the existing nsLayoutUtils function to update the displpayport, and in particular, should call SchedulePaint() if the displayport has changed via PeekMessages.

Jim, can you check this patch fixes the issue for you?
Attachment #8732902 - Flags: review?(bgirard)
Attachment #8732902 - Flags: feedback?(jmathies)
Duplicate of this bug: 1258441
Comment on attachment 8732902 [details] [diff] [review]
Patch

Review of attachment 8732902 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch, thanks kats!
Attachment #8732902 - Flags: review?(bgirard) → review+
Comment on attachment 8732902 [details] [diff] [review]
Patch

After a few tries to reproduce I'd say this seems to have fixed it.
Attachment #8732902 - Flags: feedback?(jmathies) → feedback+
Great, thanks for checking!
https://hg.mozilla.org/mozilla-central/rev/92e44f28f7ac
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
No longer blocks: 1258426
Duplicate of this bug: 1258426
You need to log in before you can comment on or make changes to this bug.