Closed Bug 1664638 Opened 4 years ago Closed 4 years ago

Intermittent text-overflow/xulscroll.html == text-overflow/xulscroll-ref.html | image comparison, max difference: 255, number of differing pixels: 5051

Categories

(Core :: Layout: Text and Fonts, defect, P5)

defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox80 --- unaffected
firefox81 --- unaffected
firefox82 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #2)

Appeared on this Try push before hitting autoland.

That try push does not have bug 1655130. The only patches it has that are landed came from kats' stuff pushed a day or two ago.

Flags: needinfo?(tnikkel)
Has Regression Range: --- → yes
Flags: needinfo?(kats)

Thanks. On PTO today but I'll look tomorrow.

Assignee: nobody → kats
Flags: needinfo?(kats)

I started looking into this. Can reproduce it easily enough but am having a hard time making sense of what's going on. The scrollframe in question that isn't scrolled properly is an RTL one, so I would expect the x-coord of the scrollable rect to be negative. This is true when I load the page standalone but for some reason doesn't seem to happen in the reftest. But aside from that there seems to be some issue with the visual scroll offset clobbering the scroll update, or something. So it does seem like a regression from my patches, but it's not immediately clear to me what the problem might be.

I think the RTL vs non-RTL numbers are some sort of bug, but orthogonal to the problem here. Here what we have happening is that initially the page loads, and APZ gets the initial "scrollframe constructed" ScrollPositionUpdate, with the scroll position at (0,0). And along with that is a visual scroll offset update which scrolls the scrollframe over to the rightmost extent, which is where it's supposed to be. And then the page triggers a scrollframe reconstruction, and APZ gets another ScrollPositionUpdate with the scroll position at (0,0) which scrolls the scrollframe over to the leftmost extent.

The reason it's intermittent is that sometimes the scrollframe reconstruction happens before the first paint. In that scenario we just get the one paint with two scroll position updates to (0,0) followed by the visual scroll offset update, so things up in the right spot. I can reliably reproduce it by delaying the scrollframe reconstruction to run in a rAF callback.

I also tried implementing the changes suggested in Botond's review comment at https://phabricator.services.mozilla.com/D88744#inline-510374 but that didn't help here. I'm going to keep digging to figure out where the visual scroll offset update is coming from and why it doesn't re-appear on the reconstructed scrollframe.

The visual scroll offset update I referred to in my previous comment is not actually an update; the updatetype is eNone. It's just that the APZ is in the isDefault state and so it applies the visual scroll offset in the metrics.

The fundamental problem here though is that when a scrollframe reconstruction happens, it can get reconstructed to a nonzero scroll position without going going through RestoreState or running any meaningful code in ScrollToRestoredPosition. Instead it seems like the scrolled frame's position is already at the nonzero scroll position. In this situation right now the only ScrollPositionUpdate we send to APZ is the one from the ScrollFrameHelper's constructor, which bakes in a zero scroll position. If we instead populate that with the correct scroll position that should solve this problem. Unfortunately I don't know if that's doable right in the constructor as the mScrolledFrame is only populated later. I'll see if I can figure out a good way to accomplish this.

Least bad way I could think to do this was to replace the origin-None ScrollPositionUpdate from the constructor with a better one later that has the actual scroll position. Try push to see if this breaks anything: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=39129467b6125ca5a1bd29766aca409f09a8db10

I'd like to understand this scroll that comes from nowhere, if not for this bug, then for my understanding for the future. When does the scroll position get to a non-zero value and how? Immediately when all of the frames involved have been constructed but not reflowed yet? Or (as the try push suggests) after the first reflow? Can you override nsIFrame::Init in nsHTML/XULScrollFrame and do it there?

I can look in more detail tomorrow but I verified that the scroll position is still zero after the call to ReloadChildFrames which populates mScrolledFrame. At some point between that and the end of the reflow, the scroll position changes, I assume because something in the reflow code sets a position on the scrolled frame. I can use rr tomorrow to find out exactly where.

So maybe this is some confusion between physical and logical coordinates, I'm not really sure. Here's the backtrace at which point the scrollframe's scroll position "changes", because the mScrolledFrame's rect is modified.

At the point of this call the childRect has a 0,0 top-left because both mHelper.mScrollPort.TopLeft() and aScrollPosition are 0. However inside ClampAndSetBounds we take the non-LTR codepath which sets the x-coord of the rect to a negative number, and that produces a positive scroll position.

This might also explain why it's only affecting a XUL test, since the relevant code seems specific to XULScrollFrame. I haven't checked if there's an equivalent codepath in HTMLScrollFrame.

I poked around some more but I don't understand the LTR/RTL semantics of things inside nsGfxScrollFrame (as in, which quantities are supposed to be in logical vs physical coordinates) well enough to tell if there's an underlying bug here. But given that this is happening, and my fix corrects the problem, I'll put it up for review. If we find an underlying RTL bug then we can always not land it or back it out.

In some cases (in this scenario, with a RTL XUL scrollframe), the scrollframe
can have a nonzero scroll position upon construction or reconstruction, without
having executed the scrolling code in ScrollFrameHelper::RestoreState or
ScrollFrameHelper::ScrollToRestoredPosition. In such cases APZ is not properly
notified of the scroll position, and it can get into a state where it is out of
sync with the main thread scroll position (until a subsequent scroll).

To correct this, we replace the ScrollPositionUpdate that is created during
scrollframe construction with a new ScrollPositionUpdate that holds the correct
scroll position, so that APZ is properly notified of the scroll position.

Thanks. It sort of seems like our assumption that scroll frames start at 0,0 is wrong. Would it make sense to use the frame bit NS_FRAME_FIRST_REFLOW to detect this?

I can certainly add an assertion that the bit is set. I think we still want to check mScrollUpdates has the expected size and contains a ScrollOrigin::None, to ensure that we're only doing the replacement in the exact condition we care about. This is mostly for more robust handling in release builds where the assertion failure won't crash - we still want to have the most sane behaviour that we can.

Hm, actually that bit doesn't even seem to be set in this case.

For posterity, we discussed this further in #apz. The bit isn't set because ReflowFinished() runs after the bit is cleared. I added a new flag in ScrollFrameHelper to catch the first ReflowFinished() and made the code conditional on that.

New try push: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=b46f6d8360933a40e9422484055fd64f15ee23d8

Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bf270919afa6
Replace the ScrollPositionUpdate from scrollframe construction with one that has the scroll position. r=tnikkel
Regressions: 1666060
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: