Closed
Bug 1208780
Opened 9 years ago
Closed 9 years ago
More Linux e10s test failures with APZ enabled (handoff chain for nested scrollable frames doesn't get layerized properly)
Categories
(Core :: Panning and Zooming, defect)
Tracking
()
RESOLVED
FIXED
mozilla45
People
(Reporter: kats, Assigned: tnikkel)
References
Details
Attachments
(5 files, 3 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/html
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
botond
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
botond
:
review+
|
Details | Diff | Splinter Review |
Bug 1205087 landed on inbound yesterday and caused a bunch of tests to fail if APZ is enabled. In [1] at the very least the M-4 failures (and most probably the reftest failures as well) are caused by the patches in bug 1205087; I verified the mochitest failure with a local bisection. I did another try push on tip of inbound [2] with those patches backed out and APZ enabled to see if that resolves all the failures.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=344c1090f7de
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ff080c48cb3
Assignee | ||
Comment 1•9 years ago
|
||
We sometimes change the AGR via RecomputeCurrentAnimatedGeometryRoot, so if a display item was init'd with an AGR, and then the AGR changed via RecomputeCurrentAnimatedGeometryRoot that might cause problems.
Reporter | ||
Comment 2•9 years ago
|
||
As far as I can tell the current call sites to RecomputeCurrentAnimatedGeometryRoot aren't the problem. I haven't fully confirmed this but I think the problem is that at [1] we set mShouldBuildScrollableLayer to true (and hence make the frame an animated geometry root) after we build the items for the children. This means all the children end up with a stale AGR and it never gets updated. Without matt's patches the AGR is computed more "dynamically" and so this isn't a problem. Arguably turning a scrollframe into an AGR after building its descendant display items isn't really that great but that's what we do here.
[1] http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsGfxScrollFrame.cpp?rev=8270ee9be504#3120
Reporter | ||
Comment 3•9 years ago
|
||
This seems to do the job for the mochitest for me locally. Apparently just recomputing the mAnimatedGeometryRoot recursively on all the display items in |scrolledContent| is not sufficient, because (a) the display items that get built are actually different and (b) properties on the display items such as the layerBounds and visibleRegion are also different.
Reporter | ||
Comment 4•9 years ago
|
||
Btw here is a page that you can load in the browser (with APZ enabled) that replicates the failure seen in the test. It is a regression that affects real-world cases so unless we can get a good fix today I'm inclined to back out matt's patches since he's on PTO until next week.
Reporter | ||
Updated•9 years ago
|
Summary: More Linux e10s test failures with APZ enabled → More Linux e10s test failures with APZ enabled (handoff chain for nested scrollable frames doesn't get layerized properly)
Reporter | ||
Comment 5•9 years ago
|
||
Try push with the AGR cache backed out based on latest m-c:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b0b4f3859cac
Reporter | ||
Comment 6•9 years ago
|
||
Try push is quite green (except for W-e10s which is hidden on m-c). I think I'll back out Matt's patches once the tree reopens, turn on apz on Linux (once bug 1208622 is landed), and work to get Matt's patches relanded with appropriate adjustments so this bug doesn't happen.
My justification for the backout is that even as-is the patches cause a legitimate correctness issue (comment 4) on OSX and Windows; it's just not caught by the tests because we don't have APZ tests running on those platforms. If I could fix it in-place that would be ideal but I think it will take a couple of days to get to that point and I don't want to leave the regression in that long. I also don't want to wait too long before doing the backout because that increases the risk of other stuff creeping in between.
I'll reopen bug 1205087 with the backout, and use this bug to land any patches that address the stale AGR problem. My current plan to tackle it is to somehow walk up the scroll handoff chain around [1] and set a flag on the scrollframes there that force them to layerize. That way we will know which scrollframes are getting layerized before we do any displaylist building, and won't have to change the decision after building descendant display items.
[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/util/APZCCallbackHelper.cpp?rev=25e572cce005#653
Reporter | ||
Comment 7•9 years ago
|
||
I backed out the AGR cache bits of bug 1205087 in this cset:
https://hg.mozilla.org/integration/mozilla-inbound/rev/15ed42bf5d4c
This bug no longer blocks turning on APZ on Linux, but we still need to fix it to reland that work from bug 1205087.
No longer blocks: apz-linux
Reporter | ||
Comment 8•9 years ago
|
||
Reporter | ||
Comment 9•9 years ago
|
||
Based on the reftest failures I suspect we do need to do something about the "first encountered scrollframe" case as well :(
Assignee | ||
Comment 10•9 years ago
|
||
After looking at the code again I think that the "first encountered scrollframe" code shouldn't activate an ancestor scroll frame (because it would have had a display port created for it by the same code). I made a try push with that code disable (otherwise the same as your push) to see if that is correct.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1bc16d6ce608
Reporter | ||
Comment 11•9 years ago
|
||
The green reftests (opt builds) on your push seem to indicate that the reftest failures are related to the "first encountered scrollframe" scenario, if I'm interpreting this correctly. I tried reproducing the failures locally and I can intermittently reproduce them, but haven't been able to capture it with rr. I'll try debugging it the old-fashioned way.
Reporter | ||
Comment 12•9 years ago
|
||
This is for deferred-anchor2.xhtml; in the case where the test fails it seems to be because the AGR for the SubDocument display item is the root-level Canvas frame rather than the subdocument's Viewport frame. Same goes for the LayerEventRegions item immediately inside the SubDocument displayitem. This causes that eventregions item to end up on a different layer than when the test passes.
Reporter | ||
Comment 13•9 years ago
|
||
So the SubDocument display item's bad AGR can be fixed by calling mAnimatedGeometryRootCache.Clear() at the top of RecomputeCurrentAnimatedGeometryRoot(). That way when the displayport is added to the root scroll frame, the AGR cache is cleared and later when the SubDocument display item is created it ends up with the right AGR. However the event regions for the subdocument is always created before calling BuildDisplayList on the children, so it ends up with a bad AGR and updating that one is a bit more tricky.
Reporter | ||
Comment 14•9 years ago
|
||
To get the right AGR on the event regions item I had to not only recompute them after building the children's display items but also get rid of the mCurrentAnimatedGeometryRoot field on the builder, because that is saved/restored via AutoBuildingDisplayList and gets restored to a stale value.
Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=d887e0766d33
Even if it's green though I think this approach introduces huge footguns and I don't want to land it.
Reporter | ||
Comment 15•9 years ago
|
||
I discovered a problem with the patch in the above try push (mPrevAnimatedGeometryRoot is left uninitialized). Here's an updated try push which removes less code: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9ff4510bb807
I think this one will still trigger assertion failures on the debug build but hopefully the opt builds will give us some useful results.
Updated•9 years ago
|
tracking-e10s:
--- → +
Updated•9 years ago
|
Blocks: apz-desktop
Reporter | ||
Comment 16•9 years ago
|
||
I also started working on a patchset that splits out this code into a separate pass like we sort of discussed on IRC. I got it to the point where test_layerization.html was passing but there's probably other bugs. The patches are the top two at https://github.com/staktrace/gecko-dev/commits/0ca1cd06ec9a99e1cf214cdee874f2f41dbb33e4.
I would like to move all the GetOrMaybeCreateDisplayPort calls into the new pass but it's trickier than I expected because doing it right requires knowing things like the dirty rect (which the call in nsSubDocumentFrame needs) or things like "aBuilder->GetIgnoreScrollFrame() == mOuter || IsIgnoringViewportClipping()". To do that I might need to pull those variables into the new pass as well, and anything it depends on, and so on.
Reporter | ||
Comment 17•9 years ago
|
||
I discussed this with tn on Friday and we figured the best approach was probably the one I tried first (comment 14). I'll pursue that a bit more and see if I can figure out the remaining failures.
Reporter | ||
Comment 18•9 years ago
|
||
Reporter | ||
Comment 19•9 years ago
|
||
^ That push was pretty green except for a few reftest failures which might be addressed by tn's patches on bug 1210578. Try push to see if that works:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=03210bccd8f2
Reporter | ||
Comment 20•9 years ago
|
||
(Anybody should feel free to steal my patches from the try push above and do whatever needs to be done to get bug 1205087 relanded)
Comment 21•9 years ago
|
||
Thanks for looking into this Kats :)
Looks like that fixed the reftests!
Comment 22•9 years ago
|
||
Attachment #8670601 -
Flags: review?(tnikkel)
Updated•9 years ago
|
Assignee: nobody → bugmail.mozilla
Comment 23•9 years ago
|
||
Attachment #8670602 -
Flags: review?(tnikkel)
Assignee | ||
Comment 24•9 years ago
|
||
This patch replaces kats' patch which added a "force layerization" field to scroll frames. Instead we just set a 0 margin display port on ancestor scroll frames. Instead of adding more confusing state we just use the existing state, and everything should work how how we want.
There were two callsites that I don't _think_ need to call SetZeroMarginDisplayPortOnAsyncScrollableAncestors because (correct me if I'm wrong) they never create a new displayport, only update an existing one: APZCCallbackHelper::UpdateRootFrame and APZCCallbackHelper::UpdateSubFrame.
It would be nice if there was one API that combined SetDisplayPortMargins/CalculateAndSetDisplayPortMargins and SetZeroMarginDisplayPortOnAsyncScrollableAncestors but a solution didn't immediately come to mind.
Attachment #8670601 -
Attachment is obsolete: true
Attachment #8670601 -
Flags: review?(tnikkel)
Attachment #8672965 -
Flags: review?(botond)
Comment 25•9 years ago
|
||
Comment on attachment 8672965 [details] [diff] [review]
Set a zero-margin display port on all async scrollable ancestors when we set a display port.
Review of attachment 8672965 [details] [diff] [review]:
-----------------------------------------------------------------
(In reply to Timothy Nikkel (:tn) from comment #24)
> There were two callsites that I don't _think_ need to call
> SetZeroMarginDisplayPortOnAsyncScrollableAncestors because (correct me if
> I'm wrong) they never create a new displayport, only update an existing one:
> APZCCallbackHelper::UpdateRootFrame and APZCCallbackHelper::UpdateSubFrame.
UpdateSubFrame can create a new displayport if the layer was a scroll info layer; do we still use those in some edge cases?
::: layout/base/nsLayoutUtils.cpp
@@ +3107,5 @@
> + if (nsLayoutUtils::AsyncPanZoomEnabled(frame) &&
> + !nsLayoutUtils::GetDisplayPort(frame->GetContent())) {
> + nsLayoutUtils::SetDisplayPortMargins(
> + frame->GetContent(), frame->PresContext()->PresShell(), ScreenMargin(), 0,
> + aRepaintMode);
I assume the multiple calls to SchedulePaint() that could happen if this function is called with aRepaintMode == Repaint are not a problem?
Assignee | ||
Comment 26•9 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #25)
> UpdateSubFrame can create a new displayport if the layer was a scroll info
> layer; do we still use those in some edge cases?
I think so. I'll update the patch for UpdateSubFrame, and assert in UpdateRootFrame.
> I assume the multiple calls to SchedulePaint() that could happen if this
> function is called with aRepaintMode == Repaint are not a problem?
Not a problem in that it will only schedule one more paint no matter how many times it's called. The GetDisplayRootFrame call in SchedulePaint could theoretically lead to quadratic behavior (probably not a problem in practice), but I can change the patch to just do one SchedulePaint call.
Comment 27•9 years ago
|
||
Comment on attachment 8672965 [details] [diff] [review]
Set a zero-margin display port on all async scrollable ancestors when we set a display port.
Review of attachment 8672965 [details] [diff] [review]:
-----------------------------------------------------------------
Sounds good, r+ with the change to UpdateSubFrame. I'll leave the SchedulePaint quadratic performance issue up to your judgment.
Attachment #8672965 -
Flags: review?(botond) → review+
Assignee | ||
Comment 28•9 years ago
|
||
(In reply to Timothy Nikkel (:tn) from comment #26)
> > I assume the multiple calls to SchedulePaint() that could happen if this
> > function is called with aRepaintMode == Repaint are not a problem?
>
> Not a problem in that it will only schedule one more paint no matter how
> many times it's called. The GetDisplayRootFrame call in SchedulePaint could
> theoretically lead to quadratic behavior (probably not a problem in
> practice), but I can change the patch to just do one SchedulePaint call.
Actually GetDisplayRootFrame is fast: it just walks the root frames of documents (and not every frame up the frame tree), unless there is a frame state bit indicating we are inside of a xul popup (which generally contain content we control, not random content from in the wild). Since the number of nested scrollable scroll frames would have to be pretty large before we even notice this, and then the number of layers we create would be huge, and we have to deal with those every paint, not just the first time we set a displayport. I think it's best to not worry about this.
Assignee | ||
Comment 29•9 years ago
|
||
Just the changes I made to my patch.
Attachment #8674629 -
Flags: review?(botond)
Updated•9 years ago
|
Attachment #8674629 -
Flags: review?(botond) → review+
Assignee | ||
Comment 30•9 years ago
|
||
Comment on attachment 8670602 [details] [diff] [review]
Ensure that when we turn something into an AGR we update things properly
>@@ -2480,18 +2487,18 @@ nsIFrame::BuildDisplayListForChild(nsDis
>- MOZ_ASSERT(buildingForChild.GetPrevAnimatedGeometryRoot() ==
>- aBuilder->FindAnimatedGeometryRootFor(child->GetParent()));
>+ //MOZ_ASSERT(buildingForChild.GetPrevAnimatedGeometryRoot() ==
>+ // aBuilder->FindAnimatedGeometryRootFor(child->GetParent()));
Note sure why kats commented this assert out. I pushed updated patches to try with this assert active (ie not commented out) to see if the assert is firing on try to try to figure it out.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=88b66d81ce70
Assignee | ||
Comment 31•9 years ago
|
||
Had to change the assert that the return from GetNearestScrollable has WantAsyncScroll because we specifically allow it to return the root scroll frame whether or not it has WantAsyncScroll.
New try push
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4209636a01d7
Assignee | ||
Comment 32•9 years ago
|
||
GetNearestScrollable with always match root works in a wierd way, so I wrote patches to fix that and pushed again
https://treeherder.mozilla.org/#/jobs?repo=try&revision=001ff1ed298c
Assignee | ||
Comment 33•9 years ago
|
||
(In reply to (away until Oct23) Kartikaya Gupta (email:kats@mozilla.com) from comment #13)
> So the SubDocument display item's bad AGR can be fixed by calling
> mAnimatedGeometryRootCache.Clear() at the top of
> RecomputeCurrentAnimatedGeometryRoot(). That way when the displayport is
> added to the root scroll frame, the AGR cache is cleared and later when the
> SubDocument display item is created it ends up with the right AGR. However
> the event regions for the subdocument is always created before calling
> BuildDisplayList on the children, so it ends up with a bad AGR and updating
> that one is a bit more tricky.
With my patches for bug 1210578 we should never turn something into an AGR after we've already created display items for it. That includes layer event regions and subdocument items. There are two types of frames that can be turned into an AGR by the creation of a displayport: the scrolled frame of the scroll frame with a displayport, and the root frame of a document who's root scroll frame has a displayport. In both cases the patches of bug 1210578 make it so that the displayport is already created (if one is going to get created at all) before we build any display items for the frame. For the scrolled frame case we just have to make sure the displayport is created in ScrollFrameHelper::BuildDisplayList before we build the display list for the scrolled frame. For the root frame case we need to make sure the displayport is created in nsLayoutUtils::PaintFrame and nsSubdocumentFrame::BuildDisplayList before we build any items. Bug 1210578 does this by calling DecideScrollableLayer in those places.
The mAnimatedGeometryRootCache might still be needed to be cleared, I'm not sure I can prove we don't at least ask for the AGR of some frame deeper in the frame tree (but it's easy to prove that we don't build display items for them).
Assignee | ||
Comment 34•9 years ago
|
||
Comment on attachment 8670601 [details] [diff] [review]
Force layerization on handoff chain of nested scrollables.
>+ while (true) {
>+ nsIFrame* frame = do_QueryFrame(scrollAncestor);
>+ nsIFrame* parent = nsLayoutUtils::GetCrossDocParentFrame(frame);
>+ parent = nsLayoutUtils::GetCrossDocParentFrame(parent);
I wondered why this patch got the parent twice in a row, seems it was not a mistake. It was to get around the problem that bite me with GetNearestScrollableFrame. I filed bug 1215977 to fix this.
Assignee | ||
Comment 35•9 years ago
|
||
(In reply to Timothy Nikkel (:tn) from comment #32)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=001ff1ed298c
I only included the mAnimatedGeometryRootCache.Clear() change in nsDisplayListBuilder::RecomputeCurrentAnimatedGeometryRoot() for this push because I think that is all that is needed.
The try run seems green, so I'm going to take that as validation that I haven't missed anything.
Assignee | ||
Comment 36•9 years ago
|
||
This is kats' patch, with my review comments address (ie removed the hunk that I think at unecessary).
Attachment #8675926 -
Flags: review+
Assignee | ||
Updated•9 years ago
|
Attachment #8670602 -
Attachment is obsolete: true
Attachment #8670602 -
Flags: review?(tnikkel)
Assignee | ||
Updated•9 years ago
|
Assignee: bugmail.mozilla → tnikkel
Assignee | ||
Comment 37•9 years ago
|
||
Comment on attachment 8675926 [details] [diff] [review]
Ensure that when we turn something into an AGR we update things properly v2
Moved this patch to bug 1220020.
Attachment #8675926 -
Attachment is obsolete: true
Assignee | ||
Comment 38•9 years ago
|
||
Comment 39•9 years ago
|
||
Comment 40•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox45:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
You need to log in
before you can comment on or make changes to this bug.
Description
•