Closed Bug 1443027 Opened 6 years ago Closed 6 years ago

Merging gets incorrect results when the ordering changes without an invalidation

Categories

(Core :: Web Painting, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 blocking disabled
firefox61 + fixed

People

(Reporter: mattwoodrow, Assigned: mattwoodrow)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(2 files)

I've attached Markus' testcase, which shows the incorrect merging.

This is partially a regression from bug 1439809, which changes the testcase required to hit this bug, but doesn't actually cause the underlying issue.

I'll back out bug 1439809, and then fix the problem here.
Priority: -- → P1
Markus created another testcase which breaks even with the above changes to the merge algorithm.

I think I understand the fundamental issue here now, so will try explain it here:

In the first testcase we start with the display list A, B, D. We then make C visible, build the display list for C (and things that intersect C): C, D.

Merging can match D in both lists, and know that it must be last, but the relative ordering of A,B and C is undefined (which is expected, since there's no intersections).

The merging picks C to go first (changing the algorithm to pick A,B doesn't help, since we can create a similar testcase where the reverse ordering breaks things) and we end up with C, A, B, D. This is valid, and correct, since the ordering of all items that intersect is correct.

The next paint removes D, and builds a display list for the intersections of D: B, C.

The merge algorithm know knows B must come before C, and we should use the old orderings for everything else. In the old list A is after C, and before B. If we move B to come before C, we can no longer have both of those be true for A.

We don't have enough information to know how to resolve this, so we're effectively resolving it arbitrarily. Unsurprisingly, there are cases where we get it wrong, and it affects rendering.

This seems unfixable, without getting more information to make the right decisions.

The important information that we lost is that A,B must be ordered, but [A,B], C does not.

We computed the display list C,A,B,D, but this might have been better represented as a graph:

A
|
B   C
 \ /
  D

From there, the knowledge that D has been removed, and B must be before C can be sanely resolved to A,B,C.
I think if we create a group around a set of consecutive items from the new list (possibly using a subclass of nsDisplayWrapList), then we can represent strong orderings more easily.

For the above example, the initial display list merge would then result in C, [A,B], D.

If we treat the group as atomic in terms of ordering, but looking for matches would match the group if any contained item matched, then I think this solves our issue.

We'd still have to remove an invalidated items from the group, as well as entries from the group that match items later in the new list.

For this example:

Merging C, [A,B], D with B,C (and D invalidated), B would match the [A,B] group, skip C in the old list (since it's in the new list), add the group [A,B], and then C.
You can always reconstruct the overlap graph of the stored list by looking at the items' bounds, can't you? And merging a retained complete overlap graph with a partial update overlap graph should be doable.

Not sure how to translate this into an (efficient) algorithm, though...
(In reply to Markus Stange [:mstange] from comment #5)
> You can always reconstruct the overlap graph of the stored list by looking
> at the items' bounds, can't you? And merging a retained complete overlap
> graph with a partial update overlap graph should be doable.
> 
> Not sure how to translate this into an (efficient) algorithm, though...

Indeed, though reconstructing the information seems worse (in principle) than retaining it in the first place.

I also haven't yet figured out what the algorithm would be to use this information, but I do agree that it should be possible.
I don't think the grouping I proposed earlier is sufficient, seems like there's case where it still gets confused.

New idea:

Any time we don't know the canonical ordering between two sets of display items, insert a marker item that has pointers to the start/end of the second set (GetAbove() returns the start of the first set, and the start of the second is the end of the first). We can then choose to iterate through the resulting list in the order that we picked, or the reverse.

When merging into a list with these markers, and a matched item exists in one of the two sets, then decide to iterate such that that set is first.

If we later get a match into the other half of the set, then we now know the right ordering between the two sets, so we can omit the markers in the merged list.

So basically anytime we potentially diverge from the canonical ordering, we record where, and what the choices were. If we ever get more information in the future to make a better decision, then we do so then.

Figuring out how to implement that in a sane way will be interesting. I think we can compute indices for items when we build the hashtable, so when we're walking through the old list looking for an item and reach a marker, we can check the indices of the item and the extents of the marker to determine if it's contained, and in which branch. Will need lots of help lazily processing items into the merged list, since we need to wait to find out if we need the markers.

Jeff, does your planned area based merging have similar problems? I'll think more about how we can check for intersections between items to resolve conflicts too.
I have an implementation of bogo_merge here: https://github.com/jrmuizel/play2d

It uses the previous merge algorithm and then permutes the result until it matches the ordering constraints from the old list and the new list. This is obviously very inefficient but it does seem to pass all of the test cases we've come up with so far. I'm going to make the tests more exhaustive and hopefully implement a saner version tomorrow.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)
> I have an implementation of bogo_merge here:
> https://github.com/jrmuizel/play2d
> 
> It uses the previous merge algorithm and then permutes the result until it
> matches the ordering constraints from the old list and the new list. This is
> obviously very inefficient but it does seem to pass all of the test cases
> we've come up with so far. I'm going to make the tests more exhaustive and
> hopefully implement a saner version tomorrow.

That's fun! I like it :)

I had a think about this too. 

My current conclusion is that when we're walking through the old list looking for the matched item and copying items into the merged list, we only want to copy items that are not invalidated, do not have a later entry in the new list and geometrically need to be before the matched item. Any items that don't intersect with the matched item can stay in the old list, and we'll process them later on.

Unfortunately that means we need to build a full DAG for all items between the bottom of the old list, and the matched item. It seems like that might be prohibitively expensive, since it won't be uncommon for the old list to be big, and the new list to contain only an item near the very top. Computing an edge list for the entire display list seems painful.

I guess we can take some shortcuts, like we only need a single path to the matched item to confirm a dependency, so we don't need to look for other edges once we find the first. Still unsure if that's going to be acceptable.
Gankro and I have a not quite working version of a merge algorithm in https://github.com/jrmuizel/play2d. It uses a defer list and is everything is reasonably cheap to compute. However, there are still some broken test cases I'm working through. (It does pass all of the 4 rectangle ABCD test cases) It fails on a 5 rectangle case.
Alexis found a test case that seems to require O(n^2) intersection checks: https://usercontent.irccloud-cdn.com/file/ukpp2MQh/IMG_0009.PNG
Trying to convert comment 7 into a pseudo-code algorithm:

For-each item i in the new list:
     If the item is not modified and has a matching item in the old list:
         If there is a current marker, insert a middle marker
         Remove items from the start of the old list up until we reach the matching item:
             If the old-list item is a start marker, and the matched item occurs before the end marker, but after the middle marker, then swap the ordering of the halves such that it is now in the first half. Record the marker in a stack of currently processing markers, and mark that we're processing the first half of it.
             If the old-list item is a middle marker, note that we're now processing the second half of it on the currently processing markers stack.
             If the old-list item is an end marker, and the currently processing markers stack entry shows that we recorded a match for both halves, then backtracking into the merged list and strip out the begin/middle markers. Otherwise continue with letting the end marker be added to the merged list. Pop the top entry from the currently processing markers stack.
             If the old-list item is modified, then destroy it.
             If the old-list item has a match in the new list, merge child lists into the new version and then destroy the old.
               This item will be added when we reach the new item in the outer loop.
             Otherwise, add the old-list item to the merged list.
         Record that we found a match for an item in the current half of all entries in the currently processing markers stack.
         If there is a current marker, insert an end marker, and clear the current marker.
         Add i into the merged list, merging child lists if we found a match in the old list.
     Else
         If there is no current marker, then insert a start marker into the old list and set the current marker
         Add i into the merged list, merging child lists if we found a match in the old list.
If there is a current marker, insert a middle marker
Add all remaining valid items from the old list into the merged list, skipping over (and destroying) any that are modified
If there is a current marker, insert an end marker, and clear the current marker.


A worked example from the second testcase in the patches:

Initial display list is A,B,D,  new display list is C,D and invalidations is C.

The merged display list would be: Start,C,Middle,A,B,End,D

On the next paint the new list is B,C and invalidations D.

We'd match B, find the Start marker and figure out that B exists within the marker, but after the Middle. We'd then swap the ordering so that it's in the first half, giving us:

Start,A,B,Middle,C,End,D.

We insert Start,A,B (recording that we found a match in the first half of the marker), then Middle, C (recording that we found a match in the second half of the marker), find the End, strip all the markers since we matched both halves and now have a defined ordering between them, and finally drop D since it's invalid.

Result: A,B,C.
IIUC, Fx60 was already fixed via backout here. Matt, feel free to correct that if I'm misunderstanding.
Some variations of this still exist, even with the backout.

I really want to get this fixed and uplifted to 60, if it all possible.
Comment on attachment 8955960 [details]
Bug 1443027 - Fix the merging algorithm to pass the new tests correctly.

https://reviewboard.mozilla.org/r/224908/#review235960


Code analysis found 3 defects in this patch:
 - 3 defects found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: layout/painting/RetainedDisplayListBuilder.cpp:422
(Diff revision 2)
> -        nsDisplayList empty;
> -        Maybe<const ActiveScrolledRoot*> containerASRForChildren;
> +  DirectedAcyclicGraph<MergedListUnits> mMergedDAG;
> +  nsDataHashtable<DisplayItemHashEntry, size_t> mMergedKeyLookup;
> +  bool mResultIsModified;
> +};
>  
> -        if (MergeDisplayLists(&empty, old->GetChildren(),
> +RetainedDisplayList::RetainedDisplayList(nsDisplayList* aList)

Error: Bad implicit conversion constructor for 'retaineddisplaylist' [clang-tidy: mozilla-implicit-constructor]

::: layout/painting/nsDisplayList.h:3353
(Diff revision 2)
> +  {
> +    AppendToTop(&aOther);
> +    mDAG = mozilla::Move(aOther.mDAG);
> +    mKeyLookup.SwapElements(aOther.mKeyLookup);
> +  }
> +  RetainedDisplayList(nsDisplayList* aList);

Error: Bad implicit conversion constructor for 'retaineddisplaylist' [clang-tidy: mozilla-implicit-constructor]

::: layout/painting/nsDisplayList.h:3353
(Diff revision 2)
> +  {
> +    AppendToTop(&aOther);
> +    mDAG = mozilla::Move(aOther.mDAG);
> +    mKeyLookup.SwapElements(aOther.mKeyLookup);
> +  }
> +  RetainedDisplayList(nsDisplayList* aList);

Error: Bad implicit conversion constructor for 'retaineddisplaylist' [clang-tidy: mozilla-implicit-constructor]
Comment on attachment 8955960 [details]
Bug 1443027 - Fix the merging algorithm to pass the new tests correctly.

https://reviewboard.mozilla.org/r/224908/#review235958

Looks great!

::: layout/painting/RetainedDisplayListBuilder.cpp:114
(Diff revision 2)
> -  bool modified = false;
> +  static const uint32_t kMaxEdgeRatio = 5;
> +  bool initializeDAG = !aList->mDAG.Length();
> +  if (!initializeDAG &&
> +      aList->mDAG.mDirectPredecessorList.Length() >
> +      (aList->mDAG.mNodesInfo.Length() * kMaxEdgeRatio)) {
> +    return false;

Very pragmatic! I like it. Maybe add a comment:

The DAG merging algorithm does not have strong mechanisms in place to keep the complexity of the resulting DAG under control. In some cases we can build up edges very quickly. Detect those cases and force a full display list build if we hit them.

::: layout/painting/RetainedDisplayListBuilder.cpp
(Diff revision 2)
> -static bool
> -MergeFrameRects(nsDisplayLayerEventRegions* aOldItem,
> -                nsDisplayLayerEventRegions* aNewItem,
> -                nsDisplayLayerEventRegions::FrameRects nsDisplayLayerEventRegions::*aRectList,
> -                nsTArray<nsIFrame*>& aAddedFrames)

Is this removal something we could have done before, and now is just a convenient time to do it?

::: layout/painting/RetainedDisplayListBuilder.cpp:281
(Diff revision 2)
> +    }
> +    mResultIsModified = true;
> +    return AddNewNode(aNewItem, Nothing(), Span<MergedListIndex>(), aPreviousItem);
> +  }
> +
> +  bool HasSamePredecessors(OldListIndex aNode,

Maybe rename to HasSameSingleDirectPredecessor? Since you're returning false if there's more than one, even if they could still be the same.

::: layout/painting/RetainedDisplayListBuilder.cpp:371
(Diff revision 2)
> -            }
> +      }
> -            UpdateASR(newItem, containerASRForChildren);
> -            newItem->UpdateBounds(&mBuilder);
> +      if (item->GetType() == DisplayItemType::TYPE_SUBDOCUMENT) {
> +        mBuilder->IncrementSubDocPresShellPaintCount(item);
> +      }
> +      item->SetReused(true);
> +      mOldItems[aNode.val].AddedToMergedList(AddNewNode(item, Some(aNode), aDirectPredecessors, Nothing()));

This line could use a line break after AddedToMergedList(

::: layout/painting/RetainedDisplayListBuilder.cpp:378
(Diff revision 2)
> +      if (mOldItems[directPredecessor.val].IsUsed()) {
> +        continue;

There should be a comment here, I think.

::: layout/painting/RetainedDisplayListBuilder.cpp:436
(Diff revision 2)
> + * Display lists are interpreted as DAGs (with a maximum of one ancestor and one
> + * descendant per node). We add the two DAGs together, and then output the topological

Could we use DAG language here? I think "direct predecessor" and "direct successor" would be more accurate. And please clarify that this is only the case for aNewList.

This comment should probably also mention that aOldList and aOutList have stored DAGs inside them, and we make use of them during the merging.

::: layout/painting/RetainedDisplayListHelpers.h:55
(Diff revision 2)
> +
> +  DisplayItemKey mKey;
> +};
> +
> +template <typename T>
> +bool SpanContains(mozilla::Span<const T>& aSpan, T aItem) {

I think the mozilla style guide calls for a line break before this opening brace, and before almost all the other opening braces in this file.

::: layout/painting/nsDisplayList.h:3353
(Diff revision 2)
> +  {
> +    AppendToTop(&aOther);
> +    mDAG = mozilla::Move(aOther.mDAG);
> +    mKeyLookup.SwapElements(aOther.mKeyLookup);
> +  }
> +  RetainedDisplayList(nsDisplayList* aList);

When is this constructor used? And it should be marked explicit.
Attachment #8955960 - Flags: review?(mstange) → review+
Comment on attachment 8955959 [details]
Bug 1443027 - Add two new tests for merging behaviour.

https://reviewboard.mozilla.org/r/224906/#review235964

::: layout/reftests/display-list/1443027-1.html:4
(Diff revision 2)
> +<!DOCTYPE html>
> +<html lang="en" class="reftest-wait"><head>
> +<meta http-equiv="content-type" content="text/html; charset=UTF-8"><meta charset="utf-8">
> +<title>The new algorithm gets the wrong result here</title>

Should probably update the title here to be different and mention the bug number.

::: layout/reftests/display-list/1443027-1.html:70
(Diff revision 2)
> +
> +window.addEventListener("MozReftestInvalidate", step1);
> +
> +function step1() {
> +  C.style.visibility = "visible";
> +  C.style.transform = "translatez(1px)";

Could you just set this transform with a central CSS rule on all divs unconditionally? Or does it need to be in sync with the visibility property for some reason?
Attachment #8955959 - Flags: review?(mstange) → review+
Comment on attachment 8955960 [details]
Bug 1443027 - Fix the merging algorithm to pass the new tests correctly.

https://reviewboard.mozilla.org/r/224908/#review235968

::: layout/painting/RetainedDisplayListBuilder.cpp:383
(Diff revision 2)
> +      if (mOldItems[directPredecessor.val].IsUsed()) {
> +        continue;
> -          }
> +      }
>  
> -          if (destroy) {
> -            oldItem->Destroy(&mBuilder);
> +      AutoTArray<MergedListIndex, 2> thatNodesDirectPredecessors =
> +        ProcessPredecessorsOfOldNode(directPredecessor);

I'm still concerned that this recursion is going to cause stack overflows on long display lists. I think we need to make this a non-recursive loop.
(In reply to Markus Stange [:mstange] from comment #19)

> ::: layout/reftests/display-list/1443027-1.html:70
> (Diff revision 2)
> > +
> > +window.addEventListener("MozReftestInvalidate", step1);
> > +
> > +function step1() {
> > +  C.style.visibility = "visible";
> > +  C.style.transform = "translatez(1px)";
> 
> Could you just set this transform with a central CSS rule on all divs
> unconditionally? Or does it need to be in sync with the visibility property
> for some reason?

It does unfortunately! We create an empty transform display item even when visibility is hidden, which then makes the merging easy.

<canvas> doesn't work either, since we toggle the layer activity state on a timer, and we get random failures.
Fixed review comments, including switching ProcessPredecessorsOfOldNode to use a loop.

There were three failing reftests with this, all Windows only.

* layout/reftests/position-dynamic-changes/relative/move-right-bottom.html
* layout/reftests/position-dynamic-changes/relative/move-top-left.html

In these two, the text 'surrounding' is supposed to be behind the animated 'relative position' text. This would normally mean it gets drawn into the background layer, and gets subpixel-AA. The animated text, and the word 'text' are in two separate layers, getting component alpha or no subpixel-AA.

Since there are no intersections, when we do a partial update, we lose the relative ordering, and we end up with a slightly different order, and 'surrounding' ends up in the same layer as 'text'.

I don't think there's much we can do about this, and if anything it looks better in this case, since now all the text looks the same.


* layout/reftests/w3c-css/submitted/ruby/ruby-inlinize-blocks-002.html

This looks to be the same underlying bug as 748228, where the overflow area for the text frame doesn't include a pixel that we drew to with subpixel-AA. Retained-dl thinks there are no intersections between the text and table cell border (since that's what the overflow areas say), and decides that it's ok to put the letter 'a' above the border in the resulting display list. Drawing the 'a' touches a single pixel of the border, and darkens it slightly.
I tried text-rendering: optimizeLegibility here and it didn't help, but padding the overflow area of the text frame did (and broke other tests).

I've fuzzed all of these for now.
Blocks: 748228
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/90e564873aa1
Add two new tests for merging behaviour. r=mstange
https://hg.mozilla.org/integration/autoland/rev/d9f154931d6d
Fix the merging algorithm to pass the new tests correctly. r=mstange
https://hg.mozilla.org/mozilla-central/rev/90e564873aa1
https://hg.mozilla.org/mozilla-central/rev/d9f154931d6d
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Depends on: 1448826
I just filed a crash that points to this bug just now: bug 1448829.
Depends on: 1448841
Backed out for causing topcrash bug 1448841.
https://hg.mozilla.org/mozilla-central/rev/7b9da7139d94951431a148dcaf8a388640c91b27
Status: RESOLVED → REOPENED
Flags: needinfo?(matt.woodrow)
Resolution: FIXED → ---
Target Milestone: mozilla61 → ---
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e04979dd66be
Add two new tests for merging behaviour. r=mstange
https://hg.mozilla.org/integration/autoland/rev/5deb310542a9
Fix the merging algorithm to pass the new tests correctly. r=mstange
https://hg.mozilla.org/mozilla-central/rev/e04979dd66be
https://hg.mozilla.org/mozilla-central/rev/5deb310542a9
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Backed out 2 changesets (bug 1443027) for breacking Gmail on OSX r=pascalc a=backout

Requested by pascalc bue to "broke Gmail on OSX"
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Backout by nerli@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/0405f6006f3a
Backed out 2 changesets for breacking Gmail on OSX r=pascalc a=backout
Status: REOPENED → ASSIGNED
Target Milestone: mozilla61 → ---
The HasSameSinglePredecessor optimization ended up having a bug that caused cycles, and crashes.

I've removed it, and added a crashtest (to the first patch) that demonstrates the broken case.
Flags: needinfo?(matt.woodrow)
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d2734042605a
Add two new tests for merging behaviour. r=mstange
https://hg.mozilla.org/integration/autoland/rev/1e3dc6112e76
Fix the merging algorithm to pass the new tests correctly. r=mstange
https://hg.mozilla.org/mozilla-central/rev/d2734042605a
https://hg.mozilla.org/mozilla-central/rev/1e3dc6112e76
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Backed out for frequent crashes on OS X:

https://hg.mozilla.org/mozilla-central/rev/a1fb8ffae378963b128deaaf3a76eff9dbb6be21

See bug 1448841 comment 22 etc. for details.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla61 → ---
The crash here is the same as the old crashes, related to overlay scrollbars. Bug 1450360 moved some code around related to that, and the merge to mozilla-central was busted and lost a critical chunk.

The broken code results in undefined behaviour without this patches, and crashes with, so backing this out made things somewhat better.

I've relanded the missing bit of code, and will re-land this now.
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/373a7a3e5c8f
Add two new tests for merging behaviour. r=mstange
https://hg.mozilla.org/integration/autoland/rev/99de9f5450d8
Fix the merging algorithm to pass the new tests correctly. r=mstange
https://hg.mozilla.org/mozilla-central/rev/373a7a3e5c8f
https://hg.mozilla.org/mozilla-central/rev/99de9f5450d8
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Depends on: 1451971
Depends on: 1452225
Depends on: 1452805
Flags: needinfo?(matt.woodrow)
Depends on: 1453185
No longer depends on: 1453185
Depends on: 1454355
Per bug 1454509, this isn't being targeted for uplift to 60 anymore.
No longer depends on: 1454355
Depends on: 1461231
Depends on: 1464288
Depends on: 1509581
Depends on: 1522455
No longer depends on: 1522455
Regressions: 1522455
Regressions: 1551061
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: