Closed Bug 965593 Opened 11 years ago Closed 11 years ago

[B2G][Wikipedia] Vertical scrolling gets stuck on field areas that include vertical and horizontal scrolling

Categories

(Core :: Panning and Zooming, defect)

28 Branch
ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.2 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: nkhristoforov, Assigned: tnikkel)

References

()

Details

(Keywords: regression, Whiteboard: dogfood1.3)

Attachments

(3 files, 1 obsolete file)

Attached file Logcat (deleted) —
When the user attempts to scroll vertically on a field that has vertical and horizontal scrolling, the vertical scrolling gets stuck for a limited time. The user can still click on links or the buttons on the overlay and horizontal scrolling still works.

Repro Steps:
1) Updated Buri to BuildID: 20140127004002
2) Install the Wikipedia app through Marketplace.
3) Open the Wikipedia app.
4) Type "Arsenal F.C." in the search bar and select the suggested item "Arsenal F.C."
5) Scroll down by swiping up across the page.

Actual:
Page gets stuck and does not move vertically for about 7 seconds.

Expected:
Page would continue to scroll down.

Environmental Variables:
Device: Buri 1.3 MOZ
BuildID: 20140127004002
Gaia: 25a45a836a4a21a30f63fa7b544b42e8b781180a
Gecko: c40099a42c1f
Version: 28.0a2
Firmware Version: V1.2-device.cfg

Repro frequency: 100%
See attached: Logcat and Youtube video (http://youtu.be/fxYqAYdH58A)
This bug does not occur on 1.2. The user is able to scroll vertically without it getting stuck.

Device: Buri 1.2 COM
BuildID: 20140127004002
Gaia: 539a25e1887b902b8b25038c547048e691bd97f6
Gecko: d10e1f965d0c
Version: 26.0
RIL Version: 01.02.00.019.102
Firmware Version: v1.2-device.cfg
blocking-b2g: --- → 1.3?
Component: General → Panning and Zooming
Product: Firefox OS → Core
Version: unspecified → 28 Branch
It'd be good to have a regression window, but the "turning APZC off" doesn't really help - we want to see if this ever worked with APZC on, and it's a recent regression, or if it didn't work with APZC on from day one that option was available.
1.3+ for wikipedia
blocking-b2g: 1.3? → 1.3+
Wonder if this is related to axis locking?
Flags: needinfo?(bugmail.mozilla)
Flags: needinfo?(botond)
I don't believe it is related to axis locking. There's a subframe that contains the crest and that other information, and interaction with that is a bit wonky. Not sure why exactly, but I can look into it.
Flags: needinfo?(bugmail.mozilla)
Flags: needinfo?(botond)
I'm looking into this. Interestingly this behaviour doesn't show up in the browser on the wikipedia page for Arsenal F.C.. It also doesn't look like touch events in the content are interfering. Digging further...
Assignee: nobody → bugmail.mozilla
Status: NEW → ASSIGNED
The structure of the page is that there are three nested scrollframes - one for the entire app, one for the wikipedia page area, and one for that infobox on the arsenal FC page. The innermost scrollframe is not vertically scrollable, so all vertical scrolling there gets handed up a level to the page scrollable. APZC for the page seems to be updating its scroll offset and scheduling composites, but for some reason the composites don't seem to visually move the page. As far as I can tell the async transform being calculated is correct, but the visual position of the page only updates on repaints (i.e. every gPanRepaintInterval milliseconds).

There are two other problems that complicate this bug. One is that we don't support fling handoff (bug 965871) so if you do a fling from the innermost scrollframe it doesn't behave as you would expect. The other problem is that when the touch ends, we usually request a content repaint with the latest metrics [1], but only on the scrollframe actually receiving the events (which in this case is the innermost one). What we really want to do there is request repaints for the entire handoff chain (or whatever subset of the handoff chain was doing scrolls).

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/AsyncPanZoomController.cpp?rev=dd8a9796aa31#730
The regression window is 1-9-2014 to 1-10-2014, the 1-10 build is the first nightly buri v1.3 to have APZ enabled by default.

The first build the issue reproduces on (APZ = ON):
v1.3 Environmental Variables:
Device: Buri v1.3 MOZ
BuildID: 20140110004009
Gaia: c606b129a2d1647c0fc7bfb197555026e9b27f9e
Gecko: c5109884ae3a
Version: 28.0a2
Firmware Version: v1.2-device.cfg

The last build the issue does not reproduce on (APZ = Off):
v1.3 Environmental Variables:
Device: Buri v1.3 MOZ
BuildID: 20140109004002
Gaia: 22bc6be5b76cdc6d4e9667ff070979041a20ce2f
Gecko: 2c8f8683bd0d
Version: 28.0a2
Firmware Version: v1.2-device.cfg
QA Contact: pbylenga
Working on a new window by enabling APZ manually on earlier builds.
I don't think getting a regression window here will be very useful. I'm already chasing down the problem.
Ah, I had started one per offline discussion with botond, will hold off for now on getting one then.  Let me know if the situation changes.  Thanks!
So it looks like the ContainerLayer on which the transform gets applied has no children and an empty visible region (i.e. it's a ScrollInfo layer rather than a full blow scroll layer). This explains why even though we're setting a transform correctly on the ContainerLayer, the content doesn't visibly move.

When I pan on the part of the page outside the infobox however, the layer tree structure changes to be what I expect, where the ThebesLayer is a child of the ContainerLayer with the async transform.

I suspect this has to do with which frame is considered "actively scrolled" in layout - when we are doing scroll handoff the "actively scrolled frame" in layout is the inner one and so we don't layerize the handoff layer as we would otherwise.
Actually the above might have been because of bug 962791. I will retest now that it has been backed out.
It is unrelated to bug 962791; turns out this was happening even before bug 962791 landed.
Attached patch Flush repaints along the handoff chain (obsolete) (deleted) — Splinter Review
This seems to take care of one of the issues I described in comment 7. Visually it doesn't help the main problem which is that the async transform doesn't get applied to the right content. Still figuring that out.
Also, something I tried but that didn't work was to make nsGfxScrollFrame::IsScrollingActive always return true but that didn't seem to do the trick.
I haven't had much time to continue looking into this, but manual backporting of tn's nsLayoutDebugger patches and such I was able to get display list and layer dumps of the page on 1.3.

This is attached, but the output is all interleaved together and hard to read. It helps if you grep-filter the output for "515" which is the child PID - that gives you the displaylist dumps from the child process and you can see the layer tree under PID 139. The layer 0x46cc1800 is the one that I expect to be fully scrollable but is actually a "[not visible]" ContainerLayerComposite which has no children; it's content must be elsewhere.

The display list dumps seem to show a bajillion different ScrollLayer instances; I don't know if that's normal or not.
Assignee: bugmail.mozilla → tnikkel
Can we get ETA to fix this bug? Thank you.
Whiteboard: dogfood1.3 → dogfood1.3, [ETA: 2/14]
We have three nested scroll frames (inside only one document). The outside one doesn't scroll, but contains some browser like chrome (search box, back forward), then there is a scroll frame for the content of the wikipedia article, it lies below the pseudo-chrome. Then there is a scroll frame for the infobox inside the article. The scrollbar on the infobox scroll frame has z-index: int32max like all scrollbars on b2g, so it ends up above everything, including the pseudo chrome. Which means that when we try to flatten the scroll layer items for the scroll frame containing the article we have the scrolled article contents, the non-scrolling browser chrome, and the scrolled scrollbar (from the third (infobox) scrollframe).

Besides this problem this also seems undesirable: we don't want scrollbars for a scrollframe to be on top of content that covers that scrollframe. On the root scrll frame that still makes sense, and won't cause this problem.
Attached patch patch (deleted) — Splinter Review
Attachment #8371383 - Flags: review?(roc)
(In reply to Timothy Nikkel (:tn) from comment #19)
> Besides this problem this also seems undesirable: we don't want scrollbars
> for a scrollframe to be on top of content that covers that scrollframe.

Actually, I think on B2G, we do - they aren't scrollbars remember, but scroll indicators, and they get displayed inside the element they're backing. They only appear when the scroll position changes and it doesn't really make sense for them to be obscured. (for example, a fixed position footer shouldn't obscure the scroll indicator of the page its in)
(In reply to Chris Lord [:cwiiis] from comment #21)
> They only appear when the scroll position changes and it doesn't
> really make sense for them to be obscured.

I disagree with this - if you have a page with an iframe, and the page has other stuff on top of the iframe, then you don't want the scrollbars from the iframe to show up on top of the other stuff.

> (for example, a fixed position
> footer shouldn't obscure the scroll indicator of the page its in)

This I agree with because the fixed-position footer is part of the page that the scroll indicator belongs to. If the fixed-position footer were part of the *containing* document then I would say it should obscure the scroll indicator.
(Incidentally this is exactly the problem in bug 966538)
Comment on attachment 8369498 [details] [diff] [review]
Flush repaints along the handoff chain

We'll probably want this one too.
Attachment #8369498 - Flags: review?(botond)
Comment on attachment 8369498 [details] [diff] [review]
Flush repaints along the handoff chain

Review of attachment 8369498 [details] [diff] [review]:
-----------------------------------------------------------------

General question: do we have a mechanism for making sure that two RequestContentRepaints, one on an inner frame and one on an outer, don't actually cause two paints by Gecko?

::: gfx/layers/ipc/AsyncPanZoomController.cpp
@@ +646,5 @@
> +      // null before calling HandleOverscroll(). This is necessary because
> +      // Destroy(), which nulls out mTreeManager, could be called concurrently.
> +      APZCTreeManager* treeManagerLocal = mTreeManager;
> +      if (treeManagerLocal) {
> +        treeManagerLocal->FlushRepaintsForOverscrollHandoffChain();

This change introduces a reliance here on the overscroll handoff chain being populated whenever we are panning. This _should_ be the case, but if subtle bugs in event handling (perhaps ones we introduce in the future) break this property, we will fail worse by relying on it here.

Perhaps we can add a boolean return value to FlushRepaintsForOverscrollHandoffChain() which is false if the chain was empty, and explicitly RequestContentRepaint here in that case?
Attachment #8369498 - Flags: review?(botond) → review+
Comment on attachment 8369498 [details] [diff] [review]
Flush repaints along the handoff chain

Review of attachment 8369498 [details] [diff] [review]:
-----------------------------------------------------------------

Er, sorry, not plussing yet.
Attachment #8369498 - Flags: review+
(In reply to Botond Ballo [:botond] from comment #25)
> Perhaps we can add a boolean return value to
> FlushRepaintsForOverscrollHandoffChain() which is false if the chain was
> empty, and explicitly RequestContentRepaint here in that case?

Might want to log a warning in such a case, too.
Comment on attachment 8371383 [details] [diff] [review]
patch

Review of attachment 8371383 [details] [diff] [review]:
-----------------------------------------------------------------

Doesn't this mean that if we have an element like this:
  <div style="overflow:auto; position:relative; height:400px">
    <div style="position:absolute; top:200px; width:600px; background:yellow; height:600px"></div>
  </div>
the scrollbars will be covered by the inner DIV?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28)
> Comment on attachment 8371383 [details] [diff] [review]
> patch
> 
> Review of attachment 8371383 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Doesn't this mean that if we have an element like this:
>   <div style="overflow:auto; position:relative; height:400px">
>     <div style="position:absolute; top:200px; width:600px;
> background:yellow; height:600px"></div>
>   </div>
> the scrollbars will be covered by the inner DIV?

No, because we append the scroll bars to the top of the positioned descendants list after we have descended into the scrollframe content.

We actually have the opposite problem in that if we remove the position: relative from the outer div (making it not the abs containing block) then the scroll bars will be drawn overtop the inner yellow div. Which seems wrong to me, but it's an existing problem.
If you give the inner div positive z-index then yes, it will obscure the scrollbars. But then there is the problem if there is z-index auto position item outside this scroll frame on the page. Then we simultaneously want the scrollbars to be above the yellow z-index div, but below the z-index auto (0) item on the page.
Comment on attachment 8371383 [details] [diff] [review]
patch

Review of attachment 8371383 [details] [diff] [review]:
-----------------------------------------------------------------

This matches what we do for Mac in nativescrollbars.css.
Attachment #8371383 - Flags: review?(roc) → review+
(In reply to Chris Lord [:cwiiis] from comment #21)
> (In reply to Timothy Nikkel (:tn) from comment #19)
> > Besides this problem this also seems undesirable: we don't want scrollbars
> > for a scrollframe to be on top of content that covers that scrollframe.
> 
> Actually, I think on B2G, we do - they aren't scrollbars remember, but
> scroll indicators, and they get displayed inside the element they're
> backing. They only appear when the scroll position changes and it doesn't
> really make sense for them to be obscured. (for example, a fixed position
> footer shouldn't obscure the scroll indicator of the page its in)

If the scroll frame is a stacking context then we put the scrollbars on the top of the stacking context and everybody is happy (this is what we do). (Root scroll frames are stacking contexts.) Non-stacking contexts don't have a clear answer however.

We want the scrollbars to be one top of the content they scroll. But if the scrolled content is completely obscured then I think we can agree that we don't want to see scrollbars. If the scrolled content is partially covered then I think this forces us to go behind the content that partially covers the scrolled content, otherwise we'd get inconsistent results. The problem is that scrolled content can interleave with content outside of the scroll frame and there is no ideal place to put the scrollbars. If we put them on top of the topmost scrolled item then they will also be above non-scrolled content from outside the scrollframe, if we put them below the non-scrolled content from outside the scrollframe then they will be below some of the scrolled content.
https://hg.mozilla.org/mozilla-central/rev/477e7d2eb1d7
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
This patch makes the behaviour good enough to consider this bug fixed. I'll move my other patch to a different bug since it is technically a separate issue.
Comment on attachment 8369498 [details] [diff] [review]
Flush repaints along the handoff chain

Moved this to bug 969378.
Attachment #8369498 - Attachment is obsolete: true
For anybody who comes looking, the above changeset is really from bug 969378 but landed with the wrong bug number.
Depends on: 970422
Depends on: 974081
Due to scrolling regression in the SMS app, bug 974081, this bug has to be backed out. There is no low risk solution to this, so asking for 1.4+ blocker instead of 1.3.
Status: RESOLVED → REOPENED
blocking-b2g: 1.3+ → 1.4?
Resolution: FIXED → ---
(In reply to Mason Chang [:mchang] from comment #41)
> Due to scrolling regression in the SMS app, bug 974081, this bug has to be
> backed out. There is no low risk solution to this, so asking for 1.4+
> blocker instead of 1.3.

This is not up for negotiation. Wikipedia is a partner app that must work on 1.3, as it's preinstalled on all devices. This is a blocker for all target markets.
blocking-b2g: 1.4? → 1.3+
(In reply to Mason Chang [:mchang] from comment #41)
> Due to scrolling regression in the SMS app, bug 974081, this bug has to be
> backed out. There is no low risk solution to this, so asking for 1.4+
> blocker instead of 1.3.

Please file a followup also. I don't think a backout is an option here.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Bug 965593 was filed as the inability to scroll on certain parts of certain pages in the wikipedia app. This was fixed by changing the z-index of the scrollbars so that the layer tree would be different and allow scrolling to occur as expected. This also had the side-effect of fixing bug 966538 and causing bug 970422 (which was later fixed safely). As :jsmith pointed out at [1] the fix for bug 965593 did violate landing policy by introducing a UX change and not having ur+.

The patch works by changing the structure of the layer tree, but fixes the wikipedia case at the expense of breaking other (more commonly encountered) cases. As a result, we have regressions in scrolling performance, most notably bug 974081. It also prevents backing out of bug 942854 in order to fix bug 972666, although we might have an alternate solution for that. It is likely that other apps that make heavy use of z-index will also experience scrolling performance regressions. It is to fix these regressions that we are considering backing out the original fix to bug 965593, as it is not clear if there is any other straightforward way to fix these regressions.

Since the fix landed, we have made other changes to the APZ code - bug 969378 (not uplifted, but can be) and bug 968495 - which improve the user-visible symptoms of bug 965593. However the core issue still persists in that sometimes it is possible to get stuck while scrolling through wikipedia pages.

To summarize: if we back this bug out we improve scrolling performance in a number of places but end up with this bug (in a marginally less severe form) again. We are also unlikely to be able to fix it properly for either 1.3 o 1.4. Alternatively, if we keep this patch in the tree, we would have to find other ways to deal with the scrolling perf regressions, and it's not clear that's feasible in the 1.3 timeframe either.

We're not sure what the right course of action is here, so calling on the release drivers to make a decision one way or another.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=970422#c11
Flags: needinfo?(jsmith)
Flags: needinfo?(fabrice)
The major problem here is I don't see not fixing this bug for 1.3 as an option. We have business content requirements that require that core preinstalled partner apps (e.g. Wikipedia) have to work on all shipping devices, so any Gaia/Gecko issues that impact these requirements have to be fixed in order to allow us to ship the FxOS device to each target market. The scrolling performance regressions really suck, but they have been deemed as non-blockers from QC's perspective. We've also been moving in a direction in triage to block only on checkerboarding fixes that we can feasibly take in 1.3 without introducing regression risk, so not all checkerboarding fixes can necessarily be taken into 1.3 if other factors warrant the decision. However, checkerboarding fixes must be resolved in 1.4 timeframe, as that's been deemed as a CS blocker for 1.4.

Based on the the above analysis, I think our best course of action for 1.3 is:

1. Keep this patch in for 1.3
2. Go with alternative solution to solve bug 972666
3. Track the resolution of scrolling performance regressions as 1.4 blockers, as checkerboarding has been deemed a 1.4 CS blocker

Fabrice - Let me know what you think.
Flags: needinfo?(jsmith)
I'm not a driver, merely a user, but checkerboarding looks less important to me than the main scrolling performance.
This is suspected to cause bug 975221. If a backout confirms this, then we'll need to back this out on all branches.
Blocks: 975221
Looks like we've got no choice here now - this caused a CS blocking regression.

checkin-needed to back this out on all branches.
Keywords: checkin-needed
Kats,

Next steps here?
Flags: needinfo?(bugmail.mozilla)
Flags: needinfo?(bugmail.mozilla)
No longer depends on: 972675
Depends on: 976370
Whiteboard: dogfood1.3, [ETA: 2/14] → dogfood1.3, [ETA: 2/28]
tn, we can reland the patch on this bug now that bug 976370 is in, right? I'd like to let it bake on central for a day or before requesting approval for uplift to 1.3, just in case.
Yes, let's land it. Sounds like a good plan.
Note - let's test this on trunk & verify that we don't regress SMS FPS measurements before we try to uplift this to 1.3 again.

I'll ping Mason when this lands on m-c to see if he can check this.
https://hg.mozilla.org/mozilla-central/rev/9700cc023e47
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
(In reply to Jason Smith [:jsmith] from comment #55)
> Note - let's test this on trunk & verify that we don't regress SMS FPS
> measurements before we try to uplift this to 1.3 again.
> 
> I'll ping Mason when this lands on m-c to see if he can check this.

Leaving needinfo on Mason to verify this on trunk to ensure that the SMS regression doesn't appear again.
Flags: needinfo?(mchang)
Just tested on SMS with and without this patch, looks the same.
Flags: needinfo?(mchang)
Err, looks good to go. I don't see an SMS regression.
Please request b2g28 approval when ready for uplift :)
Flags: needinfo?(tnikkel)
Comment on attachment 8371383 [details] [diff] [review]
patch

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): turning on apz for apps
User impact if declined: quite hard to scroll wikipedia articles in the wikipedia app that have an infobox (which is a lot of them)
Testing completed: it's been on 1.3 before and on central, we found some issues, fixed, mitigated, or understood them all.
Risk to taking this patch (and alternatives if risky): moderately low, the fix for bug 976370 means this should have much less impact
String or UUID changes made by this patch: none
Attachment #8371383 - Flags: approval-mozilla-b2g28?
Flags: needinfo?(tnikkel)
Attachment #8371383 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+

Wrong bug number was on the commit msg, so you might have come here looking for bug 969378.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: