Closed Bug 1179735 Opened 9 years ago Closed 8 years ago

[e10s] tp5o_scroll regression on windows compared to non-e10s

Categories

(Core :: Panning and Zooming, defect, P1)

46 Branch
Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
Tracking Status
e10s + ---
firefox47 --- wontfix
firefox48 --- fixed

People

(Reporter: jmaher, Assigned: kats)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [gfx-noted][diagnosis in comment 32])

Attachments

(1 file)

using data from:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=aad95360a002&exclusion_profile=false&filter-searchStr=talos

we can see that we have wins for most platforms on tp5o_scroll, but a huge regression on winxp:
linux32:
:) tp5o_scroll            8.2 +/-  0% (7#)  [  -4.4%]       7.9 +/-  1% (7#)
linux64:
:) tp5o_scroll            8.0 +/-  1% (7#)  [ -11.7%]       7.1 +/-  1% (7#)
win7:
:) tp5o_scroll            3.4 +/-  1% (7#)  [ -15.4%]       2.8 +/-  1% (7#)
winxp:
:( tp5o_scroll            2.6 +/-  1% (7#)  [+494.5%]      15.7 +/-  1% (7#)
win8:
:) tp5o_scroll            2.9 +/-  1% (7#)  [ -13.8%]       2.5 +/-  2% (7#)

osx 10.10 doesn't have any data.

you can see by the graphs that we have just a big difference, not real regression point:
http://graphs.mozilla.org/graph.html#tests=[[323,1,45],[323,1,37]]&sel=none&displayrange=90&datatype=geo
keep in mind that winxp has a few really bizarre points:
WinXP:
:( tresize               11.8 +/-  2% (7#)  [+147.0%]      29.2 +/-  3% (7#)
:( tsvgx                541.6 +/-  1% (7#)  [ +25.5%]     679.6 +/-  1% (7#)
:( tscrollx               2.6 +/- 12% (7#)  [+591.6%]      18.2 +/-  2% (7#)
:( tp5o_scroll            2.6 +/-  1% (7#)  [+494.5%]      15.7 +/-  1% (7#)

all of these are out of the normal range of the other platforms.  Maybe there is something specific to windows xp on e10s.
Mason, could it be that ASAP mode doesn't work well in e10s on XP?

15.7ms on average is closer to 16.7 than to 2.6ms it shows on non e10s...
Flags: needinfo?(mchang)
Summary: 495% winxp regression when comparing e10s to non-e10s → 495% e10s tp5o_scroll regression on winXP compared to non-e10s
I'm fairly confused just from reading the bug. Does this come from some other bug or it's just a snapshot from Release to nightly today?

As for ASAP mode not working well, it shouldn't matter. It should use the same code paths as Linux, so I would expect Linux to show the same behavior. Is the second column in comment 1 the variation? In general, silk shouldn't even be enabled if ASAP mode is working. You can try disabling silk to see if that fixes this case.
Flags: needinfo?(mchang)
this bug is comparing e10s to non-e10s on the same build.  We haven't been running windows xp e10s tests very long, so this is a newer thing.
Assignee: nobody → jmathies
Priority: -- → P1
This isn't XP or even Windows specific, the remaining regression is on Windows and OSX.
Summary: 495% e10s tp5o_scroll regression on winXP compared to non-e10s → [e10s] tp5o_scroll regression on windows compared to non-e10s
Component: Talos → General
Product: Testing → Firefox
Assignee: jmathies → gkrizsanits
I've looked into this, and reproduced the regression locally on OSX. However when tried out scrolling manually on the tp5o pages I did not notice any janks at all or difference compared to non-e10s version. Simulating the test with scratchpad hacking resulted in some noticeable janks however. As it turn out we measure requestAnimationFrame times after each scrolls. And we have a regression in requestAnimationFrame times, which I think it can mean many things, but since the frequency for this callback depends on various things like vsync or time takes to handle the callback function and who knows what else I don't think this is a reliable way to measure scrolling. By the way there are some content process janks I've noticed while running the test mostly from: mozilla::layers::ClientMultiTiledLayerBuffer::Update(mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&)

I think we should find a more reliable way to test scrolling but could not find any yet.
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #8)
> I've looked into this, and reproduced the regression locally on OSX. However
> when tried out scrolling manually on the tp5o pages I did not notice any
> janks at all or difference compared to non-e10s version. Simulating the test
> with scratchpad hacking resulted in some noticeable janks however. As it
> turn out we measure requestAnimationFrame times after each scrolls. And we
> have a regression in requestAnimationFrame times, which I think it can mean
> many things, but since the frequency for this callback depends on various
> things like vsync or time takes to handle the callback function and who
> knows what else I don't think this is a reliable way to measure scrolling.

It's typically impossible to notice regressions of this test (and other animation throughput tests) by observation.

The reason is these test iterate intentionally with unlimited refresh rate (aka ASAP mode), like when you benchmark a game engine with fps above 60 (or whatever the monitor rate it), but you're limited to observing differences only if the rate becomes below 60.

If we limited the animation tests to vsync rate, then most regressions would not be detected since the test systems are typically able to animate well beyond 60/sec, so all your results would have been 16.667ms/iteration on average.

See https://wiki.mozilla.org/Buildbot/Talos/Tests#ASAP_Tests
(In reply to Avi Halachmi (:avih) from comment #9)
> See https://wiki.mozilla.org/Buildbot/Talos/Tests#ASAP_Tests

Right, so these tests disable vsync, how do I do that in a custom build? I want to do some profiling. Is there anything else that can affect the frequency of these callbacks? Like if the execution of the callback took longer in e10s mode let's say, would that affect the results?
Set the pref layout.frame_rate = 0 and restart the browser.

These tests sets also two other prefs for ASAP mode, but they're unlikely to affect your profiling:

    preferences = {'layout.frame_rate': 0,
                   'docshell.event_starvation_delay_hint': 1,
                   'dom.send_after_paint_to_content': False}

From test.py at hg.mozilla.org/mozilla-central/file/tip/testing/talos/talos/test.py
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #10)
> ... Is there anything else that can affect the
> frequency of these callbacks? Like if the execution of the callback took
> longer in e10s mode let's say, would that affect the results?

Sorry, I didn't reply this one earlier.

Whatever affects the iteration throughput would get represented at the result since this is what they essentially test: throughput for animating specific content in a specific way.

So I'd think that yes, longer callback execution should get represented at the numbers as well.
Uploading some profiling I've done locally. The difference on this page is quite big (around 40% regression with e10s) but so far I could not figure out a way to make it better.

e10s: https://cleopatra.io/#report=32320203271036e4006c3bdc54118eb997fe2dc4
non-e10s: https://cleopatra.io/#report=050a71b9a62f060902a910a9de6780f9a7d06488

One thing is weird that with e10s, despite that the main process is not too busy, while scrolling we still don't get ticks too often in the nsRefreshDriver. In ASAP mode I would expect ticks way more often but I might not understand its concept well enough.
OK I think I've spotted it. Is seems like we run the tscroll.js code in the e10s case using the baseline JIT compiler and because of that it runs way slower than in the non-e10s case. This is the code that ticks so it should be taking about the same time to run in both e10s and non-e10s case for a fair comparison. In the non-e10s case it take about 0.5% of a tick that updates in the e10s case it's about 6%. This difference seems to add up.

Bill, I don't know much about JS compilers, do you know why do we use different JIT compilers for frame scripts?
Flags: needinfo?(wmccloskey)
Eh, nevermind, I think I've got lost in the profiler for a second, the difference is more like 3% vs 6% and both uses the baseline jit compiler, so that's not it. But even this lesser difference can be interesting. Anyway, I keep looking.
Flags: needinfo?(wmccloskey)
As it turns out it was APZ. If I set layers.async-pan-zoom.enabled to false, the regression is gone. In fact there seem to be a slight gain.

Note: tscroll.js ensureScroll does a win.screenY call which sends a sync message, that should be eliminated in the e10s case. Also I don't see why we use content.wrappedJSObject for target but content for win, that seem to be some unnecessary xray work in myNow().
APZ seems to be slowing down display lists. Is there a bug for this / would you be interested in taking a look at this?
Flags: needinfo?(bugmail.mozilla)
APZ wasn't enabled on windows nightly at the time this bug was filed. That being said, yes, APZ does slow down things, and bug 1189817 was filed for that. It may be that this regression was caused by something else that has since been fixed, and the only regression remaining is the hit from APZ.

If disabling APZ but leaving e10s enabled gets rid of the regression I would either close this bug or dupe it to bug 1189817.
Flags: needinfo?(bugmail.mozilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #18)
> If disabling APZ but leaving e10s enabled gets rid of the regression I would
> either close this bug or dupe it to bug 1189817.

It does, let's dupe it.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
(In reply to Jim Mathies [:jimm] from comment #20)
> To confirm, this regression was caused by turning apz on and we're accepting
> it? (This is a ~73% regression afaict so I'm a little surprised.)

I really don't think we are. But it seems like a known issue for apz, and I don't see a reason for having multiple bugs open for it. 

That being said I wanted to do a follow up talk on this one since I don't know about the status of apz and e10s. In my mind for e10s apz is supposed to be turned off initially, but I realized this Friday that I was wrong. Anyway, the most important question here regards of this regression is if someone is actively working on this, and what do we know about it. If the display list creation is slower with apz I imagine that causes quite a lot of trouble and not only for scrolling.
Flags: needinfo?(bugmail.mozilla)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #21)
> If the
> display list creation is slower with apz I imagine that causes quite a lot
> of trouble and not only for scrolling.

What kind of trouble do you think it will cause?
Flags: needinfo?(bugmail.mozilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #22)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #21)
> > If the
> > display list creation is slower with apz I imagine that causes quite a lot
> > of trouble and not only for scrolling.
> 
> What kind of trouble do you think it will cause?

I would expect perf regression in other tests too, but I'm not a gfx expert.
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #23)
> I would expect perf regression in other tests too, but I'm not a gfx expert.

There were some regressions in other tests too, sure, but these regressions are generally not user-visible. We need new tests that actually test user-visible behaviour with APZ enabled.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #24)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #23)
> > I would expect perf regression in other tests too, but I'm not a gfx expert.
> 
> There were some regressions in other tests too, sure, but these regressions
> are generally not user-visible. We need new tests that actually test
> user-visible behaviour with APZ enabled.

I'm afraid on a stressed system or an older machine that 70% regression can easily be a user-visible difference. Especially all these regressions combined... If a regression at this rate is acceptable for a test, that test should be thrown away probably.

This specific test is quite artificial in that it turns off vsync and switches in ASAP mode. It's really not what a user would experience normally. But after a thorough investigation I think that it does the right thing and it's a fair measurement for the painting performance during scrolling. And probably in a stressed situation or on a slow platform where we don't artificially throttle stuff, we will notice this regression in real usage too.
Well the point of APZ is to hide, or at least reduce, the effect of main-thread slowdowns to the user while scrolling, because the scrolling is done on the compositor thread. Longer display list build times on the main thread should not block the compositor thread at all, and so should cause zero extra jank while the user is scrolling - and that's what the test is testing for.

The longer display list build times can certainly have other effects like more checkerboarding, or possibly higher latency with script execution, or something like that, but this test isn't measuring any of those - which is why I think we need new tests to check for those specifically.

That being said we did look into the display list build time regression and while we can make some improvements, there will almost certainly be some level of regression that we'll have to accept. The extra time is being spent in the event regions code which is something we need for APZ to work properly. The region operations themselves can be sped up a bit (Jeff did a bit in bug 1247979 recently, and more will happen in bug 1248044) but I'm not sure how much of that will apply specifically to this test.
OK, let's summarize:

1. In e10s tp5o_scroll is ~70% slower (from 3ms on average to 5ms).
2. At least some of it (unknown how much) seems related to screenY queries (test specific).
3. This is likely both required "cost" with APZ, and possibly not noticeable with APZ.
4. But it might still affect scrolls which are not APZ (currently space, page up/down, arrows).

So, the main question which pops to my mind is: can we reduce the penalty meaningfully when APZ scrolls do not kick in?

If we can, then I think we should accept it.

If we cannot limit the penalty to only while scrolling APZ, then what's at stake here is that pages which scroll the page "manually" (via scrollTo, etc which are possibly also using feedback with screenY) are going to regress.

Is that a penalty we're willing to accept?
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #26)
> done on the compositor thread. Longer display list build times on the main
> thread should not block the compositor thread at all, and so should cause
> zero extra jank while the user is scrolling - and that's what the test is
> testing for.

The compositor thread is waiting in most of the time during the test, the content
process is the bottle neck that is doing the paint. It does not cause jank in the
parent process so it does not hang the browser, but the content process can get more
janky, with can also harm user experience.

> 
> The longer display list build times can certainly have other effects like
> more checkerboarding, or possibly higher latency with script execution, or
> something like that, but this test isn't measuring any of those - which is
> why I think we need new tests to check for those specifically.
> 

I might be wrong but it kind of does. It measures how often the refresh driver ticks
which is responsible for a lot of things and not only for animation.

> That being said we did look into the display list build time regression and
> while we can make some improvements, there will almost certainly be some
> level of regression that we'll have to accept.

I'm not the person to make that call, I don't know what is acceptable for apz, I'm just concern that 70% here is a lot. I think this can affect e10s release criteria.

(In reply to Avi Halachmi (:avih) from comment #27)
> 2. At least some of it (unknown how much) seems related to screenY queries
> (test specific).

That part can be ignored for now, if I eliminate the screenY part and some more xrays as well I can win a couple of percentage top most.

> Is that a penalty we're willing to accept?

Who can sign this off? And what data is required to make that decision?
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #28)
> > 
> > The longer display list build times can certainly have other effects like
> > more checkerboarding, or possibly higher latency with script execution, or
> > something like that, but this test isn't measuring any of those - which is
> > why I think we need new tests to check for those specifically.
> > 
> 
> I might be wrong but it kind of does. It measures how often the refresh
> driver ticks
> which is responsible for a lot of things and not only for animation.

Do you have more specific details?

> > That being said we did look into the display list build time regression and
> > while we can make some improvements, there will almost certainly be some
> > level of regression that we'll have to accept.
> 
> I'm not the person to make that call, I don't know what is acceptable for
> apz, I'm just concern that 70% here is a lot. I think this can affect e10s
> release criteria.

Fair enough. It does seem like a large regression on paper, it's not clear to me that it is representative of real-world usecases, but I think it's fair to require more investigation on this.
Had a discussion with Avi on IRC. I'm flagging myself to investigate the regression further.
Flags: needinfo?(bugmail.mozilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #29)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #28)
> > I might be wrong but it kind of does. It measures how often the refresh
> > driver ticks
> > which is responsible for a lot of things and not only for animation.
> 
> Do you have more specific details?

Don't take my word for it I don't know this code well, but based on this comment from Boris, quite a lot:

"The refresh driver drives restyling, reflow, CSS animations, CSS transitions, SMIL animations, and rAF callbacks.  Plus a few minor things like scrolling iirc.
Oh, and invalidation.  Not sure about painting yet, but eventually also painting."

From Bug 854421 Comment 3 (and 4). It's an old comment, probably not up to date, it's probably worth
asking Boris if he thinks that this is serious or not in a real life usage.

> Fair enough. It does seem like a large regression on paper, it's not clear
> to me that it is representative of real-world usecases, but I think it's
> fair to require more investigation on this.

Thanks for looking into this, I will probably spend some more time on it too later on this week
just currently I have to take a few days off because of some flu...
So based on my investigation the extra display list building time is entirely attributable to the larger displayport over which we are doing the display list build. So in that respect it's not abnormal in any way. There's already a few bugs where we came up with the same result.

That being said - yes, this will impact real-world scenarios that don't involve APZ scrolling. For example keyboard scrolling, as Avi mentioned, will be impacted by this. I'm trying to think of some way to mitigate this problem. We might be able to suppress the displayport margins in cases where we know we're driving scrolling from the main thread but then we're going to incur churn from frequently resizing the displayport as well. I can give it a try locally and see how it does in talos. Leaving ni on me for now.

In terms of administrivia I'm not sure if you'd prefer to reopen this bug or if you'd rather I did my investigation on bug 1189817 or some other bug - let me know if you have a strong preference or I'll just pick something.
Ok, I'll just reopen this bug for this issue.
Assignee: gkrizsanits → bugmail.mozilla
Status: RESOLVED → REOPENED
Component: General → Panning and Zooming
Depends on: apz-talos
Flags: needinfo?(bugmail.mozilla)
OS: Unspecified → Windows
Product: Firefox → Core
Resolution: DUPLICATE → ---
Version: unspecified → 46 Branch
from Gabor in the parent meta - 

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #12)
> After some measurements and discussions in bug 1179735 I think this bug
> should be re-triaged. It seems that with APZ Display List creation is
> significantly slower when e10s is on, and that causes serious performance
> regressions. There is a debate about weather the regressing tests are
> measuring the right thing or not, but for example tp5o_scroll is regressing
> with 70% percentage right now (e10s vs not-e10s with APZ on) so at very
> least I want to gather more data to better understand the risk this comes
> with.
apz specific, minus tracking for e10s.
actually we've been plus'ing these, and blocking them on the apz desktop bug instead.
Attached patch Hackity hack (deleted) — Splinter Review
So far this sort of thing is the only possible-viable approach I've found. I considered other things like skipping paints entirely if it's just the scroll position that changed but in practice there's a lot of other things that trigger paints as well (such as animations) so that doesn't work. I tried just reducing the displayport margins and while that helps, the tile-alignment code still plumps it out quite a bit (on OS X anyway). Another idea I had was to somehow cache the display list but that seems immensely complicated.
Keywords: perf
Whiteboard: [gfx-noted]
I'm not sure the above patch is such a great idea either, for a couple of reasons. One is that as mentioned before it'll cause a lot of churn. The second is that after non-APZ scroll methods are done scrolling, there is a final APZ repaint, which makes the displayport "big" again. This is desirable, because subsequent APZ scrolling will want to use the bigger displayport. However it also means that any painting on the main thread driven for non-scrolling reasons (e.g. CSS changes or animated images) will still incur the cost with the big displayport. So the patch is really just fixing the specific scenario exercised in the test and wouldn't help with the more general problem.

I'm going to put a patch up for bug 1240133 which should help a little, and maybe also shrink the default displayport size in the y direction, which will help more. It's a bit of a balancing act because it might result in more checkerboarding, but we have telemetry on that now so we can reduce the displayport incrementally and see how far we get before we hit an increase in checkerboarding.
I filed bug 1251052 to reduce the displayport size using a telemetry experiment, so we can more efficiently gauge the impact on checkerboarding.
Whiteboard: [gfx-noted] → [gfx-noted][diagnosis in comment 32]
Bug 1253860 should fix this by routing this sort of scrolling through APZ as well.
Depends on: 1253860
Blocks: 1255129
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: