Terrible scrolling performance on Canvas LMS homepage
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox97 | --- | unaffected |
firefox98 | --- | unaffected |
firefox99 | --- | verified |
People
(Reporter: noszalyaron4, Assigned: gw)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(2 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0
Steps to reproduce:
- Using latest nightly on Linux with proprietary NVidia drivers
- Visiting a Canvas LMS instance's homepage
- Scroll
Actual results:
Scrolling was very laggy.
Here's a profiling: https://share.firefox.dev/34M402a
Expected results:
No laggy scrolling on such a "simple" site.
Comment 1•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Canvas: 2D' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Reporter | ||
Comment 2•3 years ago
|
||
I've reduced the issue to the little boxes of lectures and the transparent background of the "Dashboard" text ("Vezérlőpult" in hungarian), it seems their blending makes the rendering long. Though its most definitely not a minimal working example.
Reporter | ||
Comment 3•3 years ago
|
||
I have bisected it with mozregression-gui and found that this commit caused the regression:
2022-02-10T10:19:34.507000: INFO : Narrowed integration regression window from [8e9c1ffd, 41535e3d] (3 builds) to [8e9c1ffd, b7f88e5c] (2 builds) (~1 steps left)
2022-02-10T10:19:34.514000: DEBUG : Starting merge handling...
2022-02-10T10:19:34.514000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=b7f88e5c537bb7d64ba9d4cb58b052650d510638&full=1
2022-02-10T10:19:34.515000: DEBUG : redo: attempt 1/3
2022-02-10T10:19:34.515000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=b7f88e5c537bb7d64ba9d4cb58b052650d510638&full=1',), kwargs: {}, attempt #1
2022-02-10T10:19:34.516000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2022-02-10T10:19:35.787000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=b7f88e5c537bb7d64ba9d4cb58b052650d510638&full=1 HTTP/1.1" 200 None
2022-02-10T10:19:35.822000: DEBUG : Found commit message:
Bug 1749380 - Improve how WR handles bounding rects for off-screen surfaces r=gfx-reviewers,kvark
This patch introduces a number of subtle but important changes
to how we deal with off-screen surfaces. The overall goals are:
- Improve rendering correctness in a number of edge cases.
- Begin reducing complexity related to surfaces, scaling
factors, surface size adjustments and clipping.
- Improve CPU performance by removing some per-primitive work.
- Simplify implementation of future SVG and CSS filters by
having explicit support for picture rects + inflation regions.
- Lay the groundwork for caching child picture surfaces,
reduction of per-primitive work during visibility pass,
simplifying picture code.
Unfortunately, the nature of the changes make it impossible to
split up in to small isolated patches. Details below:
* Introduce `LocalRectKind` concept. This allows us to separate
out the bounding rect of the surface (a group of primitives
backed by a texture) from the bounding rect of the picture
compositing that surface (e.g. a drop-shadow which draws the
surface once at the local origin and once at a specific offset
+ blur-radius). This fixes a number of correctness bugs we have
related to culling, clipping, invalidation regions of complex
primitives such as drop-shadows and blur filters. Importantly,
it makes it simpler to implement (or fix) SVG filter chains,
backdrop-filter implementations.
* Establish raster roots for all off-screen surfaces. Every off-screen
surface uses the spatial node of the enclosing stacking context as
a coordinate system root, ensuring that each off-screen surface is
drawn in a 2D coordinate system, with appropriate scaling factors
applied to ensure high quality rendering. The primary goal is to make
it possible to correctly inflate and clip off-screen surfaces, removing
some correctness issues we currently have with complex filters interacting
with transforms. The initial work here doesn't reduce complexity a huge
amount, but will allow us to simplify large parts of the picture/surface
handling code in future, as well as simplify a number of shaders that
currently must handle arbitrarily complex transform matrices. This will
also allow us to simplify the implementation of features such as
mix-blend-mode and backdrop-filter, which rely on readback and UV mapping
from the parent surface.
* Remove concepts of `estimated` and `precise` local rects for pictures. This
is both a performance optimization and a code simplification. Instead, we
only determine the estimated local rect during bounding rect propagation,
and rely on the clipping regions from the tile dirty regions to reduce which
parts of the picture we allocate if drawing to an off-screen surface. This
removes some per-primitive work during the visibility pass, and also means
we can rely on the final picture bounding rect from the start of the visibility
pass. This also removes much of the complexity in `take_context` where we
previously determined surface scale factors and device pixel ratio - instead
these can be determined earlier during `propagate_bounding_rects`.
* Remove some complexity in `update_prim_visibility`. This is still recursive,
but follow up patches will aim to remove this recursion and integrate this
pass with the picture graph (similar to how `propagate_bounding_rects` works).
* Remove `PictureOptions` struct. Instead, store `inflate_if_required` with
the Blur filter enum, which is the only place that uses it.
* Remove `root_scaling_factor` from text runs - this is handled implicitly
by the surface device-pixel scale.
* Skip calling `update_clip_task` for pass-through pictures (since they have
no defined local rect).
* Improve scaling factors used for determining the render task cache size for
complex line decorations.
Differential Revision: https://phabricator.services.mozilla.com/D137569
Reporter | ||
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
There's definitely a bug related to that patch, which I'm working on this week. However, the profile shows the time spent is in software webrender, which seems unexpected. Could you post the contents of your about:support
?
Reporter | ||
Comment 5•3 years ago
|
||
Reporter | ||
Comment 6•3 years ago
|
||
OMG, enabling it in about:config fixes the scrolling on canvas entirely and brought back 144hz scrolling which I didn't remember losing but now I can't believe I could again :D
Here's the profiling now, it seems to coincide with my perception that everything seems normal: https://share.firefox.dev/3Jq86w1
Comment 7•3 years ago
|
||
Seems like this is mostly a swgl perf regression from bug 1749380 in that case, which hopefully Glenn's follow up work will address.
But there's still the question of why you were using software webrender rather than regular webrender.
@noszalyaron4 could you please try running mozregression again, and this time checking whether whether it says Webrender
(= good) or Webrender (software)
(= bad).
Thanks!
Comment 8•3 years ago
|
||
Set release status flags based on info from the regressing bug 1749380
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Minimal test case:
<!DOCTYPE html>
<html>
<head>
<style type="text/css">
.ic-DashboardCard {
border-radius: 4px;
overflow: hidden;
width: 262px;
}
</style>
</head>
<body>
<div class="ic-DashboardCard">
<svg viewBox="0 0 1920 1920" style="width: 1em; height: 1em;" width="1em" height="1em">
<g>
<path d="M677.647 16v338.936h112.941V129.054h1016.47V919.53h-225.994v259.765L1321.412 919.53h-79.172V467.878H0v1016.47h338.71v418.9l417.996-418.9h485.534v-451.877h32.753l419.125 419.124v-419.124H1920V16H677.647zM338.79 919.563h564.706v-112.94H338.79v112.94zm0 225.883h338.936v-113.054H338.79v113.054zM112.94 580.706h1016.47v790.701H710.4L451.652 1631.06v-259.652h-338.71V580.706z">
</path>
</g>
</svg>
</div>
</body>
</html>
In this case, performance isn't terrible since the test case is small. However, WR is still creating a 3072 x 3072 render target here, which seems clearly incorrect.
When running the complete test case above, there are > 10 of those large targets, which will explain the performance issue there.
Assignee | ||
Comment 10•3 years ago
|
||
It seems like the scale factor determination is incorrect - we have a stacking context that is 1920 x 1920 in local coords due to the SVG box, with a 16x16 image. The scale transform of 0.00833 isn't being applied in a way to reduce the surface size.
Assignee | ||
Comment 11•3 years ago
|
||
The problem is due to ensuring that we always draw a surface with a minimum scale of 1.0 [1] which can cause very large surfaces in cases like this, where it's not necessary. I'll do some try runs without that min_scale and we'll find a better way to handle the cases that break without this.
[1] https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/picture.rs#5550
Reporter | ||
Comment 12•3 years ago
|
||
@jnicol:
Yeah, I've found this commit which reasonably explains why sw webrender was used :D Guess I will update my driver.
2022-02-15T10:20:27.171000: INFO : Narrowed integration regression window from [61fba772, e947de18] (3 builds) to [a85a8dbb, e947de18] (2 builds) (~1 steps left)
2022-02-15T10:20:27.175000: DEBUG : Starting merge handling...
2022-02-15T10:20:27.175000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=e947de18d5805af4d75a43ff1fc4a561a2fed3b5&full=1
2022-02-15T10:20:27.175000: DEBUG : redo: attempt 1/3
2022-02-15T10:20:27.175000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=e947de18d5805af4d75a43ff1fc4a561a2fed3b5&full=1',), kwargs: {}, attempt #1
2022-02-15T10:20:27.176000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2022-02-15T10:20:28.380000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=e947de18d5805af4d75a43ff1fc4a561a2fed3b5&full=1 HTTP/1.1" 200 None
2022-02-15T10:20:28.399000: DEBUG : Found commit message:
Bug 1742994 - Bump required nvidia driver version for HW-WR to 470.82 (i.e. always use EGL), r=aosmond
We currently require `460.32.03`, however the `460` driver series got
succeeded by `470` LTS one, which supports the same hardware.
Thus all users can (and should) already have updated to `470.82`.
Distros like Ubuntu appear to ship the latest drivers as security
updates to all supported versions.
Bumping the requirement to `470.82` has the great advantage of being
in sync with the required version for EGL, i.e. all Nvidia users
with HW-WR will use EGL by default (minus bugs like bug 1739611).
This solves a couple of Nvidia+GLX related bugs at the price of
downgrading some users with outdated/unsupported drivers to SW-WR.
For these users, SW-WR likely gives a more stable experience anyway.
If they want the performance benefits of HW-WR, all they need to do
is to update their drivers (again, something they should do anyway
for security reasons).
Differential Revision: https://phabricator.services.mozilla.com/D132162
Reporter | ||
Comment 13•3 years ago
|
||
Oh now that I look at it isn't 470.103.01>470.82.0.0? Though I'm not an expert at versions.
Comment 14•3 years ago
|
||
Thanks @noszalyaron4! I agree, I would have thought that 470.103 is more recent than 470.82. Perhaps a bug in our blocklisting logic. I'll file a new issue for that, to keep it separate from this bug which Glenn is using for the poor performance when using SWGL.
In the meantime if you keep gfx.webrender.all
as true in about:config you should continue to get hardware acceleration.
Updated•3 years ago
|
Assignee | ||
Comment 15•3 years ago
|
||
Fixed by backout. I will be re-landing this patch series next week, with changes that fix this issue. Please re-open if it occurs again.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 16•3 years ago
|
||
I managed to reproduce this issue on Linux x86_64(Ubuntu 20.04) on Nightly 99.0a1(20220209095640). Confirmed as fixed on Firefox 99.0b6(20220320185956) and Nightly 100.0a1(20220321214243) on the Linux x86_64(Ubuntu 20.04), macOS 11 and Win10 64-bit.
Description
•