[Bug]: Weird rendering on android 5 (Adreno 4xx)
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: kbrosnan, Assigned: jnicol)
References
(Blocks 1 open bug)
Details
Crash Data
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
Details |
From github: https://github.com/mozilla-mobile/fenix/issues/20323.
Steps to reproduce
Open a page in fexin on android 5
Expected behaviour
All things should render smoothly
Actual behaviour
There are some weird render result such as square and triangles at the edge of round elements
Device name
Sony Xperia
Android version
Android 5
Firefox release type
Firefox
Firefox version
89.1.1
Device logs
No response
Additional information
No response
Change performed by the Move to Bugzilla add-on.
Reporter | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
Pretty sure this is the same bug - looks the same, and same GPU and android version
Assignee | ||
Comment 2•3 years ago
|
||
In bug 1714227 I fixed one of (at least) 2 issues causing the rendering artefacts. I'll re-open this bug to use it for the 2nd, which is can be seen in the first screenshot on the github issue linked to this bug. The blue and black diagonal lines across the screen shown instead of some glyphs.
It appears to be due to corrupted vertex data. I am unsure why, but the driver appears not to like our current method of orphaning (calling glBufferData) the instance data buffer prior to every draw call. This means we are repeatedly asking the driver to allocate new buffers of varying sizes, which for some reason leads to this corruption. I have noticed that by always using a fixed-size buffer the issue goes away. However, that would mean either always allocating very large buffers despite most draw calls requiring small amounts of data, or having to break batches in to much smaller calls so that the data can fit in the fixed size, neither of which are ideal.
Instead we should always allocate a large fixed size buffer, and where possible re-use the same buffer for as many draw calls as can fit in it. We have investigated doing this before for performance reasons, as many drivers struggle with the overheads of allocating and managing many small buffers (rather than actually transfering data in to said buffers). We thought of using a "base instance", or SSBOs, but due to lack of platform support we did not do this. However, what we missed at the time was that we can simply rebind the the buffer at a specified offset prior to each draw call, by using glVertexAttribPointer
's offset parameter.
(As a follow up, we can even use the newer set of functions glVertexAttribFormat
, glBindVertexBuffer
, etc where supported, potentially saving the driver some work.)
The simple solution would be to always use the same VBO ID, and re-allocate the buffer when full by orphaning it. However this still results in glitches albeit different looking ones, presumably due to the same driver bug when recycling orphaned buffers. Instead we must handle the recycling ourselves by generating new VBO IDs when required, and only reusing an existing allocated buffer when it is no longer in use (eg by waiting 3 frames).
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
Currently each VAO owns its own VBO for per-instance array data. Prior
to submitting each draw call that instance buffer is orphaned and a
new one allocated at exactly the right size to contain the batch's
data. This frequent reallocation appears to cause a rendering glitch
on some Adreno 4xx devices, likely due to a bug in the driver's
internal allocation/recycling code. It can also be expensive on some
devices, as the drivers struggle with the overheads of allocating many
small buffers much more than the cost of actually transferring the
data.
This patch makes it so that the VAOs no longer own an instance buffer
each. Instead, they share a pool of buffers. This pool always
allocates buffers large enough to hold the largest batch we
allow. update_vao_instances() now writes data to the pool's current
buffer until that buffer is full, and then either an old one will be
recycled or a new one allocated. After writing to the buffer it must
be bound to the VAO at the correct offset, and then the draw call is
submitted.
By manually managing the buffer lifecycles ourselves we avoid the
glitches on Adreno 4xx, and also improve performance on some devices.
Updated•3 years ago
|
Comment 5•3 years ago
|
||
bugherder |
Comment 6•3 years ago
|
||
Backed out changeset 9c24c2012186 (bug 1720169) for causing webrender crashes.
Backout link: https://hg.mozilla.org/integration/autoland/rev/a6fe7c532b7230ec8cb09bdfbfb4ca6fd6862463
Comment 7•3 years ago
|
||
The change got backed out for crashes with [@ webrender::renderer::Renderer::draw_instanced_batch<T> ]
(so far only reported for Windows 10), e.g. bp-3b93b929-ee4a-4a39-bd4d-af0b10210730. The backout will be part of the next merge to mozilla-central and the next Nightly.
Assignee | ||
Comment 9•3 years ago
|
||
The stacktrace doesn't point to the exact location of the crash, but it's probably because I didn't add error handling to glMapBufferRange
, so it sometimes returns null on failure and we attempt to write data to null.. Lots of the crash reports have graphicsCriticalErrors which indicate errors allocating textures, which makes this plausible.
We just need to avoid writing data to the ptr if the map fails. (The subsequent draw call will be garbage, but that would have been the case anyway before this patch if the corresponding glBufferData
had failed.)
Comment 10•3 years ago
|
||
Comment 11•3 years ago
|
||
bugherder |
Comment 12•3 years ago
|
||
Backed out changeset b97f1ccfbb1f (Bug 1720169) as requested on irc by jnicol for causing performance regressions (Bug 1724678, Bug 1724216).
https://hg.mozilla.org/integration/autoland/rev/8301e0b76d27e3bab2d0026c15a50594e27f31b1
Comment 13•3 years ago
|
||
uplift |
Backed out from for causing performance regressions (bug 1724678, bug 1724216):
beta https://hg.mozilla.org/releases/mozilla-beta/rev/4da80584e859
Comment 14•3 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/8301e0b76d27
Comment 15•3 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:jnicol, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Assignee | ||
Comment 16•3 years ago
|
||
The patch caused regressions and I haven't yet figured out why. We do want to fix this bug but it only affects a limited number of devices so hasn't been my highest priority.
Updated•3 years ago
|
Description
•