Open Bug 1720169 Opened 3 years ago Updated 3 years ago

[Bug]: Weird rendering on android 5 (Adreno 4xx)

Categories

(Core :: Graphics, defect)

Unspecified
Android
defect

Tracking

()

REOPENED

People

(Reporter: kbrosnan, Assigned: jnicol)

References

(Blocks 1 open bug)

Details

Crash Data

Attachments

(1 file)

From github: https://github.com/mozilla-mobile/fenix/issues/20323.

Steps to reproduce

Open a page in fexin on android 5

Expected behaviour

All things should render smoothly

Actual behaviour

There are some weird render result such as square and triangles at the edge of round elements

Device name

Sony Xperia

Android version

Android 5

Firefox release type

Firefox

Firefox version

89.1.1

Device logs

No response

Additional information

No response

Change performed by the Move to Bugzilla add-on.

Pretty sure this is the same bug - looks the same, and same GPU and android version

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE

In bug 1714227 I fixed one of (at least) 2 issues causing the rendering artefacts. I'll re-open this bug to use it for the 2nd, which is can be seen in the first screenshot on the github issue linked to this bug. The blue and black diagonal lines across the screen shown instead of some glyphs.

It appears to be due to corrupted vertex data. I am unsure why, but the driver appears not to like our current method of orphaning (calling glBufferData) the instance data buffer prior to every draw call. This means we are repeatedly asking the driver to allocate new buffers of varying sizes, which for some reason leads to this corruption. I have noticed that by always using a fixed-size buffer the issue goes away. However, that would mean either always allocating very large buffers despite most draw calls requiring small amounts of data, or having to break batches in to much smaller calls so that the data can fit in the fixed size, neither of which are ideal.

Instead we should always allocate a large fixed size buffer, and where possible re-use the same buffer for as many draw calls as can fit in it. We have investigated doing this before for performance reasons, as many drivers struggle with the overheads of allocating and managing many small buffers (rather than actually transfering data in to said buffers). We thought of using a "base instance", or SSBOs, but due to lack of platform support we did not do this. However, what we missed at the time was that we can simply rebind the the buffer at a specified offset prior to each draw call, by using glVertexAttribPointer's offset parameter.

(As a follow up, we can even use the newer set of functions glVertexAttribFormat, glBindVertexBuffer, etc where supported, potentially saving the driver some work.)

The simple solution would be to always use the same VBO ID, and re-allocate the buffer when full by orphaning it. However this still results in glitches albeit different looking ones, presumably due to the same driver bug when recycling orphaned buffers. Instead we must handle the recycling ourselves by generating new VBO IDs when required, and only reusing an existing allocated buffer when it is no longer in use (eg by waiting 3 frames).

Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Blocks: wr-adreno4xx
Summary: [Bug]: Weird rendering on android 5 → [Bug]: Weird rendering on android 5 (Adreno 4xx)

Currently each VAO owns its own VBO for per-instance array data. Prior
to submitting each draw call that instance buffer is orphaned and a
new one allocated at exactly the right size to contain the batch's
data. This frequent reallocation appears to cause a rendering glitch
on some Adreno 4xx devices, likely due to a bug in the driver's
internal allocation/recycling code. It can also be expensive on some
devices, as the drivers struggle with the overheads of allocating many
small buffers much more than the cost of actually transferring the
data.

This patch makes it so that the VAOs no longer own an instance buffer
each. Instead, they share a pool of buffers. This pool always
allocates buffers large enough to hold the largest batch we
allow. update_vao_instances() now writes data to the pool's current
buffer until that buffer is full, and then either an old one will be
recycled or a new one allocated. After writing to the buffer it must
be bound to the VAO at the correct offset, and then the draw call is
submitted.

By manually managing the buffer lifecycles ourselves we avoid the
glitches on Adreno 4xx, and also improve performance on some devices.

Assignee: nobody → jnicol
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9c24c2012186 Pack instance data for multiple draw calls in to large fixed-size VBOs. r=gfx-reviewers,nical
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

Backed out changeset 9c24c2012186 (bug 1720169) for causing webrender crashes.
Backout link: https://hg.mozilla.org/integration/autoland/rev/a6fe7c532b7230ec8cb09bdfbfb4ca6fd6862463

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 92 Branch → ---

The change got backed out for crashes with [@ webrender::renderer::Renderer::draw_instanced_batch<T> ] (so far only reported for Windows 10), e.g. bp-3b93b929-ee4a-4a39-bd4d-af0b10210730. The backout will be part of the next merge to mozilla-central and the next Nightly.

Crash Signature: [@ webrender::renderer::Renderer::draw_instanced_batch<T> ]
Backout by cbrindusan@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/b0f81186f7da Backed out changeset 9c24c2012186 for causing webrender crashes.

The stacktrace doesn't point to the exact location of the crash, but it's probably because I didn't add error handling to glMapBufferRange, so it sometimes returns null on failure and we attempt to write data to null.. Lots of the crash reports have graphicsCriticalErrors which indicate errors allocating textures, which makes this plausible.

We just need to avoid writing data to the ptr if the map fails. (The subsequent draw call will be garbage, but that would have been the case anyway before this patch if the corresponding glBufferData had failed.)

Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b97f1ccfbb1f Pack instance data for multiple draw calls in to large fixed-size VBOs. r=gfx-reviewers,nical
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch
Regressions: 1724216
Regressions: 1724678

Backed out changeset b97f1ccfbb1f (Bug 1720169) as requested on irc by jnicol for causing performance regressions (Bug 1724678, Bug 1724216).
https://hg.mozilla.org/integration/autoland/rev/8301e0b76d27e3bab2d0026c15a50594e27f31b1

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 92 Branch → ---
Regressions: 1724728

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:jnicol, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(jnicol)
Flags: needinfo?(nical.bugzilla)

The patch caused regressions and I haven't yet figured out why. We do want to fix this bug but it only affects a limited number of devices so hasn't been my highest priority.

Flags: needinfo?(jnicol)
Severity: -- → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: