Closed Bug 1640564 Opened 5 years ago Closed 4 years ago

Crash in [@ OOM | large | mozalloc_abort | hashbrown::raw::RawTable<T>::new_uninitialized<T> | webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer ]

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
81 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox77 --- unaffected
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- wontfix
firefox81 --- fixed

People

(Reporter: gsvelto, Assigned: aosmond)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, perf-alert)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-e4802059-f714-46b0-84c6-e76460200523.

Top 9 frames of crashing thread:

0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 xul.dll gkrust_shared::oom_hook::hook toolkit/library/rust/shared/lib.rs:221
3 xul.dll std::alloc::rust_oom ../4fb7144ed159f94491249e86d5bbd033b5d60550//src/libstd/alloc.rs:240
4 xul.dll alloc::alloc::handle_alloc_error ../4fb7144ed159f94491249e86d5bbd033b5d60550//src/liballoc/alloc.rs:268
5 xul.dll static hashbrown::raw::RawTable< /builds/worker/workspace/obj-build/toolkit/library/build/C:/Users/VssAdministrator/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.6.2/src/raw/mod.rs:393
6 xul.dll webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer gfx/webrender_bindings/src/moz2d_renderer.rs:729
7 xul.dll webrender_api::resources::ApiResources::update gfx/wr/webrender_api/src/resources.rs:159
8  @0x49b86bea57 

These are new OOM crashes which started with the introduction of hashbrown. The crashes aren't strange - these machines were running out of memory on their own - however it might be worth keeping an eye on them. Some of the OOM allocation sizes are several megabytes in size, in the case of this crash over 50 MiB. I understand hashbrown uses open addressing and stores elements inline, so it's normal that it will try and allocate large arrays for tables with a lot of entries, but still several megabytes sounds like the keys & entries are either very large or we're storing a ton of them.

Blocks: wr-stability
Severity: -- → S4
Priority: -- → P5
Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | hashbrown::raw::RawTable<T>::new_uninitialized<T>] → [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | hashbrown::raw::RawTable<T>::new_uninitialized<T>] [@ OOM | large | mozalloc_abort | hashbrown::raw::RawTable<T>::new_uninitialized<T> | webren…
Summary: Crash in [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | hashbrown::raw::RawTable<T>::new_uninitialized<T>] → Crash in [@ OOM | large | mozalloc_abort | hashbrown::raw::RawTable<T>::new_uninitialized<T> | webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer ]

Jeff - do you think there is anything we should look at here?

Flags: needinfo?(jmuizelaar)

I don't think there's much we can do.

Flags: needinfo?(jmuizelaar)
Blocks: wr-oom
Crash Signature: webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer] → webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer] [@ OOM | large | mozalloc_abort | webrender_api::resources::ApiResources::update]

I think we can do better. We are asking for huge allocations sometimes, >10MB, for what is likely a blob update for a fraction of them. We don't need to clone the entire table, just the entries we need.

However, the entries themselves are relatively small. A rect, size and Arc. The table can only have a large footprint if there is a huge number of entries.

From my own testing, it seems easy to bloat the number of entries:

E.g. To go https://www.cbc.ca/news and scroll down to the bottom of the page and back up. If you add a print to watch the number of entries, you can see it is only ever updating a handful, but it can grow into the thousands quite quickly with only a few scrolling repetitions.

That seems more like a bug than something we should paper over without understand more.

Assignee: nobody → aosmond
Blocks: wr-81
Severity: S4 → S3
Priority: P5 → P3

https://searchfox.org/mozilla-central/rev/d54210d490ef335b13fc1fcac817525120c8c46b/gfx/wr/webrender_api/src/resources.rs#108

We probably should be checking self.blob_image_handler and deleting the cached blob commands here. It appears nothing ever removes it, unless we clear the entire map by closing the tab, etc.

The patch to fix the leak is nominally trivial (notwithstanding any issues we may run into once we start cleaning up data that might have complicated life expectancies).

I've noticed here:

https://searchfox.org/mozilla-central/rev/d54210d490ef335b13fc1fcac817525120c8c46b/gfx/wr/webrender_api/src/resources.rs#230

... that we often have duplicate keys in the array. I'm investigating this as well now.

Never mind, we get bailed out in the tile calculations.

Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/54d73028c01f Delete the blob commands when we delete the blob image template. r=nical
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 81 Branch

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

== Change summary for alert #26799 (as of Wed, 26 Aug 2020 04:09:55 GMT) ==

Improvements:

35% Heap Unclassified windows10-64-shippable-qr opt 94,848,234.77 -> 61,557,475.02
34% Heap Unclassified windows10-64-shippable-qr opt 94,131,926.94 -> 61,727,578.04
10% Explicit Memory windows10-64-shippable-qr opt 401,881,782.15 -> 360,629,770.98
6% Resident Memory windows10-64-shippable-qr opt 606,783,131.93 -> 567,448,454.36
6% Heap Unclassified linux1804-64-shippable-qr opt 312,788,771.65 -> 293,997,667.17
5% Heap Unclassified windows10-64-shippable-qr opt tp6 76,177,792.89 -> 72,087,993.19
4% Explicit Memory linux1804-64-shippable-qr opt 620,928,900.77 -> 598,200,568.41

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=26799

Keywords: perf-alert

Wowza, those are some big improvements. Also, are we missing some memory reporters here given the massive reduction in heap unclassified?

Flags: needinfo?(aosmond)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #14)

Wowza, those are some big improvements. Also, are we missing some memory reporters here given the massive reduction in heap unclassified?

It wasn't a leak in so far as would never free the memory, but if you stay on a page, it will accumulate over time as you interact with it. So I'm not too surprised to see changes.

Flags: needinfo?(aosmond)

That didn't answer my question about missing memory reporters. Changes of that magnitude in heap unclassified would suggest to me that we're missing something here.

Flags: needinfo?(aosmond)

Sorry, bug 1655039 is supposed to address blob images in general in the memory report. It is probably true if we had that completed, it would have identified this issue. In this case, it was a structure growing that should never have gotten that big. I will make a note on that bug to make sure the blob commands hash map is included in this effort.

Flags: needinfo?(aosmond)
Blocks: 1654013
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: