1749298 - .NET 6 Blazor wasm page leaked after reloading

anders.frantz

Reporter

Description

•

3 years ago

Steps to reproduce:

Reload a published .NET 6 compiled Blazor Wasm application page multiple times.
Different test setups:

Windows 10. Our own application with 2 iFrames each hosting 2 Blazor Wasm in Blazor application published with AOT compilation. Crash after 3-4 reloads.
Windows 10. Our own Blazor application published without AOT compilation. Crash after 10-12 reloads
Windows 10. To eliminate possible developer errors I ran the Visual Studio 2022 out-of-the-box self hosted Blazor client template project untouched and published with AOT. Crash after 32 reloads.
Ubuntu LTS, Firefox 95.0.1. 3GB memory. Your tab just crashed after 2 reloads
Same scenario as 1 but on Chrome. No problem and increased memory consumption after 70 reloads

Also reported this issues dotnet/aspnetcore GitHub:https://github.com/dotnet/aspnetcore/issues/39305

Actual results:

Crash with some different log messages:
Scenario 1 above:
The following was logged:
info: Streaming compilation failed. Falling back to ArrayBuffer instantiation. out of memory blazor.webassembly.js:1:36697

err: TypeError: Response.arrayBuffer: Body has already been consumed. blazor.webassembly.js:1:35461
printErr https://localhost/pg2/_framework/blazor.webassembly.js:1
Module https://localhost/pg2/_framework/blazor.webassembly.js:1

err: Uncaught (in promise) TypeError: Response.arrayBuffer: Body has already been consumed.
n https://localhost/pg2/_framework/blazor.webassembly.js:1

Scenario 3:
warn: WebAssembly module validated with warning: failed to allocate executable memory for module

err: Uncaught (in promise) TypeError: Response.arrayBuffer: Body has already been consumed.
n https://localhost:5001/_framework/blazor.webassembly.js:1

Firefox memory in task manager has at this point grown to about 4 GB.
Below is an excerpt from about:memory (Force GC by pressing the 'Minimize memory usage' button makes no noticeable difference):

Explicit Allocations

20,118.30 MB (100.0%) -- explicit
├──20,068.68 MB (99.75%) -- window-objects/top(https://localhost:5001/, id=37)
│ ├──20,053.02 MB (99.68%) -- active/window(https://localhost:5001/)
│ │ ├──20,049.23 MB (99.66%) -- js-realm(https://localhost:5001/)
│ │ │ ├──20,042.39 MB (99.62%) -- classes
│ │ │ │ ├──16,385.64 MB (81.45%) -- class(ArrayBuffer)/objects
│ │ │ │ │ ├──16,384.00 MB (81.44%) ── non-heap/elements/wasm [32]
│ │ │ │ │ └───────1.64 MB (00.01%) ++ (2 tiny)
│ │ │ │ ├───3,638.22 MB (18.08%) -- class(WebAssembly.Instance)/objects
│ │ │ │ │ ├──1,966.21 MB (09.77%) ── malloc-heap/misc [32]
│ │ │ │ │ ├──1,672.00 MB (08.31%) ── non-heap/code/wasm [32]
│ │ │ │ │ └──────0.00 MB (00.00%) ── gc-heap [32]
│ │ │ │ └──────18.53 MB (00.09%) ++ (6 tiny)
│ │ │ └───────6.84 MB (00.03%) ++ (2 tiny)
│ │ └───────3.79 MB (00.02%) ++ (3 tiny)
│ └──────15.66 MB (00.08%) ++ (32 tiny)
└──────49.61 MB (00.25%) ++ (24 tiny)

Other Measurements

134,217,727.94 MB (100.0%) -- address-space
├──131,915,122.37 MB (98.28%) ── free(segments=202)
├────2,282,128.48 MB (01.70%) -- reserved
│ ├──2,097,096.57 MB (01.56%) ── mapped(segments=13)
│ └────185,031.91 MB (00.14%) ── private(segments=1331)
└───────20,477.09 MB (00.02%) ++ commit

5.87 MB (100.0%) -- decommitted
└──5.87 MB (100.0%) ── js-non-window/gc-heap/decommitted-pages

81 (100.0%) -- event-counts
└──81 (100.0%) -- window-objects/top(https://localhost:5001/, id=37)/active/window(https://localhost:5001/)/dom
├──77 (95.06%) ── event-listeners [32]
└───4 (04.94%) ── event-targets

Expected results:

There should be no memory consumption/leakage causing the application to crash after reloading the page.

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

3 years ago

The Bugbug bot thinks this bug should belong to the 'Core::JavaScript: GC' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → JavaScript: GC

Product: Firefox → Core

Nicolas B. Pierron [:nbp]

Comment 2

•

3 years ago

Thanks for opening this bug.

Would you have a static copy of the AOT compilation case which reproduces this issue? Such that we can easily reproduce this issue without setting up .NET / Blazor locally.

Lars, would you have any idea what might be causing this memory retention?

Severity: -- → S3

Flags: needinfo?(lhansen)

Flags: needinfo?(anders.frantz)

Priority: -- → P2

Nicolas B. Pierron [:nbp]

Updated

•

3 years ago

Blocks: wasm-lang

Lars T Hansen [:lth]

Comment 3

•

3 years ago

This is just speculation, but from old I seem to remember that Blazor apps tend to be fairly large / have a lot of code. On our side, we compile each wasm module into one object code blob, while as a security measure SpiderMonkey has a single per-process code memory area that is restricted to 2GB on 64-bit systems. Even if unloading works as it should and there's no actual leak, the code area can become fragmented, and that might prevent new, large, contiguous code memory blobs from being allocated; that could in turn account for the "failed to allocate executable memory" error, which I expect is the key here. But there could also be a plain old leak causing that error.

I agree that having access to the test case would be extremely helpful here. We could arrange for it not to be attached to a public bug, if that's desirable.

Lars T Hansen [:lth]

Updated

•

3 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1749671

anders.frantz

Reporter

Comment 4

•

3 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #2)

Thanks for opening this bug.

Would you have a static copy of the AOT compilation case which reproduces this issue? Such that we can easily reproduce this issue without setting up .NET / Blazor locally.

Lars, would you have any idea what might be causing this memory retention?

The repro zip file was to large to attach. I've shared it here https://www.dropbox.com/s/v31101q3w7gj4b8/Bug_1749298_repro.zip?dl=0. Let me know if not accessible.

Flags: needinfo?(anders.frantz)

Lars T Hansen [:lth]

Comment 5

•

3 years ago

Thanks, I grabbed a copy.

Flags: needinfo?(lhansen)

Lars T Hansen [:lth]

Updated

•

3 years ago

Blocks: wasm-large-web-properties

Lars T Hansen [:lth]

Comment 6

•

3 years ago

Firefox release-debug, mozilla-central from Jan 13. 8-core/16-thread Xeon workstation, 32GB RAM, Fedora 35.
Numbers from htop sorted by size, filtered by "contentproc", numbers in GB. Developer console closed.

Load empty browser with default start tab, wait for stabilization. Virt=2.6, Res=0.18
Load blazor app once, wait for stabilization. Virt=9.0, Res=0.33
Load again, wait. Virt=15.2, Res=0.49
Again, wait. Virt=21.4, Res=0.66
Again, wait. Virt=27.7, Res=0.85
Again, wait. Virt=33.9, Res=1.0
Again, wait. Virt=40.2, Res=1.2
Again, wait. Virt=46.4, Res=1.3
Again, wait. Virt=52.6, Res=1.5
Again, wait. Virt=58.9, Res=1.7
Again, wait. Virt=65.1, Res=1.8
Again, wait. Virt=71.3, Res=2.0
Again, wait. Virt=77.6, Res=2.2
Five fairly rapid reloads (waiting ca 1.5s between each). Virt=106, Res=3.2

At this point, about:memory says:
11GB explicit allocations
8.7GB in ArrayBuffers (this is committed but most of it will be zero and so not resident)
2.0GB in wasm instance-related (code+data)
We have 17 memories (assuming one memory per reload), @ 6GB per memory, 17*6=102
So far it looks like we leak the wasm instance with its compiled code and memory when we reload. Eventually available virtual memory, or the 2GB code memory area, will be exhausted and we'll OOM

Minimize memory usage. Htop numbers drop a little to 106G/3.0G. about:memory says:
(no change from the above)

Close the blazor tab. Htop shows that we drop back down to the baseline within about 2s.

Disable wasm caching (Nightly-only feature), repeat experiment, no difference.

No other wasm features should matter for this. (For grins I disabled tiered compilation but other than resident memory growing more slowly because we only have one code blob, there was no difference.)

Lars T Hansen [:lth]

Comment 7

•

3 years ago

Taking this for now, for further experiments.

Assignee: nobody → lhansen

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Lars T Hansen [:lth]

Comment 8

•

3 years ago

A couple more data points:

on my system, Chromium does indeed not exhibit the leak
i've yet to find other wasm properties that exhibit similar leaks in Firefox, though mostly these are emscripten-based and not .NET-based

At this point we probably have to identify the leaked instances in a running process and try to find out who keeps them alive.

Lars T Hansen [:lth]

Comment 9

•

3 years ago

Problem goes away completely when running with --disable-e10s.

Lars T Hansen [:lth]

Comment 10

•

3 years ago

Jon, any ideas? There's a shutdown crash in release-debug too, "Extra shutdown CC".

Assignee: lhansen → nobody

Status: ASSIGNED → NEW

Flags: needinfo?(jcoppeard)

Lars T Hansen [:lth]

Updated

•

3 years ago

No longer blocks: wasm-lang

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 11

•

3 years ago

I tested locally and can reproduce. I can't immediately see why this is happening though.

anders.frantz

Reporter

Comment 12

•

3 years ago

Hi, is there any progress on this issue? Is there anything we can do in our end to help resolve this issue?

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 13

•

3 years ago

(In reply to anders.frantz from comment #12)
I've spent a while looking into this but not come to any definite conclusions so far.

One thing that is suspicious that that there are about 100 BodyStreamHolder objects in the CC log when this happens. These don't seem to have any incoming edges and are holding alive the old windows that should have been collected. However there are a few other things (LoadedScripts and orphaned FragmentOrElements) that are also holding the windows alive which might also be to blame. I'll continue looking into this tomorrow.

Andrew, have you seen anything like that before? I don't know anything about how these holders work.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 14

•

3 years ago

While there may be some kind of leak here, the cycle collector is not well-equipped for this kind of "hit reload a bunch of times on a tab containing a big web page" scenario, which is what the original report seems to describe.

At least at some point, there was some kind of mechanism Luke added that would cause these large buffers to get dumped on memory pressure, though I'm not finding it exactly right now. Maybe the memory pressure stuff in the large allocation failure callback is supposed to do that?

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 15

•

3 years ago

I can try looking at why we are leaking the page here from the DOM side, but it would also be nice to know if we can get whatever other mechanism working to dump the huge buffers for pages that aren't currently being displayed to work.

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 16

•

3 years ago

(In reply to Andrew McCreight [:mccr8] from comment #15)
Was it this perhaps?

https://searchfox.org/mozilla-central/source/js/src/vm/ArrayBufferObject.cpp#83-136

This only triggers GC though, and the problem here is that these buffers are still live from the GC point of view.

Andrew McCreight [:mccr8]

Comment 17

•

3 years ago

How do I load this test case? I downloaded the file in comment 4, and tried loading wwwroot/index.html but it just says "Loading... An unhandled error has occurred."

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 18

•

3 years ago

(In reply to Andrew McCreight [:mccr8] from comment #17)
I ran 'python3 -m http.server' from the wwwroot directory.

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 19

•

3 years ago

(In reply to Andrew McCreight [:mccr8] from comment #14)

the cycle collector is not well-equipped for this kind of "hit reload a bunch of times on a tab containing a big web page" scenario

We can do this outside the context of this bug, but I'd like to understand why this is.

Andrew McCreight [:mccr8]

Comment 20

•

3 years ago

(In reply to Jon Coppeard (:jonco) from comment #16)

(In reply to Andrew McCreight [:mccr8] from comment #15)
Was it this perhaps?

Looking at some ancient bugs, it seems like I was thinking of bug 936236, which sounds like the same thing. I thought we did something more magical, but that got WONTFIXed (bug 936128).

Andrew McCreight [:mccr8]

Comment 21

•

3 years ago

Minimize memory usage doesn't free the leaked page (which runs sync GC/CC/GC/CC). However, getting a verbose CC log does free the page. (For Jon, it doesn't, which is odd, but I'm going to ignore that for now.) That suggests that the problem is with a cycle collector CanSkip optimization, and not something due to incremental CC.

In the concise log, the document's JS reflector is marked black, and doesn't seem to be held alive via any non-browser root, so maybe the CC optimizations are causing the reflector to get marked black. In the verbose log, the document's JS reflector is marked grey.

Andrew McCreight [:mccr8]

Comment 22

•

3 years ago

It appears to be the "forceGC" part of the AllTraces behavior that avoids the leak. If you change the bool forceGC = line in nsCycleCollector.cpp to be bool forceGC = true; then we don't leak the page.

Blocks: GhostWindows

Summary: Repeatedly reloading of .NET 6 Blazor wasm page in Firefox crash of memory → .NET 6 Blazor wasm page leaked after reloading

Andrew McCreight [:mccr8]

Comment 23

•

3 years ago

I ripped out most or all of the ForgetSkippable stuff, and the document reflector is still getting marked black somehow. Maybe it is getting marked black via exposeToActiveJS() somehow.

Andrew McCreight [:mccr8]

Comment 24

•

3 years ago

I'm still confused about what the difference is between doing a GC via "memory-pressure" "heap-minimize" followed by a CC is from doing a GC within the CC itself. I noticed that the "heap-minimize" GC is a shrinking one, while the forceGC one is not, but sadly making the forceGC into a shrinking GC doesn't somehow make it stop collecting the leaked page.

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 25

•

3 years ago

(In reply to Andrew McCreight [:mccr8] from comment #21)

However, getting a verbose CC log does free the page. (For Jon, it doesn't, which is odd, but I'm going to ignore that for now.)

I rechecked this. I don't see a memory reduction immediately when I save verbose logs, but if I do minimize memory after that then everything is released.

I'm still confused about what the difference is between doing a GC via "memory-pressure" "heap-minimize" followed by a CC is from doing a GC within the CC itself. I noticed that the "heap-minimize" GC is a shrinking one, while the forceGC one is not, but sadly making the forceGC into a shrinking GC doesn't somehow make it stop collecting the leaked page.

I suspect the difference is that if you set forceGC it uses JS::GCReason::SHUTDOWN_CC. The GC treats shutdown reasons specially and does extra cleanup and some things that only make sense during shutdown:

https://searchfox.org/mozilla-central/search?q=symbol:_ZN2js2gc16IsShutdownReasonEN2JS8GCReasonE&redirect=false

Not all of this is valid do outside shutdown. In particular it will affect use of FinalizationRegistry and WeakRef, which suggests these could be involved in the leak. We should probably make this behaviour explicit rather than depending on the reason in this way.

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 26

•

3 years ago

This seems to be caused by how FinalizationRegistry is implemented. When registering a target and held value with a finalization registry we create a record object containing the held value (and various other things). These are marked as roots and are used to queue the finalization callback when the target is found to be dead. One concern is that we don't entrain the target, which would make it uncollectable. Unfortunately record objects (and the other objects involved) are normal JS objects and reference the global, so if the target is reachable from the global then we will trigger this problem and make the target, and the global, live forever.

Assignee: nobody → jcoppeard

Andrew McCreight [:mccr8]

Comment 27

•

3 years ago

(In reply to Jon Coppeard (:jonco) from comment #25)

I suspect the difference is that if you set forceGC it uses JS::GCReason::SHUTDOWN_CC. The GC treats shutdown reasons specially and does extra cleanup and some things that only make sense during shutdown:

Huh. Good point. The piece of code that decides the GC reasons is aForceGC ? JS::GCReason::SHUTDOWN_CC : JS::GCReason::CC_FORCED, so you are right that when forceGC is true the reason ends up as SHUTDOWN_CC, but for some reason I wasn't actually reading the code and was assuming it would only do a SHUTDOWN_CC on shutdown, so when I was manually inserting GarbageCollect() calls I was using CC_FORCED instead, which explains why it wasn't doing the cleanup. The currently behavior is incorrect in the case of a non-shutdown CC, so I should file a bug on that. Though in this case it did help reveal the issue, oddly enough.

Andrew McCreight [:mccr8]

Updated

•

3 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1753996

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Updated

•

3 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1753958

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 29

•

3 years ago

Attached file Bug 1749298 - Part 1: Factor out FinalizationRegistry data from Zone into its own class r?sfink (deleted) — Details

There's quite a lot of data that's stored on the zone and this bug will add
more. We can encapsulate this inside a per-zone object that is created on
demand.

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 30

•

3 years ago

Attached file Bug 1749298 - Part 2: Sweep FinalizationRegistries in the same sweep group as their targets r?sfink (deleted) — Details

To mark finalization records from the registry's global we need the registry
and its targets to be swept in the same sweep group, otherwise the global may
not be marked in time to mark the CCWs to the records.

The patch adds a count of cross-zone wrappers in the record map and uses this
to add sweep group edges in both directions.

We have to be careful when nuking wrappers that we update this correctly. The
patch adds addtional state to a finalization record to indicate whether it is
currently in the Zone's record map. This is different to the previous 'active'
state, here renamed to 'registered' as it reflects the caller's use of
register() and unregister() methods.

Depends on D138436

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 31

•

3 years ago

Attached file Bug 1749298 - Part 3: Only trace finalization records if the registry's global is alive r?sfink (deleted) — Details

This adds a set of record object wrappers to the global data and traces them as
part of global tracing.

This may be cross compartment edges. They are not in the CCW map but are
handled by the extra sweep group edges added by the previous patch.

Depends on D138437

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 32

•

3 years ago

Attached file Bug 1749298 - Part 4: Test that global not kept alive when a finalization registry target is reachable from it r?sfink (deleted) — Details

Depends on D138438

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Updated

•

3 years ago

Keywords: leave-open

Pulsebot

Comment 33

•

3 years ago

Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bdc9d39e0886
Part 1: Factor out FinalizationRegistry data from Zone into its own class r=sfink
https://hg.mozilla.org/integration/autoland/rev/c6e0088728db
Part 2: Sweep FinalizationRegistries in the same sweep group as their targets r=sfink
https://hg.mozilla.org/integration/autoland/rev/ca41fc23ea6a
Part 3: Only trace finalization records if the registry's global is alive r=sfink
https://hg.mozilla.org/integration/autoland/rev/e9a80289cc3b
Part 4: Test that global not kept alive when a finalization registry target is reachable from it r=sfink

Cristina Cozmuta (:CrissCozmuta)

Comment 34

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bdc9d39e0886
https://hg.mozilla.org/mozilla-central/rev/c6e0088728db
https://hg.mozilla.org/mozilla-central/rev/ca41fc23ea6a
https://hg.mozilla.org/mozilla-central/rev/e9a80289cc3b

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 35

•

3 years ago

Hi Anders, can you check whether the memory leak is still happening in the latest nightly?

Flags: needinfo?(anders.frantz)

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Regressions: 1755257

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Updated

•

3 years ago

Regressions: 1755874

anders.frantz

Reporter

Comment 36

•

3 years ago

(In reply to Jon Coppeard (:jonco) from comment #35)

Hi Anders, can you check whether the memory leak is still happening in the latest nightly?

Hi,
Thanks for the effort!
I've made some quick test with nightly build and it seems to be working better now.

I've only looked in task manager, no rigorous memory analyze.
Memory seems to double up temporarily on the reload, but then it's reduced to about the initial size, which is quite large buy the way, but that's another question.

Flags: needinfo?(anders.frantz)

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 37

•

3 years ago

(In reply to anders.frantz from comment #36)
Great, good to hear it's working better now. I'll mark this as fixed.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox99: --- → fixed

Keywords: leave-open

Target Milestone: --- → 99 Branch

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Updated

•

3 years ago

Regressions: 1756590

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Regressions: 1756567

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Regressions: 1813556

Bug 1749298 - Part 1: Factor out FinalizationRegistry data from Zone into its own class r?sfink 3 years ago Jon Coppeard (:jonco) (PTO until 14th September) (deleted), text/x-phabricator-request		Details
Bug 1749298 - Part 2: Sweep FinalizationRegistries in the same sweep group as their targets r?sfink 3 years ago Jon Coppeard (:jonco) (PTO until 14th September) (deleted), text/x-phabricator-request		Details
Bug 1749298 - Part 3: Only trace finalization records if the registry's global is alive r?sfink 3 years ago Jon Coppeard (:jonco) (PTO until 14th September) (deleted), text/x-phabricator-request		Details
Bug 1749298 - Part 4: Test that global not kept alive when a finalization registry target is reachable from it r?sfink 3 years ago Jon Coppeard (:jonco) (PTO until 14th September) (deleted), text/x-phabricator-request		Details