[dt-leak] NetworkEvent actors leak on reload
Categories
(DevTools :: General, defect, P2)
Tracking
(firefox94 fixed)
Tracking | Status | |
---|---|---|
firefox94 | --- | fixed |
People
(Reporter: guirec.corbel, Assigned: mozilla-bugzilla)
References
(Blocks 3 open bugs)
Details
(Whiteboard: dt-perf-stability-triage)
Attachments
(5 files)
Reporter | ||
Comment 1•7 years ago
|
||
Comment 2•7 years ago
|
||
Comment 3•7 years ago
|
||
Reporter | ||
Comment 4•7 years ago
|
||
Reporter | ||
Comment 5•7 years ago
|
||
Reporter | ||
Comment 6•7 years ago
|
||
Comment 7•7 years ago
|
||
Reporter | ||
Comment 8•7 years ago
|
||
Reporter | ||
Comment 9•7 years ago
|
||
Comment 10•7 years ago
|
||
Reporter | ||
Comment 11•7 years ago
|
||
Reporter | ||
Comment 12•7 years ago
|
||
Reporter | ||
Comment 13•7 years ago
|
||
Updated•7 years ago
|
Updated•7 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 14•6 years ago
|
||
Updated•6 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 15•5 years ago
|
||
Don't keep old network events around after a tab navigation occurs.
Updated•5 years ago
|
Comment 16•5 years ago
|
||
Harald, Could you think about any existing telemetry which may help assert the impact of this patch?
I'm pretty confident it will address serious memory leaks from DevTools, and it would be great to take an overview of existing metrics we have to see if one highlights this fix. If one do pick this up, we may take it as good candidate for the performance initiative.
I was wondering for example if we have a metrics about OOM crashes? If yet, could we categorize the one related to DevTools?
Or telemetry about memory usage, also categorized by DevTools usages?
Or the number of restart of Firefox per day/week? Or Firefox uptime for DevTools users?
Comment 17•5 years ago
|
||
Not sure about OOM crashes as we only get very little information. Bug 1297525 was filed for DevTools because of OOM, but we don't have the means to track those reliably over time (afaik).
The best we can do without additional instrumentation is relying on existing probes. For leaks especially we can track our memory telemetry; but also GC related metrics (leak = long GC). Leaks slow specific parts of GC but also cause hangs with mostly GC.
Preliminary chat with sfink about this yields:
For DevTools users:
- MEMORY_JS_GC_HEAP should improve.
- GC_MS should improve.
- GC_MARK_MS should improve.
- GC_SWEEP_MS should not change much as leaks prevent sweeping (maybe)
As you said (and Sorin brought it up as well) open/close count for DevTools should improve if this has big impact, as we trained developers that DevTools leaks are fixed by closing the tools (also comes up in feedback).
Most of these we should be able to plot in stmo (with some help from data science).
Comment 18•5 years ago
|
||
It is a bit surprising, but DAMP doesn't highlight any improvement:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=0f7e1a3c96ad5512e356006e3a19af287d75527c&originalSignature=1759151&newSignature=1759151&framework=12&originalRevision=f77921e426e801272780129a8cdff3121c3d6620
I would expect us to free some more memory between page reload while running the netmonitor test, leading to less GC pauses in the following tests. But nothing visible.
Note that netmonitor isn't having a stress test document. We only test it against bild.de, which may not trigger enough requests?
Assignee | ||
Comment 19•5 years ago
|
||
Last time I checked the DAMP version of bild.de it seemed pretty lightweight by today's standards. That's why I chose Mastodon to test (it's pretty heavyweight with React, Redux, Immutable and tons of other libraries).
Comment 20•5 years ago
|
||
Agreed. An SPA would probably best represent a modern heavy JS payload. Mastodon seems good – Riot, Firefox Profiler would also be open source alternatives. If we want to go towards content, we could have Discord and Reddit.
We can also continue that discussion in bug 1539195.
Updated•5 years ago
|
Comment 21•5 years ago
|
||
Note that bug 1615283 might help improve test helpers. At least, when we navigate to a new URL, there is a shared helper that we can tweak.
But here, we may want to also tweak initNetmonitor
all all netmonitor tests, using various wait methods for no clear reason.
Honza, could you please have a look at the comments on phabricator?
https://phabricator.services.mozilla.com/D54800#1815934
https://phabricator.services.mozilla.com/D54800#1906061
https://phabricator.services.mozilla.com/D54800#inline-377350
I believe that, in order to fix this issue and have tests passing, we would need to review all existing helpers around netmonitor.
We have way too many different wait mechanisms:
waitForAllNetworkUpdateEvents
waitForNetworkEvents
waitForAllRequestsFinished
waitForRequestsFinished
- and probably some others...
This patch highlights that some are weak andwaitForAllRequestsFinished
looks slightly better.
Should we use this one everywhere? Would it work? Or is it weak against another edge case?
Why are we having so many different ways to wait for updates?
Then, once we can pick the best way to wait for updates, we can go over netmonitor test and use the right one.
This may involve tweaking initNetmonitor
and/or navigateTo
.
It looks like we are explicitely avoiding to check the behavior during toolbox opening and always reload the page.
This creates a lot of duplicated lines in each test, if we have to keep this, we should introduce an helper to do all of that in one function call.
Assignee | ||
Comment 22•5 years ago
|
||
Could we maybe create another issue to track this refactoring of the tests? It seems to be a more systemic problem than what the revision tries to address.
Comment 23•5 years ago
|
||
(In reply to Sorin Davidoi from comment #22)
Could we maybe create another issue to track this refactoring of the tests? It seems to be a more systemic problem than what the revision tries to address.
Yes. Your patch raise a systemic issue which has to be addressed before being able to fix this leak.
I opened bug 1622043 and will start submitting patches to it.
Updated•5 years ago
|
Assignee | ||
Comment 24•5 years ago
|
||
@ochameau Since the test issues are fixed, can you take another look at the patch?
Comment 25•5 years ago
|
||
I was looking at this yesterday and pushed to try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=75a94dd42d446e96e21309e64deb8e7553ed17f2
I am seeing some test failures:
- devtools/client/netmonitor/test/browser_net_block-context.js
- devtools/client/netmonitor/test/browser_net_block-csp.js
- devtools/client/netmonitor/test/browser_net_block.js
- devtools/client/netmonitor/test/browser_net_cause_source_map.js
- devtools/client/netmonitor/test/browser_net_charts-**.js
- devtools/client/netmonitor/src/har/test/browser_net_har_copy_all_as_har.js
- devtools/client/framework/test/browser_ignore_toolbox_network_requests.js
I was doing some analysis of the Network monitor backend and I am hoping to get also more time to work on this.
Honza
Assignee | ||
Comment 26•4 years ago
|
||
I've rebased the patch. The test failure are still there, and I'm not sure how to fix them. I suspect that they are related to the test setup (probably similar issues to the ones @ochameau has fixed).
Comment 27•4 years ago
|
||
Cleaning up dt-leak dependencies
Comment 30•4 years ago
|
||
I'm curious to know if the test failures in this bug highlighted a real issue with the fix or if they are only test artifacts.
I wonder if it would be acceptable to put this new behavior behind a pref so that we can toggle it off for the tests?
Updated•4 years ago
|
Comment 31•3 years ago
|
||
The main challenge here is to still support persist-logs feature of the netmonitor.
When this feature is enabled we have to keep around all the Network Events as before this patch.
Fission refactoring and the migration to server side NETWORK_EVENT watchers made
this fix significantly easier as we can now more easily reach out the parent process
code which is keeping things allocated.
Note that with the current patch, we still leak one request. The navigation request,
which is slightly special as it is flaged with previous document innerWindowId
and is meant to be inspectable from the next document load.
Comment 32•3 years ago
|
||
DAMP reports various small improvements (3% to 6%) on many tests:
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=be53546902553b0e585b42f253d6ceb0f67876c8&originalSignature=3148563&newSignature=3130913&framework=12&selectedTimeRange=172800&page=1&showOnlyImportant=1
The most important improvements impact tests others than netmonitor one, which probably means that the leaked objects were slowing down the following tests.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 33•3 years ago
|
||
Comment 34•3 years ago
|
||
Surprisingly, I no longer see notable improvement reported by DAMP:
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=f08cb89717e5fb10f940ffc0ef6e84d596646fd6&originalSignature=3130913&newSignature=3130913&framework=12&originalRevision=37234b235e647e5814d4eb24fe19d873ad284f9f&page=1
Let's see if it impacts the metrics on mozilla-central.
But it does still fix browser_allocation_reload.js, which sees its number of leaks go down to 0 objects leaked with stack (with all the target fronts and markupview fixes [still in flight]).
Comment 35•3 years ago
|
||
bugherder |
Description
•