Closed Bug 1439509 Opened 7 years ago Closed 3 years ago

[dt-leak] NetworkEvent actors leak on reload

Categories

(DevTools :: General, defect, P2)

60 Branch
defect

Tracking

(firefox94 fixed)

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: guirec.corbel, Assigned: mozilla-bugzilla)

References

(Blocks 3 open bugs)

Details

(Whiteboard: dt-perf-stability-triage)

Attachments

(5 files)

Attached file memory-report-before-cc.json.gz (deleted) —
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0 Build ID: 20180215141455 Steps to reproduce: Open a page in a local server; Open the Developer Tool; Do a change in a JavaScript file; Reload the page; Actual results: The number of copies stored and the memory usage increase. Expected results: Do a garbage collection.
I reproduced the bug with this HTML file : <script src='./bigfile.js' type="text/javascript" charset="utf-8"></script> And this js file : https://raw.githubusercontent.com/orangehill/jstree-bootstrap-theme/master/dist/jstree.js . Any big files should be the same. I didn't saw the bug with a small file. As you can see in the attachment, I can take 4go of RAM and 18go of "vsize". I had to closed Firefox and the CPU usage was very high. I had to kill the process to really stop Firefox. As you can see here "string(length=515416, copies=101, "/* jsTree default theme */" there is a lot of copies of the same files. I have to close the dev tool and run a Garbage Collection and Cycle Collection to release the memory. It took the CPU for a while when I run a Garbage Collection. The "Disable Cache" checkbox was checked in the network tab.
Does it happen with every tools or were you using one in particular (debugger, console, inspector)?
I see you are having LiveReload add-on via the memory profile you attached. Were you using any particular workflow/tool, or just a plain http server and F5? I imagine live-reload add-ons may make such memory leak even more present!
I send an memory report with livereload disabled. I was on the console, I will test in other tabs in a few minutes. As you can see, there 7 copies of "/app/lib/assets/stylesheets/loader.scss" but I didn't touch this file. There is other files loaded multiple times too.
Attached file memory-report.json.gz (deleted) —
It is the same with other tabs. For information, I was modifying the file that starts with "Modernizer". I reloaded with F5.
Component: Untriaged → Developer Tools
(In reply to guirec.corbel from comment #1) > I reproduced the bug with this HTML file : > > <script src='./bigfile.js' type="text/javascript" > charset="utf-8"></script> > > And this js file : > https://raw.githubusercontent.com/orangehill/jstree-bootstrap-theme/master/ > dist/jstree.js . I tried to reproduce that on today's nightly, without success. I used this html file: <script src='jquery.js' type="text/javascript" > charset="utf-8"></script> <script src='bigfile.js' type="text/javascript" > charset="utf-8"></script> With the following js files next to it: $ wget https://code.jquery.com/jquery-3.3.1.min.js -O jquery.js $ wget https://raw.githubusercontent.com/orangehill/jstree-bootstrap-theme/master/dist/jstree.js -O bigfile.js My STR was: * open DevTools on the console * press F5 to reload the page tons of times My result was that the memory is surely growing, that's expected. But nothing really bad, here is the memory after 30seconds of reloads: 224MB for parent process and 211MB for content process (Firefox started at: 172MB for parent and 53MB for content with the test document opened with the console) Also I do not see anything really special in about:memory. So, there must be something special in your usecase. Are you sure you opened only the console and directly the console? Did you reproduced with just a <script> tag in an empty html document?
I wasn't able to reproduce the same bug with the same file I used when I created the issue. I was able to reproduce with a brand new ember app. I did just `ember new my-app && cd my-app && ember s` when ember cli is installed. You can install ember with `npm install -g ember-cli`. I attached a small video with a new profile. As you can see in the video, the memory is clean first. I open the console with Ctrl+Shift+k and refresh some times. After that, there is 4 copies of the same files. It's cleaned when I close the console. I don't need to click on "GC" and "CC" this time. I suppose this is because the memory is not loaded. It isn't a problem specific to Ember. I have it with other applications.
Ah ok, I see. This is related to source map, this is why the simplest example doesn't work. 108.15 MB (25.29%) -- worker(resource://devtools/client/shared/source-map/worker.js, 0x7f8ca8b53800) │ │ ├──103.92 MB (24.30%) -- zone(0x7f8ca915f000) │ │ │ ├───53.03 MB (12.40%) -- strings │ │ │ │ ├──32.13 MB (07.51%) -- string(length=3368956, copies=5, "{"version":3,"sources":["vendor/ember So the source-map worker is loading the sources multiple times. If you force a GC, these strings go away, so it isn't a leak. It's just that the source map is allocating a lot of memory because of the JS content copies. There is multiple ways to improve that: * force GC on page reload * kill, respawn a source worker on reload, that may help releasing the worker objets sooner * optimize source map to allocate less memory? * try to detect when the resources don't change and do not re-request the source map worker to re-parse the files * ...
On the reports I gave, the memory leaks dosen't seems to be related to the source map (or it's less obvious) as you can see here : 4,893.12 MB (100.0%) -- explicit ├──3,357.53 MB (68.62%) -- window-objects/top(http://app.local.com:4200/tests/#, id=6442450990) │ ├──2,301.10 MB (47.03%) ++ active │ └──1,056.43 MB (21.59%) ++ js-zone(0x7fa71f048000) ├────972.40 MB (19.87%) -- js-non-window │ ├──888.73 MB (18.16%) -- zones │ │ ├──871.52 MB (17.81%) -- zone(0x7fa74637d000) │ │ │ ├──831.51 MB (16.99%) -- strings │ │ │ │ ├──502.44 MB (10.27%) ++ (276 tiny) │ │ │ │ ├───99.42 MB (02.03%) -- string(length=515416, copies=101, "/* jsTree" (truncated)) │ │ │ │ │ ├──99.42 MB (02.03%) ── malloc-heap/two-byte │ │ │ │ │ └───0.00 MB (00.00%) ── gc-heap/two-byte │ │ │ │ ├───92.18 MB (01.88%) -- string(length=893446, copies=54, ""use strict" (truncated)) │ │ │ │ │ ├──92.18 MB (01.88%) ── malloc-heap/two-byte │ │ │ │ │ └───0.00 MB (00.00%) ── gc-heap/two-byte │ │ │ │ ├───81.35 MB (01.66%) -- string(length=1092231, copies=39, "/n/n(function() " (truncated)) │ │ │ │ │ ├──81.35 MB (01.66%) ── malloc-heap/two-byte │ │ │ │ │ └───0.00 MB (00.00%) ── gc-heap/two-byte │ │ │ │ └───56.11 MB (01.15%) -- string(length=1049559, copies=28, "window.EmberENV"(truncated)) │ │ │ │ ├──56.11 MB (01.15%) ── malloc-heap/two-byte It seems to be under the tab where the page is loaded. In the screencast, it seems to be the same than you.
I just realized I forgot an important part yesterday (and you too). We have to do a small change in a JS file before to reload as I specified in the first comment.
I still have the issue. It keeps multiple copies of JS files when we change a file and reload with the dev tool open.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Product: Firefox → DevTools
I tried again to reproduce the bug and it seems be a little bit different. Now, to reproduce the bug, I have to open the file in the debugger tab. It seems to keep copies of old JS files. I think it needs less memory than before but I suppose some garbage collection have to be performed. The file is still in memory if close the file in the debugger.
Blocks: 1553178
No longer blocks: devtools-performance
Blocks: dt-leak
No longer blocks: 1553178

Don't keep old network events around after a tab navigation occurs.

Assignee: nobody → sorin.davidoi

Harald, Could you think about any existing telemetry which may help assert the impact of this patch?
I'm pretty confident it will address serious memory leaks from DevTools, and it would be great to take an overview of existing metrics we have to see if one highlights this fix. If one do pick this up, we may take it as good candidate for the performance initiative.

I was wondering for example if we have a metrics about OOM crashes? If yet, could we categorize the one related to DevTools?
Or telemetry about memory usage, also categorized by DevTools usages?
Or the number of restart of Firefox per day/week? Or Firefox uptime for DevTools users?

Flags: needinfo?(hkirschner)

Not sure about OOM crashes as we only get very little information. Bug 1297525 was filed for DevTools because of OOM, but we don't have the means to track those reliably over time (afaik).

The best we can do without additional instrumentation is relying on existing probes. For leaks especially we can track our memory telemetry; but also GC related metrics (leak = long GC). Leaks slow specific parts of GC but also cause hangs with mostly GC.

Preliminary chat with sfink about this yields:

For DevTools users:

  1. MEMORY_JS_GC_HEAP should improve.
  2. GC_MS should improve.
  3. GC_MARK_MS should improve.
  4. GC_SWEEP_MS should not change much as leaks prevent sweeping (maybe)

As you said (and Sorin brought it up as well) open/close count for DevTools should improve if this has big impact, as we trained developers that DevTools leaks are fixed by closing the tools (also comes up in feedback).

Most of these we should be able to plot in stmo (with some help from data science).

Flags: needinfo?(hkirschner)

It is a bit surprising, but DAMP doesn't highlight any improvement:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=0f7e1a3c96ad5512e356006e3a19af287d75527c&originalSignature=1759151&newSignature=1759151&framework=12&originalRevision=f77921e426e801272780129a8cdff3121c3d6620
I would expect us to free some more memory between page reload while running the netmonitor test, leading to less GC pauses in the following tests. But nothing visible.
Note that netmonitor isn't having a stress test document. We only test it against bild.de, which may not trigger enough requests?

Last time I checked the DAMP version of bild.de it seemed pretty lightweight by today's standards. That's why I chose Mastodon to test (it's pretty heavyweight with React, Redux, Immutable and tons of other libraries).

Agreed. An SPA would probably best represent a modern heavy JS payload. Mastodon seems good – Riot, Firefox Profiler would also be open source alternatives. If we want to go towards content, we could have Discord and Reddit.

We can also continue that discussion in bug 1539195.

Note that bug 1615283 might help improve test helpers. At least, when we navigate to a new URL, there is a shared helper that we can tweak.
But here, we may want to also tweak initNetmonitor all all netmonitor tests, using various wait methods for no clear reason.

Honza, could you please have a look at the comments on phabricator?
https://phabricator.services.mozilla.com/D54800#1815934
https://phabricator.services.mozilla.com/D54800#1906061
https://phabricator.services.mozilla.com/D54800#inline-377350

I believe that, in order to fix this issue and have tests passing, we would need to review all existing helpers around netmonitor.
We have way too many different wait mechanisms:

  • waitForAllNetworkUpdateEvents
  • waitForNetworkEvents
  • waitForAllRequestsFinished
  • waitForRequestsFinished
  • and probably some others...
    This patch highlights that some are weak and waitForAllRequestsFinished looks slightly better.
    Should we use this one everywhere? Would it work? Or is it weak against another edge case?
    Why are we having so many different ways to wait for updates?

Then, once we can pick the best way to wait for updates, we can go over netmonitor test and use the right one.
This may involve tweaking initNetmonitor and/or navigateTo.
It looks like we are explicitely avoiding to check the behavior during toolbox opening and always reload the page.
This creates a lot of duplicated lines in each test, if we have to keep this, we should introduce an helper to do all of that in one function call.

Flags: needinfo?(odvarko)

Could we maybe create another issue to track this refactoring of the tests? It seems to be a more systemic problem than what the revision tries to address.

Flags: needinfo?(poirot.alex)
Depends on: 1622043

(In reply to Sorin Davidoi from comment #22)

Could we maybe create another issue to track this refactoring of the tests? It seems to be a more systemic problem than what the revision tries to address.

Yes. Your patch raise a systemic issue which has to be addressed before being able to fix this leak.

I opened bug 1622043 and will start submitting patches to it.

Flags: needinfo?(poirot.alex)

@ochameau Since the test issues are fixed, can you take another look at the patch?

Flags: needinfo?(poirot.alex)

I was looking at this yesterday and pushed to try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=75a94dd42d446e96e21309e64deb8e7553ed17f2

I am seeing some test failures:

  • devtools/client/netmonitor/test/browser_net_block-context.js
  • devtools/client/netmonitor/test/browser_net_block-csp.js
  • devtools/client/netmonitor/test/browser_net_block.js
  • devtools/client/netmonitor/test/browser_net_cause_source_map.js
  • devtools/client/netmonitor/test/browser_net_charts-**.js
  • devtools/client/netmonitor/src/har/test/browser_net_har_copy_all_as_har.js
  • devtools/client/framework/test/browser_ignore_toolbox_network_requests.js

I was doing some analysis of the Network monitor backend and I am hoping to get also more time to work on this.

Honza

Flags: needinfo?(odvarko)

I've rebased the patch. The test failure are still there, and I'm not sure how to fix them. I suspect that they are related to the test setup (probably similar issues to the ones @ochameau has fixed).

Cleaning up dt-leak dependencies

Summary: High RAM usage when the developer tool is open → [dt-leak] NetworkEvent actors leak on reload

I'm curious to know if the test failures in this bug highlighted a real issue with the fix or if they are only test artifacts.
I wonder if it would be acceptable to put this new behavior behind a pref so that we can toggle it off for the tests?

The main challenge here is to still support persist-logs feature of the netmonitor.
When this feature is enabled we have to keep around all the Network Events as before this patch.

Fission refactoring and the migration to server side NETWORK_EVENT watchers made
this fix significantly easier as we can now more easily reach out the parent process
code which is keeping things allocated.

Note that with the current patch, we still leak one request. The navigation request,
which is slightly special as it is flaged with previous document innerWindowId
and is meant to be inspectable from the next document load.

DAMP reports various small improvements (3% to 6%) on many tests:
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=be53546902553b0e585b42f253d6ceb0f67876c8&originalSignature=3148563&newSignature=3130913&framework=12&selectedTimeRange=172800&page=1&showOnlyImportant=1
The most important improvements impact tests others than netmonitor one, which probably means that the leaked objects were slowing down the following tests.

Flags: needinfo?(poirot.alex)
Whiteboard: dt-perf-stability-triage
Depends on: 1728737
Pushed by apoirot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c1ba43b65415 [devtools] Clear network events when their related WindowGlobal is destroyed. r=bomsy,devtools-backward-compat-reviewers,jdescottes

Surprisingly, I no longer see notable improvement reported by DAMP:
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=f08cb89717e5fb10f940ffc0ef6e84d596646fd6&originalSignature=3130913&newSignature=3130913&framework=12&originalRevision=37234b235e647e5814d4eb24fe19d873ad284f9f&page=1

Let's see if it impacts the metrics on mozilla-central.

But it does still fix browser_allocation_reload.js, which sees its number of leaks go down to 0 objects leaked with stack (with all the target fronts and markupview fixes [still in flight]).

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
Regressions: 1740116
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: