Closed Bug 1708191 Opened 4 years ago Closed 3 years ago

4.46 - 4.12% JS / JS + 3 more (Linux, OSX, Windows) regression on Mon April 26 2021

Categories

(Remote Protocol :: Marionette, defect, P3)

defect

Tracking

(firefox-esr78 unaffected, firefox88 unaffected, firefox89 unaffected, firefox90 fixed)

RESOLVED FIXED
90 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox88 --- unaffected
firefox89 --- unaffected
firefox90 --- fixed

People

(Reporter: aesanu, Assigned: whimboo)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 file)

Perfherder has detected a awsy performance regression from push 15253a3168d2f2543a5348d47157f033995e928b. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
4% JS linux1804-64-shippable-qr 86,352,814.41 -> 90,201,978.25
4% JS windows10-64-shippable 88,631,724.23 -> 92,394,280.47
4% JS windows10-64-shippable-qr 88,350,715.80 -> 92,046,253.79
4% JS macosx1015-64-shippable-qr 89,191,093.00 -> 92,884,248.29
4% JS linux1804-64-shippable 87,513,973.68 -> 91,117,985.62

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
0.47% Base Content JS macosx1015-64-shippable-qr 2,445,646.67 -> 2,434,209.33

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(hskupin)

There are 5 changesets that got landed:

https://hg.mozilla.org/integration/autoland/rev/9ffdbe61355a
[marionette] Refactor modal dialog unit tests. r=marionette-reviewers,jdescottes

That one is not related because it only refactors the Python tests.

https://hg.mozilla.org/integration/autoland/rev/6d2c8ff216fc
[marionette] Use GeckoDriver's dialog handler to inform MarionetteCommands parent actor about a new user prompt. r=marionette-reviewers,jdescottes
https://hg.mozilla.org/integration/autoland/rev/e352f905ecf8
[marionette] Use GeckoDriver's dialog observer when awaiting open dialog to be closed. r=marionette-reviewers,jdescottes
https://hg.mozilla.org/integration/autoland/rev/1cddea72d353
[marionette] Switch To Window has to check for open user prompts. r=marionette-reviewers,jdescottes,webdriver-reviewers
https://hg.mozilla.org/integration/autoland/rev/15253a3168d2
[marionette] Only handle user prompts from the currently selected tab. r=marionette-reviewers,webdriver-reviewers,Gijs,jdescottes

For those I did some try builds:

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #1)

For those I did some try builds:

Here the results for JS Tabs closed opt:

That means that the memory increase for JS got triggered by the following changeset and in particular the single line change for WebDriver:SwitchToWindow in driver.js:

+  this.dialog = modal.findModalDialogs(this.curBrowser);

Given that AWSY is opening a lot of tabs and switching through these I assume that we started to leak some memory.

Set release status flags based on info from the regressing bug 1701686

Blocks: 1709321

Running the AWSY tests locally I cannot see any dialog that gets opened, and are also found. As such I pushed a try build that enables the Marionette trace logs:

https://treeherder.mozilla.org/jobs?repo=try&revision=2a46d1ee00c503f7f59c29a821ee5333f899c8e1

Maybe that gives an indication which page actually opened a dialog that WebDriver:SwitchToWindow uses to create a dialog instance for.

As the gecko log shows there was not a single user prompt opened (missing Found open string):
https://firefoxci.taskcluster-artifacts.net/e7xqYUduQGmPeXLbItpf3w/0/public/test_info//gecko.log

I will partially comment out code from findModalDialogs to check which specific check for window, tab, or content modals triggers the leak.

Indeed, disabling the usage of various dialog and prompt managers when checking for open dialogs caused the memory usage to drop to the original level:

https://treeherder.mozilla.org/perfherder/graphs?series=try,2203896,1,4&selected=2203896,1356774189

That means that there are leaks in all of the following implementations:

  • tabDialogBox.getTabDialogManager().dialogs
  • tabDialogBox.getContentDialogManager().dialogs
  • promptBox.listPrompts()

I'll create a small testcase with Marionette that could be used with the gecko profiler to actually create a profile with JS allocation.

Flags: needinfo?(hskupin)

The problematic line here seems to actually be:

const tabDialogBox = context.tabBrowser.getTabDialogBox(contentBrowser);

Not sure if that is just a new baseline that we should accept. On Monday I will check if the memory usage actually grows when switching over and over between windows.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #6)

That means that there are leaks in all of the following implementations:

  • tabDialogBox.getTabDialogManager().dialogs
  • tabDialogBox.getContentDialogManager().dialogs
  • promptBox.listPrompts()

That's a pretty strident assertion. Why do you think so?

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #7)

The problematic line here seems to actually be:

const tabDialogBox = context.tabBrowser.getTabDialogBox(contentBrowser);

Not sure if that is just a new baseline that we should accept. On Monday I will check if the memory usage actually grows when switching over and over between windows.

We should not; this creates a tabdialogbox instance and all the associated markup, will load JSMs where necessary, etc. If marionette is now invoking this all the time, that is not good, and we should either create and use a getter that just returns null if no dialogs have been created, or teach marionette not to invoke whatever code is calling into this willy-nilly.

Why are we running this code, if there are no dialogs?

Flags: needinfo?(hskupin)

Actually by calling getTabDialogBox() and getTabModalPromptBox() we unnecessarily create those two boxes even if no dialog is present:
https://searchfox.org/mozilla-central/rev/a8773ba2a8f015e0525f219a7e55087b04d30258/browser/base/content/tabbrowser.js#863-877

We should avoid that and directly check for contentBrowser.tabDialogBox and contentBrowser.tabModalPromptBox. Locally this brings us back to the memory usage from before. I'll upload the patch and trigger a try build.

Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: -- → P3

Calling "getTabDialogBox()" and "getTabModalPromptBox()"
unnecessarily creates new instances for
"TabModalPromptBox" and "TabDialogBox". To avoid that
directly access "tabDialogBox" and "tabModalPromptBox"
on the content browser.

Sorry, had a mid-air collision but sent my last comment anyway before.

(In reply to :Gijs (he/him) from comment #8)

We should not; this creates a tabdialogbox instance and all the associated markup, will load JSMs where necessary, etc. If marionette is now invoking this all the time, that is not good, and we should either create and use a getter that just returns null if no dialogs have been created, or teach marionette not to invoke whatever code is calling into this willy-nilly.

Why are we running this code, if there are no dialogs?

When switching windows we have to check for possible open dialogs. The patch that I'm going to upload will get rid of exactly these problems.

The causing code has been added by bug 1348872 for Firefox 53, but we actually never hit this code path so far, except for the very first browser where we called that method when creating a new Marionette session.

Flags: needinfo?(hskupin)
Regressed by: 1348872
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a064f5229916 [marionette] Reduce memory footprint when checking for open user prompts. r=marionette-reviewers,Gijs
Pushed by imoraru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d9a7299c306e fixed eslint failures. a=lint-fix CLOSED TREE
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch

== Change summary for alert #30069 (as of Thu, 13 May 2021 10:54:24 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
8% JS macosx1015-64-shippable-qr 97,096,869.89 -> 89,409,004.18
8% JS linux1804-64-shippable 95,377,853.14 -> 87,869,165.14
8% JS linux1804-64-shippable-qr 94,752,239.59 -> 87,349,702.12
8% JS macosx1015-64-shippable 98,371,932.31 -> 90,964,059.03
7% JS windows10-64-shippable 95,941,470.56 -> 88,910,122.27
... ... ... ... ... ...
2% Resident Memory linux1804-64-shippable 624,308,218.83 -> 610,863,461.13

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30069

Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: