Closed Bug 1517758 Opened 6 years ago Closed 4 years ago

10.09 - 17.22% tp5o responsiveness / tp5o_webext responsiveness (linux64, windows7-32) regression on push 1b13a38c51e5640aa716d8d367c44e101c3835e3 (Thu Jan 3 2019)

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Performance Impact low
Tracking Status
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix

People

(Reporter: igoldan, Unassigned)

References

Details

(5 keywords)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=1b13a38c51e5640aa716d8d367c44e101c3835e3

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 17%  tp5o_webext responsiveness linux64 pgo e10s stylo         1.08 -> 1.27
 14%  tp5o_webext responsiveness linux64 opt e10s stylo         1.23 -> 1.40
 13%  tp5o responsiveness linux64 pgo e10s stylo                0.85 -> 0.96
 12%  tp5o responsiveness linux64 opt e10s stylo                0.97 -> 1.08
 10%  tp5o_webext responsiveness windows7-32 pgo e10s stylo     1.10 -> 1.21

Improvements:

 16%  about_preferences_basic osx-10-10 opt e10s stylo           220.13 -> 185.77
 15%  tabpaint linux64 pgo e10s stylo                            54.38 -> 46.08
 14%  about_preferences_basic linux64 opt e10s stylo             157.80 -> 135.86
 14%  about_preferences_basic windows10-64 opt e10s stylo        146.69 -> 126.76
 13%  about_preferences_basic windows10-64-qr opt e10s stylo     144.62 -> 125.14
 13%  about_preferences_basic windows7-32 opt e10s stylo         143.49 -> 124.65
 13%  about_preferences_basic windows7-32 pgo e10s stylo         130.54 -> 113.92
 13%  about_preferences_basic linux64-qr opt e10s stylo          161.41 -> 140.90
 12%  about_preferences_basic windows10-64 pgo e10s stylo        133.13 -> 116.78
 11%  tabpaint osx-10-10 opt e10s stylo                          76.97 -> 68.41
 10%  tps linux64 pgo e10s stylo                                 10.59 -> 9.50
 10%  about_preferences_basic linux64 pgo e10s stylo             142.45 -> 128.06
 10%  tps linux64-qr opt e10s stylo                              9.92 -> 8.94
  9%  tps windows10-64 opt e10s stylo                            11.05 -> 10.03
  9%  tp5o_webext windows10-64-qr opt e10s stylo                 192.73 -> 175.41
  9%  tp5o_webext linux64-qr opt e10s stylo                      211.73 -> 192.86
  9%  tp5o_webext osx-10-10 opt e10s stylo                       359.55 -> 327.77
  9%  tp5o_webext linux64 opt e10s stylo                         209.36 -> 191.03
  9%  tp5o_webext windows10-64 opt e10s stylo                    196.68 -> 179.88
  8%  tps linux64 opt e10s stylo                                 11.74 -> 10.75
  8%  tp5o_webext windows10-64 pgo e10s stylo                    177.19 -> 162.27
  8%  tp5o_webext windows7-32 pgo e10s stylo                     173.83 -> 159.35
  8%  tp5o_webext linux64 pgo e10s stylo                         186.43 -> 170.97
  7%  tp5o_webext windows7-32 opt e10s stylo                     192.61 -> 178.49
  6%  tart windows10-64-qr opt e10s stylo                        2.84 -> 2.67
  6%  tart linux64 opt e10s stylo                                1.79 -> 1.68
  5%  tart windows10-64 opt e10s stylo                           2.37 -> 2.24
  5%  tart linux64 pgo e10s stylo                                1.65 -> 1.57
  5%  sessionrestore linux64 opt e10s stylo                      328.42 -> 312.33
  4%  tart linux64-qr opt e10s stylo                             4.05 -> 3.88
  4%  sessionrestore_no_auto_restore linux64 opt e10s stylo      346.29 -> 332.25
  3%  sessionrestore windows10-64-qr opt e10s stylo              331.75 -> 322.00
  3%  sessionrestore_no_auto_restore linux64 pgo e10s stylo      316.58 -> 308.67
  3%  tp5o windows10-64-qr opt e10s stylo                        129.27 -> 126.04
  2%  sessionrestore windows7-32 pgo e10s stylo                  314.71 -> 307.08
  2%  tp5o linux64-qr opt e10s stylo                             144.12 -> 140.91
  2%  tp5o linux64 opt e10s stylo                                142.60 -> 139.74


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=18562

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
Flags: needinfo?(jdemooij)
Product: Testing → Core
:jandem if one of bug 1512029 or bug 1514210 isn't related to this issue, please remove it from the blocking list.
Hm I'm not sure what's going on with the responsiveness tests - it might be because we now have more stuff in the system zone and that could affect GC. I'll see if I can track this down somehow.

It's great to see a ton of improvements elsewhere.

Also some memory wins like https://treeherder.mozilla.org/perf.html#/alerts?id=18564 and devtools https://treeherder.mozilla.org/perf.html#/alerts?id=18569
We now have more delays that exceed the 20 ms threshold:

number of values before: 1089, 1106, 1134, 1073, 1116
number of values after:  1191, 1196, 1204, 1215, 1212

This results in a higher score. Maybe with more things in the system zone we now exceed the malloc threshold more quickly and we end up GC'ing more frequently? I'll try to collect some data on our GCs and GC reasons.
Component: General → DOM
The "tp5o responsiveness" benchmark is flawed.

How it works: a background thread sends a trace event to the native event queue (GTK, Windows, etc) and then we measure the time it takes Gecko to handle it. If that's more than 20 ms, we report it and the harness does some calculations on these values to determine the final responsiveness score. (We also ensure we don't trigger more than one event in each 10 ms window.)

That's pretty reasonable but the problem is that the benchmark harness triggers thousands of non-incremental GCs with windowUtils.garbageCollect(). So what we end up measuring is effectively non-incremental GC times: if non-incremental GCs get a bit slower we will end up with a higher score. Incremental GCs are what you want when it comes to responsiveness and by forcing so many non-incremental GCs we're just making things hard for ourselves and skewing the results to something non-representative.

If I make nsDOMWindowUtils::GarbageCollect a no-op, I get a score of 0.04 on Linux64 Opt on Try (compare to 0.97 -> 1.08 in comment 0).

---------------

Now why do my patches make non-incremental GCs slower? When we call windowUtils.garbageCollect(), we preserve JIT code and type information in that zone because it's active. That now means the whole system zone instead of just that window. Tracing JIT code and ICs is a bit slower than just discarding it hence GC becoming a bit slower. If I make GCRuntime::shouldPreserveJITCode always return false, I get a score of 0.69-0.73, better than before.
Flags: needinfo?(jdemooij)
> the benchmark harness triggers thousands of non-incremental GCs with windowUtils.garbageCollect()

Hmm.  Why is it doing that?  Do we know who owns this benchmark?
Flags: needinfo?(igoldan)
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #5)
> > the benchmark harness triggers thousands of non-incremental GCs with windowUtils.garbageCollect()
> 
> Hmm.  Why is it doing that?  Do we know who owns this benchmark?

https://wiki.mozilla.org/Buildbot/Talos/Tests#Responsiveness

jimm/jmaher.
Flags: needinfo?(igoldan) → needinfo?(jmaher)
(In reply to Jan de Mooij [:jandem] from comment #2)
> Hm I'm not sure what's going on with the responsiveness tests - it might be
> because we now have more stuff in the system zone and that could affect GC.
> I'll see if I can track this down somehow.
> 
> It's great to see a ton of improvements elsewhere.
> 
> Also some memory wins like
> https://treeherder.mozilla.org/perf.html#/alerts?id=18564 and devtools
> https://treeherder.mozilla.org/perf.html#/alerts?id=18569

These are the memory improvements:

== Change summary for alert #18564 (as of Thu, 03 Jan 2019 08:04:31 GMT) ==

Improvements:

  4%  JS windows10-64 pgo stylo                            115,798,435.37 -> 111,491,829.98
  3%  JS linux64-stylo-sequential opt stylo-sequential     114,951,746.13 -> 111,202,599.96
  3%  JS osx-10-10 opt stylo                               115,765,833.10 -> 112,130,792.75
  3%  JS windows10-64-qr opt stylo                         116,220,274.86 -> 112,752,668.66
  3%  JS linux64 opt stylo                                 114,829,338.43 -> 111,691,883.55
  2%  JS windows10-64 opt stylo                            115,806,784.04 -> 112,926,332.15
  2%  JS linux64-qr opt stylo                              114,443,632.16 -> 111,610,538.30
  2%  JS windows7-32 opt stylo                             85,857,378.69 -> 83,825,204.10

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18564
It's probably worth noting that reducing the number of system compartments almost always leads 5o major performance and responsiveness improvements, so the GC issue is probably pretty serious if it's enough to show a regression.
Depends on: 1518109

I am not too familiar with what this actually measures.

We collect data from MOZ_EVENT_TRACES:
https://searchfox.org/mozilla-central/source/testing/talos/talos/results.py#295

then we filter the data:
https://searchfox.org/mozilla-central/source/testing/talos/talos/filter.py#253

I believe we emit an event trace message when there is a lag in getting an event through the event loop.

Flags: needinfo?(jmaher)

I think the upshot is that you need to either stop doing forced gc in this test or ignore all events which are measuring time across a gc you're forcing yourself.

Flags: needinfo?(jmaher)

:rwood, can you triage this and figure out how on your team we can stop measuring events while we are doing a gc or stop doing a gc? if we continue to do a gc, then we need some markers before/after the gc and then modify the log parser to skip responsiveness data between these markers.

Flags: needinfo?(jmaher) → needinfo?(rwood)

It's not exactly responsiveness data "between the markers". It's data that corresponds to any interval that started before the end of a GC and ended after the start of the same GC.

So depending on the structure of the setup, the data could be coming in after the "end GC" marker, for example.

Keywords: stalled

:jimm you're listed as a contact for the tp5o responsiveness test. Could you read through the comments above and advise on how we might improve the value of the test?

Flags: needinfo?(rwood) → needinfo?(jmathies)
Component: DOM → DOM: Core & HTML

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #14)

:jimm you're listed as a contact for the [tp5o
responsiveness](https://wiki.mozilla.org/Performance_sheriffing/Talos/
Tests#Responsiveness) test. Could you read through the comments above and
advise on how we might improve the value of the test?

I don't have any feedback on this particular test. I'm fishing around for an owner though, will post back once i figure that out.

(In reply to Jim Mathies [:jimm] from comment #15)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #14)

:jimm you're listed as a contact for the [tp5o
responsiveness](https://wiki.mozilla.org/Performance_sheriffing/Talos/
Tests#Responsiveness) test. Could you read through the comments above and
advise on how we might improve the value of the test?

I don't have any feedback on this particular test. I'm fishing around for an owner though, will post back once i figure that out.

Jim, have you managed to find an owner for this test?

(In reply to IonuΘ› Goldan [:igoldan], Performance Sheriffing from comment #16)

(In reply to Jim Mathies [:jimm] from comment #15)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #14)

:jimm you're listed as a contact for the [tp5o
responsiveness](https://wiki.mozilla.org/Performance_sheriffing/Talos/
Tests#Responsiveness) test. Could you read through the comments above and
advise on how we might improve the value of the test?

I don't have any feedback on this particular test. I'm fishing around for an owner though, will post back once i figure that out.

Jim, have you managed to find an owner for this test?

Jean Gong (Performance team) might have some ideas on what to do here, I think we're trying to develop better tests.

Flags: needinfo?(jmathies) → needinfo?(jgong)
Type: defect → enhancement
Type: enhancement → defect
Priority: -- → P3
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p3:responsiveness]

Ionut, this is 2 years old. Could you please look into and update the status?

Flags: needinfo?(igoldan)

There's not much we can do about a 2 year old performance regression, but I've opened bug 1665600 for the test deficiency mentioned.

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(jgong)
Flags: needinfo?(igoldan)
Resolution: --- → WONTFIX

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Performance Impact: --- → P3
Whiteboard: [qf:p3:responsiveness]
You need to log in before you can comment on or make changes to this bug.