1517758 - 10.09 - 17.22% tp5o responsiveness / tp5o_webext responsiveness (linux64, windows7-32) regression on push 1b13a38c51e5640aa716d8d367c44e101c3835e3 (Thu Jan 3 2019)

Reporter

Description

•

6 years ago

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=1b13a38c51e5640aa716d8d367c44e101c3835e3

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 17%  tp5o_webext responsiveness linux64 pgo e10s stylo         1.08 -> 1.27
 14%  tp5o_webext responsiveness linux64 opt e10s stylo         1.23 -> 1.40
 13%  tp5o responsiveness linux64 pgo e10s stylo                0.85 -> 0.96
 12%  tp5o responsiveness linux64 opt e10s stylo                0.97 -> 1.08
 10%  tp5o_webext responsiveness windows7-32 pgo e10s stylo     1.10 -> 1.21

Improvements:

 16%  about_preferences_basic osx-10-10 opt e10s stylo           220.13 -> 185.77
 15%  tabpaint linux64 pgo e10s stylo                            54.38 -> 46.08
 14%  about_preferences_basic linux64 opt e10s stylo             157.80 -> 135.86
 14%  about_preferences_basic windows10-64 opt e10s stylo        146.69 -> 126.76
 13%  about_preferences_basic windows10-64-qr opt e10s stylo     144.62 -> 125.14
 13%  about_preferences_basic windows7-32 opt e10s stylo         143.49 -> 124.65
 13%  about_preferences_basic windows7-32 pgo e10s stylo         130.54 -> 113.92
 13%  about_preferences_basic linux64-qr opt e10s stylo          161.41 -> 140.90
 12%  about_preferences_basic windows10-64 pgo e10s stylo        133.13 -> 116.78
 11%  tabpaint osx-10-10 opt e10s stylo                          76.97 -> 68.41
 10%  tps linux64 pgo e10s stylo                                 10.59 -> 9.50
 10%  about_preferences_basic linux64 pgo e10s stylo             142.45 -> 128.06
 10%  tps linux64-qr opt e10s stylo                              9.92 -> 8.94
  9%  tps windows10-64 opt e10s stylo                            11.05 -> 10.03
  9%  tp5o_webext windows10-64-qr opt e10s stylo                 192.73 -> 175.41
  9%  tp5o_webext linux64-qr opt e10s stylo                      211.73 -> 192.86
  9%  tp5o_webext osx-10-10 opt e10s stylo                       359.55 -> 327.77
  9%  tp5o_webext linux64 opt e10s stylo                         209.36 -> 191.03
  9%  tp5o_webext windows10-64 opt e10s stylo                    196.68 -> 179.88
  8%  tps linux64 opt e10s stylo                                 11.74 -> 10.75
  8%  tp5o_webext windows10-64 pgo e10s stylo                    177.19 -> 162.27
  8%  tp5o_webext windows7-32 pgo e10s stylo                     173.83 -> 159.35
  8%  tp5o_webext linux64 pgo e10s stylo                         186.43 -> 170.97
  7%  tp5o_webext windows7-32 opt e10s stylo                     192.61 -> 178.49
  6%  tart windows10-64-qr opt e10s stylo                        2.84 -> 2.67
  6%  tart linux64 opt e10s stylo                                1.79 -> 1.68
  5%  tart windows10-64 opt e10s stylo                           2.37 -> 2.24
  5%  tart linux64 pgo e10s stylo                                1.65 -> 1.57
  5%  sessionrestore linux64 opt e10s stylo                      328.42 -> 312.33
  4%  tart linux64-qr opt e10s stylo                             4.05 -> 3.88
  4%  sessionrestore_no_auto_restore linux64 opt e10s stylo      346.29 -> 332.25
  3%  sessionrestore windows10-64-qr opt e10s stylo              331.75 -> 322.00
  3%  sessionrestore_no_auto_restore linux64 pgo e10s stylo      316.58 -> 308.67
  3%  tp5o windows10-64-qr opt e10s stylo                        129.27 -> 126.04
  2%  sessionrestore windows7-32 pgo e10s stylo                  314.71 -> 307.08
  2%  tp5o linux64-qr opt e10s stylo                             144.12 -> 140.91
  2%  tp5o linux64 opt e10s stylo                                142.60 -> 139.74


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=18562

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Flags: needinfo?(jdemooij)

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Product: Testing → Core

Ionuț Goldan [:igoldan]

Reporter

Comment 1

•

6 years ago

:jandem if one of bug 1512029 or bug 1514210 isn't related to this issue, please remove it from the blocking list.

Jan de Mooij [:jandem]

Comment 2

•

6 years ago

Hm I'm not sure what's going on with the responsiveness tests - it might be because we now have more stuff in the system zone and that could affect GC. I'll see if I can track this down somehow.

It's great to see a ton of improvements elsewhere.

Also some memory wins like https://treeherder.mozilla.org/perf.html#/alerts?id=18564 and devtools https://treeherder.mozilla.org/perf.html#/alerts?id=18569

Jan de Mooij [:jandem]

Comment 3

•

6 years ago

We now have more delays that exceed the 20 ms threshold:

number of values before: 1089, 1106, 1134, 1073, 1116
number of values after:  1191, 1196, 1204, 1215, 1212

This results in a higher score. Maybe with more things in the system zone we now exceed the malloc threshold more quickly and we end up GC'ing more frequently? I'll try to collect some data on our GCs and GC reasons.

(No longer employed by Mozilla) Aaron Klotz

Updated

•

6 years ago

Component: General → DOM

Jan de Mooij [:jandem]

Comment 4

•

6 years ago

The "tp5o responsiveness" benchmark is flawed.

How it works: a background thread sends a trace event to the native event queue (GTK, Windows, etc) and then we measure the time it takes Gecko to handle it. If that's more than 20 ms, we report it and the harness does some calculations on these values to determine the final responsiveness score. (We also ensure we don't trigger more than one event in each 10 ms window.)

That's pretty reasonable but the problem is that the benchmark harness triggers thousands of non-incremental GCs with windowUtils.garbageCollect(). So what we end up measuring is effectively non-incremental GC times: if non-incremental GCs get a bit slower we will end up with a higher score. Incremental GCs are what you want when it comes to responsiveness and by forcing so many non-incremental GCs we're just making things hard for ourselves and skewing the results to something non-representative.

If I make nsDOMWindowUtils::GarbageCollect a no-op, I get a score of 0.04 on Linux64 Opt on Try (compare to 0.97 -> 1.08 in comment 0).

---------------

Now why do my patches make non-incremental GCs slower? When we call windowUtils.garbageCollect(), we preserve JIT code and type information in that zone because it's active. That now means the whole system zone instead of just that window. Tracing JIT code and ICs is a bit slower than just discarding it hence GC becoming a bit slower. If I make GCRuntime::shouldPreserveJITCode always return false, I get a score of 0.69-0.73, better than before.

Flags: needinfo?(jdemooij)

Boris Zbarsky [:bzbarsky]

Comment 5

•

6 years ago

> the benchmark harness triggers thousands of non-incremental GCs with windowUtils.garbageCollect()

Hmm.  Why is it doing that?  Do we know who owns this benchmark?

Flags: needinfo?(igoldan)

Bobby Holley (:bholley)

Comment 6

•

6 years ago

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #5)
> > the benchmark harness triggers thousands of non-incremental GCs with windowUtils.garbageCollect()
> 
> Hmm.  Why is it doing that?  Do we know who owns this benchmark?

https://wiki.mozilla.org/Buildbot/Talos/Tests#Responsiveness

jimm/jmaher.

Flags: needinfo?(igoldan) → needinfo?(jmaher)

Jan de Mooij [:jandem]

Comment 7

•

6 years ago

pageloader.js calls windowUtils.garbageCollect() here:

https://searchfox.org/mozilla-central/rev/a0841a254f67ae6c58a7b03473b6c8a10d928368/testing/talos/talos/pageloader/chrome/pageloader.js#421

https://searchfox.org/mozilla-central/rev/a0841a254f67ae6c58a7b03473b6c8a10d928368/testing/talos/talos/pageloader/chrome/pageloader.js#533

Ionuț Goldan [:igoldan]

Reporter

Comment 8

•

6 years ago

(In reply to Jan de Mooij [:jandem] from comment #2)
> Hm I'm not sure what's going on with the responsiveness tests - it might be
> because we now have more stuff in the system zone and that could affect GC.
> I'll see if I can track this down somehow.
> 
> It's great to see a ton of improvements elsewhere.
> 
> Also some memory wins like
> https://treeherder.mozilla.org/perf.html#/alerts?id=18564 and devtools
> https://treeherder.mozilla.org/perf.html#/alerts?id=18569

These are the memory improvements:

== Change summary for alert #18564 (as of Thu, 03 Jan 2019 08:04:31 GMT) ==

Improvements:

  4%  JS windows10-64 pgo stylo                            115,798,435.37 -> 111,491,829.98
  3%  JS linux64-stylo-sequential opt stylo-sequential     114,951,746.13 -> 111,202,599.96
  3%  JS osx-10-10 opt stylo                               115,765,833.10 -> 112,130,792.75
  3%  JS windows10-64-qr opt stylo                         116,220,274.86 -> 112,752,668.66
  3%  JS linux64 opt stylo                                 114,829,338.43 -> 111,691,883.55
  2%  JS windows10-64 opt stylo                            115,806,784.04 -> 112,926,332.15
  2%  JS linux64-qr opt stylo                              114,443,632.16 -> 111,610,538.30
  2%  JS windows7-32 opt stylo                             85,857,378.69 -> 83,825,204.10

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18564

Kris Maglione [:kmag]

Comment 9

•

6 years ago

It's probably worth noting that reducing the number of system compartments almost always leads 5o major performance and responsiveness improvements, so the GC issue is probably pretty serious if it's enough to show a regression.

Jan de Mooij [:jandem]

Updated

•

6 years ago

Depends on: 1518109

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

6 years ago

I am not too familiar with what this actually measures.

We collect data from MOZ_EVENT_TRACES:
https://searchfox.org/mozilla-central/source/testing/talos/talos/results.py#295

then we filter the data:
https://searchfox.org/mozilla-central/source/testing/talos/talos/filter.py#253

I believe we emit an event trace message when there is a lag in getting an event through the event loop.

Flags: needinfo?(jmaher)

Boris Zbarsky [:bzbarsky]

Comment 11

•

6 years ago

I think the upshot is that you need to either stop doing forced gc in this test or ignore all events which are measuring time across a gc you're forcing yourself.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 12

•

6 years ago

:rwood, can you triage this and figure out how on your team we can stop measuring events while we are doing a gc or stop doing a gc? if we continue to do a gc, then we need some markers before/after the gc and then modify the log parser to skip responsiveness data between these markers.

Flags: needinfo?(jmaher) → needinfo?(rwood)

Boris Zbarsky [:bzbarsky]

Comment 13

•

6 years ago

It's not exactly responsiveness data "between the markers". It's data that corresponds to any interval that started before the end of a GC and ended after the start of the same GC.

So depending on the structure of the setup, the data could be coming in after the "end GC" marker, for example.

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Keywords: stalled

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 14

•

6 years ago

:jimm you're listed as a contact for the tp5o responsiveness test. Could you read through the comments above and advise on how we might improve the value of the test?

Flags: needinfo?(rwood) → needinfo?(jmathies)

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

Jim Mathies [:jimm]

Comment 15

•

6 years ago

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #14)

:jimm you're listed as a contact for the [tp5o
responsiveness](https://wiki.mozilla.org/Performance_sheriffing/Talos/
Tests#Responsiveness) test. Could you read through the comments above and
advise on how we might improve the value of the test?

I don't have any feedback on this particular test. I'm fishing around for an owner though, will post back once i figure that out.

Ionuț Goldan [:igoldan]

Reporter

Comment 16

•

6 years ago

(In reply to Jim Mathies [:jimm] from comment #15)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #14)

:jimm you're listed as a contact for the [tp5o
responsiveness](https://wiki.mozilla.org/Performance_sheriffing/Talos/
Tests#Responsiveness) test. Could you read through the comments above and
advise on how we might improve the value of the test?

I don't have any feedback on this particular test. I'm fishing around for an owner though, will post back once i figure that out.

Jim, have you managed to find an owner for this test?

Jim Mathies [:jimm]

Comment 17

•

6 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #16)

(In reply to Jim Mathies [:jimm] from comment #15)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #14)

:jimm you're listed as a contact for the [tp5o
responsiveness](https://wiki.mozilla.org/Performance_sheriffing/Talos/
Tests#Responsiveness) test. Could you read through the comments above and
advise on how we might improve the value of the test?

I don't have any feedback on this particular test. I'm fishing around for an owner though, will post back once i figure that out.

Jim, have you managed to find an owner for this test?

Jean Gong (Performance team) might have some ideas on what to do here, I think we're trying to develop better tests.

Flags: needinfo?(jmathies) → needinfo?(jgong)

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

5 years ago

Type: defect → enhancement

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Type: enhancement → defect

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

5 years ago

Priority: -- → P3

Alexandru Ionescu (needinfo me) [:alexandrui]

Updated

•

5 years ago

status-firefox69: --- → affected

status-firefox70: --- → affected

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Keywords: perf-alert

Florin Strugariu [:Bebe]

Updated

•

5 years ago

Whiteboard: [qf]

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Updated

•

5 years ago

status-firefox69: affected → wontfix

status-firefox70: affected → wontfix

status-firefox71: --- → affected

Pascal Chevrel:pascalc (PTO until August 21)

Updated

•

5 years ago

status-firefox71: affected → wontfix

Andrew Creskey [:acreskey]

Updated

•

5 years ago

Whiteboard: [qf] → [qf:p3:responsiveness]

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 18

•

4 years ago

Ionut, this is 2 years old. Could you please look into and update the status?

Alexandru Ionescu (needinfo me) [:alexandrui]

Updated

•

4 years ago

Flags: needinfo?(igoldan)

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

4 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1665600

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 19

•

4 years ago

There's not much we can do about a 2 year old performance regression, but I've opened bug 1665600 for the test deficiency mentioned.

Status: NEW → RESOLVED

Closed: 4 years ago

Flags: needinfo?(jgong)

Flags: needinfo?(igoldan)

Resolution: --- → WONTFIX

BugBot [:suhaib / :marco/ :calixte]

Comment 20

•

4 years ago

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P3

Keywords: perf:responsiveness

Whiteboard: [qf:p3:responsiveness]