Closed Bug 1521080 Opened 6 years ago Closed 6 years ago

1.08% raptor-assorted-dom-firefox (linux64) regression on push 560290f263515bb7c7bb834cde8f7d64d5bdde54 (Thu Jan 17 2019)

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: Bebe, Unassigned)

References

Details

(Keywords: perf, regression)

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=560290f263515bb7c7bb834cde8f7d64d5bdde54

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

1% raptor-assorted-dom-firefox linux64 pgo 25.95 -> 26.23

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=18792

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/Performance_sheriffing/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Blocks: 1508521, 1513237
Component: General → JavaScript Engine
Product: Testing → Core
Version: Version 3 → unspecified

Here's some data on subtests:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=autoland&newRevision=560290f263515bb7c7bb834cde8f7d64d5bdde54&originalSignature=1762680&newSignature=1801039&framework=10&selectedTimeRange=172800

For worker-getname-performance-getter I think we just load this in a worker:

https://github.com/mozilla/perf-automation/blob/master/benchmarks/assorted-dom/assorted/getname-performance-getter-helper.js

The "performance" getter is likely loop-hoisted or eliminated, so we probably have an empty loop where the increment operator is generating worse code somehow. Maybe we can try to repro in the shell with a similar script first.

Can one of you look at this?

Flags: needinfo?(wingo)
Flags: needinfo?(robin)

i'll work on this today and see if i can reproduce the regression in the js shell. thanks for looking at the actual test script, starting with that should make it easier to diagnose the problem

Flags: needinfo?(robin)

I was unable to repro given the getname-performance-getter-helper.js and the shell; not sure why. The assembly generated is the exact same modulo constant values. Still working on this.

Flags: needinfo?(wingo)

This one is a bit odd. As I said the assembly generated is the same, and they tier up after the same number of iterations with --ion-offthread-compile=off. However with off-thread Ion compilation, there is a difference in terms of block visit counts. To wit:

$ cat /tmp/test.js
(function() {
  var count = 1000000;
  var holder;
  var start = performance.now();
  for (var i = 0; i < count; ++i)
      holder = performance;
  var end = performance.now();
  print(end - start);
})();

$ for i in `seq 20`; do before/js -D /tmp/test.js | grep '#4 -> #5'; done
BB #3 [00066,5,18] -> #4 -> #5 :: 872633 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 886288 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 891551 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 890249 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 898373 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 887998 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 888869 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 892613 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 896578 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 892053 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 916841 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 896787 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 890668 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 887440 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 898019 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 898912 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 892518 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 891104 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 898216 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 888739 hits

$ for i in `seq 20`; do after/js -D /tmp/test.js | grep '#4 -> #5'; done
BB #3 [00065,5,18] -> #4 -> #5 :: 869775 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 872655 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 873440 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 878849 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 876378 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 881312 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 872953 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 868985 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 871452 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 872265 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 884143 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 873623 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 875129 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 876356 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 875595 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 875652 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 878249 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 874992 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 871252 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 868012 hits

I am not quite sure why this is. Given that the benchmark is really small, it could be small things like this affect the results. Jan do you have any idea?

Flags: needinfo?(jdemooij)

(Obviously the difference about 65 vs 66 is that the "after" case does a JSOP_INC instead of JSOP_INT8(1) + JSOP_ADD, leading to a bytecode sequence that's shorter by 1)

This test appears to be strongly sensitive to multiprocessing topologies. E.g.:

$ for cores in 4 4-7; do
    echo -e "\n*** taskset -c $cores ***"
    taskset -c $cores bash -c 'for offthread in off on; do
        for which in before after; do
            ~/src/multitime/multitime -s 0 -n 100 -q $which/js --ion-offthread-compile=$offthread /tmp/test.js
        done
    done';
done

*** taskset -c 4 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.023+/-0.0005      0.002       0.022       0.022       0.036       
user        0.018+/-0.0009      0.004       0.008       0.018       0.024       
sys         0.005+/-0.0010      0.004       0.000       0.004       0.014       
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.023+/-0.0002      0.001       0.022       0.023       0.028       
user        0.018+/-0.0010      0.004       0.008       0.018       0.025       
sys         0.005+/-0.0010      0.004       0.000       0.004       0.015       
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.024+/-0.0002      0.001       0.024       0.024       0.028       
user        0.021+/-0.0009      0.003       0.008       0.020       0.026       
sys         0.003+/-0.0009      0.004       0.000       0.004       0.016       
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.025+/-0.0004      0.002       0.024       0.024       0.032       
user        0.023+/-0.0007      0.003       0.014       0.024       0.031       
sys         0.002+/-0.0007      0.003       0.000       0.000       0.010       

*** taskset -c 4-7 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.037+/-0.0006      0.003       0.031       0.038       0.042       
user        0.028+/-0.0013      0.005       0.016       0.029       0.038       
sys         0.011+/-0.0014      0.005       0.000       0.010       0.026       
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.038+/-0.0006      0.002       0.033       0.039       0.042       
user        0.028+/-0.0012      0.004       0.017       0.029       0.040       
sys         0.012+/-0.0012      0.005       0.000       0.012       0.025       
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.038+/-0.0006      0.002       0.033       0.038       0.041       
user        0.030+/-0.0014      0.005       0.014       0.030       0.044       
sys         0.011+/-0.0014      0.005       0.000       0.010       0.030       
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.038+/-0.0005      0.002       0.034       0.039       0.041       
user        0.031+/-0.0013      0.005       0.017       0.031       0.041       
sys         0.011+/-0.0013      0.005       0.000       0.012       0.026       

This is using https://tratt.net/laurie/src/multitime/ to time the runs. This is on a dual Xeon Silver 4114 machine (each socket with 10 cores, 2 threads per core -> 40 logical cores total).

Anyway, adding more cores slows down the test. I guess it's not Ion though (?), as we tested ofthread compile off and on and I got the same results; even --no-threads doesn't help. Weird test.

Oh yay microbenchmarking hell! So by default multitime will sleep for 3 seconds between individual runs. I got impatient and turned that off above. Those results do correspond to my manual tests using time and bash for loops. But if I enable the sleeping, I get very little difference between the multicore and single-core case, and no difference between before and after or offthread compile off and on. Truly perplexing.

$ for cores in 4 4-7; do     echo -e "\n*** taskset -c $cores ***";     taskset -c $cores bash -c 'for offthread in off on; do
        for which in before after; do
            ~/src/multitime/multitime -s 3 -n 10 -q $which/js --ion-offthread-compile=$offthread /tmp/test.js
        done
    done'; done

*** taskset -c 4 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.039+/-0.0007      0.001       0.038       0.039       0.040       
user        0.029+/-0.0049      0.005       0.022       0.030       0.037       
sys         0.009+/-0.0050      0.005       0.001       0.008       0.017       
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.039+/-0.0023      0.002       0.032       0.039       0.041       
user        0.027+/-0.0056      0.006       0.019       0.027       0.037       
sys         0.012+/-0.0046      0.005       0.003       0.013       0.020       
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.038+/-0.0024      0.002       0.032       0.039       0.040       
user        0.026+/-0.0054      0.005       0.016       0.027       0.035       
sys         0.012+/-0.0054      0.005       0.004       0.012       0.024       
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.039+/-0.0022      0.002       0.032       0.039       0.040       
user        0.025+/-0.0047      0.005       0.017       0.025       0.033       
sys         0.013+/-0.0040      0.004       0.007       0.014       0.020       

*** taskset -c 4-7 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.041+/-0.0015      0.002       0.038       0.041       0.044       
user        0.029+/-0.0060      0.006       0.024       0.026       0.038       
sys         0.014+/-0.0066      0.007       0.004       0.016       0.020       
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.042+/-0.0012      0.001       0.040       0.041       0.044       
user        0.030+/-0.0039      0.004       0.024       0.032       0.035       
sys         0.013+/-0.0044      0.004       0.008       0.013       0.020       
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.042+/-0.0014      0.001       0.038       0.042       0.043       
user        0.031+/-0.0049      0.005       0.025       0.030       0.042       
sys         0.014+/-0.0051      0.005       0.005       0.016       0.020       
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.043+/-0.0014      0.001       0.040       0.043       0.046       
user        0.029+/-0.0061      0.006       0.018       0.030       0.039       
sys         0.016+/-0.0061      0.006       0.007       0.016       0.028       

(In reply to Andy Wingo [:wingo] from comment #4)

$ cat /tmp/test.js
(function() {
var count = 1000000;
var holder;
var start = performance.now();
for (var i = 0; i < count; ++i)
holder = performance;
var end = performance.now();
print(end - start);
})();

What happens if you remove the function and put the code in the global scope? Then we access global properties instead of locals so it affects the code we generate.

Flags: needinfo?(jdemooij)

Tx for feedback. Here's the same code in global scope:

$ cat /tmp/test.js
var count = 1000000;
var holder;
var start = performance.now();
for (var i = 0; i < count; ++i)
    holder = performance;
var end = performance.now();
print(end - start);

$ for cores in 4 4-7; do     echo -e "\n*** taskset -c $cores ***";     taskset -c $cores bash -c 'for offthread in off on; do
        for which in before after; do
            ~/src/multitime/multitime -s 3 -n 10 -q $which/js --ion-offthread-compile=$offthread /tmp/test.js
        done
    done'; done

*** taskset -c 4 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.043+/-0.0032      0.003       0.037       0.042       0.051       
user        0.031+/-0.0058      0.006       0.024       0.030       0.041       
sys         0.012+/-0.0055      0.006       0.000       0.013       0.017       
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.042+/-0.0035      0.004       0.035       0.042       0.048       
user        0.030+/-0.0064      0.006       0.023       0.028       0.043       
sys         0.012+/-0.0056      0.006       0.004       0.010       0.020       
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.041+/-0.0016      0.002       0.038       0.041       0.044       
user        0.029+/-0.0057      0.006       0.020       0.032       0.036       
sys         0.011+/-0.0065      0.007       0.004       0.008       0.023       
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.041+/-0.0030      0.003       0.035       0.041       0.046       
user        0.033+/-0.0048      0.005       0.024       0.033       0.043       
sys         0.008+/-0.0039      0.004       0.000       0.008       0.013       

*** taskset -c 4-7 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.042+/-0.0022      0.002       0.037       0.042       0.045       
user        0.033+/-0.0045      0.005       0.025       0.034       0.040       
sys         0.010+/-0.0050      0.005       0.004       0.010       0.019       
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.042+/-0.0023      0.002       0.038       0.042       0.045       
user        0.032+/-0.0041      0.004       0.024       0.033       0.037       
sys         0.012+/-0.0040      0.004       0.006       0.011       0.021       
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.041+/-0.0021      0.002       0.038       0.041       0.044       
user        0.031+/-0.0040      0.004       0.025       0.032       0.037       
sys         0.013+/-0.0032      0.003       0.006       0.013       0.017       
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
            Mean                Std.Dev.    Min         Median      Max
real        0.043+/-0.0031      0.003       0.035       0.044       0.045       
user        0.036+/-0.0062      0.006       0.027       0.038       0.043       
sys         0.010+/-0.0050      0.005       0.000       0.009       0.017       

Now I guess I should really try in a worker...

I have been able to run these tests in "before" and "after" browsers via:

../mach raptor-test -t raptor-assorted-dom --gecko-profile

However these profiles do not appear to include worker processes/threads, so there's not much interesting to see.

I have run the tests manually via:

dist/bin/firefox --new-instance --profile tmp/profile-default testing/raptor/benchmarks/assorted-dom

on the "before" and "after" browsers, but the results do not appear to be much different to me. I mean, I am unable to determine if there is a difference. I have manually captured profiles via the perf-html.io add-on, but I have so far been unable to identify anything significant in them. Frustrating :/

I would be happy to look into the regression if I could reproduce it or understand it a bit better. I can't even reproduce the different block hit counts any more. Jan do you have any suggestions regarding next steps here?

Flags: needinfo?(jdemooij)

(In reply to Andy Wingo [:wingo] from comment #11)

I would be happy to look into the regression if I could reproduce it or understand it a bit better. I can't even reproduce the different block hit counts any more. Jan do you have any suggestions regarding next steps here?

If we can't reproduce any regressions locally it might be a fluke (the kind micro-benchmarks like this are sensitive to) and it's probably fine to close this bug. The main thing the benchmark is testing is name/property lookups on |window| and that's definitely unaffected by the changes here.

Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(jdemooij)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.