1.08% raptor-assorted-dom-firefox (linux64) regression on push 560290f263515bb7c7bb834cde8f7d64d5bdde54 (Thu Jan 17 2019)
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
People
(Reporter: Bebe, Unassigned)
References
Details
(Keywords: perf, regression)
Raptor has detected a Firefox performance regression from push:
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
1% raptor-assorted-dom-firefox linux64 pgo 25.95 -> 26.23
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=18792
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.
To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/Performance_sheriffing/Raptor
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
Reporter | ||
Updated•6 years ago
|
Comment 1•6 years ago
|
||
Here's some data on subtests:
For worker-getname-performance-getter I think we just load this in a worker:
The "performance" getter is likely loop-hoisted or eliminated, so we probably have an empty loop where the increment operator is generating worse code somehow. Maybe we can try to repro in the shell with a similar script first.
Can one of you look at this?
Comment 2•6 years ago
|
||
i'll work on this today and see if i can reproduce the regression in the js shell. thanks for looking at the actual test script, starting with that should make it easier to diagnose the problem
Comment 3•6 years ago
|
||
I was unable to repro given the getname-performance-getter-helper.js and the shell; not sure why. The assembly generated is the exact same modulo constant values. Still working on this.
Comment 4•6 years ago
|
||
This one is a bit odd. As I said the assembly generated is the same, and they tier up after the same number of iterations with --ion-offthread-compile=off. However with off-thread Ion compilation, there is a difference in terms of block visit counts. To wit:
$ cat /tmp/test.js
(function() {
var count = 1000000;
var holder;
var start = performance.now();
for (var i = 0; i < count; ++i)
holder = performance;
var end = performance.now();
print(end - start);
})();
$ for i in `seq 20`; do before/js -D /tmp/test.js | grep '#4 -> #5'; done
BB #3 [00066,5,18] -> #4 -> #5 :: 872633 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 886288 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 891551 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 890249 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 898373 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 887998 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 888869 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 892613 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 896578 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 892053 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 916841 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 896787 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 890668 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 887440 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 898019 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 898912 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 892518 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 891104 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 898216 hits
BB #3 [00066,5,18] -> #4 -> #5 :: 888739 hits
$ for i in `seq 20`; do after/js -D /tmp/test.js | grep '#4 -> #5'; done
BB #3 [00065,5,18] -> #4 -> #5 :: 869775 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 872655 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 873440 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 878849 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 876378 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 881312 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 872953 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 868985 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 871452 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 872265 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 884143 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 873623 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 875129 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 876356 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 875595 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 875652 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 878249 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 874992 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 871252 hits
BB #3 [00065,5,18] -> #4 -> #5 :: 868012 hits
I am not quite sure why this is. Given that the benchmark is really small, it could be small things like this affect the results. Jan do you have any idea?
Comment 5•6 years ago
|
||
(Obviously the difference about 65 vs 66 is that the "after" case does a JSOP_INC instead of JSOP_INT8(1) + JSOP_ADD, leading to a bytecode sequence that's shorter by 1)
Comment 6•6 years ago
|
||
This test appears to be strongly sensitive to multiprocessing topologies. E.g.:
$ for cores in 4 4-7; do
echo -e "\n*** taskset -c $cores ***"
taskset -c $cores bash -c 'for offthread in off on; do
for which in before after; do
~/src/multitime/multitime -s 0 -n 100 -q $which/js --ion-offthread-compile=$offthread /tmp/test.js
done
done';
done
*** taskset -c 4 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.023+/-0.0005 0.002 0.022 0.022 0.036
user 0.018+/-0.0009 0.004 0.008 0.018 0.024
sys 0.005+/-0.0010 0.004 0.000 0.004 0.014
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.023+/-0.0002 0.001 0.022 0.023 0.028
user 0.018+/-0.0010 0.004 0.008 0.018 0.025
sys 0.005+/-0.0010 0.004 0.000 0.004 0.015
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.024+/-0.0002 0.001 0.024 0.024 0.028
user 0.021+/-0.0009 0.003 0.008 0.020 0.026
sys 0.003+/-0.0009 0.004 0.000 0.004 0.016
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.025+/-0.0004 0.002 0.024 0.024 0.032
user 0.023+/-0.0007 0.003 0.014 0.024 0.031
sys 0.002+/-0.0007 0.003 0.000 0.000 0.010
*** taskset -c 4-7 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.037+/-0.0006 0.003 0.031 0.038 0.042
user 0.028+/-0.0013 0.005 0.016 0.029 0.038
sys 0.011+/-0.0014 0.005 0.000 0.010 0.026
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.038+/-0.0006 0.002 0.033 0.039 0.042
user 0.028+/-0.0012 0.004 0.017 0.029 0.040
sys 0.012+/-0.0012 0.005 0.000 0.012 0.025
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.038+/-0.0006 0.002 0.033 0.038 0.041
user 0.030+/-0.0014 0.005 0.014 0.030 0.044
sys 0.011+/-0.0014 0.005 0.000 0.010 0.030
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.038+/-0.0005 0.002 0.034 0.039 0.041
user 0.031+/-0.0013 0.005 0.017 0.031 0.041
sys 0.011+/-0.0013 0.005 0.000 0.012 0.026
This is using https://tratt.net/laurie/src/multitime/ to time the runs. This is on a dual Xeon Silver 4114 machine (each socket with 10 cores, 2 threads per core -> 40 logical cores total).
Anyway, adding more cores slows down the test. I guess it's not Ion though (?), as we tested ofthread compile off and on and I got the same results; even --no-threads doesn't help. Weird test.
Comment 7•6 years ago
|
||
Oh yay microbenchmarking hell! So by default multitime will sleep for 3 seconds between individual runs. I got impatient and turned that off above. Those results do correspond to my manual tests using time
and bash for loops. But if I enable the sleeping, I get very little difference between the multicore and single-core case, and no difference between before and after or offthread compile off and on. Truly perplexing.
$ for cores in 4 4-7; do echo -e "\n*** taskset -c $cores ***"; taskset -c $cores bash -c 'for offthread in off on; do
for which in before after; do
~/src/multitime/multitime -s 3 -n 10 -q $which/js --ion-offthread-compile=$offthread /tmp/test.js
done
done'; done
*** taskset -c 4 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.039+/-0.0007 0.001 0.038 0.039 0.040
user 0.029+/-0.0049 0.005 0.022 0.030 0.037
sys 0.009+/-0.0050 0.005 0.001 0.008 0.017
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.039+/-0.0023 0.002 0.032 0.039 0.041
user 0.027+/-0.0056 0.006 0.019 0.027 0.037
sys 0.012+/-0.0046 0.005 0.003 0.013 0.020
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.038+/-0.0024 0.002 0.032 0.039 0.040
user 0.026+/-0.0054 0.005 0.016 0.027 0.035
sys 0.012+/-0.0054 0.005 0.004 0.012 0.024
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.039+/-0.0022 0.002 0.032 0.039 0.040
user 0.025+/-0.0047 0.005 0.017 0.025 0.033
sys 0.013+/-0.0040 0.004 0.007 0.014 0.020
*** taskset -c 4-7 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.041+/-0.0015 0.002 0.038 0.041 0.044
user 0.029+/-0.0060 0.006 0.024 0.026 0.038
sys 0.014+/-0.0066 0.007 0.004 0.016 0.020
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.042+/-0.0012 0.001 0.040 0.041 0.044
user 0.030+/-0.0039 0.004 0.024 0.032 0.035
sys 0.013+/-0.0044 0.004 0.008 0.013 0.020
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.042+/-0.0014 0.001 0.038 0.042 0.043
user 0.031+/-0.0049 0.005 0.025 0.030 0.042
sys 0.014+/-0.0051 0.005 0.005 0.016 0.020
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.043+/-0.0014 0.001 0.040 0.043 0.046
user 0.029+/-0.0061 0.006 0.018 0.030 0.039
sys 0.016+/-0.0061 0.006 0.007 0.016 0.028
Comment 8•6 years ago
|
||
(In reply to Andy Wingo [:wingo] from comment #4)
$ cat /tmp/test.js
(function() {
var count = 1000000;
var holder;
var start = performance.now();
for (var i = 0; i < count; ++i)
holder = performance;
var end = performance.now();
print(end - start);
})();
What happens if you remove the function and put the code in the global scope? Then we access global properties instead of locals so it affects the code we generate.
Comment 9•6 years ago
|
||
Tx for feedback. Here's the same code in global scope:
$ cat /tmp/test.js
var count = 1000000;
var holder;
var start = performance.now();
for (var i = 0; i < count; ++i)
holder = performance;
var end = performance.now();
print(end - start);
$ for cores in 4 4-7; do echo -e "\n*** taskset -c $cores ***"; taskset -c $cores bash -c 'for offthread in off on; do
for which in before after; do
~/src/multitime/multitime -s 3 -n 10 -q $which/js --ion-offthread-compile=$offthread /tmp/test.js
done
done'; done
*** taskset -c 4 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.043+/-0.0032 0.003 0.037 0.042 0.051
user 0.031+/-0.0058 0.006 0.024 0.030 0.041
sys 0.012+/-0.0055 0.006 0.000 0.013 0.017
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.042+/-0.0035 0.004 0.035 0.042 0.048
user 0.030+/-0.0064 0.006 0.023 0.028 0.043
sys 0.012+/-0.0056 0.006 0.004 0.010 0.020
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.041+/-0.0016 0.002 0.038 0.041 0.044
user 0.029+/-0.0057 0.006 0.020 0.032 0.036
sys 0.011+/-0.0065 0.007 0.004 0.008 0.023
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.041+/-0.0030 0.003 0.035 0.041 0.046
user 0.033+/-0.0048 0.005 0.024 0.033 0.043
sys 0.008+/-0.0039 0.004 0.000 0.008 0.013
*** taskset -c 4-7 ***
===> multitime results
1: -q before/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.042+/-0.0022 0.002 0.037 0.042 0.045
user 0.033+/-0.0045 0.005 0.025 0.034 0.040
sys 0.010+/-0.0050 0.005 0.004 0.010 0.019
===> multitime results
1: -q after/js --ion-offthread-compile=off /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.042+/-0.0023 0.002 0.038 0.042 0.045
user 0.032+/-0.0041 0.004 0.024 0.033 0.037
sys 0.012+/-0.0040 0.004 0.006 0.011 0.021
===> multitime results
1: -q before/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.041+/-0.0021 0.002 0.038 0.041 0.044
user 0.031+/-0.0040 0.004 0.025 0.032 0.037
sys 0.013+/-0.0032 0.003 0.006 0.013 0.017
===> multitime results
1: -q after/js --ion-offthread-compile=on /tmp/test.js
Mean Std.Dev. Min Median Max
real 0.043+/-0.0031 0.003 0.035 0.044 0.045
user 0.036+/-0.0062 0.006 0.027 0.038 0.043
sys 0.010+/-0.0050 0.005 0.000 0.009 0.017
Now I guess I should really try in a worker...
Comment 10•6 years ago
|
||
I have been able to run these tests in "before" and "after" browsers via:
../mach raptor-test -t raptor-assorted-dom --gecko-profile
However these profiles do not appear to include worker processes/threads, so there's not much interesting to see.
I have run the tests manually via:
dist/bin/firefox --new-instance --profile tmp/profile-default testing/raptor/benchmarks/assorted-dom
on the "before" and "after" browsers, but the results do not appear to be much different to me. I mean, I am unable to determine if there is a difference. I have manually captured profiles via the perf-html.io add-on, but I have so far been unable to identify anything significant in them. Frustrating :/
Comment 11•6 years ago
|
||
I would be happy to look into the regression if I could reproduce it or understand it a bit better. I can't even reproduce the different block hit counts any more. Jan do you have any suggestions regarding next steps here?
Comment 12•6 years ago
|
||
(In reply to Andy Wingo [:wingo] from comment #11)
I would be happy to look into the regression if I could reproduce it or understand it a bit better. I can't even reproduce the different block hit counts any more. Jan do you have any suggestions regarding next steps here?
If we can't reproduce any regressions locally it might be a fluke (the kind micro-benchmarks like this are sensitive to) and it's probably fine to close this bug. The main thing the benchmark is testing is name/property lookups on |window| and that's definitely unaffected by the changes here.
Description
•