Closed
Bug 1451703
Opened 7 years ago
Closed 6 years ago
12.53% Strings PerfIsUTF8Hundred (windows7-32) regression on push 37c026e3a6a58e1110bc19c5d4588460cc83fe25 (Fri Mar 30 2018)
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox59 unaffected, firefox60 unaffected, firefox61 fixed, firefox62 fixed, firefox63 fixed)
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox59 | --- | unaffected |
firefox60 | --- | unaffected |
firefox61 | --- | fixed |
firefox62 | --- | fixed |
firefox63 | --- | fixed |
People
(Reporter: igoldan, Unassigned)
References
Details
(Keywords: perf, regression)
Attachments
(3 files)
We have detected a platform microbenchmarks regression from push:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=37c026e3a6a58e1110bc19c5d4588460cc83fe25
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
13% Strings PerfIsUTF8Hundred windows7-32 opt 2,863.42 -> 3,222.25
Improvements:
4% Strings PerfHasRTLCharsJA windows7-32 opt 356,832.92 -> 344,256.42
3% Stylo Servo_DeclarationBlock_GetPropertyById_Bench windows7-32 opt 263,700.00 -> 256,962.17
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=12443
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Platform_Microbenchmarks
Reporter | ||
Updated•7 years ago
|
Component: Untriaged → General
Product: Firefox → Firefox Build System
Reporter | ||
Comment 1•7 years ago
|
||
:chmanchester Either bug 1447116 or bug 1447137 caused the 13% regression. Rust builders updates usually have effects like these.
I'm for marking this as WONTFIX, unless you have other opinions.
Flags: needinfo?(cmanchester)
Comment 2•7 years ago
|
||
This is a rather awful performance regression in the new rust. I'd say the relevant people should weigh in whether we want to take this hit or not, and if not, we shouldn't be afraid to back out.
Henri, Bobby, what do you think about these regressions from upgrading the rust compiler?
Flags: needinfo?(hsivonen)
Flags: needinfo?(cmanchester)
Flags: needinfo?(bobbyholley)
What happened with 32-bit Linux? It seems implausible that 32-bit Windows would regress 13% without 32-bit Linux showing a similar regression.
Reporter | ||
Comment 5•7 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #4)
> What happened with 32-bit Linux? It seems implausible that 32-bit Windows
> would regress 13% without 32-bit Linux showing a similar regression.
I can see a similar 64-bit Linux regression around the same time:
== Change summary for alert #12444 (as of Fri, 30 Mar 2018 15:19:14 GMT) ==
Regressions:
10% Strings PerfIsUTF8Hundred linux64 opt 2,781.25 -> 3,058.42
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12444
Reporter | ||
Comment 6•7 years ago
|
||
I can't really pronounce myself for the 32-bit Linux perf results. Some data points are missing, others seem to regress earlier than the regression noticed in this bug, then they improve...
Updated•7 years ago
|
status-firefox59:
--- → unaffected
status-firefox60:
--- → unaffected
status-firefox61:
--- → affected
status-firefox-esr52:
--- → unaffected
tracking-firefox61:
--- → +
Comment 7•7 years ago
|
||
Sounds like the servo stuff is an improvement, and we were going to turn these tests off for windows anyway (bug 1436018).
I can't reproduce this using cargo bench in isolation, so I don't have actionable ideas. Maybe we just WONTFIX this, since we can't stop updating Rust if we don't have a concrete rustc bug to file.
Flags: needinfo?(hsivonen)
Comment 9•7 years ago
|
||
`perf stat` gives a clear difference in number of instructions when running `MOZ_RUN_GTEST=1 firefox -unittest --gtest_death_test_style=threadsafe --gtest_filter=Strings.PerfIsUTF8Hundred` with the build before and after the landing:
~ 585M instructions vs. ~ 600M.
`perf annotate` /seems/ to point to differences between the code in encoding_rs::utf_8::utf8_valid_up_to in those builds, but being sample based, it's not really helping there.
I wish I could get more detailed information with callgrind, but unfortunately running firefox under valgrind crashes.
Comment 10•7 years ago
|
||
I was able to gather callgrind data, and it confirms that encoding_rs::utf_8::utf8_valid_up_to is indeed compiled differently, and consumes more instructions. I'm not sure how to share this data, though. Henri, maybe you can reproduce locally? You need to download tc builds from before and after the landing, both target.tar.bz2 and target.gtest.tests.zip, extract them all, put the gtest/gtest_bin/gtest/libxul.so under firefox/gtest/, put gtest/dependentlibs.list.gtest in firefox/, and run `MOZ_RUN_GTEST=1 valgrind --tool=callgrind ../firefox/firefox -unittest --gtest_death_test_style=threadsafe --gtest_filter=Strings.PerfIsUTF8Hundred` from the gtest directory.
Flags: needinfo?(hsivonen)
Attaching asm for the Linux target, because trying to get the asm out of rustc on Windows fails with LLVM deeming the IR invalid.
1.24.0 does this in the innermost loops:
Unaligned case:
.LBB12_3:
movdqu (%edx,%eax), %xmm0
pmovmskb %xmm0, %ebp
testl %ebp, %ebp
jne .LBB12_9
addl $16, %eax
cmpl %ebx, %eax
jbe .LBB12_3
jmp .LBB12_5
.p2align 4, 0x90
Aligned case:
.LBB12_7:
movdqa (%edx,%eax), %xmm0
pmovmskb %xmm0, %ebp
testl %ebp, %ebp
jne .LBB12_9
addl $16, %eax
cmpl %ebx, %eax
jbe .LBB12_7
.p2align 4, 0x90
1.25.0 moves an ALU instruction from the exit path to between the SSE2 load and pmovmskb:
Unaligned case:
.LBB16_5:
leal 16(%esi), %eax
cmpl 20(%esp), %eax
ja .LBB16_20
movdqu (%ecx,%esi), %xmm0
movl %eax, %esi
pmovmskb %xmm0, %edx
testl %edx, %edx
je .LBB16_5
.LBB16_7:
testl %edx, %edx
je .LBB16_12
bsfl %edx, %esi
jmp .LBB16_13
Aligned case:
.LBB16_16:
cmpl 20(%esp), %esi
ja .LBB16_22
movdqa (%ebx,%esi), %xmm0
addl $16, %esi
pmovmskb %xmm0, %eax
testl %eax, %eax
je .LBB16_16
addl $-16, %esi
movl %esi, %ecx
testl %eax, %eax
jne .LBB16_11
Additionally, both the aligned and unaligned load instructions appear a second time earlier outside the loop with 1.25.0, which at least superficially looks to me like the first trip through the loop is copied to outside the loop and mixed with a lot of ALU code.
Flags: needinfo?(hsivonen)
See Also: → https://github.com/rust-lang/rust/issues/49873
Comment 15•7 years ago
|
||
Now the question is, do we go back to 1.24, or do we take the regression and hope it's eventually fixed in 1.26 or 1.27? (although, considering iirc 1.25 changed LLVM version, this could "just" be a consequence of that, which might make it harder to fix)
Comment 16•7 years ago
|
||
I'll note that you'll probably be asked for llvm-ir in the upstream bug, you might as well provide it ahead of time. It might also be worth trying to narrow the regression to a nightly date soon enough, because nightlies are only kept for 90 days and the culprit might be close to expire.
What microarchitectures do we run gtest on in CI?
Flags: needinfo?(mh+mozilla)
Comment 18•7 years ago
|
||
c4.2xlarge for windows and c3.xlarge for linux, whatever flavor of Xeons those EC2 instances are actually running on. Googling around suggests the former is haswell and the latter ivy bridge.
Flags: needinfo?(mh+mozilla)
Updated•7 years ago
|
status-firefox61:
affected → ---
tracking-firefox61:
+ → ---
Updated•7 years ago
|
status-firefox61:
--- → fix-optional
Comment hidden (mozreview-request) |
Comment 20•7 years ago
|
||
mozreview-review |
Comment on attachment 8969204 [details]
Bug 1451703 - Compile 32-bit code for the x86-64 generic CPU.
https://reviewboard.mozilla.org/r/237930/#review243648
::: build/moz.configure/toolchain.configure:1444
(Diff revision 1)
> opts.append('debug-assertions=%s' %
> ('yes' if debug_assertions else 'no'))
> if debug_info is not None:
> opts.append('debuginfo=%s' % debug_info)
>
> + opts.append('target-cpu=x86-64')
O_o how can this be valid?
(In reply to Mike Hommey [:glandium] from comment #20)
> Comment on attachment 8969204 [details]
> Bug 1451703 - Compile 32-bit code for the x86-64 generic CPU.
>
> https://reviewboard.mozilla.org/r/237930/#review243648
>
> ::: build/moz.configure/toolchain.configure:1444
> (Diff revision 1)
> > opts.append('debug-assertions=%s' %
> > ('yes' if debug_assertions else 'no'))
> > if debug_info is not None:
> > opts.append('debuginfo=%s' % debug_info)
> >
> > + opts.append('target-cpu=x86-64')
>
> O_o how can this be valid?
It's not for ARM targets, but this is just for a try run.
x86-64 is just some kind of combination of CPU features to represent a generic CPU, so compiling 32-bit code for it doesn't seem conceptually different from compiling 32-bit code for e.g. core2 or haswell.
Comment 22•7 years ago
|
||
We should probably either back out 1.25 or start requiring it before the soft code freeze for 61, which starts on April 26 (next Thursday).
Comment 23•6 years ago
|
||
hsivonen, this seems like a significant regression in this particular test, do you think it's worth backing out the rust update for its sake?
Staying on a version of rustc that uses the older llvm without the regression doesn't seem realistic long term... does it seem like the investigation in https://github.com/rust-lang/rust/issues/49873 is likely to yield a fix?
Flags: needinfo?(hsivonen)
Comment 24•6 years ago
|
||
Rust 1.25 also seems to have broken stack traces (bug 1456150) from Rust code, FWIW.
Comment 25•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #24)
> Rust 1.25 also seems to have broken stack traces (bug 1456150) from Rust
> code, FWIW.
Well, just for that we should back out.
(In reply to Chris Manchester (:chmanchester) from comment #23)
> hsivonen, this seems like a significant regression in this particular test,
> do you think it's worth backing out the rust update for its sake?
>
> Staying on a version of rustc that uses the older llvm without the
> regression doesn't seem realistic long term... does it seem like the
> investigation in https://github.com/rust-lang/rust/issues/49873 is likely to
> yield a fix?
Staying on 1.24.0 isn't realistic long-term, but skipping 1.25.0 might be worthwhile in the hope that https://github.com/rust-lang/rust/issues/49873 and bug 1456150 could get fixing in a short-ish timeframe.
I think at this point, to make progress on https://github.com/rust-lang/rust/issues/49873 , it would be useful to get someone who works on LLVM to look at it. It would be really inefficient for me to try to locate a changeset between LLVM 4.0 and 6.0 without familiarity with LLVM source and optimization passes, so I think I shouldn't spend time trying.
Flags: needinfo?(hsivonen)
See Also: → https://bugs.llvm.org/show_bug.cgi?id=37266
Comment 27•6 years ago
|
||
Fx61 is fixed by backout at this point, but we should probably keep this on the radar for whenever 62 gets a newer Rust version again.
status-firefox62:
--- → affected
status-firefox-esr60:
--- → unaffected
Comment 28•6 years ago
|
||
Is is still an issue in 1.26 (released yesterday)? Should this still block upgrading from 1.24?
Comment 29•6 years ago
|
||
(In reply to Simon Sapin (:SimonSapin) from comment #28)
> Is is still an issue in 1.26 (released yesterday)? Should this still block
> upgrading from 1.24?
According to https://github.com/rust-lang/rust/issues/49873#issuecomment-380438934 it would be. I confirmed on try the regression is still present with 1.26, at least on Linux 32 (Windows results are not back yet).
Comment 30•6 years ago
|
||
It seems this issue is being fixed by rust-lang/llvm#115. Are we going to wait until this fix to reach stable (i.e. waiting for 1.28, or 1.27 if backported to beta), or can we use a patched Rust in infra instead in the mean time?
Comment 31•6 years ago
|
||
FWIW, another issue is blocking us on < 1.27 at the moment: the fact that hooking the oom handler doesn't allow to get the allocation size anymore. I'm hoping to get a fix for that in 1.28. So the best we can really do for now is backporting the patch to 1.26 or wait.
Comment 32•6 years ago
|
||
Also note that building a patched rust compiler for windows might not be trivial.
Comment 33•6 years ago
|
||
As far as I can tell 62 stayed on 1.24, and 63 will get 1.28 in bug 1447116.
status-firefox63:
--- → affected
Comment 34•6 years ago
|
||
this was fixed by this commit range on april 24th:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=13863e08584ae95ac1262b517bf8ef06dd29dbdf&tochange=2087cea3bdf81170856fc773c2d9de684510e001
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Target Milestone: --- → mozilla63
You need to log in
before you can comment on or make changes to this bug.
Description
•