Closed Bug 1451703 Opened 7 years ago Closed 6 years ago

12.53% Strings PerfIsUTF8Hundred (windows7-32) regression on push 37c026e3a6a58e1110bc19c5d4588460cc83fe25 (Fri Mar 30 2018)

Categories

(Firefox Build System :: General, defect)

Unspecified
Windows
defect
Not set
normal

Tracking

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox59 unaffected, firefox60 unaffected, firefox61 fixed, firefox62 fixed, firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 --- fixed
firefox62 --- fixed
firefox63 --- fixed

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression)

Attachments

(3 files)

We have detected a platform microbenchmarks regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=37c026e3a6a58e1110bc19c5d4588460cc83fe25 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 13% Strings PerfIsUTF8Hundred windows7-32 opt 2,863.42 -> 3,222.25 Improvements: 4% Strings PerfHasRTLCharsJA windows7-32 opt 356,832.92 -> 344,256.42 3% Stylo Servo_DeclarationBlock_GetPropertyById_Bench windows7-32 opt 263,700.00 -> 256,962.17 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=12443 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format. To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Platform_Microbenchmarks
Component: Untriaged → General
Product: Firefox → Firefox Build System
:chmanchester Either bug 1447116 or bug 1447137 caused the 13% regression. Rust builders updates usually have effects like these. I'm for marking this as WONTFIX, unless you have other opinions.
Flags: needinfo?(cmanchester)
This is a rather awful performance regression in the new rust. I'd say the relevant people should weigh in whether we want to take this hit or not, and if not, we shouldn't be afraid to back out. Henri, Bobby, what do you think about these regressions from upgrading the rust compiler?
Flags: needinfo?(hsivonen)
Flags: needinfo?(cmanchester)
Flags: needinfo?(bobbyholley)
Oh wait, the servo one is an improvement.
Flags: needinfo?(bobbyholley)
What happened with 32-bit Linux? It seems implausible that 32-bit Windows would regress 13% without 32-bit Linux showing a similar regression.
(In reply to Henri Sivonen (:hsivonen) from comment #4) > What happened with 32-bit Linux? It seems implausible that 32-bit Windows > would regress 13% without 32-bit Linux showing a similar regression. I can see a similar 64-bit Linux regression around the same time: == Change summary for alert #12444 (as of Fri, 30 Mar 2018 15:19:14 GMT) == Regressions: 10% Strings PerfIsUTF8Hundred linux64 opt 2,781.25 -> 3,058.42 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12444
I can't really pronounce myself for the 32-bit Linux perf results. Some data points are missing, others seem to regress earlier than the regression noticed in this bug, then they improve...
Sounds like the servo stuff is an improvement, and we were going to turn these tests off for windows anyway (bug 1436018).
Blocks: 1450078
I can't reproduce this using cargo bench in isolation, so I don't have actionable ideas. Maybe we just WONTFIX this, since we can't stop updating Rust if we don't have a concrete rustc bug to file.
Flags: needinfo?(hsivonen)
`perf stat` gives a clear difference in number of instructions when running `MOZ_RUN_GTEST=1 firefox -unittest --gtest_death_test_style=threadsafe --gtest_filter=Strings.PerfIsUTF8Hundred` with the build before and after the landing: ~ 585M instructions vs. ~ 600M. `perf annotate` /seems/ to point to differences between the code in encoding_rs::utf_8::utf8_valid_up_to in those builds, but being sample based, it's not really helping there. I wish I could get more detailed information with callgrind, but unfortunately running firefox under valgrind crashes.
I was able to gather callgrind data, and it confirms that encoding_rs::utf_8::utf8_valid_up_to is indeed compiled differently, and consumes more instructions. I'm not sure how to share this data, though. Henri, maybe you can reproduce locally? You need to download tc builds from before and after the landing, both target.tar.bz2 and target.gtest.tests.zip, extract them all, put the gtest/gtest_bin/gtest/libxul.so under firefox/gtest/, put gtest/dependentlibs.list.gtest in firefox/, and run `MOZ_RUN_GTEST=1 valgrind --tool=callgrind ../firefox/firefox -unittest --gtest_death_test_style=threadsafe --gtest_filter=Strings.PerfIsUTF8Hundred` from the gtest directory.
Flags: needinfo?(hsivonen)
Attached file 1.24.0 i686 Linux asm (deleted) —
Attaching asm for the Linux target, because trying to get the asm out of rustc on Windows fails with LLVM deeming the IR invalid.
1.24.0 does this in the innermost loops: Unaligned case: .LBB12_3: movdqu (%edx,%eax), %xmm0 pmovmskb %xmm0, %ebp testl %ebp, %ebp jne .LBB12_9 addl $16, %eax cmpl %ebx, %eax jbe .LBB12_3 jmp .LBB12_5 .p2align 4, 0x90 Aligned case: .LBB12_7: movdqa (%edx,%eax), %xmm0 pmovmskb %xmm0, %ebp testl %ebp, %ebp jne .LBB12_9 addl $16, %eax cmpl %ebx, %eax jbe .LBB12_7 .p2align 4, 0x90 1.25.0 moves an ALU instruction from the exit path to between the SSE2 load and pmovmskb: Unaligned case: .LBB16_5: leal 16(%esi), %eax cmpl 20(%esp), %eax ja .LBB16_20 movdqu (%ecx,%esi), %xmm0 movl %eax, %esi pmovmskb %xmm0, %edx testl %edx, %edx je .LBB16_5 .LBB16_7: testl %edx, %edx je .LBB16_12 bsfl %edx, %esi jmp .LBB16_13 Aligned case: .LBB16_16: cmpl 20(%esp), %esi ja .LBB16_22 movdqa (%ebx,%esi), %xmm0 addl $16, %esi pmovmskb %xmm0, %eax testl %eax, %eax je .LBB16_16 addl $-16, %esi movl %esi, %ecx testl %eax, %eax jne .LBB16_11 Additionally, both the aligned and unaligned load instructions appear a second time earlier outside the loop with 1.25.0, which at least superficially looks to me like the first trip through the loop is copied to outside the loop and mixed with a lot of ALU code.
Now the question is, do we go back to 1.24, or do we take the regression and hope it's eventually fixed in 1.26 or 1.27? (although, considering iirc 1.25 changed LLVM version, this could "just" be a consequence of that, which might make it harder to fix)
I'll note that you'll probably be asked for llvm-ir in the upstream bug, you might as well provide it ahead of time. It might also be worth trying to narrow the regression to a nightly date soon enough, because nightlies are only kept for 90 days and the culprit might be close to expire.
What microarchitectures do we run gtest on in CI?
Flags: needinfo?(mh+mozilla)
c4.2xlarge for windows and c3.xlarge for linux, whatever flavor of Xeons those EC2 instances are actually running on. Googling around suggests the former is haswell and the latter ivy bridge.
Flags: needinfo?(mh+mozilla)
Comment on attachment 8969204 [details] Bug 1451703 - Compile 32-bit code for the x86-64 generic CPU. https://reviewboard.mozilla.org/r/237930/#review243648 ::: build/moz.configure/toolchain.configure:1444 (Diff revision 1) > opts.append('debug-assertions=%s' % > ('yes' if debug_assertions else 'no')) > if debug_info is not None: > opts.append('debuginfo=%s' % debug_info) > > + opts.append('target-cpu=x86-64') O_o how can this be valid?
(In reply to Mike Hommey [:glandium] from comment #20) > Comment on attachment 8969204 [details] > Bug 1451703 - Compile 32-bit code for the x86-64 generic CPU. > > https://reviewboard.mozilla.org/r/237930/#review243648 > > ::: build/moz.configure/toolchain.configure:1444 > (Diff revision 1) > > opts.append('debug-assertions=%s' % > > ('yes' if debug_assertions else 'no')) > > if debug_info is not None: > > opts.append('debuginfo=%s' % debug_info) > > > > + opts.append('target-cpu=x86-64') > > O_o how can this be valid? It's not for ARM targets, but this is just for a try run. x86-64 is just some kind of combination of CPU features to represent a generic CPU, so compiling 32-bit code for it doesn't seem conceptually different from compiling 32-bit code for e.g. core2 or haswell.
We should probably either back out 1.25 or start requiring it before the soft code freeze for 61, which starts on April 26 (next Thursday).
hsivonen, this seems like a significant regression in this particular test, do you think it's worth backing out the rust update for its sake? Staying on a version of rustc that uses the older llvm without the regression doesn't seem realistic long term... does it seem like the investigation in https://github.com/rust-lang/rust/issues/49873 is likely to yield a fix?
Flags: needinfo?(hsivonen)
Rust 1.25 also seems to have broken stack traces (bug 1456150) from Rust code, FWIW.
(In reply to Nathan Froyd [:froydnj] from comment #24) > Rust 1.25 also seems to have broken stack traces (bug 1456150) from Rust > code, FWIW. Well, just for that we should back out.
(In reply to Chris Manchester (:chmanchester) from comment #23) > hsivonen, this seems like a significant regression in this particular test, > do you think it's worth backing out the rust update for its sake? > > Staying on a version of rustc that uses the older llvm without the > regression doesn't seem realistic long term... does it seem like the > investigation in https://github.com/rust-lang/rust/issues/49873 is likely to > yield a fix? Staying on 1.24.0 isn't realistic long-term, but skipping 1.25.0 might be worthwhile in the hope that https://github.com/rust-lang/rust/issues/49873 and bug 1456150 could get fixing in a short-ish timeframe. I think at this point, to make progress on https://github.com/rust-lang/rust/issues/49873 , it would be useful to get someone who works on LLVM to look at it. It would be really inefficient for me to try to locate a changeset between LLVM 4.0 and 6.0 without familiarity with LLVM source and optimization passes, so I think I shouldn't spend time trying.
Flags: needinfo?(hsivonen)
Fx61 is fixed by backout at this point, but we should probably keep this on the radar for whenever 62 gets a newer Rust version again.
Is is still an issue in 1.26 (released yesterday)? Should this still block upgrading from 1.24?
(In reply to Simon Sapin (:SimonSapin) from comment #28) > Is is still an issue in 1.26 (released yesterday)? Should this still block > upgrading from 1.24? According to https://github.com/rust-lang/rust/issues/49873#issuecomment-380438934 it would be. I confirmed on try the regression is still present with 1.26, at least on Linux 32 (Windows results are not back yet).
It seems this issue is being fixed by rust-lang/llvm#115. Are we going to wait until this fix to reach stable (i.e. waiting for 1.28, or 1.27 if backported to beta), or can we use a patched Rust in infra instead in the mean time?
FWIW, another issue is blocking us on < 1.27 at the moment: the fact that hooking the oom handler doesn't allow to get the allocation size anymore. I'm hoping to get a fix for that in 1.28. So the best we can really do for now is backporting the patch to 1.26 or wait.
Also note that building a patched rust compiler for windows might not be trivial.
As far as I can tell 62 stayed on 1.24, and 63 will get 1.28 in bug 1447116.
No longer blocks: 1445214
Blocks: 1470850
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: