Closed Bug 1451703 Opened 7 years ago Closed 6 years ago

12.53% Strings PerfIsUTF8Hundred (windows7-32) regression on push 37c026e3a6a58e1110bc19c5d4588460cc83fe25 (Fri Mar 30 2018)

Tracking

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox59 unaffected, firefox60 unaffected, firefox61 fixed, firefox62 fixed, firefox63 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla63

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox-esr60

---

unaffected

firefox59

---

unaffected

firefox60

---

unaffected

firefox61

---

fixed

firefox62

---

fixed

firefox63

---

fixed

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression)

Attachments

(3 files)

1.24.0 i686 Linux asm 7 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), text/plain		Details
1.25.0 i686 Linux asm 7 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), text/plain		Details
Bug 1451703 - Compile 32-bit code for the x86-64 generic CPU. 7 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), text/x-review-board-request		Details

Ionuț Goldan [:igoldan]

Reporter

Description

•

7 years ago

We have detected a platform microbenchmarks regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=37c026e3a6a58e1110bc19c5d4588460cc83fe25 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 13% Strings PerfIsUTF8Hundred windows7-32 opt 2,863.42 -> 3,222.25 Improvements: 4% Strings PerfHasRTLCharsJA windows7-32 opt 356,832.92 -> 344,256.42 3% Stylo Servo_DeclarationBlock_GetPropertyById_Bench windows7-32 opt 263,700.00 -> 256,962.17 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=12443 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format. To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Platform_Microbenchmarks

Ionuț Goldan [:igoldan]

Reporter

Updated

•

7 years ago

Component: Untriaged → General

Product: Firefox → Firefox Build System

Ionuț Goldan [:igoldan]

Reporter

Comment 1

•

7 years ago

:chmanchester Either bug 1447116 or bug 1447137 caused the 13% regression. Rust builders updates usually have effects like these. I'm for marking this as WONTFIX, unless you have other opinions.

Flags: needinfo?(cmanchester)

Mike Hommey [:glandium]

Comment 2

•

7 years ago

This is a rather awful performance regression in the new rust. I'd say the relevant people should weigh in whether we want to take this hit or not, and if not, we shouldn't be afraid to back out. Henri, Bobby, what do you think about these regressions from upgrading the rust compiler?

Flags: needinfo?(hsivonen)

Flags: needinfo?(cmanchester)

Flags: needinfo?(bobbyholley)

Mike Hommey [:glandium]

Comment 3

•

7 years ago

Oh wait, the servo one is an improvement.

Flags: needinfo?(bobbyholley)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 4

•

7 years ago

What happened with 32-bit Linux? It seems implausible that 32-bit Windows would regress 13% without 32-bit Linux showing a similar regression.

Ionuț Goldan [:igoldan]

Reporter

Comment 5

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #4) > What happened with 32-bit Linux? It seems implausible that 32-bit Windows > would regress 13% without 32-bit Linux showing a similar regression. I can see a similar 64-bit Linux regression around the same time: == Change summary for alert #12444 (as of Fri, 30 Mar 2018 15:19:14 GMT) == Regressions: 10% Strings PerfIsUTF8Hundred linux64 opt 2,781.25 -> 3,058.42 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12444

Ionuț Goldan [:igoldan]

Reporter

Comment 6

•

7 years ago

I can't really pronounce myself for the 32-bit Linux perf results. Some data points are missing, others seem to regress earlier than the regression noticed in this bug, then they improve...

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox59: --- → unaffected

status-firefox60: --- → unaffected

status-firefox61: --- → affected

status-firefox-esr52: --- → unaffected

tracking-firefox61: --- → +

Bobby Holley (:bholley)

Comment 7

•

7 years ago

Sounds like the servo stuff is an improvement, and we were going to turn these tests off for windows anyway (bug 1436018).

Mike Hommey [:glandium]

Updated

•

7 years ago

Blocks: 1450078

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 8

•

7 years ago

I can't reproduce this using cargo bench in isolation, so I don't have actionable ideas. Maybe we just WONTFIX this, since we can't stop updating Rust if we don't have a concrete rustc bug to file.

Flags: needinfo?(hsivonen)

Mike Hommey [:glandium]

Comment 9

•

7 years ago

`perf stat` gives a clear difference in number of instructions when running `MOZ_RUN_GTEST=1 firefox -unittest --gtest_death_test_style=threadsafe --gtest_filter=Strings.PerfIsUTF8Hundred` with the build before and after the landing: ~ 585M instructions vs. ~ 600M. `perf annotate` /seems/ to point to differences between the code in encoding_rs::utf_8::utf8_valid_up_to in those builds, but being sample based, it's not really helping there. I wish I could get more detailed information with callgrind, but unfortunately running firefox under valgrind crashes.

Mike Hommey [:glandium]

Comment 10

•

7 years ago

I was able to gather callgrind data, and it confirms that encoding_rs::utf_8::utf8_valid_up_to is indeed compiled differently, and consumes more instructions. I'm not sure how to share this data, though. Henri, maybe you can reproduce locally? You need to download tc builds from before and after the landing, both target.tar.bz2 and target.gtest.tests.zip, extract them all, put the gtest/gtest_bin/gtest/libxul.so under firefox/gtest/, put gtest/dependentlibs.list.gtest in firefox/, and run `MOZ_RUN_GTEST=1 valgrind --tool=callgrind ../firefox/firefox -unittest --gtest_death_test_style=threadsafe --gtest_filter=Strings.PerfIsUTF8Hundred` from the gtest directory.

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 11

•

7 years ago

Attached file 1.24.0 i686 Linux asm (deleted) — Details

Attaching asm for the Linux target, because trying to get the asm out of rustc on Windows fails with LLVM deeming the IR invalid.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 12

•

7 years ago

Attached file 1.25.0 i686 Linux asm (deleted) — Details

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 13

•

7 years ago

1.24.0 does this in the innermost loops: Unaligned case: .LBB12_3: movdqu (%edx,%eax), %xmm0 pmovmskb %xmm0, %ebp testl %ebp, %ebp jne .LBB12_9 addl $16, %eax cmpl %ebx, %eax jbe .LBB12_3 jmp .LBB12_5 .p2align 4, 0x90 Aligned case: .LBB12_7: movdqa (%edx,%eax), %xmm0 pmovmskb %xmm0, %ebp testl %ebp, %ebp jne .LBB12_9 addl $16, %eax cmpl %ebx, %eax jbe .LBB12_7 .p2align 4, 0x90 1.25.0 moves an ALU instruction from the exit path to between the SSE2 load and pmovmskb: Unaligned case: .LBB16_5: leal 16(%esi), %eax cmpl 20(%esp), %eax ja .LBB16_20 movdqu (%ecx,%esi), %xmm0 movl %eax, %esi pmovmskb %xmm0, %edx testl %edx, %edx je .LBB16_5 .LBB16_7: testl %edx, %edx je .LBB16_12 bsfl %edx, %esi jmp .LBB16_13 Aligned case: .LBB16_16: cmpl 20(%esp), %esi ja .LBB16_22 movdqa (%ebx,%esi), %xmm0 addl $16, %esi pmovmskb %xmm0, %eax testl %eax, %eax je .LBB16_16 addl $-16, %esi movl %esi, %ecx testl %eax, %eax jne .LBB16_11 Additionally, both the aligned and unaligned load instructions appear a second time earlier outside the loop with 1.25.0, which at least superficially looks to me like the first trip through the loop is copied to outside the loop and mixed with a lot of ALU code.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 14

•

7 years ago

Filed https://github.com/rust-lang/rust/issues/49873

Flags: needinfo?(hsivonen)

See Also: → https://github.com/rust-lang/rust/issues/49873

Mike Hommey [:glandium]

Comment 15

•

7 years ago

Now the question is, do we go back to 1.24, or do we take the regression and hope it's eventually fixed in 1.26 or 1.27? (although, considering iirc 1.25 changed LLVM version, this could "just" be a consequence of that, which might make it harder to fix)

Mike Hommey [:glandium]

Comment 16

•

7 years ago

I'll note that you'll probably be asked for llvm-ir in the upstream bug, you might as well provide it ahead of time. It might also be worth trying to narrow the regression to a nightly date soon enough, because nightlies are only kept for 90 days and the culprit might be close to expire.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 17

•

7 years ago

What microarchitectures do we run gtest on in CI?

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 18

•

7 years ago

c4.2xlarge for windows and c3.xlarge for linux, whatever flavor of Xeons those EC2 instances are actually running on. Googling around suggests the former is haswell and the latter ivy bridge.

Flags: needinfo?(mh+mozilla)

Marion Daly [:mdaly]

Updated

•

7 years ago

status-firefox61: affected → ---

tracking-firefox61: + → ---

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox61: --- → fix-optional

Comment hidden (mozreview-request)

Mike Hommey [:glandium]

Comment 20

•

7 years ago

mozreview-review

Comment on attachment 8969204 [details] Bug 1451703 - Compile 32-bit code for the x86-64 generic CPU. https://reviewboard.mozilla.org/r/237930/#review243648 ::: build/moz.configure/toolchain.configure:1444 (Diff revision 1) > opts.append('debug-assertions=%s' % > ('yes' if debug_assertions else 'no')) > if debug_info is not None: > opts.append('debuginfo=%s' % debug_info) > > + opts.append('target-cpu=x86-64') O_o how can this be valid?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 21

•

7 years ago

(In reply to Mike Hommey [:glandium] from comment #20) > Comment on attachment 8969204 [details] > Bug 1451703 - Compile 32-bit code for the x86-64 generic CPU. > > https://reviewboard.mozilla.org/r/237930/#review243648 > > ::: build/moz.configure/toolchain.configure:1444 > (Diff revision 1) > > opts.append('debug-assertions=%s' % > > ('yes' if debug_assertions else 'no')) > > if debug_info is not None: > > opts.append('debuginfo=%s' % debug_info) > > > > + opts.append('target-cpu=x86-64') > > O_o how can this be valid? It's not for ARM targets, but this is just for a try run. x86-64 is just some kind of combination of CPU features to represent a generic CPU, so compiling 32-bit code for it doesn't seem conceptually different from compiling 32-bit code for e.g. core2 or haswell.

Chris Manchester (limited bugmail, email directly)

Comment 22

•

7 years ago

We should probably either back out 1.25 or start requiring it before the soft code freeze for 61, which starts on April 26 (next Thursday).

Chris Manchester (limited bugmail, email directly)

Comment 23

•

6 years ago

hsivonen, this seems like a significant regression in this particular test, do you think it's worth backing out the rust update for its sake? Staying on a version of rustc that uses the older llvm without the regression doesn't seem realistic long term... does it seem like the investigation in https://github.com/rust-lang/rust/issues/49873 is likely to yield a fix?

Flags: needinfo?(hsivonen)

Nathan Froyd [:froydnj]

Comment 24

•

6 years ago

Rust 1.25 also seems to have broken stack traces (bug 1456150) from Rust code, FWIW.

Mike Hommey [:glandium]

Comment 25

•

6 years ago

(In reply to Nathan Froyd [:froydnj] from comment #24) > Rust 1.25 also seems to have broken stack traces (bug 1456150) from Rust > code, FWIW. Well, just for that we should back out.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 26

•

6 years ago

(In reply to Chris Manchester (:chmanchester) from comment #23) > hsivonen, this seems like a significant regression in this particular test, > do you think it's worth backing out the rust update for its sake? > > Staying on a version of rustc that uses the older llvm without the > regression doesn't seem realistic long term... does it seem like the > investigation in https://github.com/rust-lang/rust/issues/49873 is likely to > yield a fix? Staying on 1.24.0 isn't realistic long-term, but skipping 1.25.0 might be worthwhile in the hope that https://github.com/rust-lang/rust/issues/49873 and bug 1456150 could get fixing in a short-ish timeframe. I think at this point, to make progress on https://github.com/rust-lang/rust/issues/49873 , it would be useful to get someone who works on LLVM to look at it. It would be really inefficient for me to try to locate a changeset between LLVM 4.0 and 6.0 without familiarity with LLVM source and optimization passes, so I think I shouldn't spend time trying.

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Updated

•

6 years ago

See Also: → https://bugs.llvm.org/show_bug.cgi?id=37266

Ryan VanderMeulen [:RyanVM]

Comment 27

•

6 years ago

Fx61 is fixed by backout at this point, but we should probably keep this on the radar for whenever 62 gets a newer Rust version again.

status-firefox61: fix-optional → fixed

status-firefox62: --- → affected

status-firefox-esr60: --- → unaffected

Simon Sapin (:SimonSapin)

Comment 28

•

6 years ago

Is is still an issue in 1.26 (released yesterday)? Should this still block upgrading from 1.24?

Chris Manchester (limited bugmail, email directly)

Comment 29

•

6 years ago

(In reply to Simon Sapin (:SimonSapin) from comment #28) > Is is still an issue in 1.26 (released yesterday)? Should this still block > upgrading from 1.24? According to https://github.com/rust-lang/rust/issues/49873#issuecomment-380438934 it would be. I confirmed on try the regression is still present with 1.26, at least on Linux 32 (Windows results are not back yet).

Xidorn Quan [:xidorn] UTC+11

Comment 30

•

6 years ago

It seems this issue is being fixed by rust-lang/llvm#115. Are we going to wait until this fix to reach stable (i.e. waiting for 1.28, or 1.27 if backported to beta), or can we use a patched Rust in infra instead in the mean time?

Mike Hommey [:glandium]

Comment 31

•

6 years ago

FWIW, another issue is blocking us on < 1.27 at the moment: the fact that hooking the oom handler doesn't allow to get the allocation size anymore. I'm hoping to get a fix for that in 1.28. So the best we can really do for now is backporting the patch to 1.26 or wait.

Mike Hommey [:glandium]

Comment 32

•

6 years ago

Also note that building a patched rust compiler for windows might not be trivial.

Julien Cristau [:jcristau]

Comment 33

•

6 years ago

As far as I can tell 62 stayed on 1.24, and 63 will get 1.28 in bug 1447116.

status-firefox62: affected → fixed

status-firefox63: --- → affected

Vicky Chin [:vchin]

Updated

•

6 years ago

No longer blocks: 1445214

Vicky Chin [:vchin]

Updated

•

6 years ago

Blocks: 1470850

Joel Maher ( :jmaher ) (UTC -8)

Comment 34

•

6 years ago

this was fixed by this commit range on april 24th: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=13863e08584ae95ac1262b517bf8ef06dd29dbdf&tochange=2087cea3bdf81170856fc773c2d9de684510e001

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox63: affected → fixed

Target Milestone: --- → mozilla63

You need to log in before you can comment on or make changes to this bug.