Closed Bug 1445147 Opened 7 years ago Closed 6 years ago

2.01 - 2.59% installer size (windows2012-32, windows2012-64) regression on push a7ab282e1d4a1aa1726017a05e04102c7adc9e33 (Tue Mar 13 2018)

Categories

(Firefox Build System :: General, defect, P2)

Unspecified
Windows
defect

Tracking

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox59 unaffected, firefox60 unaffected, firefox61+ fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 + fixed

People

(Reporter: igoldan, Assigned: froydnj)

References

Details

(Keywords: regression)

Attachments

(1 file)

We have detected a build metrics regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=a7ab282e1d4a1aa1726017a05e04102c7adc9e33

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  3%  installer size windows2012-64 pgo      61,624,094.50 -> 63,217,362.75
  2%  installer size windows2012-32 pgo      57,674,681.25 -> 58,833,725.25


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=12088

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Build_Metrics
Component: Untriaged → General
Product: Firefox → Firefox Build System
Version: 56 Branch → Trunk
:RyanVM Bug 1424281 increased the installer size by a big amount. Can we reduce them to where they were previously?
Flags: needinfo?(ryanvm)
I'll run this through the various Chromium code size tools today: https://www.chromium.org/developers/windows-binary-sizes
Attached file vs156growth.txt (deleted) β€”
I spot-checked a few of the most-grown functions and we seem to have more aggressive inlining (as expected).

This is particularly bad in OnMessageReceived generated code where we have a lot of repetition: https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/PFilePickerParent.cpp#112

VS 15.6 inlined all of those helpers where the previous version didn't. Multiply that by all the IPDL classes and that's a lot.

I recommend sticking a MOZ_NEVER_INLINE on anything that sounds like an error path: FatalError, SentinelReadError, etc. Perhaps we should also mark these conditions MOZ_UNLIKELY for the sake of gcc/clang but MSVC won't benefit.

Given the sheer number of these things we might also want to MOZ_NEVER_INLINE the non-error functions: ReadSentinel, IPDLParamTraits<P>::Read (which is the guts of ReadIPDLParam).
Something interesting to consider -- when inlining functions, in addition to the obvious cost of copy-pasted code, there is another, more subtle, cost: more inlining means more variables live in a given function's stack frame. When the code needs to access variables more than 0xFF bytes away from the base pointer, a four-byte instruction turns into a seven-byte instruction:

xul!ShortestPaths+0x106 [z:\build\build\src\js\src\builtin\testingfunctions.cpp @ 3610]:
 3610 00000001`830cde32 48895518        mov     qword ptr [rbp+18h],rdx
 3610 00000001`830cde36 488b542440      mov     rdx,qword ptr [rsp+40h]
 3610 00000001`830cde3b 4883c220        add     rdx,20h
 3610 00000001`830cde3f 48895508        mov     qword ptr [rbp+8],rdx
 3610 00000001`830cde43 488b0a          mov     rcx,qword ptr [rdx]
 3610 00000001`830cde46 48894d10        mov     qword ptr [rbp+10h],rcx
 3610 00000001`830cde4a 488d4d08        lea     rcx,[rbp+8]
 3610 00000001`830cde4e 48890a          mov     qword ptr [rdx],rcx
 3611 00000001`830cde51 4c8b4518        mov     r8,qword ptr [rbp+18h]
 3611 00000001`830cde55 498b4818        mov     rcx,qword ptr [r8+18h]
 3611 00000001`830cde59 448b71f4        mov     r14d,dword ptr [rcx-0Ch]
 
xul!ShortestPaths+0x11d [z:\build\build\src\js\src\builtin\testingfunctions.cpp @ 3610]:
 3610 00000001`834aa085 488995c8010000  mov     qword ptr [rbp+1C8h],rdx
 3610 00000001`834aa08c 488b542448      mov     rdx,qword ptr [rsp+48h]
 3610 00000001`834aa091 4883c220        add     rdx,20h
 3610 00000001`834aa095 488995b8010000  mov     qword ptr [rbp+1B8h],rdx
 3610 00000001`834aa09c 488b0a          mov     rcx,qword ptr [rdx]
 3610 00000001`834aa09f 48898dc0010000  mov     qword ptr [rbp+1C0h],rcx
 3610 00000001`834aa0a6 488d8db8010000  lea     rcx,[rbp+1B8h]
 3610 00000001`834aa0ad 48890a          mov     qword ptr [rdx],rcx
 3611 00000001`834aa0b0 4c8b85c8010000  mov     r8,qword ptr [rbp+1C8h]
 3611 00000001`834aa0b7 498b4818        mov     rcx,qword ptr [r8+18h]
 3611 00000001`834aa0bb 448b79f4        mov     r15d,dword ptr [rcx-0Ch]
Noticed many big perf improvements, but also another regression:

== Change summary for alert #12102 (as of Mon, 12 Mar 2018 21:27:50 GMT) ==

Regressions:

  4%  tp5n main_startup_fileio windows7-32 pgo e10s stylo     72,419,122.33 -> 75,131,795.50
  3%  tp5n main_startup_fileio windows7-32 opt e10s stylo     69,906,224.58 -> 72,267,734.92

Improvements:

 13%  tsvg_static windows7-32 opt e10s stylo     65.24 -> 56.43
  9%  a11yr windows7-32 pgo e10s stylo           417.36 -> 381.65
  8%  displaylist_mutate windows7-32 opt e10s stylo9,550.61 -> 8,751.33
  7%  tp5o_webext responsiveness windows10-64 pgo e10s stylo2.99 -> 2.78
  7%  stylebench windows7-32 opt e10s stylo      27.07 -> 28.92
  7%  a11yr windows7-32 opt e10s stylo           698.99 -> 652.84
  6%  displaylist_mutate windows10-64 opt e10s stylo7,122.12 -> 6,677.56
  6%  a11yr windows10-64 pgo e10s stylo          393.74 -> 369.54
  5%  tp5o windows7-32 opt e10s stylo            285.94 -> 271.01
  5%  stylebench windows7-32 pgo e10s stylo      40.86 -> 42.89
  4%  speedometer windows7-32 pgo e10s stylo     43.69 -> 45.30
  3%  speedometer windows7-32 opt e10s stylo     32.99 -> 34.03
  3%  tp5o_webext windows10-64 pgo e10s stylo    293.19 -> 285.52
  2%  speedometer windows10-64 pgo e10s stylo    41.48 -> 42.41

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12102
I guess it's plausible that the main_startup_fileio regression is related to the libxul size increase.
Flags: needinfo?(ryanvm)
Blocks: 1445214
No longer blocks: 1433041
Platform_microbench test also noticed perf regressions:

== Change summary for alert #12104 (as of Mon, 12 Mar 2018 22:52:56 GMT) ==

Regressions:

 52%  Strings PerfStripCharsWhitespace windows10-64 opt      320,864.17 -> 488,867.75
 50%  Strings PerfStripCharsWhitespace windows7-32 opt       323,906.00 -> 484,405.58
 49%  Strings PerfStripCharsCRLF windows10-64 opt            191,063.42 -> 285,057.75
 19%  Strings PerfStripCRLF windows7-32 opt                  86,098.42 -> 102,425.58
 19%  Strings PerfStripWhitespace windows7-32 opt            73,790.58 -> 87,737.33
  9%  Strings PerfStripCRLF windows10-64 opt                 84,739.04 -> 92,109.50

Improvements:

 10%  Stylo Gecko_nsCSSParser_ParseSheet_Bench windows7-32 opt      73,499.08 -> 66,322.25

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12104
(In reply to IonuΘ› Goldan [:igoldan], Performance Sheriffing from comment #5)
> Noticed many big perf improvements, but also another regression:

Those speedometer wins are pretty compelling.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #6)
> I guess it's plausible that the main_startup_fileio regression is related to
> the libxul size increase.

Yes, I agree.

Realistically, we're unlikely to downgrade the compiler, especially in the light of these perf wins. Some of the more-aggressive inlining is presumably helping us here, but I'm willing to bet that 90% is dead weight.

Given that we're looking at multiple megabytes of size regression here, and given that we routinely spend multiple engineer-days to fix codesize regressions that are twenty times smaller, I think we should definitely invest a fair amount of time to iterate on the before/after comparison and move things out of line.

David, are you planning to do this, or should we work with management to find someone?
Flags: needinfo?(dmajor)
I'm testing this change which ought to be the biggest bang-per-buck: https://hg.mozilla.org/try/rev/6cd1b351e18a4493d35e9fc5d7dbc907e3dcf6f3

I don't really want to sign up for more detailed subsequent work if that's not enough.
Flags: needinfo?(dmajor)
(In reply to IonuΘ› Goldan [:igoldan], Performance Sheriffing from comment #7)
> Platform_microbench test also noticed perf regressions:

Just a note that these are GTest-based performance tests, which means we won't get PGO numbers for them (GTests don't work with our Windows PGO builds), so it's kinda hard to say how severe this is :(.
Ok, ok, one more: https://hg.mozilla.org/try/rev/508d8700c861ac0bbc56186bfb80e87684218761
Here's my approach in case anyone wants to keep going even further

- Find the second "Increases in Total Size" header of attachment 8958620 [details]
- Look for repeating patterns (I saw ::OnMessageReceived, ::ToObjectInternal, and ...Binding::get...)
- In WinDbg open xul.dll from Ryan's m-c landing and the one before, side by side
- Dump a function: uf xul!mozilla::dom::HttpConnectionElement::ToObjectInternal (it takes ages)
- Scroll right so you can see line numbers like "compositioneventbinding.cpp @ 259"
- Scroll through the before and after builds at the same time, looking for places where the "after" stays on the same line for very many instructions
- Find the corresponding non-inlined `call` instruction in the "before" build.

It would be great if we had a tool that could automate this: for each function, report how many times it was inlined, and roughly how many instructions that cost us. It would probably need compiler cooperation.
Sounds good. NI David to report back on the results of the work in comment 9 once we have them. Once we have that, we can see what's left and decide how to proceed.
Flags: needinfo?(dmajor)
Ok. Sounds like there's potentially enough further improvements here that we should spend some more time here. Anthony, can you find someone to work on this? Ideally someone would spend at least a couple of days poking at the steps in comment 12 (as well as any other ideas they may come up with). If we hit a point of diminishing returns, we should check with Product to determine how much more time to sink.
Flags: needinfo?(ajones)
Can we report something on this to Microsoft?  I cannot believe that the inliner suddenly become stupid enough to inline a lot of those IPC error functions; they should never be hit in PGO profiling, so they ought to be frigidly cold code...
Ryan - the JS engine currently uses -O2. Can you try a build with -O1 -Oi to see if that reduces the amount of inlining?
Flags: needinfo?(ajones) → needinfo?(ryanvm)
Flags: needinfo?(ajones)
I pushed the below to Try:
https://hg.mozilla.org/try/rev/92152d085974cda5137fc4bc15583cab60aee4cc

win32
before: 59,167,406
after:  59,185,916
diff: +18,510 (?!)

win64
before: 63,530,164
after:  63,454,440
diff: -75,724
Flags: needinfo?(ryanvm)
Bobby, any thoughts on what to do next here?
Flags: needinfo?(bobbyholley)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
> Bobby, any thoughts on what to do next here?

My suggestion from comment 15 still holds.

Time is dwindling here - this is a large regression, and the backout options aren't good. IMO this is on Anthony's plate, and he needs to do one of the following:

(1) Find resources (either on his team or elsewhere) to continue looking at this.
(2) Get signoff from product to accept the regression without further investigation.

If I were Product, I would at least insist that we pursue the steps from comment 12 for a few more days, and pursue this with Microsoft via comment 16 to see if there might be some easy fix.
Flags: needinfo?(bobbyholley)
Anthony, we've got roughly two weeks until the first merge of Gecko 61 to Beta. Is there anyone able to look into this?
Flags: needinfo?(ajones)
Flags: needinfo?(ajones)
Priority: -- → P2
Severity: normal → critical
Can you find someone to look at this?
I am getting set up to poke at this.  I think we should at least land most of dmajor's ipc/-related changes, at least the bits on the error paths.  I'm not in a position to make a judgement on the js/ changes:

https://hg.mozilla.org/try/diff/508d8700c861/js/src/jsapi.cpp
Fun fact: ipc/glue/ProtocolUtils.cpp already declares a bunch of functions as MOZ_NEVER_INLINE, mostly the same functions dmajor made MOZ_NEVER_INLINE in https://hg.mozilla.org/try/rev/6cd1b351e18a4493d35e9fc5d7dbc907e3dcf6f3 .  Maybe MS botched their logic for inlining things?
(In reply to Bobby Holley (:bholley) from comment #20)
> Time is dwindling here - this is a large regression, and the backout options
> aren't good.

Why can't we back it out?
Flags: needinfo?(ajones)
Flags: needinfo?(ajones)
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #25)
> (In reply to Bobby Holley (:bholley) from comment #20)
> > Time is dwindling here - this is a large regression, and the backout options
> > aren't good.
> 
> Why can't we back it out?

That's a reasonable question: bug 1424281 comment 0 said that there are "interesting optimizations" for us, but reading through the release notes, it looks like the notable thing might be that the compiler is slightly faster, not that it produces better code?
Flags: needinfo?(ryanvm)
I'd say the bigger issue at this point is that we've been landing a number of patches over the last month which depend on the newer version of the compiler. Trying to backout all of those changes and/or rework patches sounds like a terrible idea a week before the first beta.

And comment 5 in this bug enumerates the performance wins we saw from this update.
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #27)
> I'd say the bigger issue at this point is that we've been landing a number
> of patches over the last month which depend on the newer version of the
> compiler. Trying to backout all of those changes and/or rework patches
> sounds like a terrible idea a week before the first beta.

Well, this is a compiler update from 2017 -> 2017, not 2015 -> 2017, right?  If this was the latter, yes, that's basically impossible now.  But the former shouldn't be too big of a problem, I think?

> And comment 5 in this bug enumerates the performance wins we saw from this
> update.

Those are pretty nice!
(In reply to Nathan Froyd [:froydnj] from comment #28)
> Well, this is a compiler update from 2017 -> 2017, not 2015 -> 2017, right? 
> If this was the latter, yes, that's basically impossible now.  But the
> former shouldn't be too big of a problem, I think?

Microsoft changed their release model in 2017. The more significant part is that we updated from version 15.4 to 15.6 which brought along a toolset change, which includes new C++ features.

See also:
https://docs.microsoft.com/en-us/visualstudio/productinfo/vs2017-release-rhythm
It's also worth noting that, for reasons that I do not understand, the installer size for x64 Windows pgo on central is down to ~60.2MB, so it's ~5% smaller than the numbers cited in comment 0 (!).  Likewise for x86.

I think it's still worth applying some of dmajor's ipc/-related patches, because those will get us closer to the size numbers on release--at least assuming they don't tank Talos performance...
Wait, I'm confused. Do you mean that they went up in comment 0, and then went down again at some later point? If so, what is that point?
Flags: needinfo?(nfroyd)
(In reply to Bobby Holley (:bholley) from comment #31)
> Wait, I'm confused. Do you mean that they went up in comment 0, and then
> went down again at some later point? If so, what is that point?

There's a big drop from bug 1435230, e.g. bug 1435230 comment 12, when we stopped building SkiaPDF for Windows:

https://treeherder.mozilla.org/perf.html#/graphs?timerange=5184000&series=mozilla-inbound,1457369,1,2&zoom=1520917605190.7139,1521216207585.17,60513779.417519145,63614008.60864218

It looks like the installer size has been slowly but steadily trending downwards after that point as well.  Bug 1455071 seemed to help (although the installer size doesn't stay permanently lowered), and so did bug 1454592 (likewise on permanent lowering).
Flags: needinfo?(nfroyd)
Depends on: 1456192
(In reply to Nathan Froyd [:froydnj] from comment #24)
> Fun fact: ipc/glue/ProtocolUtils.cpp already declares a bunch of functions
> as MOZ_NEVER_INLINE, mostly the same functions dmajor made MOZ_NEVER_INLINE
> in https://hg.mozilla.org/try/rev/6cd1b351e18a4493d35e9fc5d7dbc907e3dcf6f3 .
> Maybe MS botched their logic for inlining things?

I assume you mean that ProtocolUtils.**h** already sets MOZ_NEVER_INLINE; I can't find any in the cpp.

Offhand I can't find any source to back me up, and I could well be imagining this, but I thought __declspec(noinline) had to go on the function definition rather than the declaration.
(In reply to David Major [:dmajor] from comment #33)
> (In reply to Nathan Froyd [:froydnj] from comment #24)
> > Fun fact: ipc/glue/ProtocolUtils.cpp already declares a bunch of functions
> > as MOZ_NEVER_INLINE, mostly the same functions dmajor made MOZ_NEVER_INLINE
> > in https://hg.mozilla.org/try/rev/6cd1b351e18a4493d35e9fc5d7dbc907e3dcf6f3 .
> > Maybe MS botched their logic for inlining things?
> 
> I assume you mean that ProtocolUtils.**h** already sets MOZ_NEVER_INLINE; I
> can't find any in the cpp.

I do indeed.

> Offhand I can't find any source to back me up, and I could well be imagining
> this, but I thought __declspec(noinline) had to go on the function
> definition rather than the declaration.

We are of both minds of this in various places throughout the tree.

OTOH, I did a test push that tacked on MOZ_NEVER_INLINE in a bunch of places, similar to your original ipc/ patch:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&originalRevision=0e45c13b34e815cb42a9f08bb44142d1a81e186e&newProject=try&newRevision=5edc0cac0c5ca15f89bd44ef8963281b3e8a53c3&originalSignature=6fe30befd371a869a2ea65b68ce54c22e6f129a6&newSignature=6fe30befd371a869a2ea65b68ce54c22e6f129a6&framework=2

that got a reduction of ~500k in xul.dll size.

I then did a test push that tacked on MOZ_NEVER_INLINE *only* on Pickle::ReadSentinel's definition (should have tried it with the declaration, doh) with the patches above, and got:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&originalRevision=0e45c13b34e815cb42a9f08bb44142d1a81e186e&newProject=try&newRevision=6b3ad2c1fb42e129a9888b064b51931c6fca05d1&originalSignature=6fe30befd371a869a2ea65b68ce54c22e6f129a6&newSignature=6fe30befd371a869a2ea65b68ce54c22e6f129a6&framework=2

that is, a reduction of ~800k in xul.dll size.

I don't understand MSVC PGO. :(
That is, redundantly adding MOZ_NEVER_INLINE to the definitions of the IPC error functions seems to *increase* the size of xul.dll by 300k.  So it would seem that MOZ_NEVER_INLINE on the declaration is fine...?  Or maybe not inlining ReadSentinel also encourages MSVC to not inline the error functions, or something?
Per discussion with Nathan, we've clawed back the easy wins we're going to get from bug 1456192. Given that our current installer size on 61 is smaller than it was when this bug was first reported, I think this is as good as it's going to get without a lot more engineering time than it's worth. Calling this "fixed" as a result.
Assignee: nobody → nfroyd
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(ajones)
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: