906119 - Enable incremental linking with Visual C++

Assignee

Description

•

11 years ago

So we do two stupid things which breaks incremental linking on Windows: one is a hack from the days of VC8 and the other is passing in the -RELEASE flag to the linker which disables incremental linking for a mostly useless checksum field apparently somewhere in the PE header. We need to stop doing both of these.

(not currently active) Ted Mielczarek

Comment 1

•

11 years ago

Note that the -RELEASE flag apparently made our PGO builds significantly faster (I don't know why), so we'll want to keep it on for those builds.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 2

•

11 years ago

This used to work a while ago.

Keywords: regression

(not currently active) Ted Mielczarek

Comment 3

•

11 years ago

We could either stick it behind bug 904979 to keep it on for all official builds, or flip it on solely for PGO builds somehow.

(no longer active)

Assignee

Comment 4

•

11 years ago

Wait. How does -RELEASE possibly make PGO builds faster? I don't believe you. :-)

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 5

•

11 years ago

I kind of don't either -- -LTCG implicitly disables -INCREMENTAL (from the docs: "/LTCG is not valid for use with /INCREMENTAL."), so it can't be that -RELEASE was disabling incremental linking which made PGO builds faster. Other than that, -RELEASE only does the checksum goop as far as I can find.

(not currently active) Ted Mielczarek

Comment 6

•

11 years ago

I was similarly skeptical, but we had a big PGO build time drop and this was the only thing in the range that was remotely plausible: http://people.mozilla.org/~catlee/sattap/2b3e3d61.png I didn't do the sleuthing on it because I didn't have time and it didn't seem super-important to track down.

(no longer active)

Assignee

Comment 7

•

11 years ago

Attached patch Patch (v1) (obsolete) (deleted) — Details — Splinter Review

Just to put my patch where my mouth is. ;-)

Assignee: nobody → ehsan

Status: NEW → ASSIGNED

Attachment #791440 - Flags: review?(ted)

(no longer active)

Assignee

Comment 8

•

11 years ago

Also, note that "-RELEASE" is only useful for device driver development, don't be mislead by its name. It should really be called "-ADDUSELESSCHECKSUM" or something!

(no longer active)

Assignee

Comment 9

•

11 years ago

(In reply to comment #6) > I was similarly skeptical, but we had a big PGO build time drop and this was > the only thing in the range that was remotely plausible: > http://people.mozilla.org/~catlee/sattap/2b3e3d61.png > > I didn't do the sleuthing on it because I didn't have time and it didn't seem > super-important to track down. Hmm, still that doesn't make a lot of sense to me... I guess we'll know for sure if we land this patch though.

(not currently active) Ted Mielczarek

Comment 10

•

11 years ago

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #8) > Also, note that "-RELEASE" is only useful for device driver development, > don't be mislead by its name. It should really be called > "-ADDUSELESSCHECKSUM" or something! I know. The only reason I added it was to get rid of the annoying warnings in WinDBG about missing checksums. If I had known it disabled incremental linking I would not have bothered. I'd like to see you push this to try and get some PGO builds to compare build times with and without before we land this. If it does actually have an impact then we should not regress that.

Mike Hommey [:glandium]

Comment 11

•

11 years ago

I'd rather see this "protected" behind a -z "$DEVELOPER_OPTIONS" test.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 12

•

11 years ago

Let's see what it does on try to PGO build times first. The fewer options the better, especially something like this that *shouldn't* have an impact. (The checksum thing is just a one-time warning in a debug session, right? The debug info loading is noisy as it is, I don't think it matters that much.)

Mike Hommey [:glandium]

Comment 13

•

11 years ago

I'm not sure incremental linking won't add more noise to the already noisy talos results.

(no longer active)

Assignee

Comment 14

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #13) > I'm not sure incremental linking won't add more noise to the already noisy > talos results. How would incrementally linking PGO builds even work?

(no longer active)

Assignee

Comment 15

•

11 years ago

Try push: https://tbpl.mozilla.org/?tree=Try&rev=195c4d142edf

(no longer active)

Assignee

Comment 16

•

11 years ago

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14) > (In reply to Mike Hommey [:glandium] from comment #13) > > I'm not sure incremental linking won't add more noise to the already noisy > > talos results. > > How would incrementally linking PGO builds even work? FWIW, <http://msdn.microsoft.com/en-us/library/xbf3tbeh.aspx> says that builds with /LTCG will not be incremental. Which makes perfect sense to me, since the linker doesn't just link in those builds, it also code-gens.

Mike Hommey [:glandium]

Comment 17

•

11 years ago

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14) > (In reply to Mike Hommey [:glandium] from comment #13) > > I'm not sure incremental linking won't add more noise to the already noisy > > talos results. > > How would incrementally linking PGO builds even work? We're running talos on non-PGO builds.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 18

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #17) > We're running talos on non-PGO builds. That's kind of a non-sequitur, but also interesting and sad! We're not running our only performance tests using the builds that we specifically optimize for performance?

Mike Hommey [:glandium]

Comment 19

•

11 years ago

(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #18) > (In reply to Mike Hommey [:glandium] from comment #17) > > We're running talos on non-PGO builds. > > That's kind of a non-sequitur, but also interesting and sad! We're not > running our only performance tests using the builds that we specifically > optimize for performance? No, because we don't do pgo builds on every push.

Mike Hommey [:glandium]

Comment 20

•

11 years ago

Well, we *do* run talos on PGO builds, but we're *also* running it on non-PGO builds. Without that tracking regressions would be much harder.

Phil Ringnalda (:philor)

Comment 21

•

11 years ago

Whether or not it makes sense (or more accurately, whether or not it is the result of a documented effect rather than an undocumented side-effect), -RELEASE dropped our runtime for debug browser-chrome on Windows from 120 minutes (which is the maximum runtime buildbot allows, so we were frequently turning it red from being too slow) to 100 minutes, so be careful you don't regress bug 864085 by getting rid of it.

Gregory Szorc [:gps]

Comment 22

•

11 years ago

I looked at the job execution time of Talos jobs on inbound for a single non-PGO build (https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=c21468186d3d): c d o p s tp x Total Linux32 9 25 21 12 28 15 110 Linux64 9 26 21 13 28 14 111 Win XP 10 20 13 29 16 88 Win 7 13 32 22 15 31 17 10 140 Win 8 12 30 22 14 30 17 125 Comparing build times (took longest build times from 3 recent m-c builds): Non-PGO PGO Difference Linux32 117 172 55 Linux64 112 175 63 Win XP N/A N/A Win 7 147 276 129 Win 8 N/A N/A So, if we performed PGO on *every* build and eliminated non-PGO builds and their corresponding Talos jobs, we'd use less overall machine time. Of course, if we got rid of non-PGO builds, we could also kill *all* non-PGO tests, which would free up yet more resources (277 minutes more for Linux32). There are downsides to killing non-PGO builds, of course. But considering it would be a net machine time win - and that is something RelEng is always making a big deal about - it's certainly worth thinking about.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 23

•

11 years ago

Latency of build results matters too.

Mike Hommey [:glandium]

Comment 24

•

11 years ago

Having to wait for 4 hours for builds to finish would certainly affect the inbound close times.

Mike Hommey [:glandium]

Comment 25

•

11 years ago

(In reply to Gregory Szorc [:gps] from comment #22) > So, if we performed PGO on *every* build and eliminated non-PGO builds and > their corresponding Talos jobs, we'd use less overall machine time. Of > course That's not true, either. PGO builds are always clobbers. Non-PGO builds aren't. Sure, we could make the PGO builds non-clobbers with some effort, but that won't make them magically as fast as Non-PGO non-clobber builds (the fact that relinking xul.dll alone takes 90 minutes with PGO, for instance, is certainly not going to help).

Gregory Szorc [:gps]

Comment 26

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #25) > (In reply to Gregory Szorc [:gps] from comment #22) > > So, if we performed PGO on *every* build and eliminated non-PGO builds and > > their corresponding Talos jobs, we'd use less overall machine time. Of > > course > > That's not true, either. PGO builds are always clobbers. Non-PGO builds > aren't. Sure, we could make the PGO builds non-clobbers with some effort, > but that won't make them magically as fast as Non-PGO non-clobber builds > (the fact that relinking xul.dll alone takes 90 minutes with PGO, for > instance, is certainly not going to help). How is it not true? Automation machine time for tests is much greater than automation time for builds. If we eliminate a whole class of builds, most of the savings come from eliminating the tests. That may change as test suites are parallelized. But for now it holds true.

(not currently active) Ted Mielczarek

Comment 27

•

11 years ago

(In reply to Gregory Szorc [:gps] from comment #22) > So, if we performed PGO on *every* build and eliminated non-PGO builds and > their corresponding Talos jobs, we'd use less overall machine time. Of > course, if we got rid of non-PGO builds, we could also kill *all* non-PGO > tests, which would free up yet more resources (277 minutes more for Linux32). > > There are downsides to killing non-PGO builds, of course. But considering it > would be a net machine time win - and that is something RelEng is always > making a big deal about - it's certainly worth thinking about. We explicitly made this tradeoff last year because of the latency of results. PGO builds are always going to take longer to finish, meaning that it's always going to take longer to get results out. We are using more machine time to do periodic PGO builds, but the tradeoff is that we get faster results for the non-PGO builds so we can more quickly assess test breakage. I don't think we want to revert that decision.

(not currently active) Ted Mielczarek

Comment 28

•

11 years ago

(In reply to Phil Ringnalda (:philor) from comment #21) > Whether or not it makes sense (or more accurately, whether or not it is the > result of a documented effect rather than an undocumented side-effect), > -RELEASE dropped our runtime for debug browser-chrome on Windows from 120 > minutes (which is the maximum runtime buildbot allows, so we were frequently > turning it red from being too slow) to 100 minutes, so be careful you don't > regress bug 864085 by getting rid of it. Oh, perhaps this was the thing I was misremembering in comment 1. It was debug test runtimes and not PGO build speed. Exciting either way! (That's right, it was my PGO-exclusion patch that made PGO builds faster.)

Ed Morley [:emorley]

Comment 29

•

11 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #27) > We are using more > machine time to do periodic PGO builds, but the tradeoff is that we get > faster results for the non-PGO builds so we can more quickly assess test > breakage. I don't think we want to revert that decision. Indeed.

(no longer active)

Assignee

Comment 30

•

11 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #28) > (In reply to Phil Ringnalda (:philor) from comment #21) > > Whether or not it makes sense (or more accurately, whether or not it is the > > result of a documented effect rather than an undocumented side-effect), > > -RELEASE dropped our runtime for debug browser-chrome on Windows from 120 > > minutes (which is the maximum runtime buildbot allows, so we were frequently > > turning it red from being too slow) to 100 minutes, so be careful you don't > > regress bug 864085 by getting rid of it. > > Oh, perhaps this was the thing I was misremembering in comment 1. It was > debug test runtimes and not PGO build speed. Exciting either way! (That's > right, it was my PGO-exclusion patch that made PGO builds faster.) OK, I'll do another try push for debug tests. FWIW, the build times were no different with this patch. Also, can everybody please take the parts of the conversation that are not related to this bug to a better forum, such as dev.platform? This bug is *only* about enabling incremental linking with Visual C++, as suggested by the summary, discussions of things such as dropping non-PGO builds do not belong here! Thanks. :-)

(no longer active)

Assignee

Comment 31

•

11 years ago

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #30) > OK, I'll do another try push for debug tests. FWIW, the build times were no > different with this patch. https://tbpl.mozilla.org/?tree=Try&rev=863d070a5909

(no longer active)

Assignee

Comment 32

•

11 years ago

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #31) > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #30) > > OK, I'll do another try push for debug tests. FWIW, the build times were no > > different with this patch. > > https://tbpl.mozilla.org/?tree=Try&rev=863d070a5909 o_O Comparing this to <https://tbpl.mozilla.org/?rev=764ac2c7cb0a>, we go from 88mins to 120mins running browser-chrome. This is ridiculous. I'll try to hide this behind $DEVELOPER_OPTIONS.

(no longer active)

Assignee

Comment 33

•

11 years ago

Attached patch Patch (v2) (deleted) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=60baca47112d

Attachment #791440 - Attachment is obsolete: true

Attachment #791440 - Flags: review?(ted)

Attachment #793140 - Flags: review?(ted)

Attachment #793140 - Flags: review?(mh+mozilla)

(not currently active) Ted Mielczarek

Comment 34

•

11 years ago

If you'd like to try to profile it that might be interesting. My best guess is the dbghelp-using NS_StackWalk implementation hits some pathological case without these checksums. (I didn't have the stomach for trying to figure it out at the time.)

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 35

•

11 years ago

I was going to write a script to parse out the times for each test in the two logs. But then I realized I really shouldn't do that, not right now. :) Someone else should though!

Mike Hommey [:glandium]

Comment 36

•

11 years ago

Comment on attachment 793140 [details] [diff] [review] Patch (v2) Review of attachment 793140 [details] [diff] [review]: ----------------------------------------------------------------- ::: configure.in @@ +2075,5 @@ > DYNAMIC_XPCOM_LIBS='$(LIBXUL_DIST)/lib/xul.lib $(LIBXUL_DIST)/lib/xpcom_core.lib $(LIBXUL_DIST)/lib/mozalloc.lib' > XPCOM_FROZEN_LDOPTS='$(LIBXUL_DIST)/lib/xul.lib $(LIBXUL_DIST)/lib/mozalloc.lib' > LIBXUL_LIBS='$(LIBXUL_DIST)/lib/xul.lib $(LIBXUL_DIST)/lib/mozalloc.lib' > MOZ_COMPONENT_NSPR_LIBS='$(NSPR_LIBS)' > + if test -n "$DEVELOPER_OPTIONS"; then Would be better to do that as LDFLAGS="$LDFLAGS -LARGEADDRESSAWARE -NXCOMPAT" if test -z "$DEVELOPER_OPTIONS"; then LDFLAGS="$LDFLAGS -RELEASE" fi

Attachment #793140 - Flags: review?(mh+mozilla) → review+

(no longer active)

Assignee

Comment 37

•

11 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #34) > If you'd like to try to profile it that might be interesting. My best guess > is the dbghelp-using NS_StackWalk implementation hits some pathological case > without these checksums. (I didn't have the stomach for trying to figure it > out at the time.) Hmm, yeah that's plausible, but I don't hope that there is much that we can do about that. At any rate, this patch gives us what _I_ was looking for (faster builds for developers), so I filed bug 907701 as a follow-up in case anybody is interested to do this for RelEng builds.

(no longer active)

Assignee

Updated

•

11 years ago

Attachment #793140 - Flags: review?(ted)

(no longer active)

Assignee

Comment 38

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e36e3bec59da

Ryan VanderMeulen [:RyanVM]

Comment 39

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/e36e3bec59da

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla26

Frank Wein [:mcsmurf]

Comment 40

•

11 years ago

Just for reference (for now): It's still possible to run into the ILK size limit of incremental linking (Bug 653662 disabled it back then) with VS2010 as the limit has only been increased to 384MB/768MB according to http://connect.microsoft.com/VisualStudio/feedback/details/283665/increase-linker-memory-to-avoid-linker-error-lnk1210-exceeded-internal-ilk-size-limit. Someone in #developers on IRC has run into this limit with a debug build/no opt with VS2010 for about a week now. But a new bug will be filed on this if necessary.

Alfred Peters

Comment 41

•

11 years ago

(In reply to Frank Wein [:mcsmurf] from comment #40) > limit of incremental linking (Bug 653662 disabled it back then) with VS2010 First: Could someone please set a reference to those old bugs (I found this bug through the rev.-history of Makefile.in <SIC!>) > Someone in #developers on IRC has run into this limit with a debug build/no > opt with VS2010 for about a week now. Confirmed! I'm just right now run into the Bug.

Frank Wein [:mcsmurf]

Comment 42

•

11 years ago

Alfred: Can you file a new bug for this via https://bugzilla.mozilla.org/enter_bug.cgi ? It looks like multiple people are affected by this. File it in Produce Core, Component "Build Config". Also include the info if you use a 32bit or a 64bit Windows.

Alfred Peters

Comment 43

•

11 years ago

(In reply to Frank Wein [:mcsmurf] from comment #42) > Alfred: Can you file a new bug for this via Done: bug 931371

Mark Banner (:standard8)

Updated

•

8 years ago

Blocks: 703845

BMO Automation

Updated

•

7 years ago

Product: Core → Firefox Build System

Patch (v1) 11 years ago (no longer active) (deleted), patch		Details \| Diff \| Splinter Review
Patch (v2) 11 years ago (no longer active) (deleted), patch	glandium : review+	Details \| Diff \| Splinter Review