Closed
Bug 906119
Opened 11 years ago
Closed 11 years ago
Enable incremental linking with Visual C++
Categories
(Firefox Build System :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla26
People
(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)
References
Details
(Keywords: regression)
Attachments
(1 file, 1 obsolete file)
(deleted),
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
So we do two stupid things which breaks incremental linking on Windows: one is a hack from the days of VC8 and the other is passing in the -RELEASE flag to the linker which disables incremental linking for a mostly useless checksum field apparently somewhere in the PE header. We need to stop doing both of these.
Comment 1•11 years ago
|
||
Note that the -RELEASE flag apparently made our PGO builds significantly faster (I don't know why), so we'll want to keep it on for those builds.
This used to work a while ago.
Keywords: regression
Comment 3•11 years ago
|
||
We could either stick it behind bug 904979 to keep it on for all official builds, or flip it on solely for PGO builds somehow.
Assignee | ||
Comment 4•11 years ago
|
||
Wait. How does -RELEASE possibly make PGO builds faster? I don't believe you. :-)
I kind of don't either -- -LTCG implicitly disables -INCREMENTAL (from the docs: "/LTCG is not valid for use with /INCREMENTAL."), so it can't be that -RELEASE was disabling incremental linking which made PGO builds faster. Other than that, -RELEASE only does the checksum goop as far as I can find.
Comment 6•11 years ago
|
||
I was similarly skeptical, but we had a big PGO build time drop and this was the only thing in the range that was remotely plausible:
http://people.mozilla.org/~catlee/sattap/2b3e3d61.png
I didn't do the sleuthing on it because I didn't have time and it didn't seem super-important to track down.
Assignee | ||
Comment 7•11 years ago
|
||
Just to put my patch where my mouth is. ;-)
Assignee | ||
Comment 8•11 years ago
|
||
Also, note that "-RELEASE" is only useful for device driver development, don't be mislead by its name. It should really be called "-ADDUSELESSCHECKSUM" or something!
Assignee | ||
Comment 9•11 years ago
|
||
(In reply to comment #6)
> I was similarly skeptical, but we had a big PGO build time drop and this was
> the only thing in the range that was remotely plausible:
> http://people.mozilla.org/~catlee/sattap/2b3e3d61.png
>
> I didn't do the sleuthing on it because I didn't have time and it didn't seem
> super-important to track down.
Hmm, still that doesn't make a lot of sense to me... I guess we'll know for sure if we land this patch though.
Comment 10•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #8)
> Also, note that "-RELEASE" is only useful for device driver development,
> don't be mislead by its name. It should really be called
> "-ADDUSELESSCHECKSUM" or something!
I know. The only reason I added it was to get rid of the annoying warnings in WinDBG about missing checksums. If I had known it disabled incremental linking I would not have bothered.
I'd like to see you push this to try and get some PGO builds to compare build times with and without before we land this. If it does actually have an impact then we should not regress that.
Comment 11•11 years ago
|
||
I'd rather see this "protected" behind a -z "$DEVELOPER_OPTIONS" test.
Let's see what it does on try to PGO build times first. The fewer options the better, especially something like this that *shouldn't* have an impact. (The checksum thing is just a one-time warning in a debug session, right? The debug info loading is noisy as it is, I don't think it matters that much.)
Comment 13•11 years ago
|
||
I'm not sure incremental linking won't add more noise to the already noisy talos results.
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #13)
> I'm not sure incremental linking won't add more noise to the already noisy
> talos results.
How would incrementally linking PGO builds even work?
Assignee | ||
Comment 15•11 years ago
|
||
Assignee | ||
Comment 16•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)
> (In reply to Mike Hommey [:glandium] from comment #13)
> > I'm not sure incremental linking won't add more noise to the already noisy
> > talos results.
>
> How would incrementally linking PGO builds even work?
FWIW, <http://msdn.microsoft.com/en-us/library/xbf3tbeh.aspx> says that builds with /LTCG will not be incremental. Which makes perfect sense to me, since the linker doesn't just link in those builds, it also code-gens.
Comment 17•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)
> (In reply to Mike Hommey [:glandium] from comment #13)
> > I'm not sure incremental linking won't add more noise to the already noisy
> > talos results.
>
> How would incrementally linking PGO builds even work?
We're running talos on non-PGO builds.
(In reply to Mike Hommey [:glandium] from comment #17)
> We're running talos on non-PGO builds.
That's kind of a non-sequitur, but also interesting and sad! We're not running our only performance tests using the builds that we specifically optimize for performance?
Comment 19•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #18)
> (In reply to Mike Hommey [:glandium] from comment #17)
> > We're running talos on non-PGO builds.
>
> That's kind of a non-sequitur, but also interesting and sad! We're not
> running our only performance tests using the builds that we specifically
> optimize for performance?
No, because we don't do pgo builds on every push.
Comment 20•11 years ago
|
||
Well, we *do* run talos on PGO builds, but we're *also* running it on non-PGO builds. Without that tracking regressions would be much harder.
Comment 21•11 years ago
|
||
Whether or not it makes sense (or more accurately, whether or not it is the result of a documented effect rather than an undocumented side-effect), -RELEASE dropped our runtime for debug browser-chrome on Windows from 120 minutes (which is the maximum runtime buildbot allows, so we were frequently turning it red from being too slow) to 100 minutes, so be careful you don't regress bug 864085 by getting rid of it.
Comment 22•11 years ago
|
||
I looked at the job execution time of Talos jobs on inbound for a single non-PGO build (https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=c21468186d3d):
c d o p s tp x Total
Linux32 9 25 21 12 28 15 110
Linux64 9 26 21 13 28 14 111
Win XP 10 20 13 29 16 88
Win 7 13 32 22 15 31 17 10 140
Win 8 12 30 22 14 30 17 125
Comparing build times (took longest build times from 3 recent m-c builds):
Non-PGO PGO Difference
Linux32 117 172 55
Linux64 112 175 63
Win XP N/A N/A
Win 7 147 276 129
Win 8 N/A N/A
So, if we performed PGO on *every* build and eliminated non-PGO builds and their corresponding Talos jobs, we'd use less overall machine time. Of course, if we got rid of non-PGO builds, we could also kill *all* non-PGO tests, which would free up yet more resources (277 minutes more for Linux32).
There are downsides to killing non-PGO builds, of course. But considering it would be a net machine time win - and that is something RelEng is always making a big deal about - it's certainly worth thinking about.
Latency of build results matters too.
Comment 24•11 years ago
|
||
Having to wait for 4 hours for builds to finish would certainly affect the inbound close times.
Comment 25•11 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #22)
> So, if we performed PGO on *every* build and eliminated non-PGO builds and
> their corresponding Talos jobs, we'd use less overall machine time. Of
> course
That's not true, either. PGO builds are always clobbers. Non-PGO builds aren't. Sure, we could make the PGO builds non-clobbers with some effort, but that won't make them magically as fast as Non-PGO non-clobber builds (the fact that relinking xul.dll alone takes 90 minutes with PGO, for instance, is certainly not going to help).
Comment 26•11 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #25)
> (In reply to Gregory Szorc [:gps] from comment #22)
> > So, if we performed PGO on *every* build and eliminated non-PGO builds and
> > their corresponding Talos jobs, we'd use less overall machine time. Of
> > course
>
> That's not true, either. PGO builds are always clobbers. Non-PGO builds
> aren't. Sure, we could make the PGO builds non-clobbers with some effort,
> but that won't make them magically as fast as Non-PGO non-clobber builds
> (the fact that relinking xul.dll alone takes 90 minutes with PGO, for
> instance, is certainly not going to help).
How is it not true? Automation machine time for tests is much greater than automation time for builds. If we eliminate a whole class of builds, most of the savings come from eliminating the tests. That may change as test suites are parallelized. But for now it holds true.
Comment 27•11 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #22)
> So, if we performed PGO on *every* build and eliminated non-PGO builds and
> their corresponding Talos jobs, we'd use less overall machine time. Of
> course, if we got rid of non-PGO builds, we could also kill *all* non-PGO
> tests, which would free up yet more resources (277 minutes more for Linux32).
>
> There are downsides to killing non-PGO builds, of course. But considering it
> would be a net machine time win - and that is something RelEng is always
> making a big deal about - it's certainly worth thinking about.
We explicitly made this tradeoff last year because of the latency of results. PGO builds are always going to take longer to finish, meaning that it's always going to take longer to get results out. We are using more machine time to do periodic PGO builds, but the tradeoff is that we get faster results for the non-PGO builds so we can more quickly assess test breakage. I don't think we want to revert that decision.
Comment 28•11 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #21)
> Whether or not it makes sense (or more accurately, whether or not it is the
> result of a documented effect rather than an undocumented side-effect),
> -RELEASE dropped our runtime for debug browser-chrome on Windows from 120
> minutes (which is the maximum runtime buildbot allows, so we were frequently
> turning it red from being too slow) to 100 minutes, so be careful you don't
> regress bug 864085 by getting rid of it.
Oh, perhaps this was the thing I was misremembering in comment 1. It was debug test runtimes and not PGO build speed. Exciting either way! (That's right, it was my PGO-exclusion patch that made PGO builds faster.)
Comment 29•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #27)
> We are using more
> machine time to do periodic PGO builds, but the tradeoff is that we get
> faster results for the non-PGO builds so we can more quickly assess test
> breakage. I don't think we want to revert that decision.
Indeed.
Assignee | ||
Comment 30•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #28)
> (In reply to Phil Ringnalda (:philor) from comment #21)
> > Whether or not it makes sense (or more accurately, whether or not it is the
> > result of a documented effect rather than an undocumented side-effect),
> > -RELEASE dropped our runtime for debug browser-chrome on Windows from 120
> > minutes (which is the maximum runtime buildbot allows, so we were frequently
> > turning it red from being too slow) to 100 minutes, so be careful you don't
> > regress bug 864085 by getting rid of it.
>
> Oh, perhaps this was the thing I was misremembering in comment 1. It was
> debug test runtimes and not PGO build speed. Exciting either way! (That's
> right, it was my PGO-exclusion patch that made PGO builds faster.)
OK, I'll do another try push for debug tests. FWIW, the build times were no different with this patch.
Also, can everybody please take the parts of the conversation that are not related to this bug to a better forum, such as dev.platform? This bug is *only* about enabling incremental linking with Visual C++, as suggested by the summary, discussions of things such as dropping non-PGO builds do not belong here! Thanks. :-)
Assignee | ||
Comment 31•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #30)
> OK, I'll do another try push for debug tests. FWIW, the build times were no
> different with this patch.
https://tbpl.mozilla.org/?tree=Try&rev=863d070a5909
Assignee | ||
Comment 32•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #31)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #30)
> > OK, I'll do another try push for debug tests. FWIW, the build times were no
> > different with this patch.
>
> https://tbpl.mozilla.org/?tree=Try&rev=863d070a5909
o_O
Comparing this to <https://tbpl.mozilla.org/?rev=764ac2c7cb0a>, we go from 88mins to 120mins running browser-chrome. This is ridiculous.
I'll try to hide this behind $DEVELOPER_OPTIONS.
Assignee | ||
Comment 33•11 years ago
|
||
Attachment #791440 -
Attachment is obsolete: true
Attachment #791440 -
Flags: review?(ted)
Attachment #793140 -
Flags: review?(ted)
Attachment #793140 -
Flags: review?(mh+mozilla)
Comment 34•11 years ago
|
||
If you'd like to try to profile it that might be interesting. My best guess is the dbghelp-using NS_StackWalk implementation hits some pathological case without these checksums. (I didn't have the stomach for trying to figure it out at the time.)
I was going to write a script to parse out the times for each test in the two logs. But then I realized I really shouldn't do that, not right now. :) Someone else should though!
Comment 36•11 years ago
|
||
Comment on attachment 793140 [details] [diff] [review]
Patch (v2)
Review of attachment 793140 [details] [diff] [review]:
-----------------------------------------------------------------
::: configure.in
@@ +2075,5 @@
> DYNAMIC_XPCOM_LIBS='$(LIBXUL_DIST)/lib/xul.lib $(LIBXUL_DIST)/lib/xpcom_core.lib $(LIBXUL_DIST)/lib/mozalloc.lib'
> XPCOM_FROZEN_LDOPTS='$(LIBXUL_DIST)/lib/xul.lib $(LIBXUL_DIST)/lib/mozalloc.lib'
> LIBXUL_LIBS='$(LIBXUL_DIST)/lib/xul.lib $(LIBXUL_DIST)/lib/mozalloc.lib'
> MOZ_COMPONENT_NSPR_LIBS='$(NSPR_LIBS)'
> + if test -n "$DEVELOPER_OPTIONS"; then
Would be better to do that as
LDFLAGS="$LDFLAGS -LARGEADDRESSAWARE -NXCOMPAT"
if test -z "$DEVELOPER_OPTIONS"; then
LDFLAGS="$LDFLAGS -RELEASE"
fi
Attachment #793140 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Comment 37•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #34)
> If you'd like to try to profile it that might be interesting. My best guess
> is the dbghelp-using NS_StackWalk implementation hits some pathological case
> without these checksums. (I didn't have the stomach for trying to figure it
> out at the time.)
Hmm, yeah that's plausible, but I don't hope that there is much that we can do about that. At any rate, this patch gives us what _I_ was looking for (faster builds for developers), so I filed bug 907701 as a follow-up in case anybody is interested to do this for RelEng builds.
Assignee | ||
Updated•11 years ago
|
Attachment #793140 -
Flags: review?(ted)
Assignee | ||
Comment 38•11 years ago
|
||
Comment 39•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Comment 40•11 years ago
|
||
Just for reference (for now): It's still possible to run into the ILK size limit of incremental linking (Bug 653662 disabled it back then) with VS2010 as the limit has only been increased to 384MB/768MB according to http://connect.microsoft.com/VisualStudio/feedback/details/283665/increase-linker-memory-to-avoid-linker-error-lnk1210-exceeded-internal-ilk-size-limit. Someone in #developers on IRC has run into this limit with a debug build/no opt with VS2010 for about a week now. But a new bug will be filed on this if necessary.
Comment 41•11 years ago
|
||
(In reply to Frank Wein [:mcsmurf] from comment #40)
> limit of incremental linking (Bug 653662 disabled it back then) with VS2010
First: Could someone please set a reference to those old bugs (I found this bug through the rev.-history of Makefile.in <SIC!>)
> Someone in #developers on IRC has run into this limit with a debug build/no
> opt with VS2010 for about a week now.
Confirmed! I'm just right now run into the Bug.
Comment 42•11 years ago
|
||
Alfred: Can you file a new bug for this via https://bugzilla.mozilla.org/enter_bug.cgi ? It looks like multiple people are affected by this. File it in Produce Core, Component "Build Config". Also include the info if you use a 32bit or a 64bit Windows.
Comment 43•11 years ago
|
||
(In reply to Frank Wein [:mcsmurf] from comment #42)
> Alfred: Can you file a new bug for this via
Done: bug 931371
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•