Closed
Bug 663251
Opened 13 years ago
Closed 13 years ago
Run the profiling step many times on Linux
Categories
(Firefox Build System :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)
References
Details
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
Right now, our PGO script launches FF, visits some sites, and then closes the browser.
It's become apparent in bug 653961 that repeating this process many times increases Dromaeo DOM scores.
The short version is that we see the following ranges for Linux-32 Dromaeo DOM scores on MC, TM, and try:
MC: 206-222
TM: 204-216
Try: 204-214
I don't know how to account for the fact that the min of TM and Try is 2 points lower than on MC. But my theory is that the difference in maximum scores is due to the fact that MC does depends builds and so can reuse existing profiling data when the changes are small. (We keep profiling data around between builds [bug 659941] and it's cumulative, at least on Linux.) In contrast, Try always clobbers.
TM is tricky; it does depends builds like MC, but my hypothesis is that because it has lower volume of pushes than MC, it's unlikely that two consecutive TM builds will end up on the same builder. Thus, my guess is that most of its builds effectively clobber and throw away the profiling data. (GCC only uses profiling data from a .gcda file which corresponds to a .o file with the same number of basic blocks as the .o file which created the .gcda file.)
Surely our releases are clobbers, so it seems likely that our releases' performance looks more like TM/Try's than MC's.
The obvious solution is to run the profiling step multiple times. I did this on Try and got to builds whose scores (220 and 218) exceed all other builds on Try by a considerable margin. Although I can't know what will happen when we do this on MC, I suspect that running the profiling step many times will improve scores there.
We'll want to land bug 659942 at the same time or before we land this change, because there's no use in keeping around old profiling data.
I haven't done any tests to see what the effect of doing this is on Windows.
Assignee | ||
Comment 1•13 years ago
|
||
Here's the patch I ran on try:
http://hg.mozilla.org/try/rev/242d2ee36919
Comment 2•13 years ago
|
||
I think the simplest way to do this would be to add a --number-of-runs parameter to profileserver.py, and add that arg to the PGO script in the mozconfigs
http://mxr.mozilla.org/mozilla-central/source/build/pgo/profileserver.py
(I don't want to force everyone to run 10 times if they don't want.)
I agree with the mechanical changes suggested in comment 2.
As long as this doesn't regress any of our other perf metrics, lets do it!
Assignee | ||
Comment 4•13 years ago
|
||
Sounds good to me!
Assignee | ||
Comment 5•13 years ago
|
||
(In reply to comment #2)
> I think the simplest way to do this would be to add a --number-of-runs
> parameter to profileserver.py, and add that arg to the PGO script in the
> mozconfigs
The python script currently exits with an error when you pass it an arg it doesn't understand. So if we were to do this, you wouldn't be able to build old versions of FF with the new mozconfig.
I recall we worked around this once before with an hg hook, although I forget what that issue was.
In any case, for this one, how about we add the argument but make it optional and default it to 10. Then if somebody is doing their own build, they can set it to 1 or whatever, but we don't have to modify our mozconfigs.
Comment 6•13 years ago
|
||
I don't want it to default to 10 because we also use this script for valgrind tests, and we certainly don't want to run those 10 times. Alternately, you can just make it take an integer value on the cmdline and use that, like "profileserver.py 10". That should be harmless to old versions.
Assignee | ||
Comment 7•13 years ago
|
||
Assignee | ||
Comment 8•13 years ago
|
||
Assignee | ||
Updated•13 years ago
|
Attachment #538530 -
Attachment description: Buildbot-config changes → Buildbot-config changes (Linux 32/64 only)
Assignee | ||
Comment 9•13 years ago
|
||
It's unclear whether we want to take these Windows changes, but I thought I'd spin the patch while I'm here.
Assignee | ||
Comment 10•13 years ago
|
||
I pushed to try, so we'll see what happens with the Windows builds in a bit.
http://tbpl.mozilla.org/?tree=Try&rev=3cf4b8988482
http://tbpl.mozilla.org/?tree=Try&rev=5abe5e61e75f
Assignee | ||
Comment 11•13 years ago
|
||
It doesn't seem to help on Windows, but it sure does on Linux.
Updated•13 years ago
|
Assignee: nobody → justin.lebar+bug
Comment 12•13 years ago
|
||
Comment on attachment 538525 [details] [diff] [review]
Patch v1
Review of attachment 538525 [details] [diff] [review]:
-----------------------------------------------------------------
Pre-emptive review, this looks fine.
::: build/pgo/profileserver.py
@@ +71,5 @@
>
> + if not os.getenv('OBJDIR'):
> + parser.error('Please specify the OBJDIR environment variable.')
> +
> + if len(args) == 0:
Python nit, you can just say "if not args:" here.
@@ +75,5 @@
> + if len(args) == 0:
> + num_runs = 1
> + else:
> + try:
> + num_runs = int(args[0])
Do you want to validate that num_runs > 0? (int("-1") doesn't raise)
Attachment #538525 -
Flags: review+
Assignee | ||
Comment 13•13 years ago
|
||
Who can review the buildbot changes?
Assignee | ||
Updated•13 years ago
|
Summary: Run the profiling step many times on Linux (perhaps also on Windows) → Run the profiling step many times on Linux
Assignee | ||
Updated•13 years ago
|
Attachment #538530 -
Flags: review?(catlee)
Assignee | ||
Updated•13 years ago
|
Attachment #538535 -
Attachment is obsolete: true
Updated•13 years ago
|
Attachment #538530 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 14•13 years ago
|
||
The changes to the python code were pushed to build-system [1] and m-c [2]
Catlee, can we push the buildbot changes in a few days? I want to wait first to see that the Talos scores on m-c do what I expect with just the changes from bug 659942.
[1] http://hg.mozilla.org/projects/build-system/rev/83b4a771ea45
[2] http://hg.mozilla.org/mozilla-central/rev/27f51c5d5f9b
Assignee | ||
Updated•13 years ago
|
Attachment #538525 -
Flags: checkin+
Comment 15•13 years ago
|
||
(In reply to comment #14)
> The changes to the python code were pushed to build-system [1] and m-c [2]
>
> Catlee, can we push the buildbot changes in a few days? I want to wait
> first to see that the Talos scores on m-c do what I expect with just the
> changes from bug 659942.
>
> [1] http://hg.mozilla.org/projects/build-system/rev/83b4a771ea45
> [2] http://hg.mozilla.org/mozilla-central/rev/27f51c5d5f9b
Yeah, just let me know when.
Assignee | ||
Comment 16•13 years ago
|
||
I'm not convinced it's worth spending a lot more time to figure out why the builds perform differently on different branches if this bug will fix it.
Catlee, can you please push the build-config changes when it's convenient for you?
Updated•13 years ago
|
Attachment #538530 -
Flags: checkin+
Comment 17•13 years ago
|
||
Is there a risk of overtraining here? If this is a good idea, why doesn't the compiler just add more weight to its decisions by default?
Assignee | ||
Comment 18•13 years ago
|
||
I think what's happening -- and I definitely don't know for sure -- is that some branch is not being taken in some runs, so therefore GCC incorrectly thinks some path is cold and mis-optimizes. So my theory for why running the profiling step many times helps is that it's unlikely that this branch will not be taken 10 times in a row.
I just looked at the graphs, and I don't see any change lately. Catlee, you marked the patch as checkin+ -- does that also mean it's been deployed?
Comment 19•13 years ago
|
||
It was deployed in http://hg.mozilla.org/build/buildbot-configs/rev/58247fce99b4.
Maybe it needs a clobber to take effect?
Assignee | ||
Comment 20•13 years ago
|
||
> Maybe it needs a clobber to take effect?
It shouldn't, but even if it did, haven't all the machines clobbered since Thursday?
Hm...
Assignee | ||
Comment 21•13 years ago
|
||
It does not appear to be working properly.
From [1]:
Starting profiling run 2 of 10
args: ['/builds/slave/m-cen-lnx/build/obj-firefox/dist/firefox/firefox-bin', '-no-remote', '-profile', '/builds/slave/m-cen-lnx/build/obj-firefox/_profile/pgo/pgoprofile/', 'http://localhost:8888/index.html']
INFO | automation.py | Application pid: 17868
INFO | automation.py | Application ran for: 0:01:40.541590
INFO | automation.py | Reading PID log: /tmp/tmp69ivvUpidlog
Starting profiling run 3 of 10
Whereas for profiling run 1 of 10, it lists a series of pages loaded.
[1] http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1309878091.1309882684.1560.gz&fulltext=1
Assignee | ||
Comment 22•13 years ago
|
||
When I run it locally, it similarly doesn't output the list of URLs it visited, although it does appear to visit them.
Assignee | ||
Comment 23•13 years ago
|
||
Oh, I think it's reading from cache.
Assignee | ||
Comment 24•13 years ago
|
||
This appears to be working (modulo bug 669408), even if it's not having the expected effect on build speeds. Closing the bug.
Assignee | ||
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•