Closed Bug 663251 Opened 13 years ago Closed 13 years ago

Run the profiling step many times on Linux

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

Attachments

(2 files, 1 obsolete file)

Patch v1 13 years ago Justin Lebar (not reading bugmail) (deleted), patch	ted : review+	Details \| Diff \| Splinter Review
Buildbot-config changes (Linux 32/64 only) 13 years ago Justin Lebar (not reading bugmail) (deleted), patch	catlee : review+	Details \| Diff \| Splinter Review
Buildbot-config changes (Windows only) 13 years ago Justin Lebar (not reading bugmail) (deleted), patch		Details \| Diff \| Splinter Review

Justin Lebar (not reading bugmail)

Assignee

Description

•

13 years ago

Right now, our PGO script launches FF, visits some sites, and then closes the browser. It's become apparent in bug 653961 that repeating this process many times increases Dromaeo DOM scores. The short version is that we see the following ranges for Linux-32 Dromaeo DOM scores on MC, TM, and try: MC: 206-222 TM: 204-216 Try: 204-214 I don't know how to account for the fact that the min of TM and Try is 2 points lower than on MC. But my theory is that the difference in maximum scores is due to the fact that MC does depends builds and so can reuse existing profiling data when the changes are small. (We keep profiling data around between builds [bug 659941] and it's cumulative, at least on Linux.) In contrast, Try always clobbers. TM is tricky; it does depends builds like MC, but my hypothesis is that because it has lower volume of pushes than MC, it's unlikely that two consecutive TM builds will end up on the same builder. Thus, my guess is that most of its builds effectively clobber and throw away the profiling data. (GCC only uses profiling data from a .gcda file which corresponds to a .o file with the same number of basic blocks as the .o file which created the .gcda file.) Surely our releases are clobbers, so it seems likely that our releases' performance looks more like TM/Try's than MC's. The obvious solution is to run the profiling step multiple times. I did this on Try and got to builds whose scores (220 and 218) exceed all other builds on Try by a considerable margin. Although I can't know what will happen when we do this on MC, I suspect that running the profiling step many times will improve scores there. We'll want to land bug 659942 at the same time or before we land this change, because there's no use in keeping around old profiling data. I haven't done any tests to see what the effect of doing this is on Windows.

Justin Lebar (not reading bugmail)

Assignee

Comment 1

•

13 years ago

Here's the patch I ran on try: http://hg.mozilla.org/try/rev/242d2ee36919

(not currently active) Ted Mielczarek

Comment 2

•

13 years ago

I think the simplest way to do this would be to add a --number-of-runs parameter to profileserver.py, and add that arg to the PGO script in the mozconfigs http://mxr.mozilla.org/mozilla-central/source/build/pgo/profileserver.py (I don't want to force everyone to run 10 times if they don't want.)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 3

•

13 years ago

I agree with the mechanical changes suggested in comment 2. As long as this doesn't regress any of our other perf metrics, lets do it!

Justin Lebar (not reading bugmail)

Assignee

Comment 4

•

13 years ago

Sounds good to me!

Justin Lebar (not reading bugmail)

Assignee

Comment 5

•

13 years ago

(In reply to comment #2) > I think the simplest way to do this would be to add a --number-of-runs > parameter to profileserver.py, and add that arg to the PGO script in the > mozconfigs The python script currently exits with an error when you pass it an arg it doesn't understand. So if we were to do this, you wouldn't be able to build old versions of FF with the new mozconfig. I recall we worked around this once before with an hg hook, although I forget what that issue was. In any case, for this one, how about we add the argument but make it optional and default it to 10. Then if somebody is doing their own build, they can set it to 1 or whatever, but we don't have to modify our mozconfigs.

(not currently active) Ted Mielczarek

Comment 6

•

13 years ago

I don't want it to default to 10 because we also use this script for valgrind tests, and we certainly don't want to run those 10 times. Alternately, you can just make it take an integer value on the cmdline and use that, like "profileserver.py 10". That should be harmless to old versions.

Justin Lebar (not reading bugmail)

Assignee

Comment 7

•

13 years ago

Attached patch Patch v1 (deleted) — Details — Splinter Review

Justin Lebar (not reading bugmail)

Assignee

Comment 8

•

13 years ago

Attached patch Buildbot-config changes (Linux 32/64 only) (deleted) — Details — Splinter Review

Justin Lebar (not reading bugmail)

Assignee

Updated

•

13 years ago

Attachment #538530 - Attachment description: Buildbot-config changes → Buildbot-config changes (Linux 32/64 only)

Justin Lebar (not reading bugmail)

Assignee

Comment 9

•

13 years ago

Attached patch Buildbot-config changes (Windows only) (obsolete) (deleted) — Details — Splinter Review

It's unclear whether we want to take these Windows changes, but I thought I'd spin the patch while I'm here.

Justin Lebar (not reading bugmail)

Assignee

Comment 10

•

13 years ago

I pushed to try, so we'll see what happens with the Windows builds in a bit. http://tbpl.mozilla.org/?tree=Try&rev=3cf4b8988482 http://tbpl.mozilla.org/?tree=Try&rev=5abe5e61e75f

Justin Lebar (not reading bugmail)

Assignee

Comment 11

•

13 years ago

It doesn't seem to help on Windows, but it sure does on Linux.

(not currently active) Ted Mielczarek

Updated

•

13 years ago

Assignee: nobody → justin.lebar+bug

(not currently active) Ted Mielczarek

Comment 12

•

13 years ago

Comment on attachment 538525 [details] [diff] [review] Patch v1 Review of attachment 538525 [details] [diff] [review]: ----------------------------------------------------------------- Pre-emptive review, this looks fine. ::: build/pgo/profileserver.py @@ +71,5 @@ > > + if not os.getenv('OBJDIR'): > + parser.error('Please specify the OBJDIR environment variable.') > + > + if len(args) == 0: Python nit, you can just say "if not args:" here. @@ +75,5 @@ > + if len(args) == 0: > + num_runs = 1 > + else: > + try: > + num_runs = int(args[0]) Do you want to validate that num_runs > 0? (int("-1") doesn't raise)

Attachment #538525 - Flags: review+

Justin Lebar (not reading bugmail)

Assignee

Comment 13

•

13 years ago

Who can review the buildbot changes?

Justin Lebar (not reading bugmail)

Assignee

Updated

•

13 years ago

Summary: Run the profiling step many times on Linux (perhaps also on Windows) → Run the profiling step many times on Linux

Justin Lebar (not reading bugmail)

Assignee

Updated

•

13 years ago

Attachment #538530 - Flags: review?(catlee)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

13 years ago

Attachment #538535 - Attachment is obsolete: true

Chris AtLee [:catlee]

Updated

•

13 years ago

Attachment #538530 - Flags: review?(catlee) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 14

•

13 years ago

The changes to the python code were pushed to build-system [1] and m-c [2] Catlee, can we push the buildbot changes in a few days? I want to wait first to see that the Talos scores on m-c do what I expect with just the changes from bug 659942. [1] http://hg.mozilla.org/projects/build-system/rev/83b4a771ea45 [2] http://hg.mozilla.org/mozilla-central/rev/27f51c5d5f9b

Justin Lebar (not reading bugmail)

Assignee

Updated

•

13 years ago

Attachment #538525 - Flags: checkin+

Chris AtLee [:catlee]

Comment 15

•

13 years ago

(In reply to comment #14) > The changes to the python code were pushed to build-system [1] and m-c [2] > > Catlee, can we push the buildbot changes in a few days? I want to wait > first to see that the Talos scores on m-c do what I expect with just the > changes from bug 659942. > > [1] http://hg.mozilla.org/projects/build-system/rev/83b4a771ea45 > [2] http://hg.mozilla.org/mozilla-central/rev/27f51c5d5f9b Yeah, just let me know when.

Justin Lebar (not reading bugmail)

Assignee

Comment 16

•

13 years ago

I'm not convinced it's worth spending a lot more time to figure out why the builds perform differently on different branches if this bug will fix it. Catlee, can you please push the build-config changes when it's convenient for you?

Chris AtLee [:catlee]

Updated

•

13 years ago

Attachment #538530 - Flags: checkin+

Daniel C

Comment 17

•

13 years ago

Is there a risk of overtraining here? If this is a good idea, why doesn't the compiler just add more weight to its decisions by default?

Justin Lebar (not reading bugmail)

Assignee

Comment 18

•

13 years ago

I think what's happening -- and I definitely don't know for sure -- is that some branch is not being taken in some runs, so therefore GCC incorrectly thinks some path is cold and mis-optimizes. So my theory for why running the profiling step many times helps is that it's unlikely that this branch will not be taken 10 times in a row. I just looked at the graphs, and I don't see any change lately. Catlee, you marked the patch as checkin+ -- does that also mean it's been deployed?

Chris AtLee [:catlee]

Comment 19

•

13 years ago

It was deployed in http://hg.mozilla.org/build/buildbot-configs/rev/58247fce99b4. Maybe it needs a clobber to take effect?

Justin Lebar (not reading bugmail)

Assignee

Comment 20

•

13 years ago

> Maybe it needs a clobber to take effect? It shouldn't, but even if it did, haven't all the machines clobbered since Thursday? Hm...

Justin Lebar (not reading bugmail)

Assignee

Comment 21

•

13 years ago

It does not appear to be working properly. From [1]: Starting profiling run 2 of 10 args: ['/builds/slave/m-cen-lnx/build/obj-firefox/dist/firefox/firefox-bin', '-no-remote', '-profile', '/builds/slave/m-cen-lnx/build/obj-firefox/_profile/pgo/pgoprofile/', 'http://localhost:8888/index.html'] INFO | automation.py | Application pid: 17868 INFO | automation.py | Application ran for: 0:01:40.541590 INFO | automation.py | Reading PID log: /tmp/tmp69ivvUpidlog Starting profiling run 3 of 10 Whereas for profiling run 1 of 10, it lists a series of pages loaded. [1] http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1309878091.1309882684.1560.gz&fulltext=1

Justin Lebar (not reading bugmail)

Assignee

Comment 22

•

13 years ago

When I run it locally, it similarly doesn't output the list of URLs it visited, although it does appear to visit them.

Justin Lebar (not reading bugmail)

Assignee

Comment 23

•

13 years ago

Oh, I think it's reading from cache.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

13 years ago

Depends on: 669408

Justin Lebar (not reading bugmail)

Assignee

Comment 24

•

13 years ago

This appears to be working (modulo bug 669408), even if it's not having the expected effect on build speeds. Closing the bug.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

13 years ago

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Justin Lebar (not reading bugmail)

Assignee

Updated

•

13 years ago

Blocks: 672563

BMO Automation

Updated

•

7 years ago

Product: Core → Firefox Build System

You need to log in before you can comment on or make changes to this bug.