Closed Bug 588150 Opened 14 years ago Closed 14 years ago

Upgrade Linux buildbots to ccache 3.0

Categories

(Release Engineering :: General, enhancement, P5)

All
Linux
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: justin.lebar+bug, Assigned: salbiz)

References

Details

(Whiteboard: [puppet][refimage])

Attachments

(2 files, 10 obsolete files)

(deleted), patch
bhearsum
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
ccache 3.0 is a pretty big improvement over previous versions.  Direct mode (plus CCACHE_BASEDIR, if necessary), has greatly sped up my builds.
Whiteboard: [puppet][refimage]
Severity: normal → enhancement
Priority: -- → P5
Assignee: nobody → salbiz
Ran a couple of experiments to establish the kinds of performance gains we
might see from this upgrade. 
Steps:
1)hg clone http://hg.mozilla.org/{mozilla-central|tracemonkey}
2)Build the ipc/chromium directory 20 times and measured the average time taken
to build with the current ccache (or in the case of macos, plain gcc), and then
built the same directory with ccache v3.0.1 taken from the ccache website
tarball. 

2a)Set env variable CCACHE_BASEDIR=(parent_of_build_objdir), prime ccache by
building mozilla-central, then compile tracemonkey

2b)Before all build runs, purge the disk cache, so the build runs off a cold
disk cache.

The results from these runs were:
On Darwin:
plain gcc: 95.15s, (Hot disk cache)ccachev3: 5.24, (Cold cache)ccachev3: 19.33s
tracemonkey(ccache primed on mozilla-central) 5.73s

On Linux:
ccache2.4: 14.15s, (Hot disk cache)ccachev3: 3.57, (Cold cache)ccachev3: 8.26s
tracemonkey(ccache primed on mozilla-central) 7.34s

Based on these rough results, we will probably see some nice gains all around
by upgrading to ccache 3
To make this change, the following things need to happen:

-Patch buildbotconfigs (see attached patch) to enable ccache on macs and set a couple of environment variables. In particular, since cache compression is no longer enabled by default, CCACHE_COMPRESS needs to be set.

-Patch puppet-manifests to deploy the new rpm/dmg ccache packages to buildslaves (already tested, patch forthcoming)

-Patch buildbotcustom to set environment variable CCACHE_BASEDIR to the base directory of the build so that ccache strips out absoulte paths containing platform/branch specific directory names from its hash.
Attachment #474047 - Flags: feedback?(catlee)
Attachment #474049 - Flags: feedback?(catlee)
Comment on attachment 474047 [details] [diff] [review]
[untested]patch_buildbot_configs to add ccache to mac, set CCACHE_COMPRESS

Looks good; I wouldn't worry about updating mozilla2, mozilla2-staging.
Attachment #474047 - Flags: feedback?(catlee) → feedback+
Comment on attachment 474049 [details] [diff] [review]
[untested]patch_buildbotcustom_set_CCACHE_BASEDIR

So you'll need to do something like

env = self.env.copy()
env['CCACHE_BASEDIR'] = ...
self.addStep(..., env=env)
Attachment #474049 - Flags: feedback?(catlee) → feedback-
Attached patch [tested]patch_puppet_manifests (obsolete) (deleted) — Splinter Review
Attachment #474051 - Flags: feedback?(catlee)
Comment on attachment 474051 [details] [diff] [review]
[tested]patch_puppet_manifests

I think we need something on mac to initialize the ccache size to a non-default value.
Attachment #474051 - Flags: feedback?(catlee) → feedback-
Attachment #474049 - Attachment is obsolete: true
Attachment #474055 - Flags: feedback?(catlee)
Attached patch [untested]patch_buildbot-configs (obsolete) (deleted) — Splinter Review
Based on our discussion, moved logic to set CCACHE_BASEDIR into config.py, and backed out changes in mozilla2 & mozilla2-staging
Attachment #474047 - Attachment is obsolete: true
Attachment #474055 - Attachment is obsolete: true
Attachment #474074 - Flags: feedback?(catlee)
Attachment #474055 - Flags: feedback?(catlee)
Comment on attachment 474074 [details] [diff] [review]
[untested]patch_buildbot-configs

You'll need to add 'from buildbot.process.properties import WithProperties' at the top there, but other than that looks good.
Attachment #474074 - Flags: feedback?(catlee) → feedback+
Attached patch [untested]patch_puppet_manifests (obsolete) (deleted) — Splinter Review
added an 'exec' stanza to os/osx.pp to ensure that macs initialize the cache to 2G (same as linux).
Attachment #474051 - Attachment is obsolete: true
Attachment #474083 - Flags: feedback?(catlee)
Please clean up any /home/ccache directories you've created for testing, if that's all done. moz2-linux-slave03 is down to 180MB free on /.
Done, df -h now shows 1.2G free on /.
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             8.3G  6.7G  1.2G  85% /
Attached patch [untested]patch_buildbot-configs (obsolete) (deleted) — Splinter Review
Attachment #474074 - Attachment is obsolete: true
Attachment #474692 - Flags: feedback?(catlee)
Comment on attachment 474083 [details] [diff] [review]
[untested]patch_puppet_manifests

Noticed a couple of things about how manifests are structured, that might be worth changing the patch to suit. 

Right now I've lumped ccache into the devtools class, where it probably shouldn't be, since it doesn't live in /tools. One place for it might be os/osx.pp, or I was thinking it might be better to follow the same pattern as linux and create a class like moz-dmgs.pp and put the package and any initializing there, keeping things a bit more consistent.
Attachment #474083 - Flags: feedback?(bhearsum)
Comment on attachment 474083 [details] [diff] [review]
[untested]patch_puppet_manifests

This looks mostly fine. You'll need to use ensure => latest in moz-rpms.pp, though.
Attachment #474083 - Flags: feedback?(bhearsum) → feedback+
Attached patch [untested]buildbot-configs_changes (obsolete) (deleted) — Splinter Review
moving CCACHE_BASEDIR declaration to buildbot-configs
Attachment #474692 - Attachment is obsolete: true
Attachment #475080 - Flags: feedback?(catlee)
Attachment #474692 - Flags: feedback?(catlee)
Attachment #475080 - Flags: feedback?(catlee) → feedback+
Comment on attachment 474083 [details] [diff] [review]
[untested]patch_puppet_manifests

Let's get the ccache package definition and init_ccache stuff close together for osx.
Attachment #474083 - Flags: feedback?(catlee)
Attached patch [tested]puppet_manifests_patch (deleted) — Splinter Review
After a *lot* of testing, I'm going to have to give up on making ccache fast for macs. NFS mounts even on 3.1 (with several fixes to improve NFS perf) are still unbearably slow, and the new direct cache hits from branch to branch are too few to barely make a dent in the overhead caused by hitting the disk for cache misses. I've uploaded the tested version of the patch that adds the new version to linux boxes, since the handful of direct hits does make a bit of difference there.

The only way I can see of improving the direct (fast) ccache hit rate would be to optimize the Makefiles and compilation process to use as few absolute paths/macros as possible, avoid switching out command-line arguments, etc, as these things all cause ccache to fall back to preprocessor (slow) mode.
Attachment #474083 - Attachment is obsolete: true
Attachment #481682 - Flags: review?(bhearsum)
Attached patch [tested]patch_buildbot-configs (obsolete) (deleted) — Splinter Review
Attachment #481683 - Flags: review?(bhearsum)
Comment on attachment 481682 [details] [diff] [review]
[tested]puppet_manifests_patch

lgtm
Attachment #481682 - Flags: review?(bhearsum) → review+
Comment on attachment 481683 [details] [diff] [review]
[tested]patch_buildbot-configs

Hmm, why isn't ccache used in non-debug mac builds?

Is ccache already installed on the Macs?

Don't use ccache in release builds, because we're paranoid like that.
Attachment #481683 - Flags: review?(bhearsum) → review-
(In reply to comment #22)
> Don't use ccache in release builds, because we're paranoid like that.

While I think it's totally sensible not to use ccache for nightly builds, for regular m-c or try release builds, ccache should be a huge win, no?
Oh, scratch that comment.  Release means release means release.  I see.
Attached patch set BASEDIR in PLATFORM_VARS [tested] (obsolete) (deleted) — Splinter Review
yeah, a mistake on my part, since the ccache-on-macs experiment didn't go so well, I should've pruned the mozconfig changes from that patch. This one just keeps the env changes that adds the CCACHE_BASEDIR variable to the environments.
Attachment #481683 - Attachment is obsolete: true
Attachment #481947 - Flags: review?
Attachment #481947 - Flags: review? → review?(bhearsum)
Comment on attachment 481947 [details] [diff] [review]
set BASEDIR in PLATFORM_VARS [tested]

Why are you dropping the enable_ccache around line 460?
Attached patch setting BASEDIR for ccache (deleted) — Splinter Review
oops, yeah. I must have thought we weren't doing ccache builds for debug for some reason. Removed offending line from patch.
Attachment #481947 - Attachment is obsolete: true
Attachment #482640 - Flags: review?(bhearsum)
Attachment #481947 - Flags: review?(bhearsum)
Attachment #482640 - Flags: review?(bhearsum) → review+
I was reading through the man page for the tip of ccache trunk, and I came across this (not sure if it's in the 3.0 or 3.1 man page):

SHARING A CACHE ON NFS
       It is possible to put the cache directory on an NFS filesystem (or similar filesystems), but keep in mind that:

       ·   Having the cache on NFS may slow down compilation. Make sure to do some benchmarking to see if it’s worth it.

       ·   ccache hasn’t been tested very thoroughly on NFS.

       A tip is to set CCACHE_TEMPDIR to a directory on the local host to avoid NFS traffic for temporary files.

Have we tried CCACHE_TEMPDIR?  Also, do we even know that sharing caches over NFS is the right thing to do?
When we used NFS mounts to distribute files for Puppet, they were pretty problematic, primarily because it's tricky to make it work across multiple colos. I think Syed did some testing on them, but I'll let him speak to that.
The goal with using a shared cache over NFS is so that slave B can make use of slave A's compilation results.

But yeah, doesn't look like NFS handles this particularly well.  I wonder if rsyncing ccache directories around is safe...
(In reply to comment #28)
> Have we tried CCACHE_TEMPDIR?  Also, do we even know that sharing caches over
> NFS is the right thing to do?

I did do some tests with a TEMPDIR, and it doesn't really seem to provide much benefit. We still have to hit the NFS mount for every cache miss, which is where that bottleneck really hurts.
No further developments, so I think this is ready to land, setting CCACHE_BASEDIR for all builds using ccache 3.x, which should net us a handful of direct ccache hits and increase throughput.
Flags: needs-reconfig?
Comment on attachment 482640 [details] [diff] [review]
setting BASEDIR for ccache

changeset:   3314:45560f8b0c15
Attachment #482640 - Flags: checked-in+
Flags: needs-reconfig? → needs-reconfig+
Attachment #475080 - Attachment is obsolete: true
This patch should break release l10n repacks. 
It uses readBranchConfig from http://hg.mozilla.org/build/tools/file/aac57c95e2ed/lib/python/release/info.py#l62, which will fail executing config.py:

from buildbot.process.properties import WithProperties

At least I've got a failure testing another patch using the same approach.
Blocks: 540598
Comment on attachment 481682 [details] [diff] [review]
[tested]puppet_manifests_patch

changeset:   244:0d7eb12a4d2b
Attachment #481682 - Flags: checked-in+
Slaves are picking this up OK and seem to be building fine too.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: