Closed Bug 522719 Opened 15 years ago Closed 15 years ago

Bump up build space as required

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Unassigned)

References

Details

Attachments

(11 files, 2 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
catlee
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
lsblakk
: review+
lsblakk
: checked-in+
Details | Diff | Splinter Review
(deleted), application/octet-stream
nthomas
: checked-in+
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
mozilla
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
mozilla
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
bhearsum
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
nthomas
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
At least the following builds have run out of disk space:

Electrolysis linux depend build (asked for 5, got 5.06 GB):
http://tinderbox.mozilla.org/showlog.cgi?log=Electrolysis/1255679363.1255683377.32120.gz

du -ms on the slave shows 5199 MB used in that directory.

Mac OSX debug build (asked for 3, got 3.72 GB):
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1254853004.1254853805.26011.gz
For bonus points, it would be nice to:
- Detect when builds fail because they're out of disk, and flag that somehow.  It could be that a new change has a bug that causes too much disk space to be used, but more likely it's just regular growth in the code base.

- Monitor the peak disk usage of a build, and warn when we're getting close to the specified limit.  Note that 'du' is slow, but 'df' is fast...we could run 'df' at the beginning of the build, and periodically throughout to give an idea of how much space is being used...assuming nothing else is writing to the disk.
Electrolysis unit on linux - asked for 5G and got 5.02G, ran out of space doing make package.
http://tinderbox.mozilla.org/showlog.cgi?log=Electrolysis/1255837061.1255840559.4578.gz
I am looking at this right now.
braindump:  At the end of a build, we could have the client post to a server the amount of disk space required for the type of build and have that used instead of a hard coded value.  This would likely add a couple minutes to get the directory size for.

As an immediate mitigation, i will bump the build space requested.
Assignee: nobody → jford
(In reply to comment #4)
> braindump:  At the end of a build, we could have the client post to a server
> the amount of disk space required for the type of build and have that used
> instead of a hard coded value.  This would likely add a couple minutes to get
> the directory size for.

I'd imagine that our peak usage at some point in the build is sometimes greater than the total space in the directory tree at the end of the build, so we'd run into problems with this method.  Indeed, the initial build space settings were derived from examining the total disk space in completed build directories.
I am going to make this value set per platform+buildtype instead of branch+platform+buildtype in a patch coming soon
Attached patch build_space dictionary (obsolete) (deleted) — Splinter Review
This patch adds a dictionary that defines the build_space parameter on a platform-buildtype basis instead of the current branch-platform-buildtype basis.

I am currently testing this in staging
Attachment #407166 - Flags: checked-in?
A mozilla-central debug everythingelse build got exactly the 0.5G it asked for, then died unpacking the symbols. IIRC that size is hard coded on the factory instantiation in misc.py. Could we move it into the dict too ?
yep, that sounds like a good idea.  Can I do that as a follow up patch?
Comment on attachment 407166 [details] [diff] [review]
build_space dictionary

I have tested this and it works in staging
Attachment #407166 - Flags: review?(nthomas)
Comment on attachment 407166 [details] [diff] [review]
build_space dictionary

>diff -r e19a365c45df mozilla2-staging/config.py
>+# BUILDSPACE
>+# This dictionary is used for figuring out required disk space
>+# per platform and buildtype
>+build_space = {}
>+build_space['win32'] = {'plain': 8, 'debug': 6, 'unit': 5}
>+build_space['macosx'] = {'plain': 6, 'debug': 4, 'unit': 5}
>+build_space['linux'] = {'plain': 6, 'debug': 4, 'unit': 5}
>+build_space['wince'] = {'plain': 5}
>+build_space['unittest'] = 5

r- for these two
* We don't end up using the |'unit': 5| part anywhere so it can get removed from each platform
* To address the Electrolysis unit running out of space you should bump unittest from 5 to 6 
Also
* I'd leave the wince size at 4 in the absence of reported failures. Was anything else incremented like that ?
* I'd suggest setting |build_space['default'] = 5| and using that here ...
>-    'default_build_space': 5,
>+    'default_build_space': build_space['unittest'],

>-BRANCHES['mozilla-1.9.1']['platforms']['linux-debug']['build_space'] = 3
>-BRANCHES['mozilla-1.9.1']['platforms']['win32-debug']['build_space'] = 4
>-BRANCHES['mozilla-1.9.1']['platforms']['macosx-debug']['build_space'] = 3

I'm wondering if switching to the global defaults is sub-optimal here. m-1.9.1 will be growing/changing very slowly. win32-debug will jump from 4 to 6G and cause more builds to be free-space clobbers, making compile times longer across the board.

>-BRANCHES['tracemonkey']['platforms']['linux']['build_space'] = 5
>-BRANCHES['tracemonkey']['platforms']['linux64']['build_space'] = 5
>-BRANCHES['tracemonkey']['platforms']['win32']['build_space'] = 5

I could make a similar argument here, where we don't do PGO for Win32 TraceMonkey, Places or Electrolysis. This patch would ask for another 3G of space that won't be used.

bhearsum and catlee, would you chose easy configuration w/ some wasted space, or more complicated config + less free space clobbers ?
Attachment #407166 - Flags: review?(nthomas)
Attachment #407166 - Flags: review-
Attachment #407166 - Flags: checked-in?
(In reply to comment #12)
> bhearsum and catlee, would you chose easy configuration w/ some wasted space,
> or more complicated config + less free space clobbers ?

I'd choose complicated config + fewer free space clobbers.  It's more overhead on our part, but means more efficient slaves.
A nightly OSX build just failed:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1256205720.1256213706.23497.gz&fulltext=1

Asked for 5.2 GB, got 5.41GB.
we could have the 1.9.1 and 1.9.2 based builds have their own space requirements (i.e. build_space['win32']['debug-191']).  If we would prefer the status quo, that is fine by me and I will whip up the patch to bump up the required.
Another mac nightly failure:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1256292120.1256296647.1121.gz
I wonder what would happen if we really over-specify the amount of space required for builds?  E.g. set it to 10 GB.

I _think_ you'd slightly increase the number of clobber builds, but not by much.  In effect you'd ensure that the build has the 5 or 6 GB free that it needs, plus an extra 4 GB of working space.
I think that over-specifying is good, but are we going to do this on old branches as well?
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1256311202.1256314247.16241.gz
Linux mozilla-central leak test build on 2009/10/23 08:20:02
Attached patch build_space dictionary v2 (deleted) — Splinter Review
Unbitrot, fix issues.

I am going to create a patch to bump the numbers without a dictionary as we need something that can get checked in now if we don't do the dictionary.
Attachment #407166 - Attachment is obsolete: true
Attachment #408035 - Flags: review?(nthomas)
Attached patch bump-buildspace non-dictionary (deleted) — Splinter Review
Here is something to stop the burning.
Attachment #408037 - Flags: review?(catlee)
Comment on attachment 408037 [details] [diff] [review]
bump-buildspace non-dictionary

good enough for now.
Attachment #408037 - Flags: review?(catlee) → review+
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1256310827.1256315010.25915.gz
OS X 10.5.2 mozilla-central leak test build on 2009/10/23 08:13:47
Comment on attachment 408037 [details] [diff] [review]
bump-buildspace non-dictionary

I can do the reconfig if somebody else can check it.
Attachment #408037 - Flags: checked-in?
Comment on attachment 408037 [details] [diff] [review]
bump-buildspace non-dictionary

Here you go,
http://hg.mozilla.org/build/buildbot-configs/rev/e5e7a1c08f9f
Attachment #408037 - Flags: checked-in? → checked-in+
this happened again, two hours after after the failure in comment 24:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1256317656.1256320410.24885.gz
Linux mozilla-central leak test build on 2009/10/23 10:07:36
...and again, around 3 hours after the that:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1256329760.1256332439.678.gz
Linux mozilla-central leak test build on 2009/10/23 13:29:20
Attached patch bump up debug builds as well (deleted) — Splinter Review
Linux debug build failed when it got 4.4GB and asked for 3gb.
Attachment #408136 - Flags: review?
Attachment #408136 - Flags: checked-in?
Attachment #408136 - Flags: review? → review?(lsblakk)
Attachment #408136 - Flags: review?(lsblakk) → review+
Comment on attachment 408136 [details] [diff] [review]
bump up debug builds as well

http://hg.mozilla.org/build/buildbot-configs/rev/cf48442b58fb
Attachment #408136 - Flags: checked-in? → checked-in+
Attached file bump up debugs even more (deleted) —
http://hg.mozilla.org/build/buildbot-configs/rev/83a8d1840448

We hit another out-of-disk for a m-c linux debug. I checked a successful build and it was using about 5400M (after the fact), so I bumped the disk space requirement from 5G to 6G. This is probably fallout from switching to gstabs+ symbols in bug 519196.

Masters reconfig'd.
Attachment #408262 - Flags: checked-in+
We should also sync these changes back to staging.
Builds like "WINNT 5.2 mozilla-central test debug mochitests-3/5" need more than 0.5G on slaves using FAT disks. I found one using 638M. NTFS uses less space.
Another debug mochitest failed on windows after getting 0.53GB free.
I just theorized that this could be a possible cause of bug 509960 ... though it could also be a parallel make issue.

I'm kinda leaning towards doubling these numbers -- I think reliability takes precedence over speed.
Assignee: jford → aki
Out of disk space:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256675793.1256679213.8509.gz

I never was a fan of trying to squeeze every ounce of disk space. Maybe I'm old fashioned, but running builds on a disk where we expect to edge up against a gig or less of free space seems like we're asking for trouble.

If we double these numbers, we will clobber more often, slowing things down. We will also hit that hard limit much less often, and we can re-enable stricter nagios checks to tell us when those types of things happen.

We also expect to see significant disk usage reduction when bug 498522 lands.

Patch to double disk space requirements incoming.
(In reply to comment #36)
> If we double these numbers, we will clobber more often, slowing things down. We
> will also hit that hard limit much less often, 

(^ ... which will result in fewer build/test failures, ...)
Depends on: 489410, 524820
Attached patch double the build space numbers (deleted) — Splinter Review
While bumping these I realized that the lack of /scratchbox purging can cause a lot of linux burning; marked the bug dependent on bug 524820 .

We'll see if jhford gets the bind mount fix before I can update purge_builds.py; that is possibly the ideal solution.
Comment on attachment 408035 [details] [diff] [review]
build_space dictionary v2

Apologies for not getting to this patch earlier. Please reset r? if you still want review on it given the subsequent comments. I'd note that the staging and production changes aren't the same, eg there are still 'unit' entries in the staging dict.
Attachment #408035 - Flags: review?(nthomas)
To fix:

(In reply to comment #33)
> Builds like "WINNT 5.2 mozilla-central test debug mochitests-3/5" need more
> than 0.5G on slaves using FAT disks. I found one using 638M. NTFS uses less
> space than FAT.
Attachment #410428 - Flags: review?(aki)
Attachment #410428 - Flags: review?(aki) → review+
Comment on attachment 410428 [details] [diff] [review]
Bump packaged unit test up to 1G.

http://hg.mozilla.org/build/buildbotcustom/rev/db5d3d010d67
Attachment #410428 - Flags: checked-in+
A couple of linux unit tests for e10s have run out of space during make package today, asking for 5G and getting 5.01G. In production this syncs the m-c unittest_build_space of 6G to the project branches, and then syncs staging to prod.
Attachment #411610 - Flags: review?(aki)
Attachment #411610 - Flags: review?(aki) → review+
Comment on attachment 411610 [details] [diff] [review]
Sync m-c unit to project branches

http://hg.mozilla.org/build/buildbot-configs/rev/9d766dbe92ea
Attachment #411610 - Flags: checked-in+
This is a somewhat vaguely worded bug that might never be closed.

If we want to keep it, it should probably be a [buildduty] bug; otherwise we should resolve + open new ones as needed.
Assignee: aki → nobody
Whiteboard: [buildduty]
Attached patch increase mac builds to 7 GB (deleted) — Splinter Review
Attachment #414058 - Flags: review?(bhearsum)
Attachment #414058 - Flags: review?(bhearsum) → review+
Comment on attachment 414058 [details] [diff] [review]
increase mac builds to 7 GB

changeset:   1776:1fd91ffe4cb3
Attachment #414058 - Flags: checked-in+
Attachment #416170 - Flags: review?(bhearsum)
Attachment #416170 - Flags: review?(bhearsum) → review+
Comment on attachment 416170 [details] [diff] [review]
increase win32-debug builds on m-c and projects to 7 GB

changeset:   1832:7cc1b34623dd
Attachment #416170 - Flags: checked-in+
"Linux mozilla-central leak test build" failed at the "make package" step by lack of space on the device.
6.03 GB of space available at the beginning of the build.
http://production-master.build.mozilla.org:8010/builders/Linux%20mozilla-central%20leak%20test%20build/builds/5075

The patch requests increasing to 7GB of space.
Attachment #417697 - Flags: review?(bhearsum)
Attachment #417697 - Flags: review?(bhearsum) → review+
Comment on attachment 417697 [details] [diff] [review]
bump unittest space from 6GB to 7GB in all branches except m191

Actually, let's bump the linux-debug space, since that's the build that actually failed.
Attachment #417697 - Flags: review+ → review-
Increasing linux-debug build space rather than unit test's build space.
Attachment #417697 - Attachment is obsolete: true
Attachment #417703 - Flags: review?(bhearsum)
Comment on attachment 417703 [details] [diff] [review]
 bump leak test space from 6GB to 7GB in all branches except m191 and m192

Might have to bump 1.9.2 later, but this seems fine for now.
Attachment #417703 - Flags: review?(bhearsum) → review+
Comment on attachment 417703 [details] [diff] [review]
 bump leak test space from 6GB to 7GB in all branches except m191 and m192

http://hg.mozilla.org/build/buildbot-configs/rev/d6872b792a77
Attachment #417703 - Flags: checked-in+
Attachment #418314 - Flags: review?
Attachment #418314 - Flags: review? → review?(nrthomas)
Attachment #418314 - Flags: review?(nrthomas) → review+
Comment on attachment 418314 [details] [diff] [review]
Bump tracemonkey build space to 7 GB

changeset:   1871:866a524e5f8d
Attachment #418314 - Attachment description: Dump tracemonkey build space to 7 GB → Bump tracemonkey build space to 7 GB
Attachment #418314 - Flags: checked-in+
Are we still holding this open for bug 489410?
Let's file new bugs for this as it becomes an issue rather than holding one bug open indefinitely.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [buildduty]
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: