Closed Bug 464093 Opened 16 years ago Closed 16 years ago

Builds on Mac take too much space

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: catlee)

References

Details

Attachments

(6 files, 4 obsolete files)

(deleted), patch
bhearsum
: review+
Details | Diff | Splinter Review
(deleted), patch
bhearsum
: review+
Details | Diff | Splinter Review
(deleted), patch
bhearsum
: review+
Details | Diff | Splinter Review
(deleted), patch
coop
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
bhearsum
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
(deleted), patch
nthomas
: review+
nthomas
: checked-in+
Details | Diff | Splinter Review
We've run into a problem a few times on mac slaves on staging where a build will fail to clean up after itself properly, causing future builds to run out of disk space, and thus fail as well.  This requires manual intervention to fix.

We haven't (yet) run into this problem on Linux or Windows, since the Mac builds have both PPC and x86 code.  However, there are several new product branches coming down the pipe, each of which will require significant disk space.

There are a few issues here:

- Mac builds are done with -save-temps which causes gcc to leave all its internal temporary files on disk.  This is to work around a bug where gcc will crash with a bus error when using '-gstabs' (which we do).

- Builds should be cleaned up regardless of if the rest of the build steps complete successfully or not.
(In reply to comment #0)
> - Builds should be cleaned up regardless of if the rest of the build steps
> complete successfully or not.

This isn't really solvable right now. When a BuildStep fails that isn't set to haltOnFailure=False the build just stops. It'd be nice to give Buildbot support for a set of steps that always get run, sortof like a "finally" statement.
Blocks: 417045
Could we add an additional command as part of the compile step to do the cleanup?

e.g.

make -f client.mk; make cleanup
I have hit this same problem when the l10n repackages happened on staging-master

For instance moz2-darwin09-slave04's df -hi shows:
Filesystem      Size   Used  Avail Capacity  iused ifree %iused  Mounted on
/dev/disk0s2    74Gi   74Gi   55Mi   100% 19439562 14144  100%   /
devfs          105Ki  105Ki    0Bi   100%      600     0  100%   /dev
fdesc          1.0Ki  1.0Ki    0Bi   100%        4   253    2%   /dev
map -hosts       0Bi    0Bi    0Bi   100%        0     0  100%   /net
map auto_home    0Bi    0Bi    0Bi   100%        0     0  100%   /home
Blocks: 460791
Blocks: 464103
I have filed bug 464103 to see if we can see how this high disk usage is going
to affect us in the long run
`du -msc * | sort -n` in /builds/moz2_slave yields the following:

0	macosx_build
0	macosx_update_verify
1	buildbot.tac
1	info
1	twistd.log
1	twistd.log.1
1	twistd.log.2
1	twistd.log.3
1	twistd.log.4
1	twistd.pid
105	mozilla-central-macosx-l10n-nightly
933	mozilla-central-macosx-unittest
1680	mozilla-central-macosx-debug
7533	tracemonkey-macosx-nightly
11681	mozilla-central-macosx-nightly
16468	tracemonkey-macosx
16560	mozilla-central-macosx
54962	total
As a work around, perhaps we can make the _start_ of all build runs clean up all the nightly directories.
(In reply to comment #6)
> As a work around, perhaps we can make the _start_ of all build runs clean up
> all the nightly directories.

yeah, that would work, but means that each of the builders needs to know about all the other ones in order to clean them up.  It could get tricky to update those lists as we add / remove branches.
  bash -c rm -rf *-nightly/build/
except that it's brutal, and will fail on new slaves.
(In reply to comment #2)
> Could we add an additional command as part of the compile step to do the
> cleanup?
> 
> e.g.
> 
> make -f client.mk; make cleanup

This is a kludge, but it would help in some cases. Obviously, when the failure comes somewhere after the Compile step we'd still be out of luck.

(In reply to comment #8)
>   bash -c rm -rf *-nightly/build/
> except that it's brutal, and will fail on new slaves.

We could do something like...
bash -c 'for dir in ../*-nightly; do rm -rf ../$dir/build'.

It irks me when there are BuildSteps that step outside their own build directory, but in the name of not getting people paged I think I could live with something like this :).
(In reply to comment #8)
>   bash -c rm -rf *-nightly/build/
> except that it's brutal, and will fail on new slaves.

The '-rf' should make the command oblivious to failure, no? Even so, couldn't we just warnOnFailure there?
(In reply to comment #10)
> (In reply to comment #8)
> >   bash -c rm -rf *-nightly/build/
> > except that it's brutal, and will fail on new slaves.
> 
> The '-rf' should make the command oblivious to failure, no? Even so, couldn't
> we just warnOnFailure there?

The command will fail completely if it can't glob anything, because rm -rf will be called with no arguments. But we can get Buildbot to ignore it.
Attachment #347553 - Attachment is obsolete: true
We could also investigate if we still need --save-temps now that we switched to XCode 3.1 - Apple might have fixed gcc.
I have hit this again during the l10n repackages but I was fast enough to "rm -rf tracemonkey*" and allowing the rest of the mac builds after "ru" to complete properly.

/me in love with the patch and even more that it tackles all 3 platforms
(In reply to comment #14)
> We could also investigate if we still need --save-temps now that we switched to
> XCode 3.1 - Apple might have fixed gcc.

I tried on my laptop which has XCode 3.1, and it crashed out pretty soon with a bus error.

Does anybody know if we have contacted Apple with this problem?  We should be able to send a copy of the preprocessed source, along with the gcc flags used to reproduce the error.
Attachment #347773 - Flags: review?(bhearsum)
Assignee: nobody → catlee
Priority: -- → P2
Looks like we'll get dwarf support in breakpad in the near future (bug 421534), so we can probably ditch --save-temps then.
Blocks: 421534
Comment on attachment 347773 [details] [diff] [review]
[checked in] changes to unittest, mobile and l10n masters to delete previous nightly builds

We should probably add this to MercurialBuildFactory in factory.py, too.
Attachment #347773 - Flags: review?(bhearsum) → review+
Comment on attachment 347773 [details] [diff] [review]
[checked in] changes to unittest, mobile and l10n masters to delete previous nightly builds

changeset:   509:e05ba6596c28

still need a follow-up for MercurialBuildFactory
Attachment #347773 - Attachment description: changes to unittest, mobile and l10n masters to delete previous nightly builds → [checked in] changes to unittest, mobile and l10n masters to delete previous nightly builds
Comment on attachment 347556 [details] [diff] [review]
[checked in] remove previous nightly builds before every build

Checking in factory.py;
/cvsroot/mozilla/tools/buildbotcustom/process/factory.py,v  <--  factory.py
new revision: 1.33; previous revision: 1.32
done
Attachment #347556 - Attachment description: remove previous nightly builds before every build → [checked in] remove previous nightly builds before every build
No longer blocks: 421534
Depends on: 421534
No longer blocks: 460791
Comment on attachment 347556 [details] [diff] [review]
[checked in] remove previous nightly builds before every build

I just noticed that production-master never picked this change up. I just reconfig'ed it for it.
Comment on attachment 348762 [details] [diff] [review]
changes to mobile and l10n factories to clean up old nightly builds on production

changeset:   512:bf93ed906fcd

Feel free to update & reconfig the master with this.
Attachment #348762 - Flags: review?(bhearsum) → review+
Priority: P2 → --
This patch will take us down to approx. 10GB per build directory at the end of a successful build w/ save-temps. Currently, we're at 15GB per dir.
Attachment #349464 - Flags: review?(ccooper)
Attachment #349464 - Flags: review?(ccooper) → review+
Comment on attachment 349464 [details] [diff] [review]
cleanup both the i386 and ppc objdirs, delete  even more temp files

Checking in factory.py;
/cvsroot/mozilla/tools/buildbotcustom/process/factory.py,v  <--  factory.py
new revision: 1.38; previous revision: 1.37
done
Attachment #349464 - Attachment description: cleanup both the i386 and ppc objdirs, deleted even more temp files → cleanup both the i386 and ppc objdirs, delete even more temp files
Attachment #349464 - Flags: checked‑in+
Attached patch Fix up path to nightly dirs (deleted) — Splinter Review
Can you spot what's wrong with this snippet:
  bash -c rm -rf ../*-nightly/build
   in dir /builds/moz2_slave/mozilla-1.9.1-linux-nightly/build

It's missing another ../ on the path to delete
Attachment #349700 - Flags: review?(bhearsum)
Attachment #349700 - Flags: checked‑in+
Comment on attachment 349700 [details] [diff] [review]
Fix up path to nightly dirs

changeset:   531:aef8d5499c06
Attachment #349700 - Flags: review?(bhearsum) → review+
Attachment #349803 - Flags: review?(bhearsum)
Actually, the same thing happens to other nightly builds. Eg
  C:\WINDOWS\system32\cmd.exe /c bash -c rm -rf ../../*-nightly/build
   in dir e:\builds\moz2_slave\mozilla-central-win32-nightly\build 
  rm: cannot remove directory `../../mozilla-central-win32-nightly/build':Permission denied
  program finished with exit code 1

What if move the rm -rf ../../*nightly/build into an else block at line 215
  http://mxr.mozilla.org/seamonkey/source/tools/buildbotcustom/process/factory.py#208
Except that won't clear up other failed nightlies when trying to do a new one. Teh suck.
why don't we set the workdir to be '.' instead of not specifying it? (and therefore set by default to "build")

This way "rm" is not being executed from within "build"
This nightly cleanup job is turning the nightly builds orange - we need to fix this before the next set of nightlies comes out.
Attached patch Fix cleaning up other nightly builds (obsolete) (deleted) — Splinter Review
Attachment #349803 - Attachment is obsolete: true
Attachment #349957 - Flags: review?(bhearsum)
Attachment #349803 - Flags: review?(bhearsum)
Attachment #349957 - Attachment is obsolete: true
Attachment #349967 - Flags: review?(nthomas)
Attachment #349957 - Flags: review?(bhearsum)
(In reply to comment #33)
> This nightly cleanup job is turning the nightly builds orange - we need to fix
> this before the next set of nightlies comes out.

Is this orange because of the windows mozilla-central nightly build?
"NMAKE : fatal error U1052: file 'makefile.sub' not found"?
What is this file for?
(In reply to comment #36)
> (In reply to comment #33)
> > This nightly cleanup job is turning the nightly builds orange - we need to fix
> > this before the next set of nightlies comes out.
> 
> Is this orange because of the windows mozilla-central nightly build?
> "NMAKE : fatal error U1052: file 'makefile.sub' not found"?
> What is this file for?

No, I think they turned orange because of this:
http://production-master.build.mozilla.org:8010/builders/WINNT%205.2%20mozilla-1.9.1%20nightly/builds/8/steps/shell_3/logs/stdio

rm: cannot remove directory `../../mozilla-1.9.1-win32-nightly/build': Permission denied

This is because it's trying to remove the directory that the shell is currently in.  So your suggestion of running with workdir='.' should work.
Comment on attachment 349967 [details] [diff] [review]
Fix cleaning up other nightly builds

r+, switched the workDir lines to use single quotes on checkin (rev 1.43)

Updated moz2-master on staging and production before 2am PST.
Attachment #349967 - Flags: review?(nthomas) → review+
Attachment #349967 - Flags: checked‑in+
I am making progress on bug 421534, so don't lose hope!
Attached patch Delete l10n builds properly (obsolete) (deleted) — Splinter Review
This is on staging right now
Attachment #351251 - Flags: review?(bhearsum)
Attachment #351251 - Attachment is obsolete: true
Attachment #351251 - Flags: review?(bhearsum)
This will probably be WFM now. I intend to try to get bug 421534 landed on the 1.9.1 branch, which should eliminate the problem.
The new dwarf support fixes this for the Mac, and bug 464103 will handle other platforms.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: