Closed Bug 1186297 Opened 9 years ago Closed 8 years ago

Switch ash branch to upload via new S3 frontends

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

References

Details

Attachments

(2 files)

... as an initial loadtest. Ash was suggested.
Depends on: 1186300
Depends on: 1186302
I've merged m-c to ash again, and this patch should take care of all the desktop, android, and the public parts of device builds. The exception is b2g manifests, which will be broken by a mismatched key and host. I am unsure if we still need these, but if we do can ask for a b2gbld upload endpoint, or swap to using ffxbld. There appear to be no remaining buildbot factory builds on ash, and in fact on m-c it's only hg-bundle, xulrunner, and l10n dep jobs which still reference 'stage.mozilla.org' explicitly. None of those are enabled on ash. This is waiting on updated known_hosts being deployed on enough slaves to avoid false upload failures.
Uploads look OK, but the new post_upload still isn't printing out the URLs for the uploaded resources.
Flags: needinfo?(oremj)
Can you give me the command you ran and the output that it generated?
Flags: needinfo?(oremj) → needinfo?(catlee)
The log is here: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/ash-linux-debug/1438229081/ash-linux-debug-bm71-build1-build0.txt.gz The command we ran was: post_upload.py --tinderbox-builds-dir ash-linux-debug -p firefox -i 20150729210441 --revision 917678add467 --release-to-tinderbox-dated-builds --release-to-latest-tinderbox-builds
Flags: needinfo?(catlee)
The --release-to-tinderbox-dated-builds command was broken. That should be working now. The --release-to-latest-tinderbox-builds, in the previous post_upload script, would symlink the dated builds directory to latest. Can we stop doing that? If not, how do we want to handle that operations since we can't atomically flip a directory.
We agreed earlier that instead of symlinking that you could copy the files into the latest directory.
I can copy them over, but what about files that are already there? Should I leave them or delete everything in the directory?
Flags: needinfo?(catlee)
Now getting these errors: 13:53:00 INFO - /builds/slave/ash-lx-00000000000000000000000/build/src/obj-firefox/_virtualenv/bin/python /builds/slave/ash-lx-00000000000000000000000/build/src/build/upload.py --base-path ../../dist \ 13:53:00 INFO - '../../dist/firefox-42.0a1.en-US.linux-i686.tar.bz2' '../../dist/linux-i686/xpi/firefox-42.0a1.en-US.langpack.xpi' '../../dist/firefox-42.0a1.en-US.linux-i686.common.tests.zip' '../../dist/firefox-42.0a1.en-US.linux-i686.cppunittest.tests.zip' '../../dist/firefox-42.0a1.en-US.linux-i686.xpcshell.tests.zip' '../../dist/firefox-42.0a1.en-US.linux-i686.mochitest.tests.zip' '../../dist/firefox-42.0a1.en-US.linux-i686.reftest.tests.zip' '../../dist/firefox-42.0a1.en-US.linux-i686.web-platform.tests.zip' '../../dist/firefox-42.0a1.en-US.linux-i686.crashreporter-symbols.zip' '../../dist//firefox-42.0a1.en-US.linux-i686.txt' '../../dist//firefox-42.0a1.en-US.linux-i686.json' '../../dist//firefox-42.0a1.en-US.linux-i686.mozinfo.json' '../../dist//test_packages.json' '../../dist/jsshell-linux-i686.zip' ../../dist/host/bin/mar ../../dist/host/bin/mbsdiff \ 13:53:00 INFO - '../../dist//firefox-42.0a1.en-US.linux-i686.checksums' '../../dist//firefox-42.0a1.en-US.linux-i686.checksums'.asc 13:56:58 INFO - 2015/07/30 20:56:58 putting /tmp/tmp.a43SLswCoz//firefox-42.0a1.en-US.linux-i686.tar.bz2 to net-mozaws-prod-delivery-firefox/pub/firefox/tinderbox-builds/ash-linux/1438229081/firefox-42.0a1.en-US.linux-i686.tar.bz2 err: NoSuchBucket: The specified bucket does not exist 13:56:58 INFO - status code: 404, request id: [] 13:56:58 INFO - 2015/07/30 20:56:58 putting /tmp/tmp.a43SLswCoz/linux-i686/xpi/firefox-42.0a1.en-US.langpack.xpi to net-mozaws-prod-delivery-firefox/pub/firefox/tinderbox-builds/ash-linux/1438229081/firefox-42.0a1.en-US.langpack.xpi err: NoSuchBucket: The specified bucket does not exist 13:56:58 INFO - status code: 404, request id: [] 13:56:59 INFO - 2015/07/30 20:56:59 putting /tmp/tmp.a43SLswCoz//firefox-42.0a1.en-US.linux-i686.common.tests.zip to net-mozaws-prod-delivery-firefox/pub/firefox/tinderbox-builds/ash-linux/1438229081/firefox-42.0a1.en-US.linux-i686.common.tests.zip err: NoSuchBucket: The specified bucket does not exist 13:56:59 INFO - status code: 404, request id: []
Flags: needinfo?(catlee)
You'll need the bucket-prefix flag we talked about earlier. --bucket-prefix "net-mozaws-stage-delivery"
Merged m-c to ash, and added the bucket prefix, at https://hg.mozilla.org/projects/ash/rev/b3432877c7fe.
We should turn on other jobs to make ash like m-c, eg nightlies, l10n, spidermonkey and other projects, in order to flush out ssh/scp/rsync usage. Transition to the new host is slated for mid-October.
Attached patch ash-buildbot-configs.diff (deleted) — Splinter Review
config diff can be viewed at https://gist.github.com/rail/2fc97bc7add5573af0ff There are some unused variables, but cleaning them up would be another bug. If you prefer we can tweak some variables, like enable_nightly_everytime, enable_weekly_bundle (shouldn't we use bundleclone?).
Attachment #8659885 - Flags: review?(nthomas)
Comment on attachment 8659885 [details] [diff] [review] ash-buildbot-configs.diff Lets either back off on periodic_start_hours now or soon, unless those builds are checking for pushes as well as watching the clock. Strange that dep_signing_servers is being unset for debug builds.
Attachment #8659885 - Flags: review?(nthomas) → review+
Comment on attachment 8659885 [details] [diff] [review] ash-buildbot-configs.diff https://hg.mozilla.org/build/buildbot-configs/rev/eb9fa2c84aee pgo strategy set to per-checkin
Attachment #8659885 - Flags: checked-in+
Summary: Switch a branch to upload via new S3 frontends → Switch ash branch to upload via new S3 frontends
I investigated multiple options to figure out what would be the optimal instance type for the upload host. One of ideas was trying to simulate load comparable to what we have now. First step was to figure out the current upload rates. I tried to use graphite, but the data is to coarse - tx/rx rates are not what we need. I took this approach to figure out the upload rates we experience: * find all files modified in particular period of time. I applied this to Firefox, Fennec and B2G files living on stage.m.o, modified within 24h for a busy day with multiple releases in fly (around Sep 15) * Generate time series and analyze the rates. In our case the max is most important because we have to plan for peaks. The results are the below: 30s max: 874 Mbps 1m max: 658 Mbps 3m max: 477 Mbps 5m max: 439 Mbps 10m max: 378 Mbps 1h max: 320 Mbps Load simulation is a bit tricky task which may take a lot of resources. We thought that we could use taskcluster to spin up a lot of clients and upload some files. This will require some extra work to prep proper images with all needed secrets baked in and write custom scripts to generate traffic. From our past experience with proxxy, we will need quite a beefy instance (assuming we can't use multiple instances in parallel) to meet the needed network performance. Per http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-ec2-config.html m3.2xlarge might be what we need.
rail, could you do what's necessary to turn on funsize for ash ?
Flags: needinfo?(rail)
I've pushed to ash with fixes for * desktop balrog submission (replace archive.m.o with CDN instead of ftp.m.o), would have probably hit this on android too * android multi-locale's missing config See https://treeherder.mozilla.org/#/jobs?repo=ash&revision=9595d85e38b8. B2G builds will still be busted uploading to the new system, probably because upload.ffxbld.stage needs to be allowed to write to /pub/b2g/.
Depends on: 1211297
(In reply to Nick Thomas [:nthomas] from comment #19) > rail, could you do what's necessary to turn on funsize for ash ? You can add ash to https://github.com/mozilla/funsize/blob/master/funsize/worker.py#L22. Deployment is a bit tricky (need to document it..), I can help with that.
Flags: needinfo?(rail)
Depends on: 1211371
Depends on: 1211374
Depends on: 1211402
Depends on: 1211770
Current status: everything is green on ash except * public b2g uploads, the upload host isn't allowed access to b2g/, bug 1211374 * b2g manifests will fail to upload after the switch, bug 1211371 * hazard builds do their own upload, bug 1211402 * the blocklist updates will break on esr38 when we lose tinderbox-builds/foo/latest, and have no nightlies, bug 1211770 (In reply to Rail Aliiev [:rail] from comment #23) > Deployed new funsize: Not sure if not working, or needs more nightlies to be able to make progress. In retrospect we only need to generate partials one build back.
Blocks: 1213721
Still using ash to finish up the source manifests and socorro json files, and then possibly for redoing the pvtbuilds uploads.
Finished using ash a long time ago. https://github.com/mozilla-releng/funsize/pull/41/files to celan up the funsize config.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: