Closed Bug 983725 Opened 11 years ago Closed 9 years ago

Panda tests don't release the mozpool request when bm-remote is down

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: Callek, Unassigned)

References

Details

So today we had a major scl3 power failure that caused pandas to fail while trying to get binaries off bm-remote

Then once bm-remote was up, we failed due to the devices not being "ready" because they were "busy" (e.g. busy from prior jobs) and thus retrying a _lot_

Looking at the log for the failed ones, we never told mozpool to release the device.

https://tbpl.mozilla.org/php/getParsedLog.php?id=36148449&tree=Mozilla-Inbound&full=1#error0

09:10:50     INFO - #### Running reftest suites
09:10:50     INFO - mkdir: /builds/panda-0761/test/build/hostutils
09:10:50     INFO - Downloading http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip to /builds/panda-0761/test/build/tegra-host-utils.Linux.742597.zip
09:10:50     INFO - retry: Calling <bound method PandaTest._download_file of <__main__.PandaTest object at 0x203ac90>> with args: ('http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip', '/builds/panda-0761/test/build/tegra-host-utils.Linux.742597.zip'), kwargs: {}, attempt #1
09:10:50  WARNING - Server returned status 500 HTTP Error 500: Server Error for http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip
09:10:50     INFO - retry: Failed, sleeping 60 seconds before retrying
09:11:50     INFO - retry: Calling <bound method PandaTest._download_file of <__main__.PandaTest object at 0x203ac90>> with args: ('http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip', '/builds/panda-0761/test/build/tegra-host-utils.Linux.742597.zip'), kwargs: {}, attempt #2
09:11:50  WARNING - Server returned status 500 HTTP Error 500: Server Error for http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip
09:11:50     INFO - retry: Failed, sleeping 120 seconds before retrying
09:13:50     INFO - retry: Calling <bound method PandaTest._download_file of <__main__.PandaTest object at 0x203ac90>> with args: ('http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip', '/builds/panda-0761/test/build/tegra-host-utils.Linux.742597.zip'), kwargs: {}, attempt #3
09:13:50  WARNING - Server returned status 500 HTTP Error 500: Server Error for http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip
09:13:50     INFO - retry: Failed, sleeping 240 seconds before retrying
09:17:50     INFO - retry: Calling <bound method PandaTest._download_file of <__main__.PandaTest object at 0x203ac90>> with args: ('http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip', '/builds/panda-0761/test/build/tegra-host-utils.Linux.742597.zip'), kwargs: {}, attempt #4
09:17:50  WARNING - Server returned status 500 HTTP Error 500: Server Error for http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip
09:17:50     INFO - retry: Failed, sleeping 300 seconds before retrying
09:22:50     INFO - retry: Calling <bound method PandaTest._download_file of <__main__.PandaTest object at 0x203ac90>> with args: ('http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip', '/builds/panda-0761/test/build/tegra-host-utils.Linux.742597.zip'), kwargs: {}, attempt #5
09:22:50  WARNING - Server returned status 500 HTTP Error 500: Server Error for http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip
09:22:50    FATAL - Can't download from http://bm-remote.build.mozilla.org/tegra/tegra-host-utils.Linux.742597.zip to /builds/panda-0761/test/build/tegra-host-utils.Linux.742597.zip!
09:22:50    FATAL - Caught exception: HTTP Error 500: Server Error
09:22:50    FATAL - Caught exception: HTTP Error 500: Server Error
09:22:50    FATAL - Caught exception: HTTP Error 500: Server Error
09:22:50    FATAL - Caught exception: HTTP Error 500: Server Error
09:22:50    FATAL - Caught exception: HTTP Error 500: Server Error
09:22:50    FATAL - Running post_fatal callback...
09:22:50    FATAL - Exiting -1
09:22:50     INFO - Running post-action listener: _resource_record_post_action
09:22:50     INFO - Running post-run listener: _resource_record_post_run
09:22:50     INFO - Running post-run listener: _upload_blobber_files
09:22:50     INFO - Blob upload gear active.
09:22:50     INFO - Preparing to upload files from /builds/panda-0761/test/build/blobber_upload_dir.
09:22:50     INFO - Files from /builds/panda-0761/test/build/blobber_upload_dir are to be uploaded with <mozilla-inbound> branch at the following location(s): https://blobupload.elasticbeanstalk.com
09:22:50     INFO - Running command: ['/builds/panda-0761/test/build/venv/bin/python', '/builds/panda-0761/test/build/venv/bin/blobberc.py', '-u', 'https://blobupload.elasticbeanstalk.com', '-a', '/builds/panda-0761/test/oauth.txt', '-b', 'mozilla-inbound', '-d', '/builds/panda-0761/test/build/blobber_upload_dir']
09:22:50     INFO - Copy/paste: /builds/panda-0761/test/build/venv/bin/python /builds/panda-0761/test/build/venv/bin/blobberc.py -u https://blobupload.elasticbeanstalk.com -a /builds/panda-0761/test/oauth.txt -b mozilla-inbound -d /builds/panda-0761/test/build/blobber_upload_dir
09:22:50     INFO -  (blobuploader) - INFO - Open directory for files ...
09:22:50     INFO -  (blobuploader) - INFO - Uploading /builds/panda-0761/test/build/blobber_upload_dir/logcat.log ...
09:22:50     INFO -  (blobuploader) - INFO - Using https://blobupload.elasticbeanstalk.com
09:22:50     INFO -  (blobuploader) - INFO - Uploading, attempt #1.
09:22:52     INFO -  (blobuploader) - INFO - TinderboxPrint: Uploaded logcat.log to http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/e492493ef3c735b23ca7675bc693513e1fcb833c6236e9992a69be17b14a74a5ac77d45fefa175a4657287d8e095a14cbb330f98eb0524a26bf5c5a54b95cdc3
09:22:52     INFO -  (blobuploader) - INFO - Blobserver returned 202. File uploaded!
09:22:52     INFO -  (blobuploader) - INFO - Done attempting.
09:22:52     INFO -  (blobuploader) - INFO - Iteration through files over.
09:22:52     INFO - Return code: 0
program finished with exit code 255
Depends on: 1186615
closing since we are decomming pandas in bug 1186615
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.