Closed
Bug 1305752
Opened 8 years ago
Closed 8 years ago
Intermittent-infra BadZipfile: File is not a zip file
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(firefox52 fixed)
RESOLVED
DUPLICATE
of bug 1306189
Tracking | Status | |
---|---|---|
firefox52 | --- | fixed |
People
(Reporter: philor, Assigned: armenzg)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell infra])
Attachments
(1 file)
+++ This bug was initially created as a clone of Bug #1300345 +++
https://treeherder.mozilla.org/logviewer.html#?job_id=4062041&repo=autoland
[task 2016-09-27T14:32:11.573744Z] 14:32:11 INFO - Fetch https://queue.taskcluster.net/v1/task/fJ6_ZueeTBCcSt22Qb-AHA/artifacts/public/build/target.common.tests.zip into memory
[task 2016-09-27T14:32:11.872192Z] 14:32:11 INFO - Expected file size: 16970
[task 2016-09-27T14:32:11.876818Z] 14:32:11 INFO - Running post-action listener: _resource_record_post_action
[task 2016-09-27T14:32:11.876885Z] 14:32:11 INFO - Running post-action listener: set_extra_try_arguments
[task 2016-09-27T14:32:11.877239Z] 14:32:11 INFO - [mozharness: 2016-09-27 14:32:11.873795Z] Finished download-and-extract step (failed)
[task 2016-09-27T14:32:11.880050Z] 14:32:11 FATAL - Uncaught exception: Traceback (most recent call last):
[task 2016-09-27T14:32:11.881801Z] 14:32:11 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 2020, in run
[task 2016-09-27T14:32:11.882774Z] 14:32:11 FATAL - self.run_action(action)
[task 2016-09-27T14:32:11.885250Z] 14:32:11 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 1959, in run_action
[task 2016-09-27T14:32:11.886042Z] 14:32:11 FATAL - self._possibly_run_method(method_name, error_if_missing=True)
[task 2016-09-27T14:32:11.889723Z] 14:32:11 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 1899, in _possibly_run_method
[task 2016-09-27T14:32:11.890950Z] 14:32:11 FATAL - return getattr(self, method_name)()
[task 2016-09-27T14:32:11.893371Z] 14:32:11 FATAL - File "/home/worker/workspace/mozharness/scripts/desktop_unittest.py", line 531, in download_and_extract
[task 2016-09-27T14:32:11.894292Z] 14:32:11 FATAL - suite_categories=target_categories)
[task 2016-09-27T14:32:11.897164Z] 14:32:11 FATAL - File "/home/worker/workspace/mozharness/mozharness/mozilla/testing/testbase.py", line 553, in download_and_extract
[task 2016-09-27T14:32:11.908863Z] 14:32:11 FATAL - self._download_test_packages(suite_categories, extract_dirs)
[task 2016-09-27T14:32:11.909757Z] 14:32:11 FATAL - File "/home/worker/workspace/mozharness/mozharness/mozilla/testing/testbase.py", line 464, in _download_test_packages
[task 2016-09-27T14:32:11.911448Z] 14:32:11 FATAL - extract_dirs=unpack_dirs)
[task 2016-09-27T14:32:11.912220Z] 14:32:11 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 684, in download_unpack
[task 2016-09-27T14:32:11.913196Z] 14:32:11 FATAL - function(**kwargs)
[task 2016-09-27T14:32:11.914962Z] 14:32:11 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 556, in unzip
[task 2016-09-27T14:32:11.915713Z] 14:32:11 FATAL - with zipfile.ZipFile(compressed_file) as bundle:
[task 2016-09-27T14:32:11.921691Z] 14:32:11 FATAL - File "/usr/lib/python2.7/zipfile.py", line 714, in __init__
[task 2016-09-27T14:32:11.925240Z] 14:32:11 FATAL - self._GetContents()
[task 2016-09-27T14:32:11.926851Z] 14:32:11 FATAL - File "/usr/lib/python2.7/zipfile.py", line 748, in _GetContents
[task 2016-09-27T14:32:11.929961Z] 14:32:11 FATAL - self._RealGetContents()
[task 2016-09-27T14:32:11.937317Z] 14:32:11 FATAL - File "/usr/lib/python2.7/zipfile.py", line 763, in _RealGetContents
[task 2016-09-27T14:32:11.942097Z] 14:32:11 FATAL - raise BadZipfile, "File is not a zip file"
[task 2016-09-27T14:32:11.943868Z] 14:32:11 FATAL - BadZipfile: File is not a zip file
[task 2016-09-27T14:32:11.945209Z] 14:32:11 FATAL - Running post_fatal callback...
[task 2016-09-27T14:32:11.948238Z] 14:32:11 FATAL - Exiting -1
Assignee | ||
Comment 1•8 years ago
|
||
gps: I remember we decided to not retry to unzip a file if what we downloaded is corrupted. If the extraction fails, would you be OK with trying *once* more to download the file into memory and try again to extract?
In this log, we downloaded [1][2] with a expected file size of 16970 (lies!) and failed to unzip.
I downloaded the file on my machine and it extracts normally.
[1] https://queue.taskcluster.net/v1/task/fJ6_ZueeTBCcSt22Qb-AHA/artifacts/public/build/target.common.tests.zip
[2] https://treeherder.mozilla.org/logviewer.html#?job_id=4062041&repo=autoland#L582
Flags: needinfo?(gps)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 3•8 years ago
|
||
We have a bit more information in the logs [1] from bug 1305775, however, it does not help determine what is causing this.
> Expected file size: 574026
> Content-Encoding None
> Content-Type application/zip
> Http code 200
Any ideas on what could be going on?
I think we might want to try to download once more and then unzip.
[1] https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=4110637#L614
[task 2016-09-28T03:53:31.179372Z] 03:53:31 INFO - Fetch https://queue.taskcluster.net/v1/task/SDlWxl5OQTOX9u4ZNytuTA/artifacts/public/build/target.reftest.tests.zip into memory
[task 2016-09-28T03:53:31.463397Z] 03:53:31 INFO - Expected file size: 574026
[task 2016-09-28T03:53:31.464822Z] 03:53:31 INFO - Content-Encoding None
[task 2016-09-28T03:53:31.464887Z] 03:53:31 INFO - Content-Type application/zip
[task 2016-09-28T03:53:31.465545Z] 03:53:31 INFO - Http code 200
[task 2016-09-28T03:53:31.472609Z] 03:53:31 INFO - Running post-action listener: _resource_record_post_action
[task 2016-09-28T03:53:31.473238Z] 03:53:31 INFO - Running post-action listener: set_extra_try_arguments
[task 2016-09-28T03:53:31.473324Z] 03:53:31 INFO - [mozharness: 2016-09-28 03:53:31.472948Z] Finished download-and-extract step (failed)
[task 2016-09-28T03:53:31.475168Z] 03:53:31 FATAL - Uncaught exception: Traceback (most recent call last):
[task 2016-09-28T03:53:31.475815Z] 03:53:31 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 2022, in run
[task 2016-09-28T03:53:31.476019Z] 03:53:31 FATAL - self.run_action(action)
[task 2016-09-28T03:53:31.476597Z] 03:53:31 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 1961, in run_action
[task 2016-09-28T03:53:31.477142Z] 03:53:31 FATAL - self._possibly_run_method(method_name, error_if_missing=True)
[task 2016-09-28T03:53:31.477611Z] 03:53:31 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 1901, in _possibly_run_method
[task 2016-09-28T03:53:31.478145Z] 03:53:31 FATAL - return getattr(self, method_name)()
[task 2016-09-28T03:53:31.478636Z] 03:53:31 FATAL - File "/home/worker/workspace/mozharness/scripts/desktop_unittest.py", line 531, in download_and_extract
[task 2016-09-28T03:53:31.479144Z] 03:53:31 FATAL - suite_categories=target_categories)
[task 2016-09-28T03:53:31.479671Z] 03:53:31 FATAL - File "/home/worker/workspace/mozharness/mozharness/mozilla/testing/testbase.py", line 553, in download_and_extract
[task 2016-09-28T03:53:31.480235Z] 03:53:31 FATAL - self._download_test_packages(suite_categories, extract_dirs)
[task 2016-09-28T03:53:31.480689Z] 03:53:31 FATAL - File "/home/worker/workspace/mozharness/mozharness/mozilla/testing/testbase.py", line 464, in _download_test_packages
[task 2016-09-28T03:53:31.481245Z] 03:53:31 FATAL - extract_dirs=unpack_dirs)
[task 2016-09-28T03:53:31.481753Z] 03:53:31 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 686, in download_unpack
[task 2016-09-28T03:53:31.482256Z] 03:53:31 FATAL - function(**kwargs)
[task 2016-09-28T03:53:31.482816Z] 03:53:31 FATAL - File "/home/worker/workspace/mozharness/mozharness/base/script.py", line 558, in unzip
[task 2016-09-28T03:53:31.483309Z] 03:53:31 FATAL - with zipfile.ZipFile(compressed_file) as bundle:
[task 2016-09-28T03:53:31.483806Z] 03:53:31 FATAL - File "/usr/lib/python2.7/zipfile.py", line 714, in __init__
[task 2016-09-28T03:53:31.484365Z] 03:53:31 FATAL - self._GetContents()
[task 2016-09-28T03:53:31.484900Z] 03:53:31 FATAL - File "/usr/lib/python2.7/zipfile.py", line 748, in _GetContents
[task 2016-09-28T03:53:31.485393Z] 03:53:31 FATAL - self._RealGetContents()
[task 2016-09-28T03:53:31.485887Z] 03:53:31 FATAL - File "/usr/lib/python2.7/zipfile.py", line 763, in _RealGetContents
[task 2016-09-28T03:53:31.486413Z] 03:53:31 FATAL - raise BadZipfile, "File is not a zip file"
[task 2016-09-28T03:53:31.486939Z] 03:53:31 FATAL - BadZipfile: File is not a zip file
[task 2016-09-28T03:53:31.487421Z] 03:53:31 FATAL - Running post_fatal callback...
[task 2016-09-28T03:53:31.487918Z] 03:53:31 FATAL - Exiting -1
Comment 4•8 years ago
|
||
(sandbox) dustin@lamport ~/p/m-c (bug1269443) $ wget https://queue.taskcluster.net/v1/task/SDlWxl5OQTOX9u4ZNytuTA/artifacts/public/build/target.reftest.tests.zip
...
-rw-rw-r-- 1 dustin dustin 32102631 Sep 28 03:45 target.reftest.tests.zip
and
(sandbox) dustin@lamport ~/p/m-c (bug1269443) $ curl -L -v https://queue.taskcluster.net/v1/task/SDlWxl5OQTOX9u4ZNytuTA/artifacts/public/build/target.reftest.tests.zip >/dev/null
...
< HTTP/1.1 200 OK
< Content-Type: application/zip
< Content-Length: 32102631
so somehow this file is truncated on the server side when it is read. I don't think S3 does that (its writes are atomic).. yet there it is!
I don't think retrying will help here.
John, do you have any idea how this might occur? I might guess an error message, but the status is 200 and the size is 574026 bytes, which is a lot for an error response! If you can dig up some cloud-mirror logging for this request, or see a way to add some logging output that might help debug, that'd be great.
Armen, what would you think about dropping the file contents as a TC artifact (in ~/artifacts) when this occurs? It would be good to know if the contents are a prefix of the zipfile, or some XML blob, or what..
Flags: needinfo?(gps) → needinfo?(jhford)
Comment 5•8 years ago
|
||
Really, that all sounds somewhat similar to bug 1219934.
Assignee | ||
Comment 6•8 years ago
|
||
Which of these fields would be useful?
(Pdb) pp(response.headers.keys())
['content-length',
'via',
'x-cache',
'accept-ranges',
'server',
'last-modified',
'connection',
'etag',
'x-amz-cf-id',
'date',
'x-amz-version-id',
'content-type',
'age']
(Pdb) pp(response.headers.values())
['32102631',
'1.1 2e30fde54499f7dddca8f5e56939db61.cloudfront.net (CloudFront)',
'Hit from cloudfront',
'bytes',
'AmazonS3',
'Wed, 28 Sep 2016 03:45:18 GMT',
'close',
'"098bdf06f370a606a75637840c4c0b3d"',
'jjJzvka1TC6MwPTxn3CD3KzWv7bPfHFnQCJoNCLu0cjWoggh3X1kQg==',
'Wed, 28 Sep 2016 15:22:41 GMT',
'jwmF11fp84D3PjyPIbbVORt_v1ecEQvR',
'application/zip',
'182']
Comment 7•8 years ago
|
||
Armen, x-amz-* would be useful, as well as content-length (already logged), via and x-cache. In fact, all of it would be useful. Yes, this sounds a lot like bug 1219934 but I think we need to get to the bottom of it -- this is not a failure mode AWS services should be producing.
Updated•8 years ago
|
Comment 8•8 years ago
|
||
I checked the cloudfront logs for the download in comment 3. It was not in cloudfront, so bug 1306040 is probably unrelated. Aremen's going to try to get more headers and a copy of the corrupt file.
Flags: needinfo?(jhford)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → armenzg
Assignee | ||
Comment 9•8 years ago
|
||
This is what the output will look like bar my development exception raising:
12:57:05 INFO - Fetch https://queue.taskcluster.net/v1/task/SDlWxl5OQTOX9u4ZNytuTA/runs/0/artifacts/public%2Fbuild%2Ftarget.jsshell.zip into memory
12:57:05 INFO - Http code: 200
12:57:05 INFO - Content-Encoding: None
12:57:05 INFO - Content-Type: application/zip
12:57:05 INFO - via: 1.1 70bf723bd71722ac07a40107e9b08c63.cloudfront.net (CloudFront)
12:57:05 INFO - x-amz-cf-id: 2Q8tkGRI9XAoOZs9g-XCyDXKJQDHWqUrMaGil9wPNemil9Gd7irU8g==
12:57:05 INFO - x-amz-version-id: RrdLOF_lcmNQdfBVjfzMCaqofCMQ5IAK
12:57:05 INFO - x-cache: Hit from cloudfront
12:57:08 INFO - Expected file size: 8904439
12:57:08 INFO - Obtained file size: 8904439
12:57:08 INFO - Storing corrupted file to /home/armenzg/repos/firefox/testing/mozharness/build/upload/public%2Fbuild%2Ftarget.jsshell.zip
12:57:08 FATAL - Traceback (most recent call last):
12:57:08 FATAL -
12:57:08 FATAL - File "/home/armenzg/repos/firefox/testing/mozharness/mozharness/base/script.py", line 689, in download_unpack
12:57:08 FATAL - function(**kwargs)
12:57:08 FATAL -
12:57:08 FATAL - File "/home/armenzg/repos/firefox/testing/mozharness/mozharness/base/script.py", line 559, in unzip
12:57:08 FATAL - raise zipfile.BadZipfile('hey')
12:57:08 FATAL -
12:57:08 FATAL - BadZipfile: hey
12:57:08 FATAL -
12:57:08 FATAL - Running post_fatal callback...
12:57:08 FATAL - Exiting -1
Comment hidden (mozreview-request) |
Comment 11•8 years ago
|
||
Are we using TLS (https://) everywhere in the artifact upload, download, and even internal proxying lifecycle? TLS provides additional prevention against stream corruption and will provide a buffer against things like bit flips.
Regarding CloudFront, when I worked on a Firefox hotfix add-on to download a Firefox installer, we were seeing a surprising number of hash verification failures on installer downloads from CloudFront. Don't over-estimate the ability of S3 or CloudFront to actually deliver bits accurately, sadly :/
Comment 12•8 years ago
|
||
mozreview-review |
Comment on attachment 8795792 [details]
Bug 1305752 - If the file we download is a corrupted zip file store it for inspection.
https://reviewboard.mozilla.org/r/81734/#review80370
Attachment #8795792 -
Flags: review?(dustin) → review+
Comment 13•8 years ago
|
||
Yes, https everywhere for artifact handling. But you're right, this could just be CF barfs (in which case, bug 1306040 will help) or S3 barfs (in which case, well, that's what you get for using Other People's Computers, right?)
Comment 14•8 years ago
|
||
Pushed by armenzg@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7eda0dd1c0a8
If the file we download is a corrupted zip file store it for inspection. r=dustin
Comment hidden (Intermittent Failures Robot) |
Comment 16•8 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 18•8 years ago
|
||
We're down to 0 since last Thursday.
Comment 19•8 years ago
|
||
Well, are we? There are 100 different bugs for this issue, similar to the docker-image link intermittent (which is now closer to 1000 bugs). Filing a new bug for every 2-3 instances of an infra issue is a great way to make it seem like our intermittent rates are low, but it doesn't actually *make* those intermittent rates low.
Resolution: FIXED → DUPLICATE
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Whiteboard: [stockwell infra]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 24•7 years ago
|
||
re: comment 23
These were caused by a bad file being copied. John purged the file from cloud-mirror and things returned to normal.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•