Closed
Bug 1300812
Opened 8 years ago
Closed 8 years ago
Make Mozharness downloads and unpacks actions handle better intermittent S3/EC2 issues
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(firefox51 fixed)
VERIFIED
FIXED
Tracking | Status | |
---|---|---|
firefox51 | --- | fixed |
People
(Reporter: armenzg, Assigned: armenzg)
References
Details
Attachments
(2 files)
We will use this bug to solve the problem and block the intermittent oranges which have been filed.
Assignee | ||
Comment 1•8 years ago
|
||
Hi philor, I will tackle each dependent bug in the order of worst offenders.
I might tackle more than one at a time.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8789845 -
Flags: review?(gps)
Attachment #8789845 -
Flags: review?(aki)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 5•8 years ago
|
||
Comment on attachment 8789845 [details]
Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues
This works, however, I'm more interested on feedback as I'm not completely satisfied on what I wrote.
Attachment #8789845 -
Flags: review?(gps)
Attachment #8789845 -
Flags: review?(aki)
Attachment #8789845 -
Flags: feedback?(gps)
Attachment #8789845 -
Flags: feedback?(aki)
Assignee | ||
Comment 6•8 years ago
|
||
mozreview-review |
Comment on attachment 8789845 [details]
Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues
https://reviewboard.mozilla.org/r/77904/#review76320
::: testing/mozharness/mozharness/base/script.py:500
(Diff revision 3)
> + raise FetchedIncorrectFilesize(
> + 'The server file size is {} while we got instead {}'.format(
> + expected_file_size, obtained_file_size)
> + )
> +
> + compressed_file = StringIO(file_contents)
gps: should we simply passed compressed_file instead of file_object?
Perhaps it is better to completely separate the download logic from the unpacking.
We might be able to have two different retry calls with their own set of exceptions being handled.
::: testing/mozharness/mozharness/base/script.py:521
(Diff revision 3)
> if mode:
> os.chmod(fname, mode)
>
> except zipfile.BadZipfile as e:
> self.exception('{}'.format(e.message))
> + raise
aki: do I need to raise the exception if I'm not catching it inside of the retry?
Comment 7•8 years ago
|
||
Comment on attachment 8789845 [details]
Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues
> ::: testing/mozharness/mozharness/base/script.py:521
> (Diff revision 3)
> > if mode:
> > os.chmod(fname, mode)
> >
> > except zipfile.BadZipfile as e:
> > self.exception('{}'.format(e.message))
> > + raise
>
> aki: do I need to raise the exception if I'm not catching it inside of the
> retry?
What do you want to have happen?
self.exception() will log the exception at level ERROR by default:
http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/mozharness/base/log.py#l157
Raising it without catching it will kill the script. A similar thing can happen by self.exception() with level=FATAL. It looks like run() catches exceptions and runs the post-run cleanup/reporting steps anyway:
http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/mozharness/base/script.py#l1930
So if you want the script to die at this point, raising it seems ok. I think it's equivalent to self.exception(..., level=FATAL).
Attachment #8789845 -
Flags: feedback?(aki) → feedback+
Assignee | ||
Comment 8•8 years ago
|
||
Any suggestions on how to patch download_unpack()?
I want to test the case scenario where the content_length is bad once and the next retry will get the right content_length.
For now, I'm only dealing with file:// as I don't know how to mock http urls.
Assignee | ||
Comment 9•8 years ago
|
||
Nevermind my last comment. I'm making progress on writing the test.
Assignee | ||
Updated•8 years ago
|
Summary: Make Mozharness downloads handle better intermittent S3/EC2 issues → Make Mozharness downloads and unpacks actions handle better intermittent S3/EC2 issues
Comment hidden (mozreview-request) |
Comment 11•8 years ago
|
||
mozreview-review |
Comment on attachment 8789845 [details]
Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues
https://reviewboard.mozilla.org/r/77906/#review76684
Overall, I think this looks good. Some history and comments:
The original idea behind the download and extract methods was to speed up local testing by saving the downloaded file to disk, so we wouldn't have to re-download every time we ran. That would allow us to run other actions without necessarily needing the network... skip the download action, and the extract action would still work properly.
However, mozharness has definitely become more of a CI tool than a CI+local development tool, and we already changed the download + extract actions to download_and_extract or download_unpack here, so retrying the download to memory and extracting seems like the best solution if we continue in that direction.
One thing: we're assuming the file will always be a size that will fit into memory. That may be a valid assumption forever, or we may hit files that we can't extract because of size. If that ever becomes a concern, we could put in a check when we find the expected filesize; if it's larger than our max in-memory size, we can write to disk and treat it as a file.
::: testing/mozharness/mozharness/base/script.py:679
(Diff revision 4)
> - function,
> - kwargs=kwargs,
> + self.fetch_url_into_memory,
> + kwargs={'url': url},
> **retry_args
> )
>
> + if return_value == -1:
I think we'll never hit this because the above error_level is FATAL. It doesn't hurt, though.
::: testing/mozharness/mozharness/base/script.py:691
(Diff revision 4)
> + retry_args = dict(
> + retry_exceptions=(
> + zipfile.BadZipfile
> + ),
> + error_message="Can't unpack the fetched file.",
> + error_level=FATAL,
This hardcoded FATAL means we can't proceed without this file downloaded. That's usually what we want, and we already have a hardcoded FATAL above.
Attachment #8789845 -
Flags: review?(aki) → review+
Comment hidden (mozreview-request) |
Assignee | ||
Comment 13•8 years ago
|
||
I addressed your comments.
I've also fixed an issue in the code which would the tests fail.
I'm re-testing on try.
Comment 14•8 years ago
|
||
mozreview-review |
Comment on attachment 8789845 [details]
Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues
https://reviewboard.mozilla.org/r/77906/#review76734
This is fine as a refactor as it is better than the previous code because it won't retry download if that steps succeeds. However, I'm going to challenge that archive extraction needs to be retried. Archive extraction should be an deterministic operation. Given a consant input, it will either work or it won't. I'm not sure what retrying that specific step will accomplish other than add latency to the job eventually failing.
Since we retried archive extraction before, I'm fine giving this an r+. But please consider eliminating the retry (or possibly capping it to 1 retry) as a follow-up.
::: testing/mozharness/mozharness/base/script.py:398
(Diff revision 5)
> +
> + self.info('Expected file size: {}'.format(expected_file_size))
> + self.debug('Url: {}'.format(url))
> + self.debug('Content-Encoding {}'.format(response.headers.get('Content-Encoding')))
> +
> + file_contents = response.read()
I'm pretty sure this will result in 2x memory usage because we first buffer the entire HTTP response *then* make a copy into a BytesIO later. I don't think this will be a problem for our current use cases, however.
::: testing/mozharness/mozharness/base/script.py:409
(Diff revision 5)
> + expected_file_size, obtained_file_size)
> + )
> +
> + # Use BytesIO instead of StringIO
> + # http://stackoverflow.com/questions/34162017/unzip-buffer-with-python/34162395#34162395
> + return BytesIO(file_contents)
While it is correct to use ``io.BytesIO`` (``io.BytesIO`` represents binary data, ``io.StringIO`` represents Unicode), there is a caveat.
The ``StringIO.StringIO`` class (which only exists in Python 2) is kinda a hybrid class. And, it is backed by a more efficient implementation. In Python 2, the ``io.*IO`` classes are a bit slower than ``StringIO.StringIO``. For that reason, many people still cling to ``StringIO.StringIO`` on Python 2.
However, I /think/ we're OK here since the ``BytesIO`` instance is pre-allocated with the full size of the final data. If we were streaming into the ``BytesIO``, then we'd have to worry about perf.
Attachment #8789845 -
Flags: review?(gps) → review+
Comment hidden (mozreview-request) |
Assignee | ||
Comment 16•8 years ago
|
||
Good point about that second retry.
I dropped the need for it and make a direct call.
If we ever saw zipfile.BadZipfile it was because it was a bad download to begin with.
I will do one last try run and land tomorrow morning.
Thank you for the reviews.
Comment hidden (mozreview-request) |
Comment 18•8 years ago
|
||
Pushed by armenzg@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4c2d01fe1c38
Make Mozharness downloads handle better intermittent S3/EC2 issues r=aki,gps
Assignee | ||
Updated•8 years ago
|
Comment 19•8 years ago
|
||
bugherder |
Assignee | ||
Comment 20•8 years ago
|
||
The remaining bugs that this bug 'blocks' will be dealt with on bug 1303759.
Both bugs shared those bugs as the same 'orange' signature.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•