1300812 - Make Mozharness downloads and unpacks actions handle better intermittent S3/EC2 issues

Comment on attachment 8789845 [details] Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues This works, however, I'm more interested on feedback as I'm not completely satisfied on what I wrote.

Attachment #8789845 - Flags: review?(gps)

Attachment #8789845 - Flags: review?(aki)

Attachment #8789845 - Flags: feedback?(gps)

Attachment #8789845 - Flags: feedback?(aki)

Armen [:armenzg]

Assignee

Comment 6

•

8 years ago

mozreview-review

Comment on attachment 8789845 [details] Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues https://reviewboard.mozilla.org/r/77904/#review76320 ::: testing/mozharness/mozharness/base/script.py:500 (Diff revision 3) > + raise FetchedIncorrectFilesize( > + 'The server file size is {} while we got instead {}'.format( > + expected_file_size, obtained_file_size) > + ) > + > + compressed_file = StringIO(file_contents) gps: should we simply passed compressed_file instead of file_object? Perhaps it is better to completely separate the download logic from the unpacking. We might be able to have two different retry calls with their own set of exceptions being handled. ::: testing/mozharness/mozharness/base/script.py:521 (Diff revision 3) > if mode: > os.chmod(fname, mode) > > except zipfile.BadZipfile as e: > self.exception('{}'.format(e.message)) > + raise aki: do I need to raise the exception if I'm not catching it inside of the retry?

Aki Sasaki (not active)

Comment 7

•

8 years ago

Comment on attachment 8789845 [details] Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues > ::: testing/mozharness/mozharness/base/script.py:521 > (Diff revision 3) > > if mode: > > os.chmod(fname, mode) > > > > except zipfile.BadZipfile as e: > > self.exception('{}'.format(e.message)) > > + raise > > aki: do I need to raise the exception if I'm not catching it inside of the > retry? What do you want to have happen? self.exception() will log the exception at level ERROR by default: http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/mozharness/base/log.py#l157 Raising it without catching it will kill the script. A similar thing can happen by self.exception() with level=FATAL. It looks like run() catches exceptions and runs the post-run cleanup/reporting steps anyway: http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/mozharness/base/script.py#l1930 So if you want the script to die at this point, raising it seems ok. I think it's equivalent to self.exception(..., level=FATAL).

Attachment #8789845 - Flags: feedback?(aki) → feedback+

Armen [:armenzg]

Assignee

Comment 8

•

8 years ago

Attached file Test (deleted) — Details

Any suggestions on how to patch download_unpack()? I want to test the case scenario where the content_length is bad once and the next retry will get the right content_length. For now, I'm only dealing with file:// as I don't know how to mock http urls.

Armen [:armenzg]

Assignee

Comment 9

•

8 years ago

Nevermind my last comment. I'm making progress on writing the test.

Treeherder Bug Filer

Updated

•

8 years ago

Blocks: 1301802

Treeherder Bug Filer

Updated

•

8 years ago

Blocks: 1301807

Phil Ringnalda (:philor)

Updated

•

8 years ago

Blocks: 1300953

Treeherder Bug Filer

Updated

•

8 years ago

Blocks: 1301855

Phil Ringnalda (:philor)

Updated

•

8 years ago

Blocks: 1300625

Armen [:armenzg]

Assignee

Updated

•

8 years ago

No longer blocks: 1301605

Armen [:armenzg]

Assignee

Updated

•

8 years ago

Summary: Make Mozharness downloads handle better intermittent S3/EC2 issues → Make Mozharness downloads and unpacks actions handle better intermittent S3/EC2 issues

Comment hidden (mozreview-request)

Aki Sasaki (not active)

Comment 11

•

8 years ago

mozreview-review

Comment on attachment 8789845 [details] Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues https://reviewboard.mozilla.org/r/77906/#review76684 Overall, I think this looks good. Some history and comments: The original idea behind the download and extract methods was to speed up local testing by saving the downloaded file to disk, so we wouldn't have to re-download every time we ran. That would allow us to run other actions without necessarily needing the network... skip the download action, and the extract action would still work properly. However, mozharness has definitely become more of a CI tool than a CI+local development tool, and we already changed the download + extract actions to download_and_extract or download_unpack here, so retrying the download to memory and extracting seems like the best solution if we continue in that direction. One thing: we're assuming the file will always be a size that will fit into memory. That may be a valid assumption forever, or we may hit files that we can't extract because of size. If that ever becomes a concern, we could put in a check when we find the expected filesize; if it's larger than our max in-memory size, we can write to disk and treat it as a file. ::: testing/mozharness/mozharness/base/script.py:679 (Diff revision 4) > - function, > - kwargs=kwargs, > + self.fetch_url_into_memory, > + kwargs={'url': url}, > **retry_args > ) > > + if return_value == -1: I think we'll never hit this because the above error_level is FATAL. It doesn't hurt, though. ::: testing/mozharness/mozharness/base/script.py:691 (Diff revision 4) > + retry_args = dict( > + retry_exceptions=( > + zipfile.BadZipfile > + ), > + error_message="Can't unpack the fetched file.", > + error_level=FATAL, This hardcoded FATAL means we can't proceed without this file downloaded. That's usually what we want, and we already have a hardcoded FATAL above.

Attachment #8789845 - Flags: review?(aki) → review+

Comment hidden (mozreview-request)

Armen [:armenzg]

Assignee

Comment 13

•

8 years ago

I addressed your comments. I've also fixed an issue in the code which would the tests fail. I'm re-testing on try.

Gregory Szorc [:gps]

Comment 14

•

8 years ago

mozreview-review

Comment on attachment 8789845 [details] Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues https://reviewboard.mozilla.org/r/77906/#review76734 This is fine as a refactor as it is better than the previous code because it won't retry download if that steps succeeds. However, I'm going to challenge that archive extraction needs to be retried. Archive extraction should be an deterministic operation. Given a consant input, it will either work or it won't. I'm not sure what retrying that specific step will accomplish other than add latency to the job eventually failing. Since we retried archive extraction before, I'm fine giving this an r+. But please consider eliminating the retry (or possibly capping it to 1 retry) as a follow-up. ::: testing/mozharness/mozharness/base/script.py:398 (Diff revision 5) > + > + self.info('Expected file size: {}'.format(expected_file_size)) > + self.debug('Url: {}'.format(url)) > + self.debug('Content-Encoding {}'.format(response.headers.get('Content-Encoding'))) > + > + file_contents = response.read() I'm pretty sure this will result in 2x memory usage because we first buffer the entire HTTP response *then* make a copy into a BytesIO later. I don't think this will be a problem for our current use cases, however. ::: testing/mozharness/mozharness/base/script.py:409 (Diff revision 5) > + expected_file_size, obtained_file_size) > + ) > + > + # Use BytesIO instead of StringIO > + # http://stackoverflow.com/questions/34162017/unzip-buffer-with-python/34162395#34162395 > + return BytesIO(file_contents) While it is correct to use ``io.BytesIO`` (``io.BytesIO`` represents binary data, ``io.StringIO`` represents Unicode), there is a caveat. The ``StringIO.StringIO`` class (which only exists in Python 2) is kinda a hybrid class. And, it is backed by a more efficient implementation. In Python 2, the ``io.*IO`` classes are a bit slower than ``StringIO.StringIO``. For that reason, many people still cling to ``StringIO.StringIO`` on Python 2. However, I /think/ we're OK here since the ``BytesIO`` instance is pre-allocated with the full size of the final data. If we were streaming into the ``BytesIO``, then we'd have to worry about perf.

Attachment #8789845 - Flags: review?(gps) → review+

Comment hidden (mozreview-request)

Armen [:armenzg]

Assignee

Comment 16

•

8 years ago

Good point about that second retry. I dropped the need for it and make a direct call. If we ever saw zipfile.BadZipfile it was because it was a bad download to begin with. I will do one last try run and land tomorrow morning. Thank you for the reviews.

Treeherder Bug Filer

Updated

•

8 years ago

Blocks: 1302237

Comment hidden (mozreview-request)

Pulsebot

Comment 18

•

8 years ago

Pushed by armenzg@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4c2d01fe1c38 Make Mozharness downloads handle better intermittent S3/EC2 issues r=aki,gps

Treeherder Bug Filer

Updated

•

8 years ago

Blocks: 1302439

Armen [:armenzg]

Assignee

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1272083

Carsten Book [:Tomcat]

Comment 19

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4c2d01fe1c38

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox51: --- → fixed

Resolution: --- → FIXED

Treeherder Bug Filer

Updated

•

8 years ago

Blocks: 1303668

Armen [:armenzg]

Assignee

Comment 20

•

8 years ago

The remaining bugs that this bug 'blocks' will be dealt with on bug 1303759. Both bugs shared those bugs as the same 'orange' signature.

Status: RESOLVED → VERIFIED

Phil Ringnalda (:philor)

Updated

•

8 years ago

Blocks: 1305752

Bug 1300812 - Make Mozharness downloads handle better intermittent S3/EC2 issues 8 years ago Armen [:armenzg] (deleted), text/x-review-board-request	mozilla : review+ gps : review+	Details
Test 8 years ago Armen [:armenzg] (deleted), text/plain		Details