Closed
Bug 884115
Opened 11 years ago
Closed 11 years ago
Add timeouts to mozharness' urllib2.urlopen() requests
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: jyeo)
References
Details
(Whiteboard: [mozharness])
Attachments
(2 files)
(deleted),
patch
|
mozilla
:
review+
emorley
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mozilla
:
review+
jyeo
:
checked-in+
|
Details | Diff | Splinter Review |
Python 2.6 and higher supports specifying a timeout for urllib2.urlopen(), which determines how long a socket should wait for a response before timing out and raising socket.timeout We should add timeouts to avoid mozharness hangs which result in a generic buildbot timeout (which requires the opening of the full log before starring) as well as make the most of the retry wrapper that we already use to catch the urllib2.HTTPError & urllib2.URLError failure modes.
Reporter | ||
Comment 1•11 years ago
|
||
Adds timeouts to mozharness' urllib2.urlopen() calls, makes us auto retry for the socket.timeout failure mode during _download_file(), and cleans up mouse_and_screen_resolution.py::wfetch()'s retry handling.
Attachment #763896 -
Flags: review?(aki)
Reporter | ||
Comment 2•11 years ago
|
||
Note requires Python 2.6 or higher - is this present on all the machines for which we run mozharness?
Comment 3•11 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #2) > Note requires Python 2.6 or higher - is this present on all the machines for > which we run mozharness? I seem to remember we were lagging on a 2.5.1 install on some platform when we first rolled out mozharness unittests. With the attempts to standardize on Python 2.7[.3?], that may no longer be the case. We should probably verify.
Comment 4•11 years ago
|
||
Comment on attachment 763896 [details] [diff] [review] Patch v1 Might be worth baking a bit on Cedar (default) before merging to every other branch (production branch).
Attachment #763896 -
Flags: review?(aki) → review+
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #3) > I seem to remember we were lagging on a 2.5.1 install on some platform when > we first rolled out mozharness unittests. With the attempts to standardize > on Python 2.7[.3?], that may no longer be the case. We should probably > verify. A number of bug 724191's dependants have been fixed & whilst inspecting a bunch of logs shows a few places using older versions, they only seem to be talos machines, so shouldn't affect mozharness for now :-)
Reporter | ||
Comment 6•11 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #4) > Might be worth baking a bit on Cedar (default) before merging to every other > branch (production branch). Agreed. Pushed to cedar to generate a recent set of builds, then pushed this patch: https://hg.mozilla.org/build/mozharness/rev/7a39cb9045f9 Have requested a dep set of builds after that.
Reporter | ||
Comment 7•11 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #6) > Have requested a dep set of builds after that. In fact just did another push rather than requesting more builds via buildapi to make it easier to distinguish them on TBPL.
Reporter | ||
Updated•11 years ago
|
Attachment #763896 -
Flags: checked-in+
Comment 8•11 years ago
|
||
In production.
Reporter | ||
Comment 9•11 years ago
|
||
Thank you :-)
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 10•11 years ago
|
||
We seem to not be catching socket.error somewhere. 14:05:46 INFO - ##### 14:05:46 INFO - ##### Running download-and-extract step. 14:05:46 INFO - ##### 14:05:46 INFO - mkdir: C:\slave\test\build 14:05:46 INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32-debug/1371671157/firefox-24.0a1.en-US.win32.tests.zip to C:\slave\test\build\firefox-24.0a1.en-US.win32.tests.zip 14:05:46 INFO - retry: Calling <bound method DesktopUnittest._download_file of <__main__.DesktopUnittest object at 0x00E66EB0>> with args: ('http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32-debug/1371671157/firefox-24.0a1.en-US.win32.tests.zip', 'C:\\slave\\test\\build\\firefox-24.0a1.en-US.win32.tests.zip'), kwargs: {}, attempt #1 Traceback (most recent call last): File "scripts/scripts/desktop_unittest.py", line 374, in <module> desktop_unittest.run() File "C:\slave\test\scripts\mozharness\base\script.py", line 821, in run self._possibly_run_method(method_name, error_if_missing=True) File "C:\slave\test\scripts\mozharness\base\script.py", line 780, in _possibly_run_method return getattr(self, method_name)() File "scripts/scripts/desktop_unittest.py", line 275, in download_and_extract super(DesktopUnittest, self).download_and_extract(target_unzip_dirs=target_unzip_dirs) File "C:\slave\test\scripts\mozharness\mozilla\testing\testbase.py", line 237, in download_and_extract self._download_test_zip() File "C:\slave\test\scripts\mozharness\mozilla\testing\testbase.py", line 162, in _download_test_zip error_level=FATAL) File "C:\slave\test\scripts\mozharness\base\script.py", line 228, in download_file error_level=error_level, File "C:\slave\test\scripts\mozharness\base\script.py", line 440, in retry status = action(*args, **kwargs) File "C:\slave\test\scripts\mozharness\base\script.py", line 177, in _download_file block = f.read(1024 ** 2) File "c:\mozilla-build\python27\lib\socket.py", line 380, in read data = self._sock.recv(left) File "c:\mozilla-build\python27\lib\httplib.py", line 561, in read s = self.fp.read(amt) File "c:\mozilla-build\python27\lib\socket.py", line 380, in read data = self._sock.recv(left) socket.error: [Errno 10035] A non-blocking socket operation could not be completed immediately program finished with exit code 1 elapsedTime=47.716000
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 11•11 years ago
|
||
Backed out in http://hg.mozilla.org/build/mozharness/rev/c65a155cae77 .
Reporter | ||
Comment 13•11 years ago
|
||
Hi sorry was on PTO Fri/away over the weekend. I had a quick dig around at the time of the backout, but struggled to find anything useful in the Python docs. I'd like to come back to this in the not so distant future if it's not been fixed by someone else since - but not sure on the timeframe - so happy for someone else to take it in the meantime :-)
Flags: needinfo?(emorley)
Assignee | ||
Comment 14•11 years ago
|
||
I applied edmorley's patch and did the following experiment. I tried to disconnect the network cable on a win and a linux machine during the download_and_extract step and found out that the winapi and the libc implements sockets different. (obviously) Disconnecting the cable on a win machine causes mozharness to die because of an uncaught socket error but disconnecting the cable on a linux machine causes the internal socket.py's timeout counter to start counting and raises the socket.timeout exception. I think we should try catching socket.error as the last case. I will try pushing this to ash-mozharness just in case errno 10035 happens again.
Assignee: emorley → yshun
Assignee | ||
Updated•11 years ago
|
Attachment #783176 -
Flags: review?(aki)
Reporter | ||
Comment 15•11 years ago
|
||
Thank you for picking this up! :-)
Updated•11 years ago
|
Attachment #783176 -
Flags: review?(aki) → review+
Assignee | ||
Comment 16•11 years ago
|
||
Seems to be working on Ash. I don't see any socket.error :-/ https://tbpl.mozilla.org/?tree=Ash
Reporter | ||
Comment 17•11 years ago
|
||
Good to land on mozharness now? :-)
Assignee | ||
Updated•11 years ago
|
Attachment #783176 -
Flags: checked-in+
Comment 18•11 years ago
|
||
In production
Assignee | ||
Comment 19•11 years ago
|
||
I don't see any issues related to this on our production mozharness jobs. FIXED. :)
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Component: General Automation → Mozharness
No longer blocks: 1074585
You need to log in
before you can comment on or make changes to this bug.
Description
•