Closed Bug 1410723 Opened 7 years ago Closed 7 years ago

Intermittent DistutilsFileError: cannot copy tree

Categories

(Firefox for Android Graveyard :: Testing, defect, P5)

defect

Tracking

(firefox58 fixed, firefox59 fixed)

RESOLVED FIXED
Firefox 59
Tracking Status
firefox58 --- fixed
firefox59 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files, 2 obsolete files)

Filed by: archaeopteryx [at] coole-files.de https://treeherder.mozilla.org/logviewer.html#?job_id=138730283&repo=mozilla-central https://queue.taskcluster.net/v1/task/fFKL89L5QIyY5ZIdpzJHjQ/runs/0/artifacts/public/logs/live_backing.log [task 2017-10-22T11:02:06.969Z] 11:02:06 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js [task 2017-10-22T11:02:13.012Z] 11:02:13 INFO - mozdevice killProcess failed force-stop of xpcshell [task 2017-10-22T11:02:13.032Z] 11:02:13 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js | took 6063ms [task 2017-10-22T11:02:13.065Z] 11:02:13 INFO - Following exceptions were raised: [task 2017-10-22T11:02:13.066Z] 11:02:13 INFO - Traceback (most recent call last): [task 2017-10-22T11:02:13.066Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 181, in run [task 2017-10-22T11:02:13.067Z] 11:02:13 INFO - self.run_test() [task 2017-10-22T11:02:13.068Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 792, in run_test [task 2017-10-22T11:02:13.068Z] 11:02:13 INFO - if self.checkForCrashes(self.tempDir, self.symbolsPath, test_name=name): [task 2017-10-22T11:02:13.069Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 181, in checkForCrashes [task 2017-10-22T11:02:13.069Z] 11:02:13 INFO - self.device.getDirectory(self.remoteMinidumpDir, dumpDir) [task 2017-10-22T11:02:13.070Z] 11:02:13 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/devicemanagerADB.py", line 541, in getDirectory [task 2017-10-22T11:02:13.070Z] 11:02:13 INFO - dir_util.copy_tree(localDir, originalLocal) [task 2017-10-22T11:02:13.071Z] 11:02:13 INFO - File "/usr/lib/python2.7/distutils/dir_util.py", line 128, in copy_tree [task 2017-10-22T11:02:13.071Z] 11:02:13 INFO - "cannot copy tree '%s': not a directory" % src [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - DistutilsFileError: cannot copy tree '/tmp/tmps55NSj/minidumps': not a directory [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - Traceback (most recent call last): [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 625, in <module> [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - main() [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 620, in main [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - mobileArgs=xpcsh.mobileArgs): [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1385, in runTests [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - mobileArgs, **kwargs) [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1580, in runTestList [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - raise exceptions[0] [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - distutils.errors.DistutilsFileError: cannot copy tree '/tmp/tmps55NSj/minidumps': not a directory [task 2017-10-22T11:02:13.114Z] 11:02:13 ERROR - Return code: 1
Flags: needinfo?(gbrown)
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=138730283&lineNumber=2140 [task 2017-10-22T11:02:06.969Z] 11:02:06 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js [task 2017-10-22T11:02:13.012Z] 11:02:13 INFO - mozdevice killProcess failed force-stop of xpcshell [task 2017-10-22T11:02:13.032Z] 11:02:13 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js | took 6063ms [task 2017-10-22T11:02:13.065Z] 11:02:13 INFO - Following exceptions were raised: [task 2017-10-22T11:02:13.066Z] 11:02:13 INFO - Traceback (most recent call last): [task 2017-10-22T11:02:13.066Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 181, in run [task 2017-10-22T11:02:13.067Z] 11:02:13 INFO - self.run_test() [task 2017-10-22T11:02:13.068Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 792, in run_test [task 2017-10-22T11:02:13.068Z] 11:02:13 INFO - if self.checkForCrashes(self.tempDir, self.symbolsPath, test_name=name): [task 2017-10-22T11:02:13.069Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 181, in checkForCrashes [task 2017-10-22T11:02:13.069Z] 11:02:13 INFO - self.device.getDirectory(self.remoteMinidumpDir, dumpDir) [task 2017-10-22T11:02:13.070Z] 11:02:13 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/devicemanagerADB.py", line 541, in getDirectory [task 2017-10-22T11:02:13.070Z] 11:02:13 INFO - dir_util.copy_tree(localDir, originalLocal) [task 2017-10-22T11:02:13.071Z] 11:02:13 INFO - File "/usr/lib/python2.7/distutils/dir_util.py", line 128, in copy_tree [task 2017-10-22T11:02:13.071Z] 11:02:13 INFO - "cannot copy tree '%s': not a directory" % src [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - DistutilsFileError: cannot copy tree '/tmp/tmps55NSj/minidumps': not a directory [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - Traceback (most recent call last): [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 625, in <module> [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - main() [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 620, in main [task 2017-10-22T11:02:13.072Z] 11:02:13 INFO - mobileArgs=xpcsh.mobileArgs): [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1385, in runTests [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - mobileArgs, **kwargs) [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - File "/builds/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1580, in runTestList [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - raise exceptions[0] [task 2017-10-22T11:02:13.073Z] 11:02:13 INFO - distutils.errors.DistutilsFileError: cannot copy tree '/tmp/tmps55NSj/minidumps': not a directory [task 2017-10-22T11:02:13.114Z] 11:02:13 ERROR - Return code: 1 [task 2017-10-22T11:02:13.114Z] 11:02:13 ERROR - No tests run or test summary not found Bug 1328301 updated getDirectory(), but that is conditional on using a modern adb. In automation, we were using an older adb until I updated it in bug 1409057.
I haven't quite figured out what is going wrong here or how to fix it, and probably won't get to it for at least a few days. :bc, checking with you to see if you have come across this, and/or if you want to take it?? (No pressure -- nothing urgent here afaik.)
Flags: needinfo?(bob)
Yes, I have seen it in autophone. It is common but not that frequent. I would like to take a look some time but it is really low on my priorities. The main issue I'd like to tackle now is disabling the tests which time out since they result in dramatic reductions in throughput. If someone would like to investigate that would be great. Otherwise, I'll get to it when I have the time. Thanks for the heads up though. I had thought this was Autophone specific. It must lie in devicemanager somewhere.
Flags: needinfo?(bob)
Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
Summary: Intermittent DistutilsFileError: cannot copy tree '/tmp/tmps55NSj/minidumps': not a directory → Intermittent DistutilsFileError: cannot copy tree
Attached patch better diagnostics on getDirectory failure (obsolete) (deleted) — Splinter Review
I don't understand how/why that directory is not there after the pull. Rather than pull my hair out, here's a patch to improve the error message -- hopefully that will help me understand the problem.
Attachment #8933153 - Flags: review?(bob)
Keywords: leave-open
Comment on attachment 8933153 [details] [diff] [review] better diagnostics on getDirectory failure Review of attachment 8933153 [details] [diff] [review]: ----------------------------------------------------------------- Let's chat on irc. ::: testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py @@ +538,5 @@ > localDir = os.path.join(tempParent, remoteName) > else: > localDir = '/'.join(localDir.rstrip('/').split('/')[:-1]) > + cmd = ["pull", remoteDir, localDir] > + proc = self._runCmd(cmd) The way _runCmd is written, proc will be None if _runCmd times out. At first I thought we needed the wait() but it seems _runCmd will guarantee that it is called if proc is not None. @@ +543,1 @@ > if copyRequired: Maybe handle no proc by: if not proc: raise Exception("getDirectory() _runCmd %s failed" % str(cmd)) try: ... But really, the caller's of _runCmd seem to expect proc to be returned regardless of the timeout. Perhaps we should make _runCmd always return proc even if it does timeout. Then we would need the proc.wait().
Attachment #8933153 - Flags: review?(bob)
Thanks. I like the idea of always returning proc from _runCmd, since many existing callers assume a non-None return value. Also added some warnings to _runCmd to give us more info about retries. Let's see how that works. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba362d7944eba488a31d41e2c64fa56c8d24138f
Attachment #8933153 - Attachment is obsolete: true
Attachment #8933354 - Flags: review?(bob)
Comment on attachment 8933354 [details] [diff] [review] better diagnostics on getDirectory failure Review of attachment 8933354 [details] [diff] [review]: ----------------------------------------------------------------- Minor style comments. Looks good to me. r+ ::: testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py @@ +544,5 @@ > + try: > + dir_util.copy_tree(localDir, originalLocal) > + mozfile.remove(tempParent) > + except: > + self._logger.error("getDirectory() failed after %s" % str(cmd)) Since we have proc now, what do you think of adding proc.returncode to the error message? @@ +696,5 @@ > retries += 1 > else: > return proc > + self._logger.warning("_runCmd exceeded all retries") > + return proc Personal style issue. Feel free to ignore. proc = None while retries < retryLimit: proc = ProcessHandler(finalArgs, storeOutput=True, processOutputLine=self._log, onTimeout=_timeout) proc.returncode = proc.wait() if proc.returncode is not None: break proc.kill() self._logger.warning("_runCmd failed for '%s'" % ' '.join(finalArgs)) retries += 1 if retries >= retryLimit: self._logger.warning("_runCmd exceeded all retries") return proc
Attachment #8933354 - Flags: review?(bob) → review+
Great suggestions -- will incorporate both on landing.
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0e3f8ec51869 Improve diagnostics when Android devicemanager getDirectory fails; r=bc
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=149266366&lineNumber=1875 [task 2017-12-02T08:41:40.533Z] 08:41:40 INFO - mozdevice getDirectory() failed after ['pull', '/storage/sdcard/tests/xpc/minidumps', '/tmp/tmp8XVf2s/minidumps'] [task 2017-12-02T08:41:40.533Z] 08:41:40 INFO - mozdevice rc=1 out=['adb: error: failed to get feature set: no devices/emulators found'] ... [task 2017-12-02T08:41:40.545Z] 08:41:40 INFO - DistutilsFileError: cannot copy tree '/tmp/tmp8XVf2s/minidumps': not a directory
All the recent failures with extra diagnostics show the same error message, as in comment 18: The emulator is suddenly unresponsive. That's a common intermittent problem with the emulator, usually handled by retrying the task whenever a DMError is seen. From that perspective, the only problem here is that getDirectory fails to throw DMError in this case; using _checkCmd should rectify that.
Attachment #8934718 - Flags: review?(bob)
Comment on attachment 8934718 [details] [diff] [review] use _checkCmd instead of _runCmd for getDirectory Review of attachment 8934718 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py @@ +539,4 @@ > else: > localDir = '/'.join(localDir.rstrip('/').split('/')[:-1]) > cmd = ["pull", remoteDir, localDir] > + proc = self._checkCmd(cmd) _checkCmd returns the returncode of the process.
Attachment #8934718 - Flags: review?(bob)
It looks like 'adb pull' will return non-zero for other conditions -- an empty directory, or a directory that doesn't exist, so I think it is necessary to parse the output to decide if throwing a DMError and triggering a task retry is reasonable. Here I check for the "no devices/emulators found" message as an explicit indication that the emulator is down, but also allow for other conditions: if adb pull failed and didn't report any of the expected messages: 0 files pulled 1 file pulled <remote dir> does not exist Doesn't break the common cases: https://treeherder.mozilla.org/#/jobs?repo=try&revision=60e69ae6e6892d809f1c5833a7d6ac19cf399c7f
Attachment #8934718 - Attachment is obsolete: true
Attachment #8935133 - Flags: review?(bob)
Comment on attachment 8935133 [details] [diff] [review] throw DMError when a bad emulator is suspected Review of attachment 8935133 [details] [diff] [review]: ----------------------------------------------------------------- r+. Feel free to ignore the style comment. ::: testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py @@ +544,5 @@ > + # Raise a DMError when the device is missing, but not when the > + # directory is empty or missing. > + if ("no devices/emulators found" in proc.output or > + not ("pulled" in proc.output or > + "does not exist" in proc.output)): A comment on style. I find I need to concentrate a bit to make sure this does what we expect. What do you think of? if ("no devices/emulators found" in proc.output or "pulled" not in proc.output and "does not exist" not in proc.output): or with parens which aren't absolutely necessary but do provide a 1 char indent of the raise from the condition? if ("no devices/emulators found" in proc.output or ("pulled" not in proc.output and "does not exist" not in proc.output)):
Attachment #8935133 - Flags: review?(bob) → review+
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1a48edb857f3 Throw DMError when getDirectory() fails suspiciously; r=bc
Keywords: leave-open
autophone passes on try, with a simple modification to account for [] output.
Flags: needinfo?(gbrown)
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5115d6bc9009 Throw DMError when getDirectory() fails suspiciously; r=bc
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 59
Blocks: 1425322
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: