Closed Bug 1293309 Opened 8 years ago Closed 8 years ago

cleardir throws DMError following xpcshell failures | DMError Attempted to push a file to a directory

Categories

(Firefox for Android Graveyard :: Testing, defect, P2)

49 Branch
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1280570

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(1 file)

Some recent xpcshell and cppunit failures contain: 10:58:57 INFO - mozdevice.devicemanager.DMError: Non-zero return code for command: ['/data/local/xpcb/cleardir', '/storage/sdcard/tests/xpc/minidumps'] (output: '', retval: 'None') Following a test failure, the failure is reported, then the harnesses check for minidumps and delete the minidump directory - the cleardir call fails and throws. It is unclear whether this happens on any failure, or if this is intermittent. I wonder if this is a long-standing issue that has traditionally been hidden by buildbot's retry-on-DMError policy.
https://public-artifacts.taskcluster.net/K2BN9iecRd2ogFDYy5KOsQ/0/public/logs/live_backing.log 0:58:57 INFO - PROCESS | toolkit/components/extensions/test/xpcshell/test_ext_alarms_does_not_fire.js | JavaScript strict warning: chrome://global/content/bindings/browser.xml, line 369: ReferenceError: reference to undefined property tabBrowser.localNam 10:58:57 INFO - <<<<<<< 10:58:57 INFO - Following exceptions were raised: 10:58:57 INFO - Traceback (most recent call last): 10:58:57 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 169, in run 10:58:57 INFO - self.run_test() 10:58:57 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 775, in run_test 10:58:57 INFO - if self.checkForCrashes(self.tempDir, self.symbolsPath, test_name=name): 10:58:57 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 177, in checkForCrashes 10:58:57 INFO - self.clearRemoteDir(self.remoteMinidumpDir) 10:58:57 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 206, in clearRemoteDir 10:58:57 INFO - self.device.shellCheckOutput([self.remoteClearDirScript, remoteDir]) 10:58:57 INFO - File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/mozdevice/devicemanager.py", line 405, in shellCheckOutput 10:58:57 INFO - raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval)) 10:58:57 INFO - DMError: Non-zero return code for command: ['/data/local/xpcb/cleardir', '/storage/sdcard/tests/xpc/minidumps'] (output: '', retval: 'None') 10:58:57 INFO - Traceback (most recent call last): 10:58:57 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 628, in <module> 10:58:57 INFO - main() 10:58:57 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 623, in main 10:58:57 INFO - **vars(options)): 10:58:57 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1448, in runTests 10:58:57 INFO - raise exceptions[0] 10:58:57 INFO - mozdevice.devicemanager.DMError: Non-zero return code for command: ['/data/local/xpcb/cleardir', '/storage/sdcard/tests/xpc/minidumps'] (output: '', retval: 'None') 10:58:57 ERROR - Return code: 1 10:58:57 ERROR - No tests run or test summary not found 10:58:57 INFO - TinderboxPrint: xpcshell<br/><em class="testfail">T-FAIL</em> 1
I cannot find any cppunit examples -- maybe this is just xpcshell.
Summary: cleardir throws DMError following xpcshell, cppunit failures → cleardir throws DMError following xpcshell failures
Priority: -- → P2
I tried introducing a simple xpcshell test failure, but was unable to reproduce this issue. I tried introducing a simple xpcshell test crash, but was unable to reproduce this issue. There are many calls to cleardir over the course of an xpcshell job, but this always seems to happen on minidumps, so I suspect this has something to do with a crash, but no minidumps are ever retrieved. Maybe the crashreporter is hung and locking the minidumps directory? But then shouldn't cleardir's rm or mkdir calls just report an error (instead of seeing cleardir's output: '')?
Attached patch retry clearRemoteDir failures (deleted) — Splinter Review
I do not know what is causing these infrequent failures. I wonder if it might be a temporary condition (busy file?), so I'm adding a retry here.
Attachment #8781265 - Flags: review?(jmaher)
Comment on attachment 8781265 [details] [diff] [review] retry clearRemoteDir failures Review of attachment 8781265 [details] [diff] [review]: ----------------------------------------------------------------- consider the sheriff case, it might not be critical- but if you like the idea change it and r=me. ::: testing/xpcshell/remotexpcshelltests.py @@ +213,5 @@ > + time.sleep(10) > + try: > + out = self.device.shellCheckOutput([self.remoteClearDirScript, remoteDir]) > + except mozdevice.DMError: > + self.log.error("unable to delete %s: '%s'" % (remoteDir, str(out))) would it make sense to have a different message here to search on? I think we could detect via 'retrying after 10 seconds' when inspecting the log, but...what about the sheriffs- could we have 'second time unable to delete %s: %s'
Attachment #8781265 - Flags: review?(jmaher) → review+
Thanks. I'll change the wording of the 2nd message. Also note that only the 2nd message is an error.
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/03bd46e4cf1d Improve error handling for Android xpcshell minidump clear; r=jmaher
Keywords: leave-open
With that change in place, now we are seeing: https://public-artifacts.taskcluster.net/QsgCwfT_QkqM3d5lkLsP2w/0/public/logs/live_backing.log 12:31:27 WARNING - TEST-UNEXPECTED-FAIL | xpcom/tests/unit/test_streams.js | xpcshell return code: -1 12:31:27 INFO - TEST-INFO took 1181ms 12:31:28 INFO - unable to delete /storage/sdcard/tests/xpc/minidumps: '' 12:31:28 INFO - retrying after 10 seconds... 12:31:38 INFO - failed to delete /storage/sdcard/tests/xpc/minidumps: '' 12:31:38 INFO - unable to delete /storage/sdcard/tests/xpc/p: '' 12:31:38 INFO - retrying after 10 seconds... 12:31:48 INFO - failed to delete /storage/sdcard/tests/xpc/p: '' 12:31:48 INFO - unable to delete /storage/sdcard/tests/xpc/tmp: '' 12:31:48 INFO - retrying after 10 seconds... 12:31:58 INFO - failed to delete /storage/sdcard/tests/xpc/tmp: '' 12:31:58 INFO - Following exceptions were raised: 12:31:58 INFO - Traceback (most recent call last): 12:31:58 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 169, in run 12:31:58 INFO - self.run_test() 12:31:58 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 620, in run_test 12:31:58 INFO - self.mozInfoJSPath = self.setupMozinfoJS() 12:31:58 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 81, in setupMozinfoJS 12:31:58 INFO - self.device.pushFile(local, mozInfoJSPath) 12:31:58 INFO - File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/mozdevice/devicemanagerADB.py", line 229, in pushFile 12:31:58 INFO - (localname, destname)) 12:31:58 INFO - DMError: Attempted to push a file (/tmp/tmpP8Z2FR) to a directory (/storage/sdcard/tests/xpc/p/mozinfo.json)! 12:31:58 INFO - Traceback (most recent call last): 12:31:58 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 639, in <module> 12:31:58 INFO - main() 12:31:58 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 634, in main 12:31:58 INFO - **vars(options)): 12:31:58 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1448, in runTests 12:31:58 INFO - raise exceptions[0] 12:31:58 INFO - mozdevice.devicemanager.DMError: Attempted to push a file (/tmp/tmpP8Z2FR) to a directory (/storage/sdcard/tests/xpc/p/mozinfo.json)! 12:31:58 ERROR - Return code: 1 12:31:58 ERROR - No tests run or test summary not found Similarly: https://public-artifacts.taskcluster.net/LCqdcoo6RaCwI798cWhmng/0/public/logs/live_backing.log 00:29:17 WARNING - TEST-UNEXPECTED-FAIL | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | xpcshell return code: 14100 00:29:17 INFO - TEST-INFO took 4584ms 00:29:17 INFO - >>>>>>> 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | xpcw: cd /storage/sdcard/tests/xpc/toolkit/components/osfile/tests/xpcshell 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | xpcw: xpcshell -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/target.apk -m -s -e const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js"; -e const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json"; -e const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m"; -f /storage/sdcard/tests/xpc/head.js -e const _SERVER_ADDR = "localhost" -e const _HEAD_FILES = ["/storage/sdcard/tests/xpc/toolkit/components/osfile/tests/xpcshell/head.js"]; -e const _TAIL_FILES = []; -e const _JSDEBUGGER_PORT = 0; -e const _TEST_FILE = ["test_osfile_kill.js"]; -e const _TEST_NAME = "toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js" -e _execute_test(); quit(0); 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | Warning: MOZILLA_FIVE_HOME not set. 00:29:17 INFO - PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /home/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 272 00:29:17 INFO - <<<<<<< 00:29:17 INFO - unable to delete /storage/sdcard/tests/xpc/minidumps: '' 00:29:17 INFO - retrying after 10 seconds... 00:29:27 INFO - failed to delete /storage/sdcard/tests/xpc/minidumps: '' 00:29:27 INFO - TEST-START | toolkit/components/osfile/tests/xpcshell/test_osfile_async_setPermissions.js 00:29:27 INFO - TEST-SKIP | toolkit/components/osfile/tests/xpcshell/test_osfile_async_setPermissions.js | took 1ms 00:29:27 INFO - TEST-START | toolkit/components/osfile/tests/xpcshell/test_osfile_win_async_setPermissions.js 00:29:27 INFO - TEST-SKIP | toolkit/components/osfile/tests/xpcshell/test_osfile_win_async_setPermissions.js | took 0ms 00:29:27 INFO - TEST-START | toolkit/components/passwordmgr/test/unit/test_module_LoginImport.js 00:29:27 INFO - TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_module_LoginImport.js | took 1ms 00:29:27 INFO - TEST-START | toolkit/components/passwordmgr/test/unit/test_module_LoginStore.js 00:29:27 INFO - TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_module_LoginStore.js | took 1ms 00:29:27 INFO - TEST-START | toolkit/components/passwordmgr/test/unit/test_removeLegacySignonFiles.js 00:29:27 INFO - TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_removeLegacySignonFiles.js | took 0ms 00:29:27 INFO - TEST-START | toolkit/components/passwordmgr/test/unit/test_storage_mozStorage.js 00:29:27 INFO - TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_storage_mozStorage.js | took 0ms 00:29:27 INFO - TEST-START | toolkit/components/passwordmgr/test/unit/test_context_menu.js 00:29:27 INFO - TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_context_menu.js | took 0ms 00:29:27 INFO - unable to delete /storage/sdcard/tests/xpc/p: '' 00:29:27 INFO - retrying after 10 seconds... 00:29:37 INFO - failed to delete /storage/sdcard/tests/xpc/p: '' 00:29:37 INFO - unable to delete /storage/sdcard/tests/xpc/tmp: '' 00:29:37 INFO - retrying after 10 seconds... 00:29:47 INFO - failed to delete /storage/sdcard/tests/xpc/tmp: '' 00:29:47 INFO - Following exceptions were raised: 00:29:47 INFO - Traceback (most recent call last): 00:29:47 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 169, in run 00:29:47 INFO - self.run_test() 00:29:47 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 620, in run_test 00:29:47 INFO - self.mozInfoJSPath = self.setupMozinfoJS() 00:29:47 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 81, in setupMozinfoJS 00:29:47 INFO - self.device.pushFile(local, mozInfoJSPath) 00:29:47 INFO - File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/mozdevice/devicemanagerADB.py", line 229, in pushFile 00:29:47 INFO - (localname, destname)) 00:29:47 INFO - DMError: Attempted to push a file (/tmp/tmph1Axzk) to a directory (/storage/sdcard/tests/xpc/p/mozinfo.json)! 00:29:47 INFO - Traceback (most recent call last): 00:29:47 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 639, in <module> 00:29:47 INFO - main() 00:29:47 INFO - File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 634, in main 00:29:47 INFO - **vars(options)): 00:29:47 INFO - File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1448, in runTests 00:29:47 INFO - raise exceptions[0] 00:29:47 INFO - mozdevice.devicemanager.DMError: Attempted to push a file (/tmp/tmph1Axzk) to a directory (/storage/sdcard/tests/xpc/p/mozinfo.json)! 00:29:47 ERROR - Return code: 1 00:29:47 ERROR - No tests run or test summary not found
I read this as "during testing, an adb command fails and then all adb communication is lost". The mozharness cleanup code sees the same thing: 12:38:28 INFO - >> Verify emulator visible to adb: Attempt #4 of 4 12:38:28 INFO - Running timeout 30 /home/worker/workspace/build/android-sdk18/platform-tools/adb devices 12:38:28 INFO - List of devices attached 12:38:28 WARNING - Unable to communicate with emulator via adb Now it all makes sense: This is the xpcshell variant of bug 1157948, where the emulator becomes unresponsive in the middle of a test. We haven't noticed this on buildbot because the DMError triggers buildbot's automatic retry; on taskcluster, it is treated as a regular error because of bug 1280570.
I can't find the bug about it (probably closed because it used Panda in the summary), but we did have a longstanding, as long as we've retried on DMError, buildbot problem where we infinitely retried xpcshell crashes on Android. This actually blocks bug 1280570 rather than depending on it - with buildbot, we had tools that would (awkwardly) let me see, eventually, that someone had pushed an xpcshell crash to try including Android, which would have retried a few dozen times before I saw it and killed it; with taskcluster, we have no such tools. I'm quite surprised you couldn't reproduce this with an xpcshell crash, because it really should be 100%: we have no intermittent xpcshell crashes on Android (other than the new ones on TC), and we've always infinitely retried non-intermittent crashes.
I think I know what you mean, but I thought common xpcshell-infinite-retry problems were fixed. I also cannot find the bugs. Bug 875814 should have resolved some cases. Also, I imagine that our migration from sutagent to adb (or maybe panda -> emulator??) changed the way xpcshell crashes are handled. An Android xpcshell crash was recently introduced on inbound, and it was consistently handled well, with crash reports and no DMError: https://public-artifacts.taskcluster.net/TJMZx5DlTAqmt7GqlMaGow/0/public/logs/live_backing.log. Of course there is a risk that permanent failures like crashes will somehow trigger DMError and produce infinite retries. I would like to have the number of retries limited by taskcluster; I've just commented to that effect in bug 1280570.
Summary: cleardir throws DMError following xpcshell failures → cleardir throws DMError following xpcshell failures | DMError Attempted to push a file to a directory
Blocks: 1303634
These are now treated as retries on taskcluster, as they have been on buildbot. https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=1215be5477392bdc60bd4da85e174d5c6417a096 I think that's the best we can do when the emulator becomes unresponsive in the middle of a test.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: