Closed Bug 799532 Opened 12 years ago Closed 12 years ago

Try to find the source of IOError when removing profile directory

Categories

(Testing :: XPCShell Harness, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla19

People

(Reporter: gps, Assigned: gps)

References

Details

(Whiteboard: [sheriff-want])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #793855 +++ I'm getting sick of these IOError's and want to try to find the problem. This patch dumps an |ls -a| like output for the profile directory if we get an IOError when deleting it. It will probably need some refining once we uncover what's actually happening, but it's a start. Yes, I known the redundant lookups to getpwuid and friends are wasteful. This isn't critical path code: I'm not worried about perf. Sample output (with mach prepending times): 14.13 TEST-INFO | (xpcshell/head.js) | No (+ 0) checks actually run 14.13 <<<<<<< 14.13 >>>>>>> 14.13 Traceback (most recent call last): 14.13 File "/Users/gps/src/mozilla-central-hg/testing/xpcshell/runxpcshelltests.py", line 896, in runTests 14.14 self.removeDir(self.profileDir) 14.14 File "/Users/gps/src/mozilla-central-hg/testing/xpcshell/runxpcshelltests.py", line 339, in removeDir 14.14 shutil.rmtree(dirname) 14.14 File "/usr/local/Cellar/python/2.7.3/lib/python2.7/shutil.py", line 245, in rmtree 14.14 rmtree(fullname, ignore_errors, onerror) 14.14 File "/usr/local/Cellar/python/2.7.3/lib/python2.7/shutil.py", line 250, in rmtree 14.14 onerror(os.remove, fullname, sys.exc_info()) 14.14 File "/usr/local/Cellar/python/2.7.3/lib/python2.7/shutil.py", line 248, in rmtree 14.14 os.remove(fullname) 14.14 OSError: [Errno 13] Permission denied: '/var/folders/sz/9xx9hkm53_q6sgt3k50nwffc0000gn/T/tmpBwWvtS/startupCache/startupCache.8.little' 14.14 <<<<<<< 14.14 >>>>>>> 14.14 Files in profile directory: 14.14 <<<<<<< 14.14 40700 gps staff /var/folders/sz/9xx9hkm53_q6sgt3k50nwffc0000gn/T/tmpBwWvtS/ 14.14 40755 root wheel /var/folders/sz/9xx9hkm53_q6sgt3k50nwffc0000gn/T/tmpBwWvtS/startupCache/ 14.14 100644 gps staff /var/folders/sz/9xx9hkm53_q6sgt3k50nwffc0000gn/T/tmpBwWvtS/startupCache/startupCache.8.little
Attachment #669577 - Flags: review?(ted.mielczarek)
No longer blocks: 778688
Finally got a Try build to repro! From https://tbpl.mozilla.org/php/getParsedLog.php?id=16184085&tree=Try&full=1: Traceback (most recent call last): File "xpcshell/runxpcshelltests.py", line 895, in runTests self.removeDir(self.profileDir) File "xpcshell/runxpcshelltests.py", line 339, in removeDir shutil.rmtree(dirname) File "/home/cltbld/lib/python2.5/shutil.py", line 175, in rmtree rmtree(fullname, ignore_errors, onerror) File "/home/cltbld/lib/python2.5/shutil.py", line 167, in rmtree onerror(os.listdir, path, sys.exc_info()) File "/home/cltbld/lib/python2.5/shutil.py", line 165, in rmtree names = os.listdir(path) OSError: [Errno 13] Permission denied: '/tmp/tmpktTCyH/Cache.Trash949609424' <<<<<<< Files in profile directory: 40700 cltbld cltbld /tmp/tmpktTCyH/ 100664 cltbld cltbld /tmp/tmpktTCyH/runxpcshelltests_leaks.log 100600 cltbld cltbld /tmp/tmpktTCyH/secmod.db 100600 cltbld cltbld /tmp/tmpktTCyH/key3.db 100600 cltbld cltbld /tmp/tmpktTCyH/cert8.db 100600 cltbld cltbld /tmp/tmpktTCyH/_CACHE_CLEAN_ OS Error while performing os.walk! Traceback (most recent call last): File "/home/cltbld/lib/python2.5/os.py", line 286, in walk names = listdir(top) OSError: [Errno 13] Permission denied: '/tmp/tmpktTCyH/Cache.Trash949609424' 40775 cltbld cltbld /tmp/tmpktTCyH/Cache/ 100600 cltbld cltbld /tmp/tmpktTCyH/Cache/_CACHE_001_ 100600 cltbld cltbld /tmp/tmpktTCyH/Cache/_CACHE_MAP_ 100600 cltbld cltbld /tmp/tmpktTCyH/Cache/_CACHE_002_ 100600 cltbld cltbld /tmp/tmpktTCyH/Cache/_CACHE_003_ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/D/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/3/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/1/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/7/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/0/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/5/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/B/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/2/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/A/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/6/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/E/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/C/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/F/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/4/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/9/ 40700 cltbld cltbld /tmp/tmpktTCyH/Cache/8/ 40755 cltbld cltbld /tmp/tmpktTCyH/webapps/ 100644 cltbld cltbld /tmp/tmpktTCyH/webapps/webapps.json So, what is "Cache.Trash949609424"?
Perhaps some Necko folk can explain where "Cache.Trash949609424" comes from.
Comment on attachment 669577 [details] [diff] [review] Dump directory listing on profile removal failure, v1 Review of attachment 669577 [details] [diff] [review]: ----------------------------------------------------------------- r=me since this is just diagnostic code and we'll remove it eventually.
Attachment #669577 - Flags: review?(ted.mielczarek) → review+
Digging through source code, it appears the Necko cache has a "trash compartment." It appears that old cache directories get renamed to Cache.Trash<random> and are then deleted on a background thread after some delay. Makes sense. I suspect that permissions are getting "corrupted" as part of the move. From nsDeleteDir.cpp: // Important: must rename directory w/o changing parent directory: else on // NTFS we'll wait (with cache lock) while nsIFile's ACL reset walks file // tree: was hanging GUI for *minutes* on large cache dirs. rv = dir->MoveToNative(nullptr, leaf); dir is just an nsIFile. I suspect we are running into some kind of race condition where the OS or filesystem is attempting to "reparent" directory entries in the directory as part of the move and we try to delete the directory before this has finished. But, that's just speculation.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
> I suspect we are running into some kind of race condition where the OS or > filesystem is attempting to "reparent" directory entries in the directory as > part of the move and we try to delete the directory before this has finished I'm not much of an expert on NTFS, but AFAIK that shouldn't be happening: when nsDeleteDir renames the file we pass null for the parent directory, which winds up resolving to "skipNtfsAclReset" here: http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileWin.cpp#1862 But perhaps there's some race condition even without doing the ACL resets. The diagnostic output will hopefully give us a clue.
So, what's the easiest way for me to enable diagnostic output on try builds?
Blocks: 809071
Blocks: 823996
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: