Closed
Bug 544727
Opened 15 years ago
Closed 15 years ago
Rev 3 Windows Talos machines not always successfully doing cleanup
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: anodelman)
References
Details
(Keywords: intermittent-failure)
Attachments
(4 files, 1 obsolete file)
(deleted),
patch
|
catlee
:
review+
anodelman
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
anodelman
:
review-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
joduinn
:
review+
anodelman
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
joduinn
:
review+
anodelman
:
checked-in+
|
Details | Diff | Splinter Review |
Since it's new, my first suspicion would be that while nohup exists, it maybe doesn't exactly always _work_ on Windows.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265505248.1265505990.31842.gz
Rev3 WINNT 7.0 mozilla-central talos on 2010/02/06 17:14:08
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265505248.1265508537.27258.gz
Rev3 WINNT 7.0 mozilla-central talos dirty on 2010/02/06 17:14:08
C:\Windows\system32\cmd.exe /c nohup rm -vrf *
...
removed directory: `talos/tpan'
removed `talos/ttest.py'
removed `talos/ttest.pyc'
removed `talos/utils.py'
removed `talos/utils.pyc'
removed `talos/winmo.config'
program finished with exit code 0
elapsedTime=242.818000
=== Output ended ===
======== BuildStep ended ========
======== BuildStep started ========
talos dir creation failed
=== Output ===
C:\Windows\system32\cmd.exe /c mkdir talos
...
A subdirectory or file talos already exists.
program finished with exit code 1
Reporter | ||
Comment 1•15 years ago
|
||
Not at all a useful guide to frequency, since tbpl doesn't show burning Talos for some reason, so I only see them when I happen to notice firebot mentioning them changing from something else to burning, but:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265678130.1265681437.3902.gz
Rev3 WINNT 7.0 mozilla-central talos dirty on 2010/02/08 17:15:30
s: talos-r3-w7-014
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1265678320.1265679042.8042.gz
Rev3 WINNT 7.0 mozilla-central talos nochrome on 2010/02/08 17:18:40
s: talos-r3-w7-006
(Odd, though possibly coincidence, that both times I've seen it it's been a pair of failures off the same run.)
Comment 2•15 years ago
|
||
we could try using |attrib -s -h -r /s builddir | and |rmdir /s /q builddir| instead of the msys coreutils rm. rmdir will not delete system or hidden files, which is what the attrib command does (remove system, hidden, read-only flags) recursively so rmdir can remove the directory recursively and quietly.
(http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/rmdir.mspx?mfr=true
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/rmdir.mspx?mfr=true)
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → anodelman
Priority: -- → P2
Assignee | ||
Comment 3•15 years ago
|
||
On staging I found that there were permission denied warnings during file removal on w7. This patch does a chmod a+rx before attempting to delete files. I believe that this should fix this random orange.
Attachment #426591 -
Flags: review?(catlee)
Updated•15 years ago
|
Attachment #426591 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•15 years ago
|
Attachment #426591 -
Attachment description: chmod files before attempting to delete them → [checked in]chmod files before attempting to delete them
Attachment #426591 -
Flags: checked-in+
Comment 4•15 years ago
|
||
This is an untested patch. Instead of using coreutils rm and chmod it uses the native windows attrib and rmdir. The cleandir property is needed because rmdir and attrib do not understand the concept of 'rmdir /s /q *' so we either need to hardcode a list of directories, use the same directory name for all builders or delete the builder dir using attrib/rmdir from .. and then recreate it (this patch).
There is a buildbot feature in (recent) versions that allows you to share the slave side builder dir.
I am not sure if this will work as I don't know all of the peculiarities of the windows buildbot slave.
Another option might be to use the cygwin coreutils package. It might have a more up to date version of coreutils and should run with only the rm.exe and cygwin1.dll (we don't need the whole cygwin stack)
Attachment #427174 -
Flags: review?(anodelman)
Comment 5•15 years ago
|
||
alternatively, it seems that the directory we care about is called 'talos', if that is the case, we could just run rmdir /s /q talos instead of doing all the properties
Assignee | ||
Comment 6•15 years ago
|
||
This issue also appears on our fed/fed64 rev3 slaves.
Assignee | ||
Comment 7•15 years ago
|
||
This will fix the redness on fed/fed64 by using nohup during cleanup (this has worked in the past and was removed during a failed attempt at using usepty=0 in the buildbot slave config).
Adding chmod a+rwx before doing the cleanup on windows to see if that will fix things there.
While this is baking I'll look into jhford's solution so that we have something to try next.
Attachment #427215 -
Flags: review?(joduinn)
Updated•15 years ago
|
Attachment #427215 -
Flags: review?(joduinn) → review+
Comment 8•15 years ago
|
||
Comment on attachment 427215 [details] [diff] [review]
[checked in]quick fix to try overnight
looks good.
Assignee | ||
Comment 9•15 years ago
|
||
Comment on attachment 427215 [details] [diff] [review]
[checked in]quick fix to try overnight
changeset: 613:0fa180cd0e4a
Attachment #427215 -
Attachment description: quick fix to try overnight → [checked in]quick fix to try overnight
Attachment #427215 -
Flags: checked-in+
Assignee | ||
Comment 10•15 years ago
|
||
Linux green overnight.
Still seeing intermittent problems on win7.
Assignee | ||
Comment 11•15 years ago
|
||
Comment on attachment 427174 [details] [diff] [review]
windows talos cleanup
Still getting remove dir error with this.
See:
http://talos-master.mozilla.org:8012/builders/Rev3%20WINNT%206.1%20mozilla-1.9.0%20talos%20nochrome/builds/450/steps/cleanup/logs/stdio
Attachment #427174 -
Flags: review?(anodelman) → review-
Comment 12•15 years ago
|
||
that is unfortunate, i wonder if running that on the command line of the slave would change anything. It looks like a lot of those files are web page files. Maybe Apache is trying to read the files while we are doing clean up?
Assignee | ||
Comment 13•15 years ago
|
||
Another possible fix. Move talos to talos-%random%, thus even if cleanup fails we can still successfully create a new, clean talos dir and carry on with testing. We'll get another chance to clean up the old talos dir on reboot.
Attachment #427452 -
Flags: review?(joduinn)
Updated•15 years ago
|
Attachment #427452 -
Flags: review?(joduinn) → review+
Comment 14•15 years ago
|
||
Comment on attachment 427452 [details] [diff] [review]
move talos dir out of the way before attempting cleanup
already tested in staging. looks good.
Assignee | ||
Comment 15•15 years ago
|
||
I think that this is the for real fix. tp4 contains some really, really long paths + filenames. I believe that we are exceeding the limit, and thus crash on attempting to delete the files. Moving the whole tp4 directory out of talos/page_load_test means that we can successfully remove everything.
Works on stage. Also matches with my observations of attempting to remove the long path named files by hand.
Attachment #427452 -
Attachment is obsolete: true
Attachment #427497 -
Flags: review?(joduinn)
Updated•15 years ago
|
Attachment #427497 -
Flags: review?(joduinn) → review+
Comment 16•15 years ago
|
||
Comment on attachment 427497 [details] [diff] [review]
[checked in]move tp4 dir to shorter path before attempting cleanup
looks good, works in staging, so r+. Also, I note this is similar to a problem we hit couple of years ago on win32 desktop builds.
Assignee | ||
Comment 17•15 years ago
|
||
Comment on attachment 427497 [details] [diff] [review]
[checked in]move tp4 dir to shorter path before attempting cleanup
changeset: 617:996d58ea54a5
Attachment #427497 -
Attachment description: move tp4 dir to shorter path before attempting cleanup → [checked in]move tp4 dir to shorter path before attempting cleanup
Attachment #427497 -
Flags: checked-in+
Assignee | ||
Comment 18•15 years ago
|
||
All green overnight. Will re-open if this reoccurs.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange]
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•