Closed Bug 703996 Opened 13 years ago Closed 12 years ago

Intermittent "Found processes still running: dwwin. Please close them before running talos."

Categories

(Release Engineering :: General, defect, P3)

x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

https://tbpl.mozilla.org/php/getParsedLog.php?id=7496585&tree=Mozilla-Inbound Rev3 WINNT 5.1 mozilla-inbound talos svg on 2011-11-20 04:45:49 PST for push 9a1341595afb 'python' 'run_tests.py' '--noisy' '20111120_0446_config.yml' ... Running test tsvg: Started Sun, 20 Nov 2011 04:46:37 Failed tsvg: Stopped Sun, 20 Nov 2011 04:46:39 FAIL: Busted: tsvg FAIL: Found processes still running: dwwin. Please close them before running talos. Traceback (most recent call last): File "run_tests.py", line 596, in ? main() File "run_tests.py", line 593, in main test_file(arg, options.screen, options.amo) File "run_tests.py", line 535, in test_file raise e utils.talosError: 'Found processes still running: dwwin. Please close them before running talos.' program finished with exit code 1 elapsedTime=3.250000
That failure followed a job which was interrupted by a network glitch (ie purple but not cancelled), which meant the slave didn't reboot. Subsequent builds on talos-r3-xp-018 have been OK, so lets leave this open to monitor frequency. philor pointed out on IRC that talos is now reporting which processes are running instead of having some sort of 'system not clean' message.
https://tbpl.mozilla.org/php/getParsedLog.php?id=7497518&tree=Firefox - talos-r3-xp-046 doing tsvg (again), following one of the bug 704010 failures which I don't think were network blips, I think they're tests-destroying-the-slave's-will-to-live. Though I could be wrong.
For info here, (from http://process.networktechs.com/dwwin.exe.php) dwwin.exe Application Error One of the biggest things being searched for on this site right now is dwwin.exe. The reason for this is because it displays during quite a few common fatal error messages. First of all dwwin.exe is Dr Watson which is used by the error reporting tool. Alot of security related applications will throw up warning flags about this file trying to read, write or modify a number of other .exe files. This isn't anything to worry about because it's only trying to investigate "events" that it believes is causing problems that may lead to crashing. File can be found: *:windowssystem32dwwin.exe The biggest complaint about the error reporting service is it causing alot of various applications to not load or crash during use. Quite a few of these can be fixed by downloading updates from windows update and scanning for viruses/spyware but alot of others simply won't be fixed with the currently available updates. In that case you'll want to just disable the service! After that 90% of the time the program will then run properly. I have been telling people to disable this service in quite a few of my guides. If you've found this article through a search engine you'll want to disable it now. I've written an article dedicated to stopping this error reporting tool from running. Read it here [1]. An overview of the error reporting process can be found @ Microsoft's site [2]. --------- [1] - http://www.iamnotageek.com/articles.php?aid=91&page=1&topic= [2] - http://www.microsoft.com/resources/satech/cer/GettingStartedMNU.asp
https://tbpl.mozilla.org/php/getParsedLog.php?id=7563321&tree=Firefox Rev3 WINNT 5.1 mozilla-central talos dirty on 2011-11-23 20:23:47 PST for push cf764be32bc3 slave: talos-r3-xp-006
The links above in comment 4 are broken, btw: http://support.microsoft.com/kb/188296
This could have been introduced by bug 701700.
(In reply to Marco Bonardo [:mak] from comment #17) > The links above in comment 4 are broken, btw: > > http://support.microsoft.com/kb/188296 Our ref image is setup with AeDebug removed, so Armen is correct: some later package must be re-creating it. I'll take a stab at deploying the MSI from Microsoft via OPSI.
Assignee: nobody → coop
Priority: -- → P3
https://tbpl.mozilla.org/php/getParsedLog.php?id=7834014&tree=Firefox3.6 Another possibility is that setting that pref doesn't affect whether dwwin runs on OS crashes, only whether it runs on application crashes - I couldn't find anything saying anything either way when I did some googling the other day, and I don't think we've ever had a situation where we were crashing the OS thirty or forty times a day before, to know whether or not we were triggering it.
The AeDebug registry is definitely present on the slaves I've checked, i.e. several that have come up repeatedly in the logs: talos-r3-xp-0[09,39,48]. It's also present on the reference image, which would explain why it's on (at least) some of the slaves. As Armen indicated, this was probably re-introduced with VC2010 Debug CRT. I notice the instructions for removing the key are conspicuously absent from that section of the ref platform doc: https://wiki.mozilla.org/ReferencePlatforms/Test/WinXP#Microsoft_Visual_C.2B.2B_2010_Non-Redistributable_Debug_CRT_.28x86.29 The Microsoft-provided MSI doesn't actually remove the key, but the reg commands from the previous section still work: https://wiki.mozilla.org/ReferencePlatforms/Test/WinXP#Microsoft_Visual_C.2B.2B_2005_Non-Redistributable_Debug_CRT_.28x86.29 I'll go through the rest of the XP slaves and remove the keys.
Status: NEW → ASSIGNED
Priority: P3 → P2
I went through all the XP slaves and removed the AeDebug key. The only slaves outstanding are the ones Armen is currently repurposing (063-075). Let's see if that helps.
https://tbpl.mozilla.org/php/getParsedLog.php?id=8437681&tree=Mozilla-Inbound Didn't this used to go away after the reboot on the first failure?
Assignee: coop → nobody
Priority: P2 → P3
Status: ASSIGNED → NEW
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange]
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.