Closed Bug 852429 Opened 12 years ago Closed 11 years ago

Only broken slaves with a full tmpdir hit cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1

Categories

(Testing :: General, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: gwagner)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

I think there might have been another instance of this earlier today, but it's been one of those days for the tree. Given the frequency, the regression range is "everything that landed on or was merged into mozilla-inbound since just after midnight, or maybe over the weekend, roughly." https://tbpl.mozilla.org/php/getParsedLog.php?id=20811279&tree=Mozilla-Inbound WINNT 5.2 mozilla-inbound leak test build on 2013-03-18 20:42:01 PDT for push 4f119b3f8046 slave: w64-ix-slave16 TEST-PASS | Start TestSettingsAPI Running TestSettingsAPI tests... ************************************************************ * Call to xpconnect wrapped JSObject produced this error: * [Exception... "Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]" nsresult: "0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE)" location: "JS frame :: file:///e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/obj-firefox/dist/bin/components/SettingsService.js :: SettingsService :: line 192" data: no] ************************************************************ Finished running TestSettingsAPI tests. nsStringStats => mAllocCount: 222410 => mReallocCount: 351 => mFreeCount: 222410 => mShareCount: 119068 => mAdoptCount: 91 => mAdoptFreeCount: cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1 (no, that's not a clipboard accident, just the way the output is intermingled)
I do not like malfunctioning slaves.
Component: XPCOM → Release Engineering: Machine Management
Product: Core → mozilla.org
QA Contact: armenzg
Summary: Intermittent cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1 → Only w64-ix-slave16 hits cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1
Version: Trunk → other
Depends on: b-2008-ix-0075
(In reply to Phil Ringnalda (:philor) from comment #6) > I do not like malfunctioning slaves. And slave12 apparently
Slave 12 again. Updated summary.
Summary: Only w64-ix-slave16 hits cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1 → Only w64-ix-slave{12,16} hits cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1
So, we've found a way to break Win7 slaves, so that they become in the state that causes this failure to happen. Alas, you have to figure out what that way of breaking is, to tell releng to fix it and stop it from happening again.
Component: Release Engineering: Machine Management → XPCOM
Product: mozilla.org → Core
QA Contact: armenzg
Summary: Only w64-ix-slave{12,16} hits cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1 → Only certain broken slaves hit cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1
Version: other → Trunk
I'm not sure why this is filed in XPCOM: SettingsService.js is part of DOM, and failed getService is for the nsIIndexedDatabaseManager. From the debug logs, it's clear that IndexedDatabaseManager::Init is failing because QuotaManager::GetOrCreate is failing, because QuotaManager::Init is failing to get both NS_APP_INDEXEDDB_PARENT_DIR and NS_APP_USER_PROFILE_50_DIR. And why is TestSettingsAPI in xpcom/tests instead of somewhere in dom? It was added in bug 743336 without review from an XPCOM peer, and I certainly would have objected then! gwagner, please move it! As for this bug, it appears that the bug is being caused by the TestHarness being unable to create a profile directory. Starting here, what we do is * get the OS tempdir * create a unique folder under it starting with "cpp-unit-profd" This second step fails. So perhaps the temp directory is not writeable, or is so full that you can't add any more directories to it, or something else. Typically (unless a test crashes) we should delete profile directories when the test exits (from ~ScopedXPCOM). But the slaves should also be prepared to clean out the tempdir regularly (on reboot, if they reboot regularly).
Component: XPCOM → General
Flags: needinfo?(anygregor)
Product: Core → Testing
->bhearsum for question about rebooting and tempdir cleanup
Flags: needinfo?(bhearsum)
Tests slaves reboot after every job regardless of pass/fail. I had a look at the most recently slave that failed (w64-ix-slave72) and there's tons of disk space free on C: - 34G. However, there is also a ton of tempdirs in c:\users\cltbld\appdata\local\temp - from as far back as June 2011. Not sure how/why that would cause a failure...maybe maximum number of entries in a dir or something?
Flags: needinfo?(bhearsum)
Tangentially, the new Python harness I wrote to run C++ unit tests creates a temp dir and runs the test with that as the CWD, so we could ostensibly change these tests to put their profile directory under there, since the Python harness will clean it up even if the test crashes. http://mxr.mozilla.org/mozilla-central/source/testing/runcppunittests.py#41 (This still won't help if $TEMP is full up, or something.)
Depends on: 862355
Filed bug 862355 for automatic tmpdir cleanup for windows slaves.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #171) > I'm not sure why this is filed in XPCOM: SettingsService.js is part of DOM, > and failed getService is for the nsIIndexedDatabaseManager. > > From the debug logs, it's clear that IndexedDatabaseManager::Init is failing > because QuotaManager::GetOrCreate is failing, because QuotaManager::Init is > failing to get both NS_APP_INDEXEDDB_PARENT_DIR and > NS_APP_USER_PROFILE_50_DIR. > > And why is TestSettingsAPI in xpcom/tests instead of somewhere in dom? It > was added in bug 743336 without review from an XPCOM peer, and I certainly > would have objected then! gwagner, please move it! > > As for this bug, it appears that the bug is being caused by the TestHarness > being unable to create a profile directory. Starting here, what we do is > > * get the OS tempdir > * create a unique folder under it starting with "cpp-unit-profd" > > This second step fails. So perhaps the temp directory is not writeable, or > is so full that you can't add any more directories to it, or something else. > > Typically (unless a test crashes) we should delete profile directories when > the test exits (from ~ScopedXPCOM). But the slaves should also be prepared > to clean out the tempdir regularly (on reboot, if they reboot regularly). I will take a look next week. (After the B2G work week)
Flags: needinfo?(anygregor)
Hi Gregor, it's next week, and as you can see, this is still extremely frequent.
Flags: needinfo?(anygregor)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #318) > Hi Gregor, it's next week, and as you can see, this is still extremely > frequent. I still have some blockers left from the work week. Lets see if bug 862355 solves the problem here.
Flags: needinfo?(anygregor)
Flags: needinfo?(anygregor)
Attached patch disable test (deleted) — — Splinter Review
Disable test for now because there is no movement in bug 862355.
Assignee: nobody → anygregor
Attachment #743689 - Flags: review?(bent.mozilla)
Flags: needinfo?(anygregor)
Attachment #743689 - Flags: review?(bent.mozilla) → review+
Disabled: I will move the test and re-enable once the blocking bug is fixed. https://hg.mozilla.org/integration/mozilla-inbound/rev/06cb02df9a67
Whiteboard: leave-open
Summary: Only certain broken slaves hit cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1 → Only broken slaves with a full tmpdir hit cppunittests TEST-UNEXPECTED-FAIL | TestSettingsAPI.exe | test failed with return code 1
(In reply to Benjamin Smedberg [:bsmedberg] from comment #171) > This second step fails. So perhaps the temp directory is not writeable, or > is so full that you can't add any more directories to it, or something else. > > Typically (unless a test crashes) we should delete profile directories when > the test exits (from ~ScopedXPCOM). But the slaves should also be prepared > to clean out the tempdir regularly (on reboot, if they reboot regularly). The something else being that the unique naming only planned on needing four digits of unique, and that wasn't enough. Hard to believe we've crashed 10,000 times on all of these broken Windows slaves, so I suspect (as is the case most of the time when we try to delete files on Windows) that the harness doesn't actually succeed at deleting the profile directory at all on Windows. https://tbpl.mozilla.org/php/getParsedLog.php?id=22772307&tree=Mozilla-B2g18 https://tbpl.mozilla.org/php/getParsedLog.php?id=22773649&tree=Mozilla-B2g18 https://tbpl.mozilla.org/php/getParsedLog.php?id=22776659&tree=Mozilla-B2g18 https://tbpl.mozilla.org/php/getParsedLog.php?id=22775969&tree=Mozilla-B2g18 https://tbpl.mozilla.org/php/getParsedLog.php?id=22787805&tree=Mozilla-B2g18
Despite not being resolved (scope-creeping a bit), the blocker is resolved, and you can move and reenable, it won't fail despite the fact that the harness code apparently does completely fail to remove the profile.
Depends on: 877003
Moved test to the right directory and re-enabled test in bug 877003
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: leave-open
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: