Closed Bug 1020345 Opened 10 years ago Closed 9 years ago

Intermittent CacheFileIOManager::Shutdown stall on Android

Categories

(Core :: Networking: Cache, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 913822

People

(Reporter: gbrown, Unassigned)

References

Details

The test job jsreftest-3 on Android 2.2 frequently fails with an apparent hang on shutdown: Bug 1008422 tracks tests that failed because shutdown took longer than 330 seconds without producing any new output. Investigation shows that the hang occurs during xpcom-shutdown notification. Even when the test succeeds, xpcom-shutdown notification often (but not always) stalls for a long time. I recently narrowed down the stall to CacheFileIOManager::Shutdown: On Android 2.2, following the jsreftest-3 tests, CacheFileIOManager::Shutdown frequently takes several *minutes* to complete. There are much less frequent shutdown hangs on other Android test jobs -- these have not been investigated. I have not noticed shutdown hangs on other platforms. Here's a try run demonstrating the stall: https://tbpl.mozilla.org/?tree=Try&rev=5ac5cdbc7cb4 Consider https://tbpl.mozilla.org/php/getParsedLog.php?id=41028743&tree=Try&full=1 06-04 07:02:04.698 I/Gecko ( 1749): >>> cache2 CacheObserver: xpcom-shutdown start 06-04 07:02:04.698 I/Gecko ( 1749): >>> cache2 CacheObserver: xpcom-shutdown -- calling CacheFileIOManager::Shutdown 06-04 07:02:04.698 I/Gecko ( 1749): >> CacheFileIOManager::Shutdown start 06-04 07:02:04.708 I/Gecko ( 1749): > CacheFileIOManager::Shutdown condVar.Wait start ... 06-04 07:06:20.141 I/Gecko ( 1749): > CacheFileIOManager::Shutdown condVar.Wait end 06-04 07:06:20.288 I/Gecko ( 1749): >> CacheFileIOManager::Shutdown end 06-04 07:06:20.288 I/Gecko ( 1749): >>> cache2 CacheObserver: xpcom-shutdown end
Can you please run it [1] with NSPR_LOG_MODULES set to "timestamp,cache2:5"? [1] http://www.janbambas.cz/get-nspr-log-from-tryserver-run/
Flags: needinfo?(gbrown)
NSPR logging is a little tricky on Android, but here's an attempt: https://tbpl.mozilla.org/?tree=Try&rev=55c2522ae1f6
Flags: needinfo?(gbrown)
Here is another experiment with non-NSPR logging: https://tbpl.mozilla.org/?tree=Try&rev=c3a3c3af525b It looks to me like the fast shutdown cases find a small number of handles at http://hg.mozilla.org/mozilla-central/annotate/668f29cd71b3/netwerk/cache2/CacheFileIOManager.cpp#l1273: 1273 for (uint32_t i=0 ; i<handles.Length() ; i++) { while the longer-running cases find substantially more handles. Contrast https://tbpl.mozilla.org/php/getParsedLog.php?id=41060397&tree=Try&full=1 (441 handles) with https://tbpl.mozilla.org/php/getParsedLog.php?id=41043339&tree=Try&full=1 (4 handles).
According the #of handles, this seems like bug 913822. Also there could be a good mitigation with bug 1013395 that we would like to have for 32.
Depends on: 913822, 1013395
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.