Open Bug 529949 Opened 15 years ago Updated 2 years ago

image caches should not be statically allocated (ABORT: imgCacheEntry release isn't thread-safe!: '_mOwningThread.GetThread() == PR_GetCurrentThread()')

Categories

(Core :: Graphics: ImageLib, defect)

x86
macOS
defect

Tracking

()

People

(Reporter: joe, Unassigned)

References

Details

Attachments

(2 files)

Attached patch allocate caches on the heap (deleted) — Splinter Review
Currently, the image caches are all static members of the imgLoader class. This means that, in the case that we crash or for some other reason call exit() on a non-main thread, we will call the caches' destructors on a non-main thread, leading to an abort: ###!!! ABORT: imgCacheEntry release isn't thread-safe!: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /builds/slave/mozilla-central-linux-debug/build/modules/libpr0n/src/imgLoader.h, line 81 In general, it's not safe to destroy or construct things statically, and since we already have an entry/exit point for the cache (XPCOM creation and shutdown), we should just use that more aggressively, and allocate the hash tables on the heap inside them, with only a pointer to these hash tables remaining static. This patch also ensures that we never try to use the cache after we've shut down XPCOM, and thus inserts a lot of null checks.
Attachment #413454 - Flags: review?(bobbyholley)
Comment on attachment 413454 [details] [diff] [review] allocate caches on the heap >@@ -677,19 +678,23 @@ nsresult imgLoader::InitCache() >+ sCache = new imgCacheTable(); >+ sCacheQueue = new imgCacheQueue(); >+ if (!sCache || !sCache->Init() || !sCacheQueue) > return NS_ERROR_OUT_OF_MEMORY; please delete + null sCache + sCacheQueue before returning. >+ sChromeCache = new imgCacheTable(); >+ sChromeCacheQueue = new imgCacheQueue(); >+ if (!sChromeCache || !sChromeCache->Init() || !sChromeCacheQueue) > return NS_ERROR_OUT_OF_MEMORY; please delete + null everything here I guess.
Comment on attachment 413454 [details] [diff] [review] allocate caches on the heap nsresult imgLoader::ClearChromeImageCache() { - return EvictEntries(sChromeCache); + return EvictEntries(*sChromeCache); } nsresult imgLoader::ClearImageCache() { - return EvictEntries(sCache); + return EvictEntries(*sCache); } void imgLoader::MinimizeCaches() { - EvictEntries(sCacheQueue); - EvictEntries(sChromeCacheQueue); + EvictEntries(*sCacheQueue); + EvictEntries(*sChromeCacheQueue); } MinimizeCaches, at the very least, is called by an observer that doesn't seem to check for null caches. As such, I'd think we'd need a null check here. Overall, can't we just make EvictEntries take a pointer, if that's how we're treating the caches these days? - if (queue.GetNumElements() == 0) - NS_ASSERTION(queue.GetSize() == 0, + if (!cache || !queue) + return; + + if (queue->GetNumElements() == 0) + NS_ASSERTION(queue->GetSize() == 0, Any reason this isn't NS_ABORT_IF_FALSE? Looks good otherwise. r+ with those changes + timeless'.
Attachment #413454 - Flags: review?(bobbyholley) → review+
I'm not convinced that it's worth writing code (and paying the costs for heap allocation and null checks) to handle exit() being called on a different thread. For one, I don't think that we should be calling exit() after crashing and I can't really think of any reason another thread would be justified in calling exit(). Can someone justify supporting calling exit() from a different thread?
I think it's worth avoiding static construction/destruction in shared libraries in general. Static constructors and destructors run while the shared library loader is holding a lock (although with different variations between platforms). Depending on platform, things like (a) loading other shared libraries (which can be triggered by do_CreateInstance) or (b) calling dladdr (which can be triggered by many in-process debugging tools like trace-malloc or nsTraceRefcnt) may need this same lock (in the trace-malloc case, while holding the malloc lock). This creates risks for AB-BA deadlocks that worsen with the presence of such debugging tools. (Doing anything that can call into the component manager or service manager during a static constructor is a problem even without debugging tools, I think.) Can you find a way to do this without null-checks? If imgLoader::InitCache fails, you shouldn't be doing anything with these tables anyway? Can you just skip deleting them at shutdown if something has happened such that they might still be manipulated later (e.g., objects still alive)?
jrmuizel: roughly (per bug 539169 comment 2), exit will happen on the main thread at the wrong time for a long time in the future (until we rewrite something or another). Unfortunately, users do not expect Quit from the dock to result in a crash. And as dbaron kinda notes, xpcom conventions forbid doing interesting work in static constructors/destructors.
Attached file crash data (deleted) —
I'm crashing a lot during debugging, seems like it might be because of this issue. This is the full Apple crash report.
This is a frustrating bug - can we get an owner here?
Oh, I own this - I just haven't come up with a nicer patch yet.
Assignee: nobody → joe
Blocks: 549767
No longer blocks: 549767
FYI: I see this daily in the crash test automation. The domains I've reproduced it on so far are: addons.mozilla.org, blogs.ubc.ca, clbaddestfemale.wordpress.com, comparescreenshots.slicx.com, developer.mozilla.org, devices.live.com, forum.wawa-mania.ws, forums.nwp4life.com, jeremy.visser.name, member.yam.com, mercedesalkatreszek.cegvalaszto.hu, micro.msi.com, picleecher.net, readygames.com.br, romakhin.ru, screenshotcomparison.com, secure.provinet.net, services.addons.mozilla.org, sexynudesgirls.org, shineeshawols.wordpress.com, slipstreamtv.com, svithue.net, tfile.ru, thetravisty.com, tvdirect.jimdo.com, usuarios.multimania.es, www.accountcentralonline.com, www.alemdizi.com, www.biggby.com, www.cajasol.es, www.casinosbo.com, www.diziizleyelim.com, www.epubbud.com, www.esikis.net, www.fridcar.hu, www.hazirfilm.com, www.horogay.com, www.houseofpain.com, www.htcsense.com, www.izleyin.gen.tr, www.king.org, www.kvfan.net, www.minecraft.net, www.myyearbook.com, www.ndr.de, www.oberbank-banking.at, www.picleecher.net, www.pornoxo.com, www.provincial.com, www.sexyavenue.com, www.simbirskrealty.ru, www.softdownload.com.br, www.spanishdict.com, www.strawmachine-supplier.com, www.tnaondemand.com, www.usaa.com, www.veetle.com, www.vidyoara.com, www.watchop.com, www.ziddu.com Note that I only tested the urls on these sites because our users are crashing there.
Joe, This continues to be a common Abort during crash automation testing. It can cause issues for automated testing by potentially hiding other crashes. If it isn't important enough to fix, then perhaps it should be a non-fatal ASSERTION rather than an Abort?
(In reply to comment #12) > Joe, > > This continues to be a common Abort during crash automation testing. It can > cause issues for automated testing by potentially hiding other crashes. > > If it isn't important enough to fix, then perhaps it should be a non-fatal > ASSERTION rather than an Abort? Is the Abort happening during exit?
I think so. Sometimes it happens during normal shutdown of the browser Assertion failure: ready > 0, at c:/work/mozilla/builds/2.0.0/mozilla/nsprpub/pr/src/md/windows/w32poll.c:322 nsStringStats => mAllocCount: 10856 => mReallocCount: 1174 => mFreeCount: 5403 -- LEAKED 5453 !!! => mShareCount: 15003 => mAdoptCount: 1129 => mAdoptFreeCount: 1118 -- LEAKED 11 !!! ###!!! ASSERTION: nsScriptCacheCleaner not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file c:/work/mozilla/builds/2.0.0/mozilla/content/base/src/nsFrameMessageManager.cpp, line 729 ###!!! ABORT: imgCacheEntry release isn't thread-safe!: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file c:\work\mozilla\builds\2.0.0\mozilla\modules\libpr0n\src\imgLoader.h, line 85 other times I get an out of memory loading the page which I believe causes an exit nsStringStats => mAllocCount: 1733731 => mReallocCount: 888 => mFreeCount: 5672 -- LEAKED 1728059 !!! => mShareCount: 1758318 => mAdoptCount: 10886 => mAdoptFreeCount: 10884 -- LEAKED 2 !!! out of memory ###!!! ASSERTION: nsScriptCacheCleaner not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file c:/work/mozilla/builds/2.0.0/mozilla/content/base/src/nsFrameMessageManager.cpp, line 733 ###!!! ABORT: imgCacheEntry release isn't thread-safe!: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file c:\work\mozilla\builds\2.0.0\mozilla\modules\libpr0n\src\imgLoader.h, line 85
Attachment #413454 - Flags: review+ → review-
will this fix land someday ? I got stuck at exactly the same issue. Or is there an actual workaround for a proper cleanup ?
ping
Please don't do that. https://bugzilla.mozilla.org/page.cgi?id=etiquette.html When there are updates, you'll hear them.
ahh oki. Just saw it in another issue and thought it is usable this way. Never mind.
Assignee: joe → nobody
We don't see this in automation anymore.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: