Closed Bug 82517 Opened 24 years ago Closed 23 years ago

Performing QI on async xpcom event proxy causes crash

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.4

People

(Reporter: markh, Assigned: dougt)

Details

(Keywords: crash)

Attachments

(7 files)

If the following operations are performed, xpcom dies with a NULL pointer reference: * Create an _async_ proxy around some object. * Call QI on that object for some new interface (ie, an interface other than the one used to create the proxy. The problem is that nsProxyEventClass::CallQueryInterfaceOnProxy() attempts to delegate the QI to the underlying object. However, if the proxy is async, the QI call is simply posted to the event queue, and returns immediately. In this case, the QI reports success, but the result remains NULL. nsProxyEventClass::CallQueryInterfaceOnProxy() simply de-references the returned pointer. Attaching a JavaScript test case that demonstrates. Simply run the js code under xpcshell to see the crash. We create the proxy with nsISupports, but attempting to QI for the "real" interface then dies. I am not attaching a patch as I am unsure of the correct semantics - either the QI should fail, or it should be immediately delegated ignoring the async flag. Neither seems particularly appropriate.
Taking the liberty of adding jband to this bug, being the xpcom czar that he is. Note this is _not_ a blocker for Komodo, but worth knowing about anyway.
Adding dougt to cc list. He's the author of this fine proxy code. Maybe he'd like to take the bug? Upholding QI success/fail semantics is *very* important.
sure
Assignee: kandrot → dougt
Target Milestone: --- → mozilla0.9.3
Priority: -- → P2
Attached patch proposed patch v1 (deleted) — Splinter Review
Attached patch proposed patch v1 (deleted) — Splinter Review
With the last patch, all nsISupports calls will be synchronous.
Status: NEW → ASSIGNED
Keywords: patch
MarkH, jband, can you review my patch (40889).
I'm a little worried that this may lead to dead-locks. Consider: * main thread calls SomeFunction(my_proxy) my_proxy is an async proxy back to some other object on the main thread. * SomeFunction() starts a new thread, then waits for thread to signal a notification. * The new thread attempts to make some use of the passed proxy object - eg, perform a QI. At this stage we will be dead-locked. The QI will attempt to perform a synchronous call to the proxy object via the even queue. However, the event queue is waiting for the initial call to SomeFunction() to return. In effect, if there is ever a situation where using an ASYNC proxy is _necessary_ to avoid dead-locks, any attempt to QI this proxy will still deadlock. Am I missing something?
Exactly. Ignore that last patch. AddRef, Release*, and QI should be done on ANY thread. The base class should implement these threadsafe. Release is the special case where the delete will need to be proxied IFF it is the last reference.
Attached patch Proposed Patch v.2 (deleted) — Splinter Review
This looks OK to me. My only question is regarding PROXY_ALWAYS. + if ( (methodIndex < 2) || + (mProxyType & PROXY_SYNC && + NS_SUCCEEDED(mDestQueue->IsQueueOnCurrentThread(&callDirectly)) && + callDirectly)) It _appears_ to me that this condition should handle this value - ie, if we have a proxy with PROXY_SYNC | PROXY_ALWAYS, then I believe we never want to call direct (except for the methodIndex < 2 case, where we always do). However, as it seems the original code did not make this distinction I am far from sure about this!
I think that you are right; It was busted. This might cause some deadlock regressions as the code might not be expecting the calling thread to block waiting on another thread. The line should be: (mProxyType & PROXY_SYNC && !(mProxyType & PROXY_ALWAYS)
I'm a bit concerned about making *all* calls to nsISupports multithreaded. The original purpose of the proxy code was to protect code that was inheritly not-threadsafe. But with this change, that assumption is invalidated! I think that we need to decide what we need to use ASYNC proxies for... I would argue that the scenario that Mark described - leading to deadlock - is simply an illustration of when *not* to use an ASYNC proxy :-) I think that we should keep the requirement that *all* methods to the proxy MUST be called on the correct (ie. originating) thread and detail the situations where this will lead to deadlocks... Otherwise, we will end up sweeping through the codebase to make more and more objects implement THREADSAFE_ISUPPORTS.... I guess, that really, i'm just arguing for making the "smaller" change :-)
rick, currently any client of this proxy system must have threadsafe isupports impelmentation.
Is this really true? Because it certainly was NOT the intent! There is no reason why the underlying 'real' object should have a threadsafe ISupports implementation. That was part of the reason for having proxy objects in the first place... I agree that the 'proxy object' must be treadsafe, but that's what its designed for :-) I think that QI is just a degenerate situation where the proxy object MUST perform a synchronous proxy call to the original thread to perform the QI on the 'real object'... Then, if the iface which is returned must be proxied too...
someone has got to own a lock when we do addref's and release's on the REAL object. I probably made a mistake and should have had the proxy object own this addref/release lock. The only problem with that is that if the real object was threadsafe, you would have to entry two locks. :-/ If the underlying real object was threadsafe, QI can just be a synchronous freethreaded call.
I guess, i'm still not seeing why the AddRef/Release on the real object need to be locked (ie. threadsafe)... I haven't looked at the code, but i'm assuming that there is a *single* addref (on the caller's thread) of the "real object" when the proxy object is created. This "owning reference" is all that's needed to keep the real object alive... after that all AddRef/Release calls apply *only* to the proxy object. Finally when the proxy object is deleted, it proxies a call back to the "caller's thread" for the Release() of the real object... In this situation, all AddRef/Release calls are made on the correct thread so no locking is required... I believe that the issue with requiring QI to be synchronous is just an explicit example of an unstated assumption that "only interfaces which have no [out] parameters can have async proxies"... I think that in the case of ISupports we *must* implement it synchronously for the very reason that QI (and even AddRef/Release) have [out] parameters...
bulk move to 0.9.4
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Attached patch Proposed Patch v.3 (deleted) — Splinter Review
doug, i think that i'm finally beginning to see why the underlying object needs to implement a thread-safe version of nsISupports :-) The main reason being that since QI automatically does an AddRef(...) we can't call it on another thread *unless* the ISupports implementation is threadsafe... So, if we state (in VERY BIG PRINT) that all real objects *must* have a threadsafe ISupports implementation then I think your patch is fine. I've only got one question... what is the difference between PROXY_ASYNC and PROXY_ALWAYS? It looks like both are treated the same in your patch. Specifically, it looks like a proxy event will be created for PROXY_ASYNC *even* if the call is made on the same thread... -- rick
PROXY_ASYNC will not wait for any response. It is a kind of "fire and forget" function call. There is a check during object creation which determines if the destination event queue is on the caller's thread. If it is, the proxy code returns the same object (no proxy is created). This was some funky requirement back in the days of t ravis. PROXY_ALWAYS will bypass this check.
r=rpotts
John, how about a sr=?
sr=jband
a=blizzard on behalf of drivers for 0.9.4.
Whiteboard: approved for 0.9.4
I am seeing problems with this current patch. I need more time to look at it. --> 0.9.5
Whiteboard: approved for 0.9.4
Target Milestone: mozilla0.9.4 → mozilla0.9.5
i am a loser. I applied this patch by hand to a new tree and was seeing problems. The problem was that I messed up apply the patch and forgot a return rv. This caused a nice deadlock!!! I am going to check this fix in to 0.9.4.
Checking in src/nsProxyEvent.cpp; /cvsroot/mozilla/xpcom/proxy/src/nsProxyEvent.cpp,v <-- nsProxyEvent.cpp new revision: 1.64; previous revision: 1.63 done Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
crap. linux does not exit during the page cycler anymore. Had to backout the last patch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla0.9.5 → mozilla0.9.4
i fumbled this one pretty bad. Too many different patches on different machines. Patch v.3 above does not apply and is missing logic to handle zero parameters. If I rediff my windows box and apply it to my linux box, I can successfully complete pagecycler. New patch coming up.
hey doug, is it too early in the morning, or does that new 'if (NS_FAILED(rv)) return rv;' code at the bottom of the patch cause 'proxyInfo' to leak if 'PostAndWait()' fails? I think it might be better to say something like: rv = PostAndWait(proxyInfo); if (NS_SUCCEEDED(rv)) { rv = proxyInfo->GetResult(); } delete proxyInfo; return rv; -- rick
yup. and the leak was in the older patches too. I will fix it now.
Keywords: crash
Comment on attachment 48176 [details] [diff] [review] "I won't say that this is the last patch" patch At some point we should probably fix up 'convertMiniVariantToVariant' so it uses nsMemory instead of malloc/free... Just for consistancy ;-)
Attachment #48176 - Flags: review+
rs=waterson, somebody else better understand these changes better than I do.
fix checked in.
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: