Closed Bug 536316 Opened 15 years ago Closed 14 years ago

e10s HTTP: channel refcounting

Categories

(Core :: Networking: HTTP, defect)

Other Branch
x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jduell.mcbugs, Assigned: jdm)

References

Details

Attachments

(1 file, 4 obsolete files)

Right now the patch for 530952 leaks memory. For HttpChannelChild we need to override Release, because when refcount == 0 it's generally time to shut down the IPDL protocol, rather than simply delete (the exception is when IPDL is doing the shutting down). I've got a handle on the basic idea, but for some reason when I debug the refcnt never hits 0. I'm planning to look into this some more, but if someone else wants to give it a try let me know and I'll fill you in on the details.
Depends on: 532221
Yoink.
Assignee: nobody → josh
This requires a little bit of thought to get right. Since the DeallocPHttpChannel methods call Release() instead of deleting the actors outright, we can be left in a situation in which the parent actor is still alive and owned, while the child actor has been abandoned and deleted. So, from IRC: <jdm> My thought process is that we could either 1) discard any events that occur after the protocol is deleted <jdm> or 2) implement a two-step delete process 1) is a simple flag that gets set in ActorDestroy() (assuming we trigger Send__delete__ in the child's destructor) and stops any further events from being sent over the pipe. 2) requires a set of messages that inform the other side that one side's refcount has reached 1 (indicating that no other objects possess an owning reference). When each side reaches this state, the actors can safely be deleted. I think I'll implement option 2.
So the comment about the refcount reaching 1 was obviously ill-thought out. The problem here is that on both ends of the pipe, the actors are AddRefed on creation, and the corresponding release doesn't happen until the __delete__ message is received. That obviously causes problems if we're waiting to send the __delete__ message until the refcount reaches 0.
Jason and I have decided that the problem is actually not as complicated as I have made it appear. We're going with the flag in the parent that indicates that the actor is no longer valid, as the chrome channel could end up keeping the actor alive after the __delete__ message is received. On the child side, the channel actor's refcount won't be affected at all by the IPDL allocation/deallocation, so it will have a normal lifetime and send the __delete__ message as it dies.
Attached patch Patch (obsolete) (deleted) — Splinter Review
This turned out to be quite a bit simpler than I originally anticipated. The reference-counted lifetime of the child does make sense if you squint a bit.
Attachment #437796 - Flags: review?(jduell.mcbugs)
Attached patch Patch (obsolete) (deleted) — Splinter Review
This time without the xpcshell valgrind changes.
Attachment #437796 - Attachment is obsolete: true
Attachment #437800 - Flags: review?(jduell.mcbugs)
Attachment #437796 - Flags: review?(jduell.mcbugs)
Blocks: 536321
Blocks: 558620
Blocks: 558623
josh, biesi and I cooked up what I think is the right scheme for refcounting the HttpChannels. I'll either explain it to you soon or just post the patch.
I look forward to it.
Attached patch Patch v2 (obsolete) (deleted) — Splinter Review
As described on IRC. I've removed the HttpHandler cruft in HttpChannelChild because a) the CallGetService didn't seem to be properly AddRefing the service (???), and the Release ended up trashing it early b) HttpBaseChannel already retains the service; doing it twice is unnecessary.
Attachment #437800 - Attachment is obsolete: true
Attachment #440205 - Flags: review?(jduell.mcbugs)
Attachment #437800 - Flags: review?(jduell.mcbugs)
Attachment #440205 - Flags: review?(jduell.mcbugs) → review+
Comment on attachment 440205 [details] [diff] [review] Patch v2 >diff --git a/netwerk/ipc/NeckoChild.cpp b/netwerk/ipc/NeckoChild.cpp > bool > NeckoChild::DeallocPHttpChannel(PHttpChannelChild* channel) > { > NS_ABORT_IF_FALSE(IsNeckoChild(), "DeallocPHttpChannel called by non-child!"); > >- HttpChannelChild *p = static_cast<HttpChannelChild*>(channel); >- p->Release(); >+ // The channel is actually reference counted, but we override the usual >+ // Release destruction so that it can occur here. >+ delete channel; > return true; > } Change comment to // Delete channel (HttpChannelChild's refcnt must already hit 0 to get here) >diff --git a/netwerk/protocol/http/src/HttpChannelChild.cpp b/netwerk/protocol/http/src/HttpChannelChild.cpp >+NS_IMETHODIMP_(nsrefcnt) HttpChannelChild::Release(void) >+{ >+ NS_PRECONDITION(0 != mRefCnt, "dup release"); >+ NS_ASSERT_OWNINGTHREAD(HttpChannelChild); >+ --mRefCnt; >+ NS_LOG_RELEASE(this, mRefCnt, "HttpChannelChild"); >+ if (mRefCnt != 0) { >+ return mRefCnt; >+ } >+ >+ if (mWasOpened) { >+ PHttpChannelChild::Send__delete__(this); >+ } >+ else { >+ mRefCnt = 1; >+ NS_DELETEXPCOM(this); >+ } >+ return 0; >+} Just in case the boilerplate XPCOM release code ever changes, please change this to use NS_IMPL_RELEASE_WITH_DESTROY, i.e. put the "if(wasOpened)..." part into its own function, and pass that to NS_IMPL_RELEASE_WITH_DESTROY. Also, I believe that HttpChannelChild may be getting its AddRef from nsHashPropertyBag (run the debugger to find out). If so, we're using expensive thread-safe ref counting for AddRef, which we don't actually need for HttpChannelChild (it's only called on the main thread). If so, also throw in an NS_IMPL_ADDREF for HttpChannelChild here too. Finally, add comment to the destroy function you create. How about // Refcount -> 0 means we initiate IPDL shutdown, which will release // HttpChannelParent (which releases the chrome nsHttpChannel). This object // gets deleted by NeckoChild::DeallocPHttpChannel. >+ // IPDL acts as the transport layer here, and the corresponding release >+ // occurs in OnStopRequest or OnRedirect. > this->AddRef(); change comment to // The socket transport layer in the chrome process now has a logical ref to // us, until either OnStopRequest or OnRedirect is called. >--- a/netwerk/protocol/http/src/HttpChannelParent.cpp >+++ b/netwerk/protocol/http/src/HttpChannelParent.cpp >@@ -45,25 +45,20 @@ > #include "nsISupportsPriority.h" > > namespace mozilla { > namespace net { > > // C++ file contents > HttpChannelParent::HttpChannelParent() > { >- // Ensure gHttpHandler is initialized: we need the atom table up and running. >- nsIHttpProtocolHandler* handler; >- CallGetService(NS_NETWORK_PROTOCOL_CONTRACTID_PREFIX "http", &handler); >- NS_ASSERTION(handler, "no http handler"); > } > > HttpChannelParent::~HttpChannelParent() > { >- NS_RELEASE(gHttpHandler); > } Wait a sec. Are you sure you can get rid of this? You said you were removing this from HttpChanneChild, but this is HttpChannelParent, which does not inherit from HttpBaseClass, so I think we may need this code (unless we don't need the atom table init'd any more: but I'm pretty sure we do, for ParamTraits to work with our IPDL msgs). What's your evidence that the GetService call isn't addreffing? Use of NS_RELEASE here is a bug: NS_RELEASE sets the passed pointer to NULL, which would have blown up all later refs to gHttpHandler by other code. This should simply be "gHttpHandler->Release". (Please replace the NS_RELEASE in the HttpBaseChannel destructor with "gHttpHandler->Release" as well: we currently use a copy of gHttpHandler to avoid nulling it, which is stupid). Finally, have you tested this code? There's no easy way to write an automated test for this, but I at least need to know that you've actually seen HttpChannelChild/Parent and the nsHttpChannel all get deleted (and not at program shutdown) before we commit this. +r with these changes and testing.
All very good points, I obviously wasn't thinking hard enough about the code removal, so that will return to the fixed patch. The changes are all tested via gdb breakpoints; the destruction occurs in both child and parent as expected.
Attached patch Patch v2 (obsolete) (deleted) — Splinter Review
Here's the patch with the requested fixes, although it's not ready for checkin yet. I'm running into a strange crash that seems to stem from too many Releases - on shutdown, the cycle collector segfaults collecting the document which holds a strong reference to the channel, but the reference isn't valid any more. Any thoughts? I'm thoroughly stumped right now.
Attachment #440205 - Attachment is obsolete: true
Attachment #441031 - Flags: feedback?(jduell.mcbugs)
Attached patch Patch v2 (deleted) — Splinter Review
So, I ended up rebasing today, and the bizarre refcount behaviour is gone. I guess this is ready to check in, since the r+ carries over. I've verified that all channels die at the expected time.
Attachment #441031 - Attachment is obsolete: true
Attachment #441031 - Flags: feedback?(jduell.mcbugs)
Comment on attachment 442098 [details] [diff] [review] Patch v2 Lovely. So far we've seen random drift where either the channel is never fully released (me about 4 months ago) and where it gets released too much (recently), and then the problem just goes away! Very confidence-building. Anyway, it's not this code's fault, so let's check it in. jdm: Didn't you run into some line of code that tried to incorrectly pass the channel to the cycle collector? Or something like that? I just remember you mentioning a line of code that everyone agreed was wrong, but wasn't your code. Should we be removing that in this patch (or opening a bug for it)? -NS_IMPL_ADDREF_INHERITED(HttpChannelChild, HttpBaseChannel) -NS_IMPL_RELEASE_INHERITED(HttpChannelChild, HttpBaseChannel) +NS_IMPL_ADDREF(HttpChannelChild) +NS_IMPL_RELEASE_WITH_DESTROY(HttpChannelChild, ReleaseDestroy()) Add comment above: // Override nsHashPropertyBag's AddRef: we don't need thread-safe refcnt +void +HttpChannelChild::ReleaseDestroy() +{ + if (mWasOpened) { + PHttpChannelChild::Send__delete__(this); + } + else { + NS_DELETEXPCOM(this); + } +} Just use "delete" instead of NS_DELETEXPCOM, which is obsolete.
http://hg.mozilla.org/projects/electrolysis/rev/88a23b9dd4c9 (I make the fixes in the last comment, except for the cycle collector one.) Thanks for slogging through this, jdm!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: