Closed Bug 746255 Opened 13 years ago Closed 10 years ago

Investigate if there is a memory leak regression in Necko

Categories

(Core :: Networking, defect)

14 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox14 - ---

People

(Reporter: smaug, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

Version: 12 Branch → 14 Branch
In particular, it sounds like necko is holding on to notification callbacks for very long times. The expectation callers have is that necko drops all references to request observers and notification callbacks OnStopRequest. I did check, and nsHttpChannel seems to null out mTransaction OnStopRequest. But nsHttpConnection seems to hold a long-lived ref to mCallbacks: basically until either ~nsHttpConnection or until nsHttpConnection::Activate. It's been that way since bug 623948 looks like; before that the socket code regot the callbacks from the transaction as needed, afaict. Did we recently change something to make nsHttpConnections live longer, perhaps?
Whiteboard: [MemShrink]
When bz says "pretty serious regression" I get scared. Jason, Patrick: are either of you able to investigate this?
Whiteboard: [MemShrink] → [MemShrink:P1]
Patrick (author of code changes) is out this week. Honza reviewed them--could you take a look Honza?
Assignee: nobody → honzab.moz
(In reply to Boris Zbarsky (:bz) from comment #1) > But nsHttpConnection seems to hold a long-lived ref to mCallbacks: basically > until either ~nsHttpConnection or until nsHttpConnection::Activate. Or (the important one) until nsHttpConnection::CloseTransaction(), which should be pretty similar in timing to OnStopRequest.
Hmm. Then what happened with smaug's stack above?
(In reply to Boris Zbarsky (:bz) from comment #5) > Hmm. Then what happened with smaug's stack above? You mean this? https://bugzilla.mozilla.org/show_bug.cgi?id=743178#c36 It seems to be a proxy release of callbacks that could be invoked just at shutdown. But that is actually not anything wrong if something finished just during shutdown. Olly, what was state of your preferences for pipelining and spdy? Is there some STR? Maybe there is, just lost in the comments of bug 743178... CloseTransaction is called: - newly, on read timeout (nsHttpConnection::ReadTimeoutTick) - when read or write on the socket failed (nsHttpConnection::OnInputStreamReady, nsHttpConnection::OnOutputStreamReady) ; this, importantly, includes NS_BASE_STREAM_CLOSED when a transaction finished its job gracefully - from nsHttpConnectionMgr::OnMsgCancelTransaction, when the transaction is still in progress nsHttpConnectionMgr::OnMsgCancelTransaction is posted via gHttpHandler->CancelTransaction from: - nsHttpChannel::Cancel - nsHttpChannel::OnStopRequest when aStatus or mStatus was NS_FAILED I've checked with instrumentation of the code (at nsInterfaceRequestorAgg ctor and dtor) we release callbacks ASAP. The pref setting was default.
(In reply to Honza Bambas (:mayhemer) from comment #6) > Olly, what was state of your preferences for pipelining and spdy? Whatever the default prefs are > Is there > some STR? Maybe there is, just lost in the comments of bug 743178... Keep Facebook or tbpl open in a tab few days and reload occasionally
(not sure if the reload is needed)
For clarity's sake, is this a regression from bug 743178, or was it just found while reviewing the patch for bug 743178. It's not clear to me what the regression is or when it first occurred.
This was found while debugging bug 743178. Or more precisely, this is the underlying cause of bug 743178 as far as we can tell. In that bug we worked around it by dropping some references in the DOM code when we get OnStopRequest, but the object that was holding those references should have had all references to it dropped by necko (and hence should have gone away and dropped its references without the need for manual releasing), and as far as we can tell that didn't happen. I agree that it's not clear when the problem appeared...
There could be a problem with idle connection timer, however that is quit unlikely. When we create a connection that is never used, it keeps callbacks until it is closed and released. Such connection is however closed up in 5 seconds of idle time, thus very soon. I could create a patch to let such connections throw callbacks as soon as not needed and revert the workaround CORS listener patch. Olli, would you be willing to test a try build with such patch?
Blocks: 623948
Status: NEW → ASSIGNED
I could try. Note, the leak happened usually one in two days when keeping Facebook open.
(In reply to Olli Pettay [:smaug] from comment #12) > I could try. Note, the leak happened usually one in two days when keeping > Facebook open. Just submitted: https://tbpl.mozilla.org/?tree=Try&rev=c5a3e082a9c4 Thanks!
The leak is still there.
(In reply to Olli Pettay [:smaug] from comment #15) > The leak is still there. How did you actually check? The memory consumption went extremely high and you used some tool to check it is caused by keeping CORS listeners by necko?
I used the tool from https://bugzilla.mozilla.org/show_bug.cgi?id=726346 and clicked Find Documents, and it showed the same leak I had before.
(I use facebook.com for testing, since it shows the leak reasonable soon.)
Olli, another appeal: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011/04/2011-04-01-03-mozilla-central/ doesn't contain fix for bug 623948, so there should be no leak http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011/04/2011-04-02-03-mozilla-central/ *DOES* contain that fix, so there should be the leak Are you will to test if that makes sense for a one year old build? (Don't forget to backup or copy the profile.)
[Triage Comment] Can we get an updated status on the severity of this leak in FF14? We're about to merge 14 to the beta channel tomorrow so there will be more users affected by this if it's still present.
Bug 743178 is a workaround the issue. The real issue is possibly very old one. I'll test the builds mentioned in comment 19 - the problem is that such old builds don't have easy-to-use APIs to check whether something is leaking (one needs to create cycle collection logs and analyze them using some external tools).
In other words, I don't think this is sever enough to block for example FF14. Bug 743178 fixes nicely the issues related to cycle collection responsiveness.
Olli did a lot of testing. Bug 743178 has fixed a leak he was able to find. No we seem to be more or less safe. I wasn't able to find the leak my self, though I've spent hours by investigating the code. Since there is currently no major issue in this area, I think we don't need to block on this.
Maybe, the two reference cycles in bug 777044 might be worth a look (?)
Olli, is this bug closeable?
Whiteboard: [MemShrink:P1] → [MemShrink:P2]
I believe no.
Hasn't this been fixed with bug 795305?
Flags: needinfo?(mcmanus)
I don't know of active work here - but its olli's bug.
Flags: needinfo?(mcmanus)
I think Patrick or somebody has found and fixed a callbacks leak a longer time ago. If there is nothing of suspicion here, please close this bug.
Assignee: honzab.moz → nobody
Status: ASSIGNED → UNCONFIRMED
Ever confirmed: false
Flags: needinfo?(bugs)
I haven't seen this leak lately.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(bugs)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.