746255 - Investigate if there is a memory leak regression in Necko

In particular, it sounds like necko is holding on to notification callbacks for very long times. The expectation callers have is that necko drops all references to request observers and notification callbacks OnStopRequest. I did check, and nsHttpChannel seems to null out mTransaction OnStopRequest. But nsHttpConnection seems to hold a long-lived ref to mCallbacks: basically until either ~nsHttpConnection or until nsHttpConnection::Activate. It's been that way since bug 623948 looks like; before that the socket code regot the callbacks from the transaction as needed, afaict. Did we recently change something to make nsHttpConnections live longer, perhaps?

Andrew McCreight [:mccr8]

Updated

•

13 years ago

Whiteboard: [MemShrink]

Nicholas Nethercote [inactive]

Comment 2

•

13 years ago

When bz says "pretty serious regression" I get scared. Jason, Patrick: are either of you able to investigate this?

Whiteboard: [MemShrink] → [MemShrink:P1]

Jason Duell

Comment 3

•

13 years ago

Patrick (author of code changes) is out this week. Honza reviewed them--could you take a look Honza?

Assignee: nobody → honzab.moz

Patrick McManus [:mcmanus]

Comment 4

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #1) > But nsHttpConnection seems to hold a long-lived ref to mCallbacks: basically > until either ~nsHttpConnection or until nsHttpConnection::Activate. Or (the important one) until nsHttpConnection::CloseTransaction(), which should be pretty similar in timing to OnStopRequest.

Boris Zbarsky [:bzbarsky]

Comment 5

•

13 years ago

Hmm. Then what happened with smaug's stack above?

Honza Bambas (:mayhemer)

Comment 6

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #5) > Hmm. Then what happened with smaug's stack above? You mean this? https://bugzilla.mozilla.org/show_bug.cgi?id=743178#c36 It seems to be a proxy release of callbacks that could be invoked just at shutdown. But that is actually not anything wrong if something finished just during shutdown. Olly, what was state of your preferences for pipelining and spdy? Is there some STR? Maybe there is, just lost in the comments of bug 743178... CloseTransaction is called: - newly, on read timeout (nsHttpConnection::ReadTimeoutTick) - when read or write on the socket failed (nsHttpConnection::OnInputStreamReady, nsHttpConnection::OnOutputStreamReady) ; this, importantly, includes NS_BASE_STREAM_CLOSED when a transaction finished its job gracefully - from nsHttpConnectionMgr::OnMsgCancelTransaction, when the transaction is still in progress nsHttpConnectionMgr::OnMsgCancelTransaction is posted via gHttpHandler->CancelTransaction from: - nsHttpChannel::Cancel - nsHttpChannel::OnStopRequest when aStatus or mStatus was NS_FAILED I've checked with instrumentation of the code (at nsInterfaceRequestorAgg ctor and dtor) we release callbacks ASAP. The pref setting was default.

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 7

•

13 years ago

(In reply to Honza Bambas (:mayhemer) from comment #6) > Olly, what was state of your preferences for pipelining and spdy? Whatever the default prefs are > Is there > some STR? Maybe there is, just lost in the comments of bug 743178... Keep Facebook or tbpl open in a tab few days and reload occasionally

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 8

•

13 years ago

(not sure if the reload is needed)

Alex Keybl [:akeybl]

Comment 9

•

13 years ago

For clarity's sake, is this a regression from bug 743178, or was it just found while reviewing the patch for bug 743178. It's not clear to me what the regression is or when it first occurred.

tracking-firefox14: ? → +

Boris Zbarsky [:bzbarsky]

Comment 10

•

13 years ago

This was found while debugging bug 743178. Or more precisely, this is the underlying cause of bug 743178 as far as we can tell. In that bug we worked around it by dropping some references in the DOM code when we get OnStopRequest, but the object that was holding those references should have had all references to it dropped by necko (and hence should have gone away and dropped its references without the need for manual releasing), and as far as we can tell that didn't happen. I agree that it's not clear when the problem appeared...

Honza Bambas (:mayhemer)

Comment 11

•

13 years ago

There could be a problem with idle connection timer, however that is quit unlikely. When we create a connection that is never used, it keeps callbacks until it is closed and released. Such connection is however closed up in 5 seconds of idle time, thus very soon. I could create a patch to let such connections throw callbacks as soon as not needed and revert the workaround CORS listener patch. Olli, would you be willing to test a try build with such patch?

Blocks: 623948

Status: NEW → ASSIGNED

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 12

•

13 years ago

I could try. Note, the leak happened usually one in two days when keeping Facebook open.

Honza Bambas (:mayhemer)

Comment 13

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #12) > I could try. Note, the leak happened usually one in two days when keeping > Facebook open. Just submitted: https://tbpl.mozilla.org/?tree=Try&rev=c5a3e082a9c4 Thanks!

Honza Bambas (:mayhemer)

Comment 14

•

13 years ago

Windows build should be soon done as well: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/honzab.moz@firemni.cz-d4b9ca006df3/

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 15

•

13 years ago

The leak is still there.

Honza Bambas (:mayhemer)

Comment 16

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #15) > The leak is still there. How did you actually check? The memory consumption went extremely high and you used some tool to check it is caused by keeping CORS listeners by necko?

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 17

•

13 years ago

I used the tool from https://bugzilla.mozilla.org/show_bug.cgi?id=726346 and clicked Find Documents, and it showed the same leak I had before.

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 18

•

13 years ago

(I use facebook.com for testing, since it shows the leak reasonable soon.)

Honza Bambas (:mayhemer)

Comment 19

•

13 years ago

Olli, another appeal: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011/04/2011-04-01-03-mozilla-central/ doesn't contain fix for bug 623948, so there should be no leak http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011/04/2011-04-02-03-mozilla-central/ *DOES* contain that fix, so there should be the leak Are you will to test if that makes sense for a one year old build? (Don't forget to backup or copy the profile.)

Lukas Blakk [:lsblakk] use ?needinfo

Comment 20

•

12 years ago

[Triage Comment] Can we get an updated status on the severity of this leak in FF14? We're about to merge 14 to the beta channel tomorrow so there will be more users affected by this if it's still present.

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 21

•

12 years ago

Bug 743178 is a workaround the issue. The real issue is possibly very old one. I'll test the builds mentioned in comment 19 - the problem is that such old builds don't have easy-to-use APIs to check whether something is leaking (one needs to create cycle collection logs and analyze them using some external tools).

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 22

•

12 years ago

In other words, I don't think this is sever enough to block for example FF14. Bug 743178 fixes nicely the issues related to cycle collection responsiveness.

Honza Bambas (:mayhemer)

Comment 23

•

12 years ago

Olli did a lot of testing. Bug 743178 has fixed a leak he was able to find. No we seem to be more or less safe. I wasn't able to find the leak my self, though I've spent hours by investigating the code. Since there is currently no major issue in this area, I think we don't need to block on this.

Alex Keybl [:akeybl]

Updated

•

12 years ago

tracking-firefox14: + → -

Benoit Jacob [:bjacob] (mostly away)

Comment 24

•

12 years ago

Maybe, the two reference cycles in bug 777044 might be worth a look (?)

Nicholas Nethercote [inactive]

Comment 25

•

12 years ago

Olli, is this bug closeable?

Whiteboard: [MemShrink:P1] → [MemShrink:P2]

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 26

•

12 years ago

I believe no.

Honza Bambas (:mayhemer)

Comment 27

•

11 years ago

Hasn't this been fixed with bug 795305?

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Comment 28

•

11 years ago

I don't know of active work here - but its olli's bug.

Flags: needinfo?(mcmanus)

Honza Bambas (:mayhemer)

Comment 29

•

10 years ago

I think Patrick or somebody has found and fixed a callbacks leak a longer time ago. If there is nothing of suspicion here, please close this bug.

Assignee: honzab.moz → nobody

Status: ASSIGNED → UNCONFIRMED

Ever confirmed: false

Nicholas Nethercote [inactive]

Updated

•

10 years ago

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 30

•

10 years ago

I haven't seen this leak lately.

Status: UNCONFIRMED → RESOLVED

Closed: 10 years ago

Flags: needinfo?(bugs)

Resolution: --- → WORKSFORME