Closed Bug 383271 Opened 17 years ago Closed 17 years ago

leak document (and sometimes windows and XBL docs) after typing in search box

Categories

(Firefox :: Search, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 3 alpha7

People

(Reporter: dbaron, Assigned: Gavin)

Details

(Keywords: memory-leak)

Attachments

(2 files, 1 obsolete file)

If I type in the search box, I leak a document with a moz-nullprincipal URI, and sometimes (if the page shown is about:blank) additional XBL documents and window objects. Steps to reproduce: 1. start trunk Firefox, with the steps for using leak-gauge as documented in http://mxr.mozilla.org/seamonkey/source/tools/footprint/leak-gauge.pl?raw=1 2. type "hello" in the search box 3. let the autocomplete popup appear 4. click in the blank space in the menu bar to make the popup go away 5. close the window Actual results: at least one leaked document (and 300K+ other objects, according to nsTraceRefcnt logging). If the document shown is about:blank, a bunch of XBL documents and some windows as well. This may well be a core bug.
The document being leaked here is the document returned from the parseFromStream call in the search service, which is used to load XML search engines. The search bar binding (search.xml) has a reference to the active search engine, which holds a reference to it's XML document. Breaking the link between the search engine and it's document once it's no longer needed fixes the leak of the document, but I suspect there's still a cycle involving the search bar and the autocomplete binding and/or controller. I also have no idea why the content document loaded in the window has any effect on this. I need to look into this further.
Assignee: nobody → gavin.sharp
Status: NEW → ASSIGNED
Nulling out pointers to potentially-large objects that you no longer need is a good idea in any case, so this seems good. (Programming languages can't handle this for you automatically, at least in the general case, since it requires solving the halting problem.)
Flags: blocking-firefox3?
Flags: blocking-firefox3? → blocking-firefox3+
Now that bug 387223 has landed, XPC_SHUTDOWN_HEAP_DUMP output made the problem rather obvious: the search suggest component (which is an nsIAutoCompleteSearch) is leaking itself and it's timer in a cycle. That leaks the active engine via _serverErrorEngine, which entrains it's XML document (which is what let leak-gauge identify the leak). It also means we leak _formHistoryResult and the XMLHTTPRequest (which ends up leaking a couple of DOM windows, I think). I'll attach a patch to fix the cycle. It fixes the leak of the document even without attachment 268356 [details] [diff] [review] applied, but we should land that attachment anyways for the reasons cited in comment 2.
OS: Linux → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → Firefox 3 beta1
Version: unspecified → Trunk
Attached patch break the cycle (obsolete) (deleted) — Splinter Review
Attachment #271394 - Flags: review?(mconnor)
Attachment #268356 - Attachment description: fix the document leak → don't hold on to documents we don't need
Attachment #268356 - Flags: review?(mconnor)
Attachment #268356 - Flags: review?(mconnor) → review+
Comment on attachment 271394 [details] [diff] [review] break the cycle > notify: function SAC_notify(timer) { >- // make sure we're still waiting for this response before sending >- if ((timer != this._formHistoryTimer) || !this._listener) >+ // Break the cycle between us and the timer >+ this._formHistoryTimer = null; This is because |timer != this._formHistoryTimer| is never true, correct? r=me assuming that's the case
Attachment #271394 - Flags: review?(mconnor) → review+
(In reply to comment #5) > This is because |timer != this._formHistoryTimer| is never true, correct? That's correct. The timer object itself calls Notify, and it sends "this" as the parameter.
So xpcom/threads/nsTimerImpl.cpp needs to participate in cycle collection. Is that a know bug? If not, please file and nominate as a blocker. Anything else here implemented in C++ and interfaced via XPCOM that needs the same treatment? Please don't just wallpaper, file those bugs. There are unscanned edges in the C++ codebase and we need to add traverse and unlink methods on singletons QI'd from instances of such edge-owning object classes. /be
Keywords: checkin-needed
Sorry if I sounded testy. The point is that XPCOM is broken as designed without systematic cycle collection, breaking, or avoidance of some kind. For Gecko, the only way is collection (we can't restrict the topology to guarantee breaking or avoidance). For 1.9, that means the XPCOM cycle collector. So it is incumbent on everyone who is dealing with the codebase to look for missing CC participants. Spread the word. /be
Cycle collection can't deal with objects that are reference-counted from threads other than the main thread.
(In reply to comment #9) > Cycle collection can't deal with objects that are reference-counted from > threads other than the main thread. Good point, but we should either: (a) implement the concurrent part of the Bacon-Rajan 2002 algorithm (http://citeseer.ist.psu.edu/bacon01concurrent.html). (b) do something ad-hoc for XPCOM timers, which (I blame pav) should not be MT-shared XPCOM objects in any event, and which run their observer-flavored releases on the same main thread that sets all observer-flavored timers (if this were not the case, we would have bad thread safety bugs already). So I don't think we need to do (a) yet, or ever. I do think we need (b). Do you see an alternative, other than a long, costly, and probably unsatisfyingly incomplete march through observer-using JS code to break cycles manually? /be
See the comment I wrote years ago after much debugging of MT races, in nsTimerImpl::Release. The upshot is (and pav agrees) that our timer API should use opaque, integer IDs and not MT-safe ref-counted strong refs. Even before changing the API (for Mozilla 2), we can simply make nsTimerImpl single-threaded, a "fat" ID as it were. Then the TimerThread code would not keep references to instances of nsTimerImpl. Rather it would keep duplicate bookkeeping. Those nsTimerImpl instances that are used by our DOM-based chrome and content code (and therefore are main-thread-only) could then participate in cycle collection. It might be possible to do this even without changing TimerThread to copy data instead of pointing to nsTimerImpl instances, based on the logic in Release cited above and the underlying rules. I will look into that tomorrow, but feel free to beat me to it. /be
Either way, both of these patches should land for now so we can make forward progress on debugging leaks. If we get a better fix for timers later, we can back the second one out.
(In reply to comment #12) > Either way, both of these patches should land for now so we can make forward > progress on debugging leaks. If we get a better fix for timers later, we can > back the second one out. Yeah, that's why all I ever wanted was a followup bug, not to delay things here. I've seen both wallpaper + followup (with FIXME comment citing latter), which is great; wallpaper only, not so great; and general despair with neither wallpaper nor followup (bad). Don't see much followup-only/no-wallpaper, but if others do, let me know. I'm interested in both short- and long-term fixes for hard problems we face. (Not meant for dbaron:) That XPCOM is broken as designed is old news; the Cycle Collector is still being debugged. If we have to do something special for timers, or even for MT XPCOM reference-counting, we should consider it and not just give up or be content with wallpaper only. Same goes for shutdown bugs in XPConnect, and other latent architecture bugs or weaknesses. I sense some amount of despair and ennui, which is understandable, but we can and will fix more of these design flaws -- so cheer up! I'm pretty sure the long march through timer-using JS, hacking code to break cycles, will be unsatisfying -- as in: Firefox 3 leaks too much, esp. in view of add-ons and other XUL apps. I filed bug 387341 on making XPCOM timers participate in GC, so please cite it in FIXME comments here and anywhere similar. /be
Attached patch break the cycle, with FIXME (deleted) — Splinter Review
Attachment #271394 - Attachment is obsolete: true
Checked in both patches. mozilla/browser/components/search/nsSearchService.js 1.97 mozilla/browser/components/search/nsSearchSuggestions.js 1.15
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: