Closed
Bug 383271
Opened 17 years ago
Closed 17 years ago
leak document (and sometimes windows and XBL docs) after typing in search box
Categories
(Firefox :: Search, defect, P1)
Firefox
Search
Tracking
()
RESOLVED
FIXED
Firefox 3 alpha7
People
(Reporter: dbaron, Assigned: Gavin)
Details
(Keywords: memory-leak)
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
mconnor
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
If I type in the search box, I leak a document with a moz-nullprincipal URI, and sometimes (if the page shown is about:blank) additional XBL documents and window objects.
Steps to reproduce:
1. start trunk Firefox, with the steps for using leak-gauge as documented in http://mxr.mozilla.org/seamonkey/source/tools/footprint/leak-gauge.pl?raw=1
2. type "hello" in the search box
3. let the autocomplete popup appear
4. click in the blank space in the menu bar to make the popup go away
5. close the window
Actual results: at least one leaked document (and 300K+ other objects, according to nsTraceRefcnt logging). If the document shown is about:blank, a bunch of XBL documents and some windows as well.
This may well be a core bug.
Assignee | ||
Comment 1•17 years ago
|
||
The document being leaked here is the document returned from the parseFromStream call in the search service, which is used to load XML search engines. The search bar binding (search.xml) has a reference to the active search engine, which holds a reference to it's XML document. Breaking the link between the search engine and it's document once it's no longer needed fixes the leak of the document, but I suspect there's still a cycle involving the search bar and the autocomplete binding and/or controller. I also have no idea why the content document loaded in the window has any effect on this. I need to look into this further.
Assignee: nobody → gavin.sharp
Status: NEW → ASSIGNED
Reporter | ||
Comment 2•17 years ago
|
||
Nulling out pointers to potentially-large objects that you no longer need is a good idea in any case, so this seems good. (Programming languages can't handle this for you automatically, at least in the general case, since it requires solving the halting problem.)
Reporter | ||
Updated•17 years ago
|
Flags: blocking-firefox3?
Updated•17 years ago
|
Flags: blocking-firefox3? → blocking-firefox3+
Assignee | ||
Comment 3•17 years ago
|
||
Now that bug 387223 has landed, XPC_SHUTDOWN_HEAP_DUMP output made the problem rather obvious: the search suggest component (which is an nsIAutoCompleteSearch) is leaking itself and it's timer in a cycle. That leaks the active engine via _serverErrorEngine, which entrains it's XML document (which is what let leak-gauge identify the leak). It also means we leak _formHistoryResult and the XMLHTTPRequest (which ends up leaking a couple of DOM windows, I think).
I'll attach a patch to fix the cycle. It fixes the leak of the document even without attachment 268356 [details] [diff] [review] applied, but we should land that attachment anyways for the reasons cited in comment 2.
OS: Linux → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → Firefox 3 beta1
Version: unspecified → Trunk
Assignee | ||
Comment 4•17 years ago
|
||
Attachment #271394 -
Flags: review?(mconnor)
Assignee | ||
Updated•17 years ago
|
Attachment #268356 -
Attachment description: fix the document leak → don't hold on to documents we don't need
Attachment #268356 -
Flags: review?(mconnor)
Updated•17 years ago
|
Attachment #268356 -
Flags: review?(mconnor) → review+
Comment 5•17 years ago
|
||
Comment on attachment 271394 [details] [diff] [review]
break the cycle
> notify: function SAC_notify(timer) {
>- // make sure we're still waiting for this response before sending
>- if ((timer != this._formHistoryTimer) || !this._listener)
>+ // Break the cycle between us and the timer
>+ this._formHistoryTimer = null;
This is because |timer != this._formHistoryTimer| is never true, correct?
r=me assuming that's the case
Attachment #271394 -
Flags: review?(mconnor) → review+
Assignee | ||
Comment 6•17 years ago
|
||
(In reply to comment #5)
> This is because |timer != this._formHistoryTimer| is never true, correct?
That's correct. The timer object itself calls Notify, and it sends "this" as the parameter.
Assignee | ||
Updated•17 years ago
|
Keywords: checkin-needed
Comment 7•17 years ago
|
||
So xpcom/threads/nsTimerImpl.cpp needs to participate in cycle collection. Is that a know bug? If not, please file and nominate as a blocker.
Anything else here implemented in C++ and interfaced via XPCOM that needs the same treatment? Please don't just wallpaper, file those bugs. There are unscanned edges in the C++ codebase and we need to add traverse and unlink methods on singletons QI'd from instances of such edge-owning object classes.
/be
Keywords: checkin-needed
Updated•17 years ago
|
Keywords: checkin-needed
Comment 8•17 years ago
|
||
Sorry if I sounded testy. The point is that XPCOM is broken as designed without systematic cycle collection, breaking, or avoidance of some kind. For Gecko, the only way is collection (we can't restrict the topology to guarantee breaking or avoidance). For 1.9, that means the XPCOM cycle collector. So it is incumbent on everyone who is dealing with the codebase to look for missing CC participants. Spread the word.
/be
Reporter | ||
Comment 9•17 years ago
|
||
Cycle collection can't deal with objects that are reference-counted from threads other than the main thread.
Comment 10•17 years ago
|
||
(In reply to comment #9)
> Cycle collection can't deal with objects that are reference-counted from
> threads other than the main thread.
Good point, but we should either:
(a) implement the concurrent part of the Bacon-Rajan 2002 algorithm (http://citeseer.ist.psu.edu/bacon01concurrent.html).
(b) do something ad-hoc for XPCOM timers, which (I blame pav) should not be MT-shared XPCOM objects in any event, and which run their observer-flavored releases on the same main thread that sets all observer-flavored timers (if this were not the case, we would have bad thread safety bugs already).
So I don't think we need to do (a) yet, or ever. I do think we need (b). Do you see an alternative, other than a long, costly, and probably unsatisfyingly incomplete march through observer-using JS code to break cycles manually?
/be
Comment 11•17 years ago
|
||
See the comment I wrote years ago after much debugging of MT races, in nsTimerImpl::Release.
The upshot is (and pav agrees) that our timer API should use opaque, integer IDs and not MT-safe ref-counted strong refs. Even before changing the API (for Mozilla 2), we can simply make nsTimerImpl single-threaded, a "fat" ID as it were.
Then the TimerThread code would not keep references to instances of nsTimerImpl. Rather it would keep duplicate bookkeeping. Those nsTimerImpl instances that are used by our DOM-based chrome and content code (and therefore are main-thread-only) could then participate in cycle collection.
It might be possible to do this even without changing TimerThread to copy data instead of pointing to nsTimerImpl instances, based on the logic in Release cited above and the underlying rules. I will look into that tomorrow, but feel free to beat me to it.
/be
Reporter | ||
Comment 12•17 years ago
|
||
Either way, both of these patches should land for now so we can make forward progress on debugging leaks. If we get a better fix for timers later, we can back the second one out.
Comment 13•17 years ago
|
||
(In reply to comment #12)
> Either way, both of these patches should land for now so we can make forward
> progress on debugging leaks. If we get a better fix for timers later, we can
> back the second one out.
Yeah, that's why all I ever wanted was a followup bug, not to delay things here.
I've seen both wallpaper + followup (with FIXME comment citing latter), which is great; wallpaper only, not so great; and general despair with neither wallpaper nor followup (bad). Don't see much followup-only/no-wallpaper, but if others do, let me know. I'm interested in both short- and long-term fixes for hard problems we face.
(Not meant for dbaron:) That XPCOM is broken as designed is old news; the Cycle Collector is still being debugged. If we have to do something special for timers, or even for MT XPCOM reference-counting, we should consider it and not just give up or be content with wallpaper only. Same goes for shutdown bugs in XPConnect, and other latent architecture bugs or weaknesses. I sense some amount of despair and ennui, which is understandable, but we can and will fix more of these design flaws -- so cheer up!
I'm pretty sure the long march through timer-using JS, hacking code to break cycles, will be unsatisfying -- as in: Firefox 3 leaks too much, esp. in view of add-ons and other XUL apps.
I filed bug 387341 on making XPCOM timers participate in GC, so please cite it in FIXME comments here and anywhere similar.
/be
Assignee | ||
Comment 14•17 years ago
|
||
Attachment #271394 -
Attachment is obsolete: true
Assignee | ||
Comment 15•17 years ago
|
||
Checked in both patches.
mozilla/browser/components/search/nsSearchService.js 1.97
mozilla/browser/components/search/nsSearchSuggestions.js 1.15
You need to log in
before you can comment on or make changes to this bug.
Description
•