Closed
Bug 383269
Opened 18 years ago
Closed 18 years ago
leak of elements when quitting with 2 empty windows
Categories
(Core :: JavaScript Engine, defect, P1)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
mozilla1.9alpha6
People
(Reporter: dbaron, Assigned: brendan)
References
Details
(Keywords: memory-leak)
Attachments
(5 files, 2 obsolete files)
I've been looking into simple cases where we're leaking significant graphs of objects, and have found the following.
Steps to reproduce:
1. start in a fresh firefox profile
2. switch the preference in preferences -> Main -> Startup -> When Firefox Starts to "Show my windows and tabs from last time"
3. open two new windows (which should be blank)
4. close the other window, so you have only 2 blank windows
5. do File | Quit
6. start up again on that profile, with leak logging this tie
7. do File | Quit
(I see the bug intermittently with a single empty window, but reliably with 2 empty windows.)
I see a bunch of objects leaked. The cycle collector says it didn't collect one of the documents or JS contexts because (among a bunch of other reasons) there were outstanding references to an nsGenericElement that it couldn't account for, which according to the refcount balancer is a wrapped native. JS_DumpHeap says the reason we didn't collect the wrapped native is the GC trace:
0x16986c0 XULElement 16aae90 via ...(0x185e740 Object).keys_(0x185e780 Array).0(0x185e6c0 Function).__parent__(0x185e580 Call).__proto__(0x17e8fc0 Call).arguments(0x19225c0 Object).1(0x1921900 Object).phishingWarden_(0x18e7640 Object).tabbrowser_(0x1698780 XULElement).__parent__(0x1698740 XULElement).__parent__(0x1698700 XULElement).__parent__
I'm not sure what the "..." at the beginning is.
Comment 1•18 years ago
|
||
Maybe related to bug 380837?
Comment 2•18 years ago
|
||
(In reply to comment #0)
> 0x16986c0 XULElement 16aae90 via ...(0x185e740 Object).keys_(0x185e780
> Array).0(0x185e6c0 Function).__parent__(0x185e580 Call).__proto__(0x17e8fc0
> Call).arguments(0x19225c0 Object).1(0x1921900 Object).phishingWarden_(0x18e7640
> Object).tabbrowser_(0x1698780 XULElement).__parent__(0x1698740
> XULElement).__parent__(0x1698700 XULElement).__parent__
>
> I'm not sure what the "..." at the beginning is.
The first number in ...(0x185e740 Object) gives the address that can be matched against the first column of the output. Do you have it in your output? If so, it should allow to recover the full chain.
Reporter | ||
Comment 3•18 years ago
|
||
Ah, I was thinking that, but I didn't look for it in the first column, but instead at the end, so I didn't find it. It's:
0x185e740 Object 1442e14 via /home/dbaron/builds/trunk-cc/obj/firefox-debugopt/dist/bin/components/nsSafebrowsingApplication.js(0x17b0b80 BackstagePass).gDataProvider(0x181c2c0 Object).prefs_(0x181c300 Object).observers_(0x181c340 Object).browser.safebrowsing.dataProvider
Comment 4•18 years ago
|
||
Ah, the pref observers are leaking. We should null out PROT_PhishingWarden.prefs_ in PROT_PhishingWarden.prototype.shutdown. This probably fixes bug 380837 too.
Reporter | ||
Comment 5•18 years ago
|
||
It seems scary to me that you have references into a specific document from a JS component. Shouldn't something be ensuring those references are dropped when the document goes away, rather than waiting until shutdown?
Comment 6•18 years ago
|
||
PROT_PhishingWarden.prototype.shutdown gets called on XUL window unload. It's just poorly named.
Reporter | ||
Comment 7•18 years ago
|
||
It looks (from the source) like gDataProvider is a PROT_Application, not a PROT_PhishingWarden.
Reporter | ||
Comment 8•18 years ago
|
||
er, sorry, a PROT_DataProvider.
Reporter | ||
Updated•18 years ago
|
Assignee: dbaron → nobody
Flags: blocking-firefox3?
Comment 9•18 years ago
|
||
I'm not sure how gDataProvider.prefs_ gets an observer containing PROT_PhishingWarden or XUL objects. The pref observers are all within a small scope:
http://mxr.mozilla.org/seamonkey/source/browser/components/safebrowsing/content/globalstore.js
Reporter | ||
Comment 10•18 years ago
|
||
This GC trace looks like a class variable that should be an instance variable, but I can't find the problem...
Assignee | ||
Comment 11•18 years ago
|
||
(In reply to comment #9)
> I'm not sure how gDataProvider.prefs_ gets an observer containing
> PROT_PhishingWarden or XUL objects.
Comment 0 (cited in comment 2) shows a reference path.
> The pref observers are all within a small scope:
> http://mxr.mozilla.org/seamonkey/source/browser/components/safebrowsing/content/globalstore.js
If I paste the path in comment 3 to the one in comment 0, and wrap lines, I get:
.../dist/bin/components/nsSafebrowsingApplication.js(0x17b0b80 BackstagePass)
.gDataProvider(0x181c2c0 Object)
.prefs_(0x181c300 Object)
.observers_(0x181c340 Object)
.browser.safebrowsing.dataProvider.keys_(0x185e780 Array)
.0(0x185e6c0 Function)
.__parent__(0x185e580 Call)
.__proto__(0x17e8fc0 Call)
.arguments(0x19225c0 Object)
.1(0x1921900 Object)
.phishingWarden_(0x18e7640 Object)
.tabbrowser_(0x1698780 XULElement)
.__parent__(0x1698740 XULElement)
.__parent__(0x1698700 XULElement)
.__parent__
Any old Lispers around will recognize the perils of environment capture. There's a function at index 0 in the keys_ array held by gDataProvider.prefs_.observers_ &c that is a closure: it entrains an object via its arguments array, at index 1 (the second actual arg). This is an Object with a .phishingWarden_ property referencing an Object containing a .tabbrowser_ and the rest is history.
/be
Reporter | ||
Comment 12•18 years ago
|
||
(In reply to comment #11)
> .browser.safebrowsing.dataProvider.keys_(0x185e780 Array)
This should be two lines:
.browser.safebrowsing.dataProvider(0x185e740 Object)
.keys_(0x185e780 Array)
> the rest is history.
It's still not clear; the thing added as an observer/listener (a PROT_Controller, 0x1921900, presumably mangled by BindToObject into 0x185e6c0, although I don't really understand what structure BindToObject should produce) was added to something else.
Comment 13•18 years ago
|
||
Yeah, it's unclear to me how a PROT_Controller became an argument to PROT_DataProvider.prefs_. In particular, what's __parent__ or __parent__.__proto__ on a Function? The functions added to keys_ should all have |this| (PROT_DataProvider) at arguments[1].
FWIW, it seems like this is pretty straight forward to fix: break the chain during window unload by nulling out tabbrowser_ or phishingWarden_.
Comment 14•18 years ago
|
||
(In reply to comment #13)
> Yeah, it's unclear to me how a PROT_Controller became an argument to
> PROT_DataProvider.prefs_. In particular, what's __parent__ or
> __parent__.__proto__ on a Function?
Since __parent__ in this case is a Call object, it means a closure that holds all its environment as Brendan ponted out.
Comment 15•18 years ago
|
||
(In reply to comment #3)
> Ah, I was thinking that, but I didn't look for it in the first column, but
> instead at the end, so I didn't find it. It's:
>
> 0x185e740 Object 1442e14 via
> /home/dbaron/builds/trunk-cc/obj/firefox-debugopt/dist/bin/components/nsSafebrowsingApplication.js(0x17b0b80
> BackstagePass).gDataProvider(0x181c2c0 Object).prefs_(0x181c300
> Object).observers_(0x181c340 Object).browser.safebrowsing.dataProvider
>
But what XPCOM references /home/dbaron/builds/trunk-cc/obj/firefox-debugopt/dist/bin/components/nsSafebrowsingApplication.js(0x17b0b80
BackstagePass)?
I.e. what is is the full cycle and why the cycle collector can not detect it?
Reporter | ||
Comment 16•18 years ago
|
||
It's not a cycle. That's the global scope in a JS component, which is a root, and should be.
Brendan thinks this is a JS engine bug, and gave me a long explanation yesterday which I asked him to put in the bug.
Assignee | ||
Comment 17•18 years ago
|
||
I am concerned that something is polluting a prototype object's 'arguments' slot, entraining junk from the "last get" on a Call object that's defunct, keeping stuff alive that should not be. Igor, see the __proto__ of the Call object, which is also a Call object (therefore a Call prototype?). More in a bit.
/be
Comment 18•18 years ago
|
||
To David Baron:
Can you attach the whole trace dump?
Comment 19•18 years ago
|
||
(In reply to comment #17)
> I am concerned that something is polluting a prototype object's 'arguments'
> slot, entraining junk from the "last get" on a Call object that's defunct,
> keeping stuff alive that should not be. Igor, see the __proto__ of the Call
> object, which is also a Call object (therefore a Call prototype?). More in a
> bit.
Yes, this is strange. Looks like explicit __proto__ manipulation or a bug somewhere.
Assignee | ||
Comment 20•18 years ago
|
||
(In reply to comment #19)
> Yes, this is strange. Looks like explicit __proto__ manipulation or a bug
> somewhere.
__proto__ is not defined in a Call object or its prototype -- see js_InitCallClass.
/be
Comment 21•18 years ago
|
||
This looks like a bug in SpiderMonkey. Here is a simple js shell session with dumpHeap output which shows that somehow the arguments object is managed to be stored in Call.__prototype__.
Comment 22•18 years ago
|
||
This is indeed a bug in SpiderMonkey. When the second line of the example above is excuted,
function outer() { var x = arguments; return function inner() { return x }; }
var f = outer("abc");
js_PutCallObject is eventually called, http://lxr.mozilla.org/seamonkey/source/js/src/jsfun.c#625.
At line 646 that in turn calls js_GetProperty(cx, callobj, argsid, &aval) which invokes js_NativeGet, http://lxr.mozilla.org/seamonkey/source/js/src/jsobj.c#3412
At line 3445 that in turns calls
LOCKED_OBJ_SET_SLOT(pobj, slot, *vp);
since cx->runtime->propertyRemovals == sample is true. And this exactly the line where Call.__prototype__.arguments is assigned.
Comment 23•18 years ago
|
||
Here is a test case for the bug. Since __proto__ censors Call objects, it relies on the generators finally invoked by GC so when 380469 is fixed, it will stop working.
~/m/trunk/mozilla/js/src> cat ~/m/y.js
function outer() { var x = arguments; return function inner() { return x }; }
var gen_was_closed = false;
function gen()
{
try {
yield 1;
} finally {
gen_was_closed = true;
}
}
var g = gen();
g.next();
var f = outer(g);
f = null;
g = null;
gc();
gc();
if (!gen_was_closed)
throw "leaked arguments object";
~/m/trunk/mozilla/js/src> js -v 170 ~/m/y.js
before 27696, after 27696, break 08bd8000
before 27696, after 27696, break 08bd8000
uncaught exception: leaked arguments object
Comment 24•18 years ago
|
||
Here is a test case for js shell that does not relies on generators and just measures the performance of GC to detect the leak.
Comment 25•18 years ago
|
||
The new test case that passes with the bug fixed.
Updated•18 years ago
|
Attachment #267550 -
Attachment is obsolete: true
Comment 26•18 years ago
|
||
The bug happens because arguments property for Class objects is unshared but has getter and setter defined. It means that the corresponding JSScopeProperty.slot points to a valid slot and this is the slot that caches the last getter result after each execution of call_GetProperty.
I am not sure where the logic is broken , but it seems strange in js_NativeGet to cache the getter result when obj != pobj.
Assignee | ||
Comment 27•18 years ago
|
||
The arguments property of call objects must be assignable -- you can rebind arguments to 42, e.g. Hence its lack of JSPROP_SHARED. We could try starting it out as JSPROP_SHARED in the Call prototype, and then overriding it in a Call instance for a function that mutates it with an unshared (slot-ful) property. Taking this bug.
/be
Assignee: nobody → general
Component: Phishing Protection → JavaScript Engine
Flags: blocking-firefox3?
OS: Linux → All
Priority: -- → P1
Product: Firefox → Core
QA Contact: phishing.protection → general
Hardware: PC → All
Target Milestone: --- → mozilla1.9alpha6
Reporter | ||
Updated•18 years ago
|
Flags: blocking1.9?
Comment 28•18 years ago
|
||
(In reply to comment #27)
> The arguments property of call objects must be assignable -- you can rebind
> arguments to 42, e.g. Hence its lack of JSPROP_SHARED.
But is exactly the purpose of assigning a slot with a getter result in js_NativeGet when obj != pobj? I do not see how this can be useful.
Assignee | ||
Comment 29•18 years ago
|
||
(In reply to comment #28)
> (In reply to comment #27)
> > The arguments property of call objects must be assignable -- you can rebind
> > arguments to 42, e.g. Hence its lack of JSPROP_SHARED.
>
> But is exactly the purpose of assigning a slot with a getter result in
> js_NativeGet when obj != pobj? I do not see how this can be useful.
If the property has a slot, then the getter must be a JSPropertyOp. The assumption is that the slot was allocated only for the object (pobj) that owns the scope (the JS_ASSERT(scope->object == pobj)) that contains the property (sprop).
IOW, you can't safely index into obj->dslots[sprop->slot], obj could delegate to pobj but not allocate dslots at all, or big enough to hold that slot, or with the slot available for that property (instead of being claimed by another).
This kind of entrainment is why JSPROP_SHARED was added. SHARED, PERMANENT prototype properties do not entrain, but appear to be directly contained by each delegating object (obj.hasOwnProperty('foo') is true, even though 'foo' is bound in pobj's scope). Hence the idea for a fix suggested in comment 27.
/be
Assignee: general → brendan
Comment 30•18 years ago
|
||
(In reply to comment #29)
> > But is exactly the purpose of assigning a slot with a getter result in
> > js_NativeGet when obj != pobj? I do not see how this can be useful.
>
> If the property has a slot, then the getter must be a JSPropertyOp.
...
> IOW, you can't safely index into obj->dslots[sprop->slot],
...
But then js_NativeGet should assert that pobj == slot in the case of valid slots. I.e. if object does not want the shared slot, then it must ensure that the slot is created in the resolve hook.
And this is exactly what call_resolve missed to do.
Also AFAICS for sprop with getter and slot == -1 there is no need to lock the object the second time in js_NativeGet as sprop->slot can not change to something valid during the execution of getter.
Assignee | ||
Comment 31•18 years ago
|
||
(In reply to comment #30)
> But then js_NativeGet should assert that pobj == slot
pobj == obj, right?
> in the case of valid
> slots. I.e. if object does not want the shared slot, then it must ensure that
> the slot is created in the resolve hook.
The API is old and there are legitimate uses of slot-ful (not JSPROP_SHARED) prototype properties, so I am loath to add such an assert.
> And this is exactly what call_resolve missed to do.
Patch in a second with a simpler fix.
> Also AFAICS for sprop with getter and slot == -1 there is no need to lock the
> object the second time in js_NativeGet as sprop->slot can not change to
> something valid during the execution of getter.
js_NativeGet's caller expects the (pobj's) scope to be locked on return.
/be
Status: NEW → ASSIGNED
Assignee | ||
Comment 32•18 years ago
|
||
Igor, both testcases from this bug that you provided pass for me without this patch, so I'm not sure how to test in the shell. But it seems clear the rogue call that gets the current frame's argsobj and stuffs it into Call prototype's slot for the proto-property named 'arguments' is the one in js_PutCallObject.
The interaction with override bits is delicate, so I'm trying this minimal patch.
/be
Attachment #268309 -
Flags: review?(igor)
Comment 33•18 years ago
|
||
Comment on attachment 268309 [details] [diff] [review]
proposed fix
Right, the removal of js_GetProperty call is an effective cure for the problem.
Attachment #268309 -
Flags: review?(igor) → review+
Assignee | ||
Comment 34•18 years ago
|
||
Fixed on trunk:
js/src/jsfun.c 3.187
/be
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Comment 35•18 years ago
|
||
The last patch is not enough since one has to prevent any call js_GetProperty("arguments"). So here is a new test case that demonstrates the leak even with the patch committed:
~/m/trunk/mozilla/js/src $ cat ~/s/t2.js
function outer() {
var x;
with ({}) {
x = arguments;
}
return function inner() { return x };
}
var gen_was_closed = false;
function gen()
{
try {
yield 1;
} finally {
gen_was_closed = true;
}
}
var g = gen();
g.next();
var f = outer(g);
f = null;
g = null;
gc();
gc();
if (!gen_was_closed)
throw "leaked arguments object";
~/m/trunk/mozilla/js/src $ ./Linux_All_DBG.OBJ/js -v 170 ~/s/t2.js
before 27696, after 27696, break 08832000
before 27696, after 27696, break 08832000
uncaught exception: leaked arguments object
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 36•18 years ago
|
||
I think the best way to fix the bug would be to remove arguments and __callee__ from the Call.prototype and instead add them as necessary in call_resolve.
Comment 38•18 years ago
|
||
This is a test case for js shell that does not use generators and exposes the problem even with the last fix committed. Its core is essentially:
function f(obj) {
with (obj)
return arguments;
}
This triggers js_NativeGet for arguments through name lookup causing the leak through the prototype object.
Assignee | ||
Comment 39•18 years ago
|
||
Using the resolve hook is a good idea. We can avoid all the overriding tests too by not resolving when assigning.
/be
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 40•18 years ago
|
||
The first fix was a fine change on its own. This is the call_resolve addition. We still need the override flag magic in order to handle
js> function f(){
let a=arguments; print(a[0]); arguments=42; print(arguments); return a
}
js> f(33)[0]
33
42
33
/be
Attachment #268550 -
Flags: review?(igor)
Comment 41•18 years ago
|
||
Comment on attachment 268550 [details] [diff] [review]
fix, part 2
With this change there is no point to keep arguments defined in the prototype.
Assignee | ||
Comment 42•18 years ago
|
||
Oops, forgot to lose that JSPropertySpec. Thanks,
/be
Attachment #268550 -
Attachment is obsolete: true
Attachment #268559 -
Flags: review?(igor)
Attachment #268550 -
Flags: review?(igor)
Updated•18 years ago
|
Attachment #268559 -
Flags: review?(igor) → review+
Assignee | ||
Comment 43•18 years ago
|
||
Fixed harder on trunk:
js/src/jsfun.c 3.188
/be
Status: ASSIGNED → RESOLVED
Closed: 18 years ago → 18 years ago
Resolution: --- → FIXED
Comment 44•18 years ago
|
||
/cvsroot/mozilla/js/tests/ecma_3/Function/arguments-002.js,v <-- arguments-002.js
initial revision: 1.1
/cvsroot/mozilla/js/tests/js1_5/GC/regress-383269-01.js,v <-- regress-383269-01.js
initial revision: 1.1
Checking in js1_5/GC/regress-383269-02.js;
/cvsroot/mozilla/js/tests/js1_5/GC/regress-383269-02.js,v <-- regress-383269-02.js
initial revision: 1.1
arguments-002.js fails in the trunk with
Expected value '33,42:33', Actual value '33,[object Object]:33'
Flags: in-testsuite+
Comment 45•17 years ago
|
||
verified fixed 1.9.0 linux/mac*/windows js1_5/GC/regress-383269-0[12].js.
filing bug 396584 for ecma_3/Function/arguments-002.js
Status: RESOLVED → VERIFIED
Comment 46•17 years ago
|
||
Igor: I get occasional failures on windows vms with js1_5/GC/regress-383269-01.js where the times will be similar to base_gc_time=0, last_gc_time=15. The problem with the windows vms are that they have variable performance depending upon the load being imposed on the host by the other vms.
I noticed that js1_5/GC/regress-383269-01.js tests time > (base_time + 1) * 3 while js1_5/GC/regress-383269-02.js tests time > (base_time + 10) * 3. Should the test in -01, be the same as in -02? Is there another adjustment that can be made to make this more reliable?
Comment 47•16 years ago
|
||
update test to handle uncaught exception
http://hg.mozilla.org/mozilla-central/rev/81d6d6ba55e1
Comment 48•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/25a8deec5823
/cvsroot/mozilla/js/tests/performance-1.8.0.tests,v <-- performance-1.8.0.tests
new revision: 1.4; previous revision: 1.3
/cvsroot/mozilla/js/tests/performance-1.8.1.tests,v <-- performance-1.8.1.tests
new revision: 1.4; previous revision: 1.3
/cvsroot/mozilla/js/tests/performance-1.9.0.tests,v <-- performance-1.9.0.tests
new revision: 1.4; previous revision: 1.3
/cvsroot/mozilla/js/tests/performance-1.9.1.tests,v <-- performance-1.9.1.tests
new revision: 1.2; previous revision: 1.1
/cvsroot/mozilla/js/tests/performance-1.9.2.tests,v <-- performance-1.9.2.tests
new revision: 1.2; previous revision: 1.1
/cvsroot/mozilla/js/tests/performance.tests,v <-- performance.tests
new revision: 1.3; previous revision: 1.2
You need to log in
before you can comment on or make changes to this bug.
Description
•