Closed
Bug 600139
Opened 14 years ago
Closed 14 years ago
Delayed marking can skip marking live objects
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: gkw, Assigned: gwagner)
References
Details
(Keywords: assertion, regression, testcase, Whiteboard: [sg:critical?] fixed-in-tracemonkey)
Attachments
(2 files, 1 obsolete file)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
igor
:
review+
|
Details | Diff | Splinter Review |
var count = 0;
var x = Proxy.create(
{
has: function (name) {
++count;
if (count == 320) gczeal(2);
print(count);
return name in x;
}
},
{}
);
3 in x;
asserts js debug shell on TM changeset 54700fad8cf9 with -m at Assertion failure: addr % sizeof(FreeCell) == 0, when passed as a CLI argument.
s-s because this concerns gczeal.
Many thanks go out to Jesse for helping to reduce this.
autoBisect shows this is probably related to the following changeset: (not sure if it really is related)
The first bad revision is:
changeset: 54483:1c913526c597
user: Gregor Wagner
date: Fri Sep 24 10:54:39 2010 -0700
summary: Bug 558861 - Compartmental GC (r=gal)
Reporter | ||
Updated•14 years ago
|
blocking2.0: --- → ?
Comment 1•14 years ago
|
||
This testcase is fragile. Small changes can cause it to switch between these two assertions or not happen at all.
* Assertion failure: !JSID_IS_VOID(lastProp->id), at jsscope.h:678
* Assertion failure: addr % sizeof(FreeCell) == 0, at jsgc.h:312
Valgrind sheds no light.
Blocks: JaegerFuzz
Comment 2•14 years ago
|
||
It's crashing while trying to create an exception object to report the too-much-recursion condition. I see some 0xdadadada near the top of the stack.
Reporter | ||
Updated•14 years ago
|
Summary: JM: "Assertion failure: addr % sizeof(FreeCell) == 0," with gczeal → JM: "Assertion failure: addr % sizeof(FreeCell) == 0," or "Assertion failure: !JSID_IS_VOID(lastProp->id)," with gczeal
Updated•14 years ago
|
Assignee: general → anygregor
Comment 3•14 years ago
|
||
What configuration/os did you test with?
Comment 4•14 years ago
|
||
Definitely a blocker for final. This is a GC hazard.
Reporter | ||
Comment 5•14 years ago
|
||
(In reply to comment #3)
> What configuration/os did you test with?
32-bit js debug shell on Ubuntu 10.04 is what I tested with..
Comment 6•14 years ago
|
||
I also tested with a 32-bit shell (Mac OS X 10.5).
Updated•14 years ago
|
blocking2.0: ? → betaN+
Assignee | ||
Comment 7•14 years ago
|
||
I can reproduce it now on linux. The expected output is a "too much recursion" error right?
Maybe the stack size or loopcount is a hint that something goes wrong in the jit?
On mac the loop count goes always up to 68 (interpreter, -j, -m) and returns.
On linux the loopcount is between 139-142 for interpreter and -j and returns normally. It's about 467 when it crashes with -m.
Jesse I don't see the output numbers for your stack. Are they the same?
Assignee | ||
Comment 8•14 years ago
|
||
The reason why we crash is that we are marking an object that was finalized during the previous GC run. We also hit the delay marking children point very often for each GC right before we crash but the amount of actual objects is also very limited. There is not much stack left for the GC.
Assignee | ||
Comment 9•14 years ago
|
||
We perform a few GC's way above the stack limit and go into the delayMarking mode with the very first object to mark.
Limit: bf87cee1, we start the GC at: 0xbf87c5ec
Maybe we should avoid this and set a minimum amount of stackspace that is needed for a GC. A small check but big security improvement!
Igor what do you think?
But this has not much todo with this bug. I don't know why the methodjit doesn't stop earlier. Maybe somebody else wants to take a look?
Comment 10•14 years ago
|
||
(In reply to comment #9)
> We perform a few GC's way above the stack limit and go into the delayMarking
> mode with the very first object to mark.
> Limit: bf87cee1, we start the GC at: 0xbf87c5ec
> Maybe we should avoid this and set a minimum amount of stackspace that is
> needed for a GC.
If delayMarking starts immediately, then we have problems elsewhere as some code allowed to breach the stack limit.
> A small check but big security improvement!
Could you clarify what exactly do you mean by the check and the security improvement?
Assignee | ||
Comment 11•14 years ago
|
||
> > A small check but big security improvement!
>
> Could you clarify what exactly do you mean by the check and the security
> improvement?
I don't know what the right thing is (perform no GC, report too much recursion...) but we should definitely not start a GC outside of the stack boundaries. We should check this when we start the GC.
(In reply to comment #9)
How much of the contiguous slab is needed for GC? (Why is any needed at all?)
Comment 13•14 years ago
|
||
(In reply to comment #11)
> I don't know what the right thing is (perform no GC, report too much
> recursion...) but we should definitely not start a GC outside of the stack
> boundaries. We should check this when we start the GC.
If we call the GC outside the quota, it implies that we have bigger problems to worry about as we could not bound the stack growth and the stack overflow is exploitable on some platforms. So we should just assert at js_GC and a few other places that quota is OK so we can find bugs faster.
Updated•14 years ago
|
Whiteboard: [sg:critical?]
Assignee | ||
Comment 14•14 years ago
|
||
Filed bug 600310 for the GC outside of stack quota problem but the over recursion seems to be another problem that is not GC related.
Updated•14 years ago
|
Assignee: anygregor → general
Comment 15•14 years ago
|
||
Spun of the GC-related part as 600310. Need a new owner since this is not GC related and JM-only (stack overflow).
Updated•14 years ago
|
Assignee: general → dvander
(In reply to comment #14)
What exactly is the over recursion problem? JM uses much less stack than js_Interpret, so it makes sense that it'd take longer to reach the limit. And it's not going over the limit.
Comment 17•14 years ago
|
||
we're hitting this on the tinderbox in jquery tests.
Assignee | ||
Comment 18•14 years ago
|
||
(In reply to comment #16)
> (In reply to comment #14)
> What exactly is the over recursion problem? JM uses much less stack than
> js_Interpret, so it makes sense that it'd take longer to reach the limit. And
> it's not going over the limit.
Where do you see it's not over the limit? I think its over the limit.
Assignee | ||
Comment 19•14 years ago
|
||
(In reply to comment #17)
> we're hitting this on the tinderbox in jquery tests.
Do you mean bug 583554? They are not related.
Comment 20•14 years ago
|
||
The jquery orange is bug 600622.
Assignee | ||
Comment 21•14 years ago
|
||
(In reply to comment #20)
> The jquery orange is bug 600622.
bug 600622 seems to be bug 583554 but this bug is still different.
Assignee | ||
Comment 22•14 years ago
|
||
David you should hit now the assertion from bug 600310. I guess the title is no longer valid?
Yup, thanks.
Summary: JM: "Assertion failure: addr % sizeof(FreeCell) == 0," or "Assertion failure: !JSID_IS_VOID(lastProp->id)," with gczeal → JM: Assertion failure, JS_CHECK_STACK_SIZE in jsgc.cpp
Comment 24•14 years ago
|
||
Assertion failure: JS_CHECK_STACK_SIZE(cx->stackLimit, &stackDummy),
at js/src/jsgc.cpp:2680
Summary: JM: Assertion failure, JS_CHECK_STACK_SIZE in jsgc.cpp → JM: "Assertion failure: JS_CHECK_STACK_SIZE(cx->stackLimit, &stackDummy)" in jsgc.cpp
Summarizing of findings so far:
1. The new assert is not a JM bug. It's somewhat bogus but I don't know how to fix it. The problem is that js_ReportOverRecursed wants to allocate an exception object, and this can cause a GC. JM is correctly throwing an error, and the GC doesn't account for this edge case.
2. The original assert actually went away when bug 589398 landed.
3. Gregor points out that *some* bug still exists, with an optimized, gc-zeal enabled build, we iloop marking a shape which is its own parent.
I've rolled my tree back before bug 589398 and I'm now debugging the original assert. I don't think that bug ever really went away, it's just hiding.
Updated•14 years ago
|
Summary: JM: "Assertion failure: JS_CHECK_STACK_SIZE(cx->stackLimit, &stackDummy)" in jsgc.cpp → JM: GC hazard with gczeal, proxies, recursion
Okay, bug 589398 "fixed" the problem by accidentally disabling the empty script optimization, which made some method JIT where it previously didn't. Will file another bug on this shortly.
It looks like we miss marking a funobj so its slot values get collected, and the next GC *does* mark it and crashes trying to dereference the values.
Aha. The bug is in the delayed marking code, best visualized with a call stack:
#0 js::GCMarker::delayMarkingChildren (this=0xfff83708, thing=0xf7503050) at ../jsgc.cpp:1410
#1 0x080dd196 in TypedMarker (trc=0xfff83708, thing=0xf7503050) at ../jsgcinlines.h:285
#2 0x080e11e2 in Mark<JSObject> (trc=0xfff83708, thing=0xf7503050) at ../jsgcinlines.h:148
#3 0x080dcda8 in MarkObject (trc=0xfff83708, obj=..., name=0x8303fc9 "proto") at ../jsgcinlines.h:184
#4 0x080dcedc in MarkChildren (trc=0xfff83708, fun=0xf7503eb0) at ../jsgcinlines.h:214
#5 0x080e6fa2 in js::gc::Arena<JSFunction>::markDelayedChildren (this=0xf7503000, trc=0xfff83708) at ../jsgc.cpp:1438
We're marking delayed children in arena X, and this recursively causes a wider range of X to need delayed marking. However, markDelayedChildren() clears the |link| field, so the range is reset.
Gregor, I'm not entirely comfortable attempting a fix here, are you willing to take this?
Updated•14 years ago
|
Summary: JM: GC hazard with gczeal, proxies, recursion → JM: Delayed marking can skip marking live objects
Updated•14 years ago
|
Summary: JM: Delayed marking can skip marking live objects → Delayed marking can skip marking live objects
Comment 28•14 years ago
|
||
(In reply to comment #27)
> Aha. The bug is in the delayed marking code, best visualized with a call stack:
> We're marking delayed children in arena X, and this recursively causes a wider
> range of X to need delayed marking.
Nice find - the arena should be popped of the delayed marking stack before doing any marking in it.
Assignee | ||
Comment 29•14 years ago
|
||
>
> We're marking delayed children in arena X, and this recursively causes a wider
> range of X to need delayed marking. However, markDelayedChildren() clears the
> |link| field, so the range is reset.
>
> Gregor, I'm not entirely comfortable attempting a fix here, are you willing to
> take this?
Uh that's an evil one! Nice work!
Assignee | ||
Updated•14 years ago
|
Assignee: dvander → anygregor
Assignee | ||
Comment 30•14 years ago
|
||
Comment 31•14 years ago
|
||
Comment on attachment 481700 [details] [diff] [review]
patch
>diff -r a4383c16a9b5 js/src/jsgc.cpp
>--- a/js/src/jsgc.cpp Thu Oct 07 15:35:08 2010 -0500
>+++ b/js/src/jsgc.cpp Thu Oct 07 16:22:25 2010 -0700
>@@ -1434,23 +1434,26 @@ Arena<T>::markDelayedChildren(JSTracer *
> thing++;
> }
> }
>
> void
> GCMarker::markDelayedChildren()
> {
> while (Arena<Cell> *a = unmarkedArenaStackTop) {
>+ MarkingDelay *markingDelay = a->getMarkingDelay();
>+ unmarkedArenaStackTop = (markingDelay->link != a)
>+ ? markingDelay->link
>+ : NULL;
markingDelay->link reset must also be moved here. The code should also comment on exactly what is going here noting that Arena<T>::markDelayedChildren reads getMarkingDelay()->start before it can be affected by the following marking.
Assignee | ||
Comment 32•14 years ago
|
||
I forgot that we are not pushing the arena on the stack if markingdelay->link is still set. That's some tricky code there.
Thx!
Attachment #481700 -
Attachment is obsolete: true
Assignee | ||
Updated•14 years ago
|
Attachment #481904 -
Flags: review?(igor)
Comment 33•14 years ago
|
||
Comment on attachment 481904 [details] [diff] [review]
patch
I realized that another possibility to fix this bug is to change the loop in Arena<T>::markDelayedChildren into something like:
while ((T *)getMarkingDelay()->start <= thingsEnd) {
if ((T *)getMarkingDelay()->start->asCell()->isMarked())
MarkChildren(trc, thing);
(T *)getMarkingDelay()->start++;
}
This way if the MarkChildren would need to delay marking for the arena, it would just reset the start. This would roughly correspond to what the code was doing before the compartment changes. But it was hard to follow the older logic, so lets stick to a simple solution.
>diff -r 7b037f68bccf js/src/jsgc.cpp
>--- a/js/src/jsgc.cpp Thu Oct 07 18:59:18 2010 -0700
>+++ b/js/src/jsgc.cpp Fri Oct 08 12:56:37 2010 -0700
>@@ -1435,22 +1435,31 @@ Arena<T>::markDelayedChildren(JSTracer *
> }
> }
>
> void
> GCMarker::markDelayedChildren()
> {
> while (Arena<Cell> *a = unmarkedArenaStackTop) {
> /*
>- * The following assert verifies that the current arena belongs to the
>- * unmarked stack, since DelayMarkingChildren ensures that even for
>- * the stack's bottom, prevUnmarked != 0 but rather points to
>- * itself.
>- */
>+ * Get next arena from delayed marking stack.
Nit: drop this sentence as it just rephrases the code.
>+ * markingDelay->link == current arena indicates last arena on stack.
>+ * If marking gets delayed at the same arena again, the arena is pushed
>+ * again in delayMarkingChildren. markingDelay->link has to be cleared,
>+ * otherwise the arena is not pushed again.
>+ */
Nit: it looks like the comment is not properly aligned.
I also think we should simplify the code and replace that markingDelay->link == current hack with a special sentinel address like 1 to denote the bottom of the stack. But this is for another bug.
Attachment #481904 -
Flags: review?(igor) → review+
Assignee | ||
Comment 34•14 years ago
|
||
Whiteboard: [sg:critical?] → fixed-in-tracemonkey
Updated•14 years ago
|
Whiteboard: fixed-in-tracemonkey → [sg:critical?] fixed-in-tracemonkey
Comment 35•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 36•13 years ago
|
||
JSBugMon: This bug has been automatically verified fixed.
Comment 37•13 years ago
|
||
JSBugMon: This bug has been automatically verified fixed.
Updated•10 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•