Closed Bug 523793 Opened 15 years ago Closed 15 years ago

TM: Crash on trace with unexpected scope chain layout

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- final-fixed
status1.9.1 --- unaffected

People

(Reporter: gkw, Assigned: dvander)

References

Details

(Keywords: crash, regression, testcase, Whiteboard: [ccbr] fixed-in-tracemonkey)

Attachments

(3 files, 5 obsolete files)

Attached file stacktraces along with valgrind output (deleted) β€”
(function() {
    let(x)
    (function() {
        for (let a in [0, x, 0, 0])
        (function() {
            for (let y in [0, 0]) print
        })();
    })()
    with({}) throw x;
})()

crashes js opt shell with -j on TM tip (34057:bd1b27a9da16) at 0x002efe8a and crashes js debug shell with -j on TM tip at 0x005a1e8a.

Turning security-sensitive because of scary addresses.

autoBisect shows this is probably related to bug 505591:

The first bad revision is:
changeset:   32065:1f50f79db6e7
user:        David Mandelin
date:        Thu Aug 27 15:40:37 2009 -0700
summary:     Bug 505591: trace JSOP_NAME for returned closures, r=dvander

Nominating blocking-1.9.2 because bug 505591 has landed on 1.9.2.
Flags: blocking1.9.2?
Actually Mac Crash Reporter reports them crashing at null, but the top line of the stacks seem to be memory addresses though.
Whiteboard: [ccbr]
scopeChainProp() sees |obj == globalObj| (shape 194) and calls traverseScopeChain() with its parent. |parent| is a Block object with shape 170, and its parent is a heavyweight Call object with shape 175.

so we emit:
  1: load(argv[-1])    ;args1, |obj|
  2: load(1, <parent>) ;obj.fslots[parent]
  3: load(2, <parent>) ;obj.fslots[parent].fslots[parent]
  4: load(3, <map>)    ;obj.fslots[parent].fslots[parent].map
  5: load(4, <shape)   ;obj.fslots[parent].fslots[parent].map.shape
  6: guard(shape == 175)

the next three times the trace runs, this all runs smoothly. on the fourth run, |obj|'s parent is the global object. grabbing the parent (instruction #3) gets a NULL back, and instruction 4's deref crashes.
Flags: blocking1.9.2? → blocking1.9.2+
Assignee: general → dvander
for (j = 0; j < 3; j++) {}
m = [];
m.concat();
n = [];
n.concat([]);
Function("\
  for (i = 0; i < 8; i++)\
  ((function f1(b, c) {\
  if (c) {\
    return (gc)()\
  }\
  f1(b, 1);\
  ((function f2(d, e) {\
    return d.length == e ? 0 : d[e] + f2(d, e + 1)\
  })([{}, /x/, /x/], 0))\
  })())\
")()


crashes both js opt and debug shells with -j with similar stacks. This one concerns concat and gc too.
Summary: TM: Crash [@ 0x002efe8a] or [@ 0x005a1e8a] → TM: Crash on trace with unexpected scope chain layout
(In reply to comment #3)

I can't get this one to crash on 1.9.2 or trunk, Mac 10.6.
(In reply to comment #4)
> (In reply to comment #3)
> 
> I can't get this one to crash on 1.9.2 or trunk, Mac 10.6.

Still crashes for me on 10.5.8, js debug and opt shell with -j at changeset http://hg.mozilla.org/tracemonkey/rev/970cd5f87f42

===

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x000000000000001c
Crashed Thread:  0

Thread 0 Crashed:
0   ???                           	0x00513c4c 0 + 5323852
1   js-dbg-tm-darwin              	0x00147e4f __ZL11ExecuteTreeP9JSContextP12TreeFragmentRjPP10VMSideExit + 1585
2   js-dbg-tm-darwin              	0x00151d20 js_MonitorLoopEdge(JSContext*, unsigned int&, RecordReason) + 1040
3   js-dbg-tm-darwin              	0x00089a5b js_Interpret + 93369
4   js-dbg-tm-darwin              	0x0009c39d js_Execute + 1169
5   js-dbg-tm-darwin              	0x00011892 JS_ExecuteScript + 54
6   js-dbg-tm-darwin              	0x0000a8c7 __ZL7ProcessP9JSContextP8JSObjectPci + 1347
7   js-dbg-tm-darwin              	0x0000b290 __ZL11ProcessArgsP9JSContextP8JSObjectPPci + 2272
8   js-dbg-tm-darwin              	0x0000b65d main + 953
9   js-dbg-tm-darwin              	0x0000245b _start + 209
10  js-dbg-tm-darwin              	0x00002389 start + 41
Attached patch possible fix (obsolete) (deleted) β€” β€” Splinter Review
The purpose of traverseScopeChain() is to guard against properties being shadowed by call objects. The problem is that block objects are lazily reified, which means the layout of the scope chain can vary if the full thing is not required (i.e., null or flat closure). This is even more true on trace, since it is extremely difficult (and not very useful) to reify the scope chain on trace, null and flat closures simply get the global object instead.

What happens then is that the code we emitted expects block objects, and later crashes.

Brendan, Dave Mandelin, and Luke sat down with me to fish out everything going on here and come up with a solution. What matters is whether or not there is a call object with a heavyweight function.

If there's a heavyweight on the scope chain, we cannot yet accept block objects, and abort the trace. If there are no heavyweights on the scope chain, the final object must be the global object, and in fact only the global object that the trace is invariant to (since we don't cross global scopes). In this case the block objects don't matter, and we get a nice optimization: the global object can be burned into the trace directly.
Attachment #411806 - Flags: review?(brendan)
(In reply to comment #3)
> for (j = 0; j < 3; j++) {}
> m = [];
> m.concat();
> n = [];
> n.concat([]);
> Function("\
>   for (i = 0; i < 8; i++)\
>   ((function f1(b, c) {\
>   if (c) {\
>     return (gc)()\
>   }\
>   f1(b, 1);\
>   ((function f2(d, e) {\
>     return d.length == e ? 0 : d[e] + f2(d, e + 1)\
>   })([{}, /x/, /x/], 0))\
>   })())\
> ")()
> 
> 
> crashes both js opt and debug shells with -j with similar stacks. This one
> concerns concat and gc too.


Spun off as bug 528048.
Attachment #411806 - Flags: review?(dmandelin)
Comment on attachment 411806 [details] [diff] [review]
possible fix

Requesting dmandelin's r? too since I'm touching his code.
Comment on attachment 411806 [details] [diff] [review]
possible fix

As discussed, these lines should go:

+            } else if (cls == &js_WithClass) {
+                /* If there's a with object, there must be a call object. Early-exit. */
+                break;


r+ with that.
Attachment #411806 - Flags: review?(dmandelin) → review+
Attached patch w/ dmandelin's comment (obsolete) (deleted) β€” β€” Splinter Review
Attachment #411806 - Attachment is obsolete: true
Attachment #411850 - Flags: review?(brendan)
Attachment #411806 - Flags: review?(brendan)
Attached patch 1.9.2 patch (obsolete) (deleted) β€” β€” Splinter Review
Comment on attachment 411851 [details] [diff] [review]
1.9.2 patch

>+++ b/js/src/jstracer.cpp
>+     * Scope chains are often left "incomplete", and reified lazily when necessary, since doing so
>+     * is expensive. When creating null and flat closures on trace (the only kinds supported),
>+     * the global object is hardcoded as the parent, since reifying the scope chain on trace
>+     * would be extremely difficult and ultimately unnecessary.

Say more, specifically "difficult (because it could require reifying stack frames)"?

>+     *
>+     * The problem, as exposed by bug 523793, is that this means creating a fixed traversal on
>+     * trace can be inconsistent and incorrect.

I suggest "inconsistent with the shorter scope chain used when executing a trace."

>+    bool foundCallObj = false;
>+    bool foundBlockObj = false;
>+    JSObject* searchObj = obj;

Nit: obj and obj2 are kinda terse, suggest targetObj for obj2 -- or else use tmp here, we do that in similar loops where tmp ranges from obj to obj2 (e.g., js_FillPropertyCache if memory serves). The shorter names are canonical and if you get used to them they are ok. Your call.

>+
>+    for (;;) {
>+        if (searchObj != globalObj) {
>+            JSClass* cls = STOBJ_GET_CLASS(searchObj);
>+            if (cls == &js_BlockClass) {
>+                foundBlockObj = true;
>+            } else if (cls == &js_CallClass &&
>+                       JSFUN_HEAVYWEIGHT_TEST(js_GetCallObjectFunction(searchObj)->flags)) {
>+                foundCallObj = true;
>+            }
>+        }
>+
>+        if (searchObj == obj2)
>+            break;
>+
>+        searchObj = STOBJ_GET_PARENT(searchObj);
>+        if (!searchObj)
>+            ABORT_TRACE("target object not reached on scope chain");

This (!searchObj) condition "can't happen" and should be asserted -- we do not check for such bugs in other consumers of js_FindPropertyHelper's results.

>+    }
>+
>+    if (foundBlockObj && foundCallObj)
>+        ABORT_TRACE("cannot traverse this scope chain on trace");
>+
>+    if (!foundCallObj) {
>+        JS_ASSERT(obj2 == globalObj);
>+        obj2_ins = INS_CONSTPTR(globalObj);
>+        return JSRS_CONTINUE;
>+    }
>+

Transpose these two paragraphs and you don't need the && foundCallObj test for the if (foundBlockObj...) ABORT... para's condition.

r=me with these addressed.

/be
Attachment #411851 - Flags: review+
Comment on attachment 411850 [details] [diff] [review]
w/ dmandelin's comment

Oops, see my comments on the "1.9.2 patch" (attachment 411851 [details] [diff] [review]).

/be
Attachment #411850 - Flags: review?(brendan) → review+
Attached patch 1.9.2 patch with nits fixed (obsolete) (deleted) β€” β€” Splinter Review
carrying over r+ w/ nits picked. I chose "target" over "obj2".
Attachment #411851 - Attachment is obsolete: true
Attachment #411873 - Flags: review+
Attached patch trunk patch, w/ nits (obsolete) (deleted) β€” β€” Splinter Review
Attachment #411850 - Attachment is obsolete: true
Attachment #411874 - Flags: review+
http://hg.mozilla.org/tracemonkey/rev/193b41b37d17
Whiteboard: [ccbr] → [ccbr] fixed-in-tracemonkey
Backed out: http://hg.mozilla.org/tracemonkey/rev/15608b1f23b9

This causes a regression on bug 509639's test case. traverseScopeChain is getting passed the callee's parent, not the callee, and misses a link in the scope chain. Taking out the ABORT_TRACE exposed the problem.

Will investigate tomorrow.
Whiteboard: [ccbr] fixed-in-tracemonkey → [ccbr]
After talking to dmandelin on Friday, the reason the ABORT_TRACE was there is because the property doesn't necessarily have to be there - we traverse a different chain than js_FindProperty returns.

We call js_FindProperty with the scope chain, but on trace, we don't have scope chains. The next best thing is the callee. In the test case in bug 509639, the scope chain is a Call object with property "c", and the callee is a Function object which does not have property "c". So traverseScopeChain() should determine that we can't deal with this binding on trace, and abort.

But after talking to Brendan it wasn't totally clear whether it was safe to assume that if we find a property off the callee, it won't later be shadowed by something on the scope chain.
(In reply to comment #18)
> But after talking to Brendan it wasn't totally clear whether it was safe to
> assume that if we find a property off the callee, it won't later be shadowed by
> something on the scope chain.

AFAIK, inside a heavyweight function g nested in f, at the point of a JSOP_NAME or similar, the scope chain looks like this, where things in [] are optional.

   [ block/with ] -> Call_g -> [ block/with/declenv ] -> Call_f -> ...
                                                           ^
                                         callee_g ---------+

I believe we are "already" (i.e., in the patch posted here) aborting if the second optional group is present. And I think we abort if a with is present in the first optional group. 

The other thing to note is that a shadowing hazard (I think) can cause a problem only if it happens while executing |g| before the point of the NAME operation. It should be hard for bad stuff to happen on trace, so I think we only have to worry about what happens if the trace starts inside |g| and bad stuff happened in the interpreter before that.

The hazards seem to be:

1. Adding a property to |g|'s local scope using |eval|.

2. Creating a shadowing variable with a let block inside |g|. I think this should not be a concern because let blocks are static so name resolution would not be able to differ at recording and trace execution times.

I think (1) is handled OK mostly because this test case works identically with jit on or off:

c = [ '1', '1', 'var j = 72' ];

function f(i) {
  var j = 50;

  var g = function(i) {
    eval(c[i]);

    for (var k = 0; k < 5; ++k) {
      print(j);
    }
  }

  g(i);
}

f(0);
f(1);
f(2);


But also, I think the creation of the local forces a reshape of all the Call object scopes going up, and so we correctly exit on one of the shape guards. There was a test case jorendorff came up with this summer that exposed a related bug, which I think motivated the traverseScopeChain stuff, and it was fixed then.

Anyway, hopefully this either explains why it's all OK, or brings you up to the point where it's easier to see how it's all horribly wrong. :-)
http://hg.mozilla.org/mozilla-central/rev/193b41b37d17
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
oops, still open
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch final fix (deleted) β€” β€” Splinter Review
This is the above patch, with the original abort-if-target-object-not-on-callee-parent-chain added back in.

Blake, could you take a look at comment #19 and the patch for a hopefully final sanity check?
Attachment #411873 - Attachment is obsolete: true
Attachment #411874 - Attachment is obsolete: true
Attachment #415510 - Flags: review?(mrbkap)
Attachment #415510 - Flags: review?(mrbkap) → review+
http://hg.mozilla.org/tracemonkey/rev/21b06416fa71
Whiteboard: [ccbr] → [ccbr] fixed-in-tracemonkey
this is new with this checkin:

s: moz2-darwin9-slave20REFTEST TEST-UNEXPECTED-FAIL | file:///builds/slave/tracemonkey-macosx-unittest-everythingelse/build/jsreftest/tests/jsreftest.html?test=js1_5/Regress/regress-3649-n.js | timed out waiting for onload to fire
This test looks like it loops until it runs out of memory... that seems like something that could very easily time out. Looking at the build log the test case printed the right message too?

Am I missing something? I can try backing out and pushing again, or triggering a build to see if it's erroneous.
I don't think backing out is necessary, but the change in behavior is something to consider. It was already pretty flaky on windows and linux but well behaved on mac. I'll mark it skip altogether.
It is possible that the test failure is related to bug 533148.
Attached patch 1.9.2 fix (deleted) β€” β€” Splinter Review
http://hg.mozilla.org/mozilla-central/rev/21b06416fa71
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/04ccc3cdaa53
Depends on: CVE-2010-0165
Group: core-security
Automatically extracted testcase for this bug was committed:

https://hg.mozilla.org/mozilla-central/rev/efaf8960a929
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: