Closed Bug 1813533 Opened 2 years ago Closed 2 years ago

Assertion failure: framePtr->hasCachedSavedFrame() || framePtr->isRematerializedFrame(), at js/src/vm/SavedStacks.cpp:1448

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED FIXED
111 Branch
Tracking Status
firefox111 --- fixed

People

(Reporter: lukas.bernhard, Assigned: iain)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 2 obsolete files)

Steps to reproduce:

The attached sample crashes the js-shell on commit fb1e6d6e5735dcf12d96fde70351aca305961b53 when invoked via commandline obj-x86_64-pc-linux-gnu/dist/bin/js --fuzzing-safe crash.js.

crash.js

async function f0() {
    await undefined;
    function f6(a7, a8) {
        const v10 = a8.__proto__;
        v10.principal = v10;
        v10.__proto__ = this.newGlobal(f6);
    }
    new Promise(f6);
    Reflect.toString.captureFirstSubsumedFrame(Int32Array.Debugger);
}
f0();

for (const v21 in this) {
    let v22 = this[v21];
    try {
        while (v22 < v21) {
            v22++;
        }   
        v22();
    }   
    catch(e25) { } 
}
::SavedStacks::insertFrames (this=0x7ffff7409570, cx=<optimized out>, frame=..., capture=...)
    at js/src/vm/SavedStacks.cpp:1447
#1  0x00005555573a5917 in js::SavedStacks::saveCurrentStack (this=0x7ffff7409570, cx=0x7ffff7434c00, frame=..., 
    capture=...) at js/src/vm/SavedStacks.cpp:1305
#2  0x000055555766c001 in JS::CaptureCurrentStack (cx=0x7ffff7434c00, stackp=..., capture=...)
    at js/src/jsapi.cpp:4656
#3  0x000055555758d1a7 in CaptureFirstSubsumedFrame (cx=0x7ffff7434c00, argc=<optimized out>, vp=<optimized out>)
    at js/src/builtin/TestingFunctions.cpp:2859
#4  0x0000555556f5c654 in CallJSNative (cx=0x7ffff7434c00, 
    native=0x55555758cde0 <CaptureFirstSubsumedFrame(JSContext*, unsigned int, JS::Value*)>, 
    reason=<optimized out>, args=...) at js/src/vm/Interpreter.cpp:459
#5  0x0000555556f5bace in js::InternalCallOrConstruct (cx=0x7ffff7434c00, args=..., construct=js::NO_CONSTRUCT, 
    reason=js::CallReason::Call) at js/src/vm/Interpreter.cpp:547
#6  0x0000555556f5d786 in InternalCall (cx=0x7ffff79f8a00 <_IO_stdfile_2_lock>, args=..., reason=1494189192)
    at js/src/vm/Interpreter.cpp:614
#7  0x0000555556f4e298 in js::CallFromStack (cx=0x7ffff79f8a00 <_IO_stdfile_2_lock>, args=..., 
    reason=<optimized out>) at js/src/vm/Interpreter.cpp:619
#8  Interpret (cx=0x7ffff79f8a00 <_IO_stdfile_2_lock>, state=...)
    at js/src/vm/Interpreter.cpp:3362
#9  0x0000555556f4078c in js::RunScript (cx=0x7ffff7434c00, state=...)
    at js/src/vm/Interpreter.cpp:431
#10 0x0000555556f5b995 in js::InternalCallOrConstruct (cx=0x7ffff7434c00, args=..., construct=js::NO_CONSTRUCT, 
    reason=<optimized out>) at js/src/vm/Interpreter.cpp:579
#11 0x0000555556f5d786 in InternalCall (cx=0x7ffff79f8a00 <_IO_stdfile_2_lock>, args=..., reason=1494189192)
Component: Untriaged → JavaScript Engine
Product: Firefox → Core

ni? myself to take a peek and get a preliminary diagnosis.

Severity: -- → S3
Flags: needinfo?(mgaudet)
Priority: -- → P3

Ok, so I did some digging. This test is really hard to reduce, as there's a big confluence of factors which need to go right for this to trigger. I also was mislead by my inability to reproduce initially -- even something so silly as having --enable-records-and-tuples in your MOZCONFIG can be sufficient to break reproduction.

This is a pernosco trace of the failure. We fail an assertion that for a given frame, if we saw a previous frame with a cached entry, it should in turn have a cached entry.

My reading of the trace is that we're dropping the cached frame flag when doing an OSR from the interpreter into baseline. I have a patch which adds this, which causes a variant of the above test to pass. My confidence in this is relatively low, but it seems correct to do.

The challenge is that I have no idea how to build a reliable test case out of this -- it's not clear to me at all how the current fuzz test case works.

Flags: needinfo?(mgaudet)
Assignee: nobody → mgaudet
Status: NEW → ASSIGNED
Attached file Bug 1813533 - Update out-of-date comment r?jandem (obsolete) (deleted) —

Depends on D168487

// https://bugzilla.mozilla.org/show_bug.cgi?id=1813533
async function f0() {
    await undefined;
    function f6(a7, a8) {
        const v10 = a8.__proto__;
        v10.principal = v10;
        v10.__proto__ = this.newGlobal(f6);
    }
    new Promise(f6);
    Reflect.toString.captureFirstSubsumedFrame(Int32Array.Debugger);
}
f0();

functions = [
    addPromiseReactions,
    createIsHTMLDDA,
    cacheEntry,
    enableSingleStepProfiling,
    disableSingleStepProfiling,
    enableGeckoProfiling,
    enableGeckoProfilingWithSlowAssertions,
    disableGeckoProfiling,
    enqueueJob,
    drainJobQueue,
]

for (f of functions) {
    try {
        print(f.name)
        f();
    } catch (e) { }
}

This was about as reduced as I could get the test case.

I think my patch is no good here; it bypasses the assertion but doesn't ensure the underlying invariant actually holds -- but I need some more time to be sure.

Some helpful pointers for following along my thought process

  • Bug 1451268, Comment 13 suggests that one of the reasons rematerialized ion frames don't have the bit set is that they don't have an associated entry in the cache. I suspect, but can't yet prove, this is also true of an OSR frame.
  • This comment highlights that the saved frame cache works on a variant pointer which supports both interpreter frames and CommonFrameLayouts, used by baseline. As a result, I am guessing that an OSR'd frame doesn't have an associated cache entry.

Given that re-materialized frames are already eluding the invariant intended by that assertion, and this seems like another case... perhaps the correct action here is to simply remove the assert.

I'm too fried at the moment to correctly answer the above challenge.

Based on your description, there's something weird going on here. (My gut tells me it has something to do with async.)

Suppose we have an interpreter frame with the bit set. That frame OSRs into baseline. To do so, it must be executing, which means it's on top of the stack, which means that there aren't any more frames above it. So at the time of OSR, even if we don't set the bit, the invariant is preserved: the bit isn't set here, but it should be set on all its callers, and there are no callee frames that could have the bit set.

But if that's true, then how does a frame on top of this get the bit set, without setting the bit here? That's not how any of this is supposed to work.

I'm going to poke at this a little.

Checking this out in rr, it looks like there are two relevant calls to SavedStacks::insertFrames. The first is triggered by this line in CreatePromiseObjectInternal, which captures the stack (with JS::AllFrames) in case we need to report an error. In that case, we stop walking the stack here.

The second is triggered by the call to the captureFirstSubsumedFrame testing function. That captures the stack with JS::FirstSubsumedFrame, so we don't stop walking the stack when we hit the end of the async activation.

Not entirely sure what the right behaviour is here, but if you wanted some threads to pull on, there you go.

Here's a more reduced version of this testcase:

async function f0() {
  await undefined;

  // Capture the whole stack
  new Promise(() => {});

  // Search the stack for a frame subsumed by a fresh global
  var g = newGlobal({principal: {}});
  captureFirstSubsumedFrame(g);
}
f0();

// Get into blinterp?
for (var i = 0; i < 20; i++) {}
// Resume the async function.
drainJobQueue();
Attachment #9315269 - Attachment is obsolete: true
Attachment #9315268 - Attachment is obsolete: true
Assignee: mgaudet → iireland

Matt and I took advantage of being in the same room to get to the bottom of this.

The root of the problem is that AllFrames and FirstSubsumedFrame are walking different stacks: AllFrames follows async parents, and FirstSubsumedFrame doesn't. When we create the promise (calling saveStack() also turns out to be sufficient), we stop at the async boundary and use the async parent. We therefore don't reach the frame for the top level script, where we called drainJobQueue. captureFirstSubsumedFrame ignores async parents, so it does reach the top level frame.

It doesn't make sense of us to enforce the expectation that the set of frames reached by walking one view of the stack matches the set of frames reached by a different set of rules, so the fix is just to relax the assertion in cases where we're capturing the first subsumed frame.

Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cef496caa88e Don't enforce invariant for FirstSubsumedFrame r=mgaudet
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 111 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: