Closed Bug 459301 (tracerecursion) Opened 16 years ago Closed 15 years ago

TM: Trace recursive function calls.

Categories

(Core :: JavaScript Engine, defect, P2)

x86
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: gal, Assigned: dvander)

References

Details

(Keywords: perf, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 13 obsolete files)

(deleted), application/x-javascript
Details
(deleted), patch
gal
: review+
Details | Diff | Splinter Review
      No description provided.
Flags: wanted1.9.1?
Assignee: general → gal
Priority: -- → P2
Target Milestone: --- → mozilla1.9.1b2
Flags: wanted1.9.1? → wanted1.9.1+
Keywords: perf
Attached patch down recursion, not fully working yet (obsolete) (deleted) — Splinter Review
Attached patch slightly more working patch (obsolete) (deleted) — Splinter Review
Freshened patch, fixed stack adjustment.
Attachment #342644 - Attachment is obsolete: true
Attached file factorial test case (deleted) —
Sample factorial test case.  First run now works fine.  Second run produces wrong results.

Side note: It looks like we're compiling a bunch of branches as the stack unwinds.

Will investigate more soon.
The patch didn't mean to support up-recursion, only down-recursion so I am not surprised we die. We have to add a guard that checks that when coming up in RETURN we loop back to the entry point only if the same script is on top of RP. If RP is empty side-exit.
Attached patch slightly slightly more working patch (obsolete) (deleted) — Splinter Review
Test case now works.  The stack adjustment is now computed correctly and the rp stack needed to be incremented.

access-binary-trees runs and triggers a bunch of traces
controlflow-recursive dies :(
Attachment #351482 - Attachment is obsolete: true
Attached patch rebased (obsolete) (deleted) — Splinter Review
Patch works against tip again, didn't need much changing though the status of access-binary-trees regressed.  Will look into this more later this week.
Attachment #351650 - Attachment is obsolete: true
Attached patch rebase again (obsolete) (deleted) — Splinter Review
Attachment #365096 - Attachment is obsolete: true
Attached patch fixes controlflow-recursive (obsolete) (deleted) — Splinter Review
Make sure recursion doesn't go too deep.
Attachment #381846 - Attachment is obsolete: true
Flags: blocking1.9.2?
Target Milestone: mozilla1.9.1b2 → ---
Depends on: 500346
Update on this. I'm finally at the point where tracing recursion doesn't _horribly_ slow down everything.  On some simple recursive loops I can get a very small perf increase, on more complex ones like ackermann/access-binary-trees I still see a 2-5X slowdown.

There are two different problems:
 * Recursive control flow is very complicated here. What seems like a straight-forward loop can be a mess of branches that push/pop frames in random directions. For example, two recursive loops in different directions can want to mutually call each other.
 * Side exits are really expensive as more and more frames get pushed on the stack.  It takes at least two full bailouts of the down-recursive tree in order to link a basic case together. For deep recursion this is awful - you don't want to spend your time rebuilding 80+ nearly identical frames.

Getting the control flow down is a matter of seeing why traces aren't linking together. This is going pretty well - when I started, controlflow-recursive had hundreds of thousands of exits. It's down to ~8,500 now.

Dealing with the expensive bailouts is a whole separate issue. Probably what we will have to do is build specialized traces that can unroll the recursive stack frames back onto the interpreter, to avoid the overhead of repeated calls to FlushNativeStackFrame and js_SynthesizeFrame. This would help beyond recursion, since eventually we want to specialize hot entry and exits from all traces.
Assignee: gal → dvander
Status: NEW → ASSIGNED
Flags: blocking1.9.2? → blocking1.9.2+
we need an update here
(In reply to comment #10)

Sorry, I missed this comment. controlflow-recursive performs nicely now, down to 150 exits and it's 10ms faster. access-binary-trees is down to about 3,500 exits but the results are not correct. It messes up linking type-unstable down frames during up recursion. I'm tracking this down today.

access-binary-trees hits the deep-bailout problem very hard, in that there are a ton of interpreter frames lying around and we keep entering single-shot up-recursive loops. We spend most of our time in js_ExecuteTree but not in native code.

If after correctness is okay, this problem is still apparent, I plan on letting up-recursive loops "eat" interpreter frames. This shouldn't be too hard.
Found the regression in access-binary-trees. in TreeNode.itemCheck(), |this.item| is pulled out as a double when taking the branch that links down recursion to up recursion. Both the up and down recursive loops want integers, but I was linking them together by accident. 

Now this exposes another problem... the trees don't link. The oracle doesn't help here because the middle link is the _last_ thing that gets recorded. So we have to record a new up-recursive tree which is certainly wasteful.

I will post back once I get perf numbers... I may take a shot at bug 486213 if it looks really bad.
Multitree bugs are worked out now. access-binary-trees is about 1.7X slower with about 3,300 exits. This might not seem like an improvement but considering that this started out 30-80X slower, it's now approaching something reasonable.

The good news: the recursive portions are all linking together now, despite bug 486213, as seen with TreeVis:

http://users.alliedmods.net/~dvander/access-binary-trees-1.png

The left side is |TreeNode.itemCheck|, and the middle is |TreeNode.bottomUp| (which seems a little grotesque - may have to look into that). On the right there is an orphaned trace which is an artifact of the problem in comment #11.

On the other hand, we're still spending a huge amount of time in js_ExecuteTree, as seen in TraceVis:

http://users.alliedmods.net/~dvander/access-binary-trees-time.png

This is the other problem described in comment #11 which I will work on tomorrow.
Attached patch down+up recursion WIP v1 (obsolete) (deleted) — Splinter Review
I've heard that hoarding things in your patch queue is bad, so posting this. Some notes:

* Does not apply cleanly yet, see "Depends on" bugs.
* guardUpRecursive and LeaveFrame are nasty - I am going to merge these into one function after performance is at parity.
* If it seems like there are completely random changes - yes there are - they will be removed when I post something for review.
Attachment #381854 - Attachment is obsolete: true
(In reply to comment #13)
> On the other hand, we're still spending a huge amount of time in
> js_ExecuteTree, as seen in TraceVis:

There's a pending change I keep putting off for speeding up js_ExecuteTree that might help here. Bug 504694. We're still, last I checked, hitting malloc each time. It's only a couple one-liners to test, you might like to fiddle with it a little on your side just to see if it makes a difference. I'll try to land it shortly, but got a lot of other mess on the go.

 /* Max native stack size. */
-#define MAX_NATIVE_STACK_SLOTS 1024
+#define MAX_NATIVE_STACK_SLOTS 512
 
 /* Max call stack size. */
-#define MAX_CALL_STACK_ENTRIES 64
+#define MAX_CALL_STACK_ENTRIES 32

(or any smaller reduction that gets MAX_INTERP_STACK_BYTES under the arena-pool malloc threshold; or an increase to the threshold)
Graydon, thats the opposite direction we need to go for recursion. We need tons of tons of native stack. I will rewrite the stack allocation code to not hit that malloc (just hit it once, grow exponentially) if nobody beats me to it.
(In reply to comment #15)

Thanks for the suggestion - I just looked to see if the malloc() path was slowing me down. Turns out I don't have that cset, I am out of date. 

(In reply to comment #16)

I brought this up a few weeks ago - was planning on doing it for recursion but didn't get to it yet. I want to keep a buffer in the TraceMonitor and grow it on a special STACK_OOM_EXIT (or whenever there are too many [call]stack slots). If you think it's orthogonal enough I can file a separate bug and make it block this one.
A little progress. I've got a very rudimentary (and hacky) POC patch to read interpreter frames on trace. access-binary-trees is now down to 1,700 exits and is ~1.4X slower (from ~1.7x).

As hoped it is now spending much more time in native code rather than js_ExecuteTree:

http://users.alliedmods.net/~dvander/access-binary-trees-time2.png
An aside - in simpler benchmarks where there is little complexity (like fibonacci) we do great, especially if the benchmark runs for a while. I can create synthetic tests that are 8X faster than the interpreter. The problem is these crazy ones where control flow is irregular.
Good news is that I got access-binary-trees at parity - 15% faster than the interpreter in fact (60ms from 70ms). Bad news is that I disabled integer speculation to get that. When multitrees doesn't have to deal with speculation, everything connects really nicely. When it doesn't, it relives some of the early multitrees nightmares (such as duplicate trees and inner trees never compiling).

Getting this right will take more analysis, but it seems like 1) everything can and does connect together with the right magic, and 2) when it does, it is faster than the interpreter.
That sounds like a patch worth attaching to a bug! Good stuff!
Attached patch down+up recursion WIP v2 (obsolete) (deleted) — Splinter Review
patch queue dump
Attachment #391539 - Attachment is obsolete: true
comment #15 ended up being on target, but from js_ReserveObjects, not arena code (if i understood it correctly). lowering the MAX_CALL_STACK_ENTRIES down to a more reasonable 500 makes controlflow-recursive 30ms faster (from 60ms) and access-binary-trees 20ms faster (from 70ms). boy would it be nice to not have that!
Lowering to 500? It's currently 64. How do you figure?

I was just reporting the hotspot I saw, which is that

    JS_ARENA_ALLOCATE(reserve, &cx->stackPool, MAX_INTERP_STACK_BYTES);

hits malloc when MAX_INTERP_STACK_BYTES is larger than the stackChunkSize the JSContext was initialized with. That's 8192 in the shell by default, and in nsJSEnvironment, but occasionally something else in other contexts. Currently MAX_INTERP_STACK_BYTES adds up to 11376 bytes. So on my build, it hits the malloc path every time.
#23: yeah, I will work on that next. I already have a patch that fixes most of that problem, but it can run into a fatal OOM condition, which we are trying to decide whether that's acceptable or not.
(In reply to comment #24)
> Lowering to 500? It's currently 64. How do you figure?
it was like 100000 in my patch <_<
Attached patch down+up recursion WIP v3 (obsolete) (deleted) — Splinter Review
reworking this into something that might be able to land one day. removed hacky code that held state until LeaveFrame, refactored SlotMap again, fixed some multitrees bugs.

I also pulled most of the jstracer.cpp files into a separate file which gets #included. I don't know how kosher that is. I really want to avoid cluttering jstracer more but it's not easy to split it into separate compilation units.
Attachment #392620 - Attachment is obsolete: true
Attached patch down+up recursion WIP v3 for real (obsolete) (deleted) — Splinter Review
forgot to hg add
Attachment #393973 - Attachment is obsolete: true
IIRC waldo had some thoughts about splitting jstracer.cpp up. We might want to split into more than two parts. I don't think #include is called for unless we can prove some linker brain-damage on a tier-1 platform really costs us.

/be
#include is just out of laziness, easy way to readably maintain this patch. I agree it's not really good.

There are so many static functions in jstracer.cpp, and it seems silly to expose them all as js_*. I think we'd need to have a JSTracer namespace/class with static member functions. A lot of functions can be moved into the recorder, or as members of JSTraceMonitor.

I tried doing this but gave up midway since it was distracting. Maybe I should be filing a separate bug (unless one is already there).
Blocks: 481419
No longer blocks: 481419
Attached patch refreshed, x64 compat (obsolete) (deleted) — Splinter Review
What's left to do?
 - Verify that frame popping code is sane.
 - Get blacklisting to work, since it doesn't, string-tagcloud regresses.
 - Figure out if there is a good way to deal with the Oracle.
 - Clean up remaining questionable bits of code.

Andreas - looking for comments/questions about anything. This needs a second pair of eyes before an actual review-to-land.

A bit more on the oracle problem. access-binary-trees is around 10-15ms faster with the oracle disabled. With the oracle on, itemCheck()'s result is undemotable because it flows from a property lookup. The outer loops never compile and we get 1,700 exits. With the oracle off we get no aborts and a beautiful nesting of trees.

Luckily, with the oracle on, access-binary-trees is still slightly faster with this patch. But it could be significantly better.
Attachment #393975 - Attachment is obsolete: true
Attachment #401357 - Flags: review?(gal)
Comment on attachment 401357 [details] [diff] [review]
refreshed, x64 compat


>+JS_REQUIRES_STACK JSBool FASTCALL
>+js_PopInterpFrame(JSContext* cx, InterpState* state)
>+{
>+    JS_ASSERT(cx->fp && cx->fp->down);
>+    JSInlineFrame* ifp = (JSInlineFrame*)cx->fp;
>+    cx->fp = cx->fp->down;
>+    JS_ASSERT(cx->fp->regs == &ifp->callerRegs);
>+    /* :TODO: is this safe? */
>+    cx->fp->regs = ifp->frame.regs;

Yeah, thats the right thing to do.

>+    JS_ARENA_RELEASE(&cx->stackPool, ifp->mark);
>+    *state->inlineCallCountp = *state->inlineCallCountp - 1;

--

>+    return JS_TRUE;
>+}
>+JS_DEFINE_CALLINFO_2(extern, BOOL, js_PopInterpFrame, CONTEXT, INTERPSTATE, 0, 0)
>+

> 
>+#define RESTORE_INTERP_VARS()                                                 \
>+    JS_BEGIN_MACRO                                                            \
>+            fp = cx->fp;                                                      \
>+            script = fp->script;                                              \
>+            atoms = FrameAtomBase(cx, fp);                                    \
>+            currentVersion = (JSVersion) script->version;                     \
>+            JS_ASSERT(fp->regs == &regs);                                     \
>+            if (cx->throwing)                                                 \
>+                goto error;                                                   \
>+    JS_END_MACRO
>+        

Why the extra newline?

>+
> #define MONITOR_BRANCH()                                                      \
>     JS_BEGIN_MACRO                                                            \
>         if (TRACING_ENABLED(cx)) {                                            \
>@@ -2888,14 +2900,8 @@ js_Interpret(JSContext *cx)
>                 MONITOR_BRANCH_TRACEVIS;                                      \
>                 ENABLE_INTERRUPTS();                                          \
>             }                                                                 \
>-            fp = cx->fp;                                                      \
>-            script = fp->script;                                              \
>-            atoms = FrameAtomBase(cx, fp);                                    \
>-            currentVersion = (JSVersion) script->version;                     \
>-            JS_ASSERT(fp->regs == &regs);                                     \
>-            if (cx->throwing)                                                 \
>-                goto error;                                                   \
>         }                                                                     \
>+        RESTORE_INTERP_VARS();                                                \
>     JS_END_MACRO

Shouldn't RESTORE_INTERP_VARS() be inside the { } ?


> #define PROPAGATE_CALLNESS()                                                  \
>     JS_BEGIN_MACRO                                                            \
>-        if (ss->opcodes[ss->top - 1] == JSOP_CALL)                            \
>+        if (ss->opcodes[ss->top - 1] == JSOP_CALL ||                          \
>+            ss->opcodes[ss->top - 1] == JSOP_CALLNOTRACE)                     \
>             saveop = JSOP_CALL;                                               \
>     JS_END_MACRO

Inline function?
> 
>+                bool recursive = fp->script == fp->down->script;
>+

We might want to try deeper recursion too, so || fp->script == fp->down->down->script. And be careful with fp->down being not NULL.

>+#ifdef JS_TRACER
>+                    if (TRACE_RECORDER(cx)) {
>+                        TRACE_1(EnterFrame, inlineCallCount);
>+                        RESTORE_INTERP_VARS();
>+                    } else {
>+                        /* Check for possible recursion */
>+                        if (fp->script == fp->down->script)
>+                                MONITOR_BRANCH();

Why do we check here too? Can we unify this?



>+    /* :TODO: use cx->fp->down here...? */

?

>+    #if 0
>+    /* :HACK: detect overflows so we dont build infinite branches */
>+    if (anchor && anchor->exitType == RECURSIVE_SLURP_FAIL_EXIT &&
>+        anchor->slurpType == TT_INT32 && downMap[anchor->slurpSlot] == TT_INT32) {
>+        downMap[anchor->slurpSlot] = TT_DOUBLE;
>+    }
>+    #endif

I don't understand.


>+LIns*
>+TraceRecorder::slurpInt32Slot(LIns* val_ins, jsval* vp, VMSideExit* exit)
>+{
>+    guard(true,
>+          lir->ins2(LIR_or,
>+                    lir->ins2(LIR_peq,
>+                              lir->ins2(LIR_piand, val_ins, INS_CONSTWORD(JSVAL_TAGMASK)),
>+                              INS_CONSTWORD(JSVAL_DOUBLE)),
>+                    lir->ins2(LIR_peq,
>+                              lir->ins2(LIR_piand, val_ins, INS_CONSTWORD(1)),
>+                              INS_CONSTWORD(1))),

LIR_or doesn't generate great code. Maybe guards + jumps?

>+          exit);
>+    LIns* space = lir->insAlloc(sizeof(int32));
>+    LIns* args[] = { space, val_ins };
>+    LIns* result = lir->insCall(&js_TryUnboxInt32_ci, args);

And thats crazy slow. Plus from above we know its an int.

>+    guard(false, lir->ins_eq0(result), exit);
>+    LIns* int32_ins = lir->insLoad(LIR_ld, space, 0);
>+    return int32_ins;
>+}

>+    guard(true,
>+          lir->ins2(LIR_or,
>+                    lir->ins2(LIR_peq,
>+                              lir->ins2(LIR_piand, val_ins, INS_CONSTWORD(JSVAL_TAGMASK)),
>+                              INS_CONSTWORD(JSVAL_DOUBLE)),
>+                    lir->ins2(LIR_peq,
>+                              lir->ins2(LIR_piand, val_ins, INS_CONSTWORD(1)),
>+                              INS_CONSTWORD(1))),
>+          exit);

Dito here.

>diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp
>--- a/js/src/jstracer.cpp
>+++ b/js/src/jstracer.cpp
>@@ -135,10 +135,10 @@ static const char tagChar[]  = "OIDISIBI
> #define MAX_CALLDEPTH 10
> 
> /* Max native stack size. */
>-#define MAX_NATIVE_STACK_SLOTS 1024
>+#define MAX_NATIVE_STACK_SLOTS 4096
> 
> /* Max call stack size. */
>-#define MAX_CALL_STACK_ENTRIES 64
>+#define MAX_CALL_STACK_ENTRIES 500

We have to fix the reserved objects.
(In reply to comment #32)
> (From update of attachment 401357 [details] [diff] [review])
> 
> >+JS_REQUIRES_STACK JSBool FASTCALL
> >+js_PopInterpFrame(JSContext* cx, InterpState* state)
> >+{
> >+    JS_ASSERT(cx->fp && cx->fp->down);
> >+    JSInlineFrame* ifp = (JSInlineFrame*)cx->fp;
> >+    cx->fp = cx->fp->down;
> >+    JS_ASSERT(cx->fp->regs == &ifp->callerRegs);
> >+    /* :TODO: is this safe? */
> >+    cx->fp->regs = ifp->frame.regs;
> 
> Yeah, thats the right thing to do.

Right, the precious JSFrameRegs regs on the js_Interpret stack is what you want the top frame's cx->fp->regs always pointing at.

> >+#define RESTORE_INTERP_VARS()                                                 \
> >+    JS_BEGIN_MACRO                                                            \
> >+            fp = cx->fp;                                                      \
> >+            script = fp->script;                                              \
> >+            atoms = FrameAtomBase(cx, fp);                                    \
> >+            currentVersion = (JSVersion) script->version;                     \
> >+            JS_ASSERT(fp->regs == &regs);                                     \
> >+            if (cx->throwing)                                                 \
> >+                goto error;                                                   \
> >+    JS_END_MACRO
> >+        
> 
> Why the extra newline?

Body of the macro apart from JS_{BEGIN,END}_MACRO is over-indented four spaces.

/be
Flags: wanted1.9.1+
Flags: blocking1.9.2+
> Shouldn't RESTORE_INTERP_VARS() be inside the { } ?

Good catch, thanks.

> We might want to try deeper recursion too, so || fp->script ==
> fp->down->down->script. And be careful with fp->down being not NULL.

In this case one level of fp->down is always okay. For this patch I'm not going to try and support multiple levels - can be follow-up.

> Why do we check here too? Can we unify this?

Check what?

> >+    /* :TODO: use cx->fp->down here...? */
> 
> ?

Stale comment - thanks.

> 
> >+    #if 0
> >+    /* :HACK: detect overflows so we dont build infinite branches */
> >+    if (anchor && anchor->exitType == RECURSIVE_SLURP_FAIL_EXIT &&
> >+        anchor->slurpType == TT_INT32 && downMap[anchor->slurpSlot] == TT_INT32) {
> >+        downMap[anchor->slurpSlot] = TT_DOUBLE;
> >+    }
> >+    #endif
> 
> I don't understand.

It's #ifdef 0'd because it shouldn't ever happen. Will change to an assert and give a better explanation. Basically, we don't want to guard that a slot is an integer, fail because it's a double, then build another integer read on the same slot.


> LIR_or doesn't generate great code. Maybe guards + jumps?

Sounds good.

> And thats crazy slow. Plus from above we know its an int.

I can put this in the slow path - it could be a double that fits in an int, for example. I don't think it's "crazy slow" :) but certainly not ideal.

> We have to fix the reserved objects.

Yeah - is there an existing bug on this or should I file?
(In reply to comment #34)
> > And thats crazy slow. Plus from above we know its an int.

On second thought, we don't know that it's an int, and it seems hard to do a jump here because we don't have phi nodes. The new code would look something like:

 ialloc 4
 if val & JSVAL_INT:
  store val >> 1 into ialloc
 else
  guard val & JSVAL_DOUBLE
  call TryUnboxInt32(val, ialloc)
 result = *ialloc

It's not much different but it has a "fast" path. Is it worth it, or is there a better way of expressing this? (Also: is LIR_or being slow something we can fix?)
Attached patch v5 (obsolete) (deleted) — Splinter Review
Changes in this version:
 - Rebased against new (unpushed) FrameInfo patch.
 - Lots more cleanup.
 - JSOP_CALLNOTRACE removed.
 - JSOP_TRACE is emitted after every JSOP_CALL, this becomes a hint for both down and up recursion.
 - Recursive trees are trashed and blacklisted if they do not link quickly.
 - Assert-botching and massive perf regressions in string-tagcloud and v8 tests fixed.
 - Various cases of recursion we shouldn't trace in this patch are properly detected and aborted. This was mostly mutual recursion but I have a plan for that in a follow-up patch.
 - Made js_PopInterpFrame more rigorous after talking to Brendan. mrkbap reviewed it after.

I have not attempted to deal with the Oracle yet. That can be a follow-up as well, since we already gain performance - just not at full potential.

There is one minor regression still, v8/run-early-boyer.js gets ~15 "points" lower with this patch. It's 1AM here so I'll track this down tomorrow. It seems like we're running a lot of short traces somewhere.

Other than that, which I expect to be a simple heuristic somewhere, this patch is ready for a real review.
Attachment #401357 - Attachment is obsolete: true
Attachment #403438 - Flags: review?(gal)
Attachment #401357 - Flags: review?(gal)
Follow-up thought: It is probably unnecessary to block tree calls if |cx->fp->script == treeInfo->script| as long as the |pc| is different, since the up-recursive guard will fail and the frame won't be popped. So that check might be too conservative, but it won't hurt anything.
Attached patch v6 (deleted) — Splinter Review
The previous iterations of this patch subtly changed lots of start and stop recording behavior, which regressed a few tests. In this version:

 - A recorder knows whether it came from EnterFrame, LeaveFrame, or a Branch, and can use this to decide whether to keep recording or whether to allow a non-looping trace.
 - A tree knows whether it only has an up-recursive side, which is usually much smaller than the down recursive side. In this case, we don't want to bother even executing the tree.
 - The heuristics on how to deal with recursion now let old cases slide through as long as they make sense (see comment in EnterFrame). When there's recursion that will never compile, the entry of the script is immediately blacklisted.

SunSpider results are promising. "Forever" paste: http://pastebin.mozilla.org/673636

On my Windows box, this is a ~45ms net perf win, which is 5-6% of the running time. There's another 8-10ms to win on access-binary-trees if we figure out what to do with the Oracle.

The 6ms regression in 3d-cube is real. If you look at SunSpider 3d-cube.js (which is different from the one in our tree), it has a benchmark loop that actually calls itself recursively. In the case that we have a deeply recursive function that only runs once, reconstructing stack frames is still slightly problematic. In this specific case, tail call analysis would make that go away.

Otherwise this is part of a larger problem in TraceMonkey where the JIT and Interpreter frames diverge too much, and it's expensive to cross in between. For example, disabling the JIT is a large perf win for v8/run-early-boyer. This is because it has a very tiny loop that gets used throughout the program, and it's just not worth running it all by itself. While we should work on tracing the more complex cases of recursion in that benchmark, it's probably also important to take a look at either 1) heuristics to throw away useless traces, or 2) speeding up ExecuteTree/LeaveTree (Andreas and I have some ideas on this, I'll file a bug tomorrow).
Attachment #403438 - Attachment is obsolete: true
Attachment #403752 - Flags: review?(gal)
Attachment #403438 - Flags: review?(gal)
Addendum: Luke Wagner had a neat idea - delay execution of the down-recursive trace. We could magically have the interpreter know which branches would trigger a guard failure, and it could automatically start a new recorder. The tracer would not actually execute the tree until it knew the whole thing was nicely linked.

This seems worth thinking about, though it's more distant than optimizing transitions between interp/JIT, and I don't think it would necessarily supplant the interp-frame slurping code which is necessary as soon as any guard fails.
Comment on attachment 403752 [details] [diff] [review]
v6


>+#if defined DEBUG
>+static JS_REQUIRES_STACK void
>+AssertDownFrameIsConsistent(JSContext* cx, VMSideExit* anchor, FrameInfo* fi)
>+{
>+    unsigned downPostSlots = fi->callerHeight;
>+    JSTraceType* typeMap = fi->get_typemap();
>+    JS_ASSERT(anchor->recursive_down);
>+    JS_ASSERT(anchor->recursive_down->callerHeight == downPostSlots);

Move these to the top and add some space after it.

>+    js_CaptureStackTypes(cx, 1, typeMap);
>+    const JSTraceType* m1 = anchor->recursive_down->get_typemap();

Here to. Let the code breathe a little.

>+    for (unsigned i = 0; i < downPostSlots; i++) {
>+        if (m1[i] == typeMap[i])
>+            continue;
>+        if (typeMap[i] == TT_INT32 && m1[i] == TT_DOUBLE)
>+            continue;
>+        JS_NOT_REACHED("invalid RECURSIVE_MISMATCH exit");
>+    }
>+    JS_ASSERT(memcmp(anchor->recursive_down, fi, sizeof(FrameInfo)) == 0);
>+}
>+#endif

> /* Max native stack size. */
>-#define MAX_NATIVE_STACK_SLOTS 1024
>+#define MAX_NATIVE_STACK_SLOTS 4096
> 
> /* Max call stack size. */
>-#define MAX_CALL_STACK_ENTRIES 64
>+#define MAX_CALL_STACK_ENTRIES 500

This is going to cause a lot of objects to be reserved but I am about to remove the reserve list so lets just ignore this issue.
Attachment #403752 - Flags: review?(gal) → review+
Pushed as cset http://hg.mozilla.org/tracemonkey/rev/910f0c1ca2e5

Watching tree to see if this sticks.
Depends on: 520003
Depends on: 519999
Attached patch corrected arena code (obsolete) (deleted) — Splinter Review
Not sure if this needed a new bug. The arena code in the original patch is wrong, since it is unbalanced - disobeys the LIFO requirement. This patch corrects that by stashing the frames in a linked list and freeing them in LeaveTree.
Attachment #404170 - Flags: review?(brendan)
New bug if the patch for this bug landed (in which case, fixed-in-tracemonkey status whiteboard please).

/be
Comment on attachment 404170 [details] [diff] [review]
corrected arena code

>diff --git a/js/src/jsbuiltins.cpp b/js/src/jsbuiltins.cpp
>--- a/js/src/jsbuiltins.cpp
>+++ b/js/src/jsbuiltins.cpp
>@@ -452,7 +452,14 @@ js_PopInterpFrame(JSContext* cx, InterpS
>     cx->fp = cx->fp->down;
>     JS_ASSERT(cx->fp->regs == &ifp->callerRegs);
>     cx->fp->regs = ifp->frame.regs;
>-    JS_ARENA_RELEASE(&cx->stackPool, ifp->mark);
>+
>+    /* Don't release |ifp->mark| yet, since ExecuteTree uses |cx->stackPool|. */
>+    if (!state->poppedFrameHead)
>+        state->poppedFrameHead = ifp;
>+    else
>+        state->poppedFrameTail->frame.down = &ifp->frame;
>+    state->poppedFrameTail = ifp;
>+    ifp->frame.down = NULL;

Here's an old trick for singly-linked lists you only append to:

    JSStackFrame *poppedFrameHead;
    JSStackFrame **poppedFrameTail;
  . . .
    state->poppedFrameHead = NULL;
    state->poppedFrameTail = &state->poppedFrameHead;

then the above-cited code simplifies to:

    *state->poppedFrameTail = ifp;
    state->poppedFrameTail = &ifp->frame.down;
    ifp->frame.down = NULL;

>@@ -6609,6 +6611,16 @@ LeaveTree(InterpState& state, VMSideExit
>                  innermost->calldepth == 0 && callstack == rp);
> 
>     JS_ARENA_RELEASE(&cx->stackPool, state.stackMark);
>+
>+    /* Release all frames that were popped by js_PopInterpFrame. */
>+    JSInlineFrame* ifp = state.poppedFrameHead;
>+    while (ifp) {
>+        JSInlineFrame* next = (JSInlineFrame*)ifp->frame.down;
>+        JS_ASSERT_IF(!next, ifp == state.poppedFrameTail);
>+        JS_ARENA_RELEASE(&cx->stackPool, ifp->mark);
>+        ifp = next;
>+    }

Now that this is LIFO it is clear you don't need to release to each mark, only to the oldest. So even simpler code would just maintain the oldest popped frame's mark.

/be
Comment on attachment 404170 [details] [diff] [review]
corrected arena code

Spun out to bug 520210.

/be
Attachment #404170 - Attachment is obsolete: true
Attachment #404170 - Flags: review?(brendan)
Whiteboard: fixed-in-tracemonkey
Depends on: 520240
Blocks: 520321
No longer blocks: 520321
Depends on: 520321
Depends on: 520492
Depends on: 520498
Alias: tracerecursion
Depends on: 520591
Depends on: 520536
http://hg.mozilla.org/mozilla-central/rev/910f0c1ca2e5
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 521447
Depends on: 521465
No longer depends on: 521169
Depends on: 522311
No longer depends on: 528048
No longer depends on: 531629
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: