Closed Bug 459301 (tracerecursion) Opened 16 years ago Closed 15 years ago

TM: Trace recursive function calls.

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: gal, Assigned: dvander)

References

Details

(Keywords: perf, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 13 obsolete files)

down recursion, not fully working yet 16 years ago Andreas Gal :gal (deleted), patch		Details \| Diff \| Splinter Review
slightly more working patch 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
factorial test case 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), application/x-javascript		Details
slightly slightly more working patch 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
rebased 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
rebase again 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
fixes controlflow-recursive 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
down+up recursion WIP v1 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
down+up recursion WIP v2 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
down+up recursion WIP v3 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
down+up recursion WIP v3 for real 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
refreshed, x64 compat 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
v5 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review
v6 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch	gal : review+	Details \| Diff \| Splinter Review
corrected arena code 15 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), patch		Details \| Diff \| Splinter Review

Andreas Gal :gal

Reporter

Description

•

16 years ago

      No description provided.

Flags: wanted1.9.1?

Andreas Gal :gal

Reporter

Updated

•

16 years ago

Assignee: general → gal

Priority: -- → P2

Target Milestone: --- → mozilla1.9.1b2

Robert Sayre

Updated

•

16 years ago

Flags: wanted1.9.1? → wanted1.9.1+

Jesse Ruderman

Updated

•

16 years ago

Keywords: perf

Andreas Gal :gal

Reporter

Comment 1

•

16 years ago

Attached patch down recursion, not fully working yet (obsolete) (deleted) — Details — Splinter Review

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 2

•

16 years ago

Attached patch slightly more working patch (obsolete) (deleted) — Details — Splinter Review

Freshened patch, fixed stack adjustment.

Attachment #342644 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 3

•

16 years ago

Attached file factorial test case (deleted) — Details

Sample factorial test case.  First run now works fine.  Second run produces wrong results.

Side note: It looks like we're compiling a bunch of branches as the stack unwinds.

Will investigate more soon.

Andreas Gal :gal

Reporter

Comment 4

•

16 years ago

The patch didn't mean to support up-recursion, only down-recursion so I am not surprised we die. We have to add a guard that checks that when coming up in RETURN we loop back to the entry point only if the same script is on top of RP. If RP is empty side-exit.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 5

•

16 years ago

Attached patch slightly slightly more working patch (obsolete) (deleted) — Details — Splinter Review

Test case now works.  The stack adjustment is now computed correctly and the rp stack needed to be incremented.

access-binary-trees runs and triggers a bunch of traces
controlflow-recursive dies :(

Attachment #351482 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 6

•

16 years ago

Attached patch rebased (obsolete) (deleted) — Details — Splinter Review

Patch works against tip again, didn't need much changing though the status of access-binary-trees regressed.  Will look into this more later this week.

Attachment #351650 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 7

•

15 years ago

Attached patch rebase again (obsolete) (deleted) — Details — Splinter Review

Attachment #365096 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 8

•

15 years ago

Attached patch fixes controlflow-recursive (obsolete) (deleted) — Details — Splinter Review

Make sure recursion doesn't go too deep.

Attachment #381846 - Attachment is obsolete: true

Nochum Sossonko [:Natch]

Updated

•

15 years ago

Flags: blocking1.9.2?

Target Milestone: mozilla1.9.1b2 → ---

Andreas Gal :gal

Reporter

Updated

•

15 years ago

Depends on: 500346

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Depends on: 501398

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 9

•

15 years ago

Update on this. I'm finally at the point where tracing recursion doesn't _horribly_ slow down everything.  On some simple recursive loops I can get a very small perf increase, on more complex ones like ackermann/access-binary-trees I still see a 2-5X slowdown.

There are two different problems:
 * Recursive control flow is very complicated here. What seems like a straight-forward loop can be a mess of branches that push/pop frames in random directions. For example, two recursive loops in different directions can want to mutually call each other.
 * Side exits are really expensive as more and more frames get pushed on the stack.  It takes at least two full bailouts of the down-recursive tree in order to link a basic case together. For deep recursion this is awful - you don't want to spend your time rebuilding 80+ nearly identical frames.

Getting the control flow down is a matter of seeing why traces aren't linking together. This is going pretty well - when I started, controlflow-recursive had hundreds of thousands of exits. It's down to ~8,500 now.

Dealing with the expensive bailouts is a whole separate issue. Probably what we will have to do is build specialized traces that can unroll the recursive stack frames back onto the interpreter, to avoid the overhead of repeated calls to FlushNativeStackFrame and js_SynthesizeFrame. This would help beyond recursion, since eventually we want to specialize hot entry and exits from all traces.

Assignee: gal → dvander

Status: NEW → ASSIGNED

Robert Sayre

Updated

•

15 years ago

Flags: blocking1.9.2? → blocking1.9.2+

Robert Sayre

Comment 10

•

15 years ago

we need an update here

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 11

•

15 years ago

(In reply to comment #10)

Sorry, I missed this comment. controlflow-recursive performs nicely now, down to 150 exits and it's 10ms faster. access-binary-trees is down to about 3,500 exits but the results are not correct. It messes up linking type-unstable down frames during up recursion. I'm tracking this down today.

access-binary-trees hits the deep-bailout problem very hard, in that there are a ton of interpreter frames lying around and we keep entering single-shot up-recursive loops. We spend most of our time in js_ExecuteTree but not in native code.

If after correctness is okay, this problem is still apparent, I plan on letting up-recursive loops "eat" interpreter frames. This shouldn't be too hard.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 12

•

15 years ago

Found the regression in access-binary-trees. in TreeNode.itemCheck(), |this.item| is pulled out as a double when taking the branch that links down recursion to up recursion. Both the up and down recursive loops want integers, but I was linking them together by accident. 

Now this exposes another problem... the trees don't link. The oracle doesn't help here because the middle link is the _last_ thing that gets recorded. So we have to record a new up-recursive tree which is certainly wasteful.

I will post back once I get perf numbers... I may take a shot at bug 486213 if it looks really bad.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 13

•

15 years ago

Multitree bugs are worked out now. access-binary-trees is about 1.7X slower with about 3,300 exits. This might not seem like an improvement but considering that this started out 30-80X slower, it's now approaching something reasonable.

The good news: the recursive portions are all linking together now, despite bug 486213, as seen with TreeVis:

http://users.alliedmods.net/~dvander/access-binary-trees-1.png

The left side is |TreeNode.itemCheck|, and the middle is |TreeNode.bottomUp| (which seems a little grotesque - may have to look into that). On the right there is an orphaned trace which is an artifact of the problem in comment #11.

On the other hand, we're still spending a huge amount of time in js_ExecuteTree, as seen in TraceVis:

http://users.alliedmods.net/~dvander/access-binary-trees-time.png

This is the other problem described in comment #11 which I will work on tomorrow.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 14

•

15 years ago

Attached patch down+up recursion WIP v1 (obsolete) (deleted) — Details — Splinter Review

I've heard that hoarding things in your patch queue is bad, so posting this. Some notes:

* Does not apply cleanly yet, see "Depends on" bugs.
* guardUpRecursive and LeaveFrame are nasty - I am going to merge these into one function after performance is at parity.
* If it seems like there are completely random changes - yes there are - they will be removed when I post something for review.

Attachment #381854 - Attachment is obsolete: true

Graydon Hoare :graydon

Comment 15

•

15 years ago

(In reply to comment #13)
> On the other hand, we're still spending a huge amount of time in
> js_ExecuteTree, as seen in TraceVis:

There's a pending change I keep putting off for speeding up js_ExecuteTree that might help here. Bug 504694. We're still, last I checked, hitting malloc each time. It's only a couple one-liners to test, you might like to fiddle with it a little on your side just to see if it makes a difference. I'll try to land it shortly, but got a lot of other mess on the go.

 /* Max native stack size. */
-#define MAX_NATIVE_STACK_SLOTS 1024
+#define MAX_NATIVE_STACK_SLOTS 512
 
 /* Max call stack size. */
-#define MAX_CALL_STACK_ENTRIES 64
+#define MAX_CALL_STACK_ENTRIES 32

(or any smaller reduction that gets MAX_INTERP_STACK_BYTES under the arena-pool malloc threshold; or an increase to the threshold)

Andreas Gal :gal

Reporter

Comment 16

•

15 years ago

Graydon, thats the opposite direction we need to go for recursion. We need tons of tons of native stack. I will rewrite the stack allocation code to not hit that malloc (just hit it once, grow exponentially) if nobody beats me to it.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 17

•

15 years ago

(In reply to comment #15)

Thanks for the suggestion - I just looked to see if the malloc() path was slowing me down. Turns out I don't have that cset, I am out of date. 

(In reply to comment #16)

I brought this up a few weeks ago - was planning on doing it for recursion but didn't get to it yet. I want to keep a buffer in the TraceMonitor and grow it on a special STACK_OOM_EXIT (or whenever there are too many [call]stack slots). If you think it's orthogonal enough I can file a separate bug and make it block this one.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 18

•

15 years ago

A little progress. I've got a very rudimentary (and hacky) POC patch to read interpreter frames on trace. access-binary-trees is now down to 1,700 exits and is ~1.4X slower (from ~1.7x).

As hoped it is now spending much more time in native code rather than js_ExecuteTree:

http://users.alliedmods.net/~dvander/access-binary-trees-time2.png

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 19

•

15 years ago

An aside - in simpler benchmarks where there is little complexity (like fibonacci) we do great, especially if the benchmark runs for a while. I can create synthetic tests that are 8X faster than the interpreter. The problem is these crazy ones where control flow is irregular.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 20

•

15 years ago

Good news is that I got access-binary-trees at parity - 15% faster than the interpreter in fact (60ms from 70ms). Bad news is that I disabled integer speculation to get that. When multitrees doesn't have to deal with speculation, everything connects really nicely. When it doesn't, it relives some of the early multitrees nightmares (such as duplicate trees and inner trees never compiling).

Getting this right will take more analysis, but it seems like 1) everything can and does connect together with the right magic, and 2) when it does, it is faster than the interpreter.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 21

•

15 years ago

That sounds like a patch worth attaching to a bug! Good stuff!

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 22

•

15 years ago

Attached patch down+up recursion WIP v2 (obsolete) (deleted) — Details — Splinter Review

patch queue dump

Attachment #391539 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 23

•

15 years ago

comment #15 ended up being on target, but from js_ReserveObjects, not arena code (if i understood it correctly). lowering the MAX_CALL_STACK_ENTRIES down to a more reasonable 500 makes controlflow-recursive 30ms faster (from 60ms) and access-binary-trees 20ms faster (from 70ms). boy would it be nice to not have that!

Graydon Hoare :graydon

Comment 24

•

15 years ago

Lowering to 500? It's currently 64. How do you figure?

I was just reporting the hotspot I saw, which is that

    JS_ARENA_ALLOCATE(reserve, &cx->stackPool, MAX_INTERP_STACK_BYTES);

hits malloc when MAX_INTERP_STACK_BYTES is larger than the stackChunkSize the JSContext was initialized with. That's 8192 in the shell by default, and in nsJSEnvironment, but occasionally something else in other contexts. Currently MAX_INTERP_STACK_BYTES adds up to 11376 bytes. So on my build, it hits the malloc path every time.

Andreas Gal :gal

Reporter

Comment 25

•

15 years ago

#23: yeah, I will work on that next. I already have a patch that fixes most of that problem, but it can run into a fatal OOM condition, which we are trying to decide whether that's acceptable or not.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 26

•

15 years ago

(In reply to comment #24)
> Lowering to 500? It's currently 64. How do you figure?
it was like 100000 in my patch <_<

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Depends on: 509089

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Depends on: 509093

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 27

•

15 years ago

Attached patch down+up recursion WIP v3 (obsolete) (deleted) — Details — Splinter Review

reworking this into something that might be able to land one day. removed hacky code that held state until LeaveFrame, refactored SlotMap again, fixed some multitrees bugs.

I also pulled most of the jstracer.cpp files into a separate file which gets #included. I don't know how kosher that is. I really want to avoid cluttering jstracer more but it's not easy to split it into separate compilation units.

Attachment #392620 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 28

•

15 years ago

Attached patch down+up recursion WIP v3 for real (obsolete) (deleted) — Details — Splinter Review

forgot to hg add

Attachment #393973 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 29

•

15 years ago

IIRC waldo had some thoughts about splitting jstracer.cpp up. We might want to split into more than two parts. I don't think #include is called for unless we can prove some linker brain-damage on a tier-1 platform really costs us.

/be

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 30

•

15 years ago

#include is just out of laziness, easy way to readably maintain this patch. I agree it's not really good.

There are so many static functions in jstracer.cpp, and it seems silly to expose them all as js_*. I think we'd need to have a JSTracer namespace/class with static member functions. A lot of functions can be moved into the recorder, or as members of JSTraceMonitor.

I tried doing this but gave up midway since it was distracting. Maybe I should be filing a separate bug (unless one is already there).

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Depends on: 510300

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Depends on: 511307

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Depends on: 515806

Brian Crowder

Updated

•

15 years ago

Blocks: 481419

Brian Crowder

Updated

•

15 years ago

No longer blocks: 481419

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 31

•

15 years ago

Attached patch refreshed, x64 compat (obsolete) (deleted) — Details — Splinter Review

What's left to do?
 - Verify that frame popping code is sane.
 - Get blacklisting to work, since it doesn't, string-tagcloud regresses.
 - Figure out if there is a good way to deal with the Oracle.
 - Clean up remaining questionable bits of code.

Andreas - looking for comments/questions about anything. This needs a second pair of eyes before an actual review-to-land.

A bit more on the oracle problem. access-binary-trees is around 10-15ms faster with the oracle disabled. With the oracle on, itemCheck()'s result is undemotable because it flows from a property lookup. The outer loops never compile and we get 1,700 exits. With the oracle off we get no aborts and a beautiful nesting of trees.

Luckily, with the oracle on, access-binary-trees is still slightly faster with this patch. But it could be significantly better.

Attachment #393975 - Attachment is obsolete: true

Attachment #401357 - Flags: review?(gal)

Andreas Gal :gal

Reporter

Comment 32

•

15 years ago

Comment on attachment 401357 [details] [diff] [review]
refreshed, x64 compat


>+JS_REQUIRES_STACK JSBool FASTCALL
>+js_PopInterpFrame(JSContext* cx, InterpState* state)
>+{
>+    JS_ASSERT(cx->fp && cx->fp->down);
>+    JSInlineFrame* ifp = (JSInlineFrame*)cx->fp;
>+    cx->fp = cx->fp->down;
>+    JS_ASSERT(cx->fp->regs == &ifp->callerRegs);
>+    /* :TODO: is this safe? */
>+    cx->fp->regs = ifp->frame.regs;

Yeah, thats the right thing to do.

>+    JS_ARENA_RELEASE(&cx->stackPool, ifp->mark);
>+    *state->inlineCallCountp = *state->inlineCallCountp - 1;

--

>+    return JS_TRUE;
>+}
>+JS_DEFINE_CALLINFO_2(extern, BOOL, js_PopInterpFrame, CONTEXT, INTERPSTATE, 0, 0)
>+

> 
>+#define RESTORE_INTERP_VARS()                                                 \
>+    JS_BEGIN_MACRO                                                            \
>+            fp = cx->fp;                                                      \
>+            script = fp->script;                                              \
>+            atoms = FrameAtomBase(cx, fp);                                    \
>+            currentVersion = (JSVersion) script->version;                     \
>+            JS_ASSERT(fp->regs == &regs);                                     \
>+            if (cx->throwing)                                                 \
>+                goto error;                                                   \
>+    JS_END_MACRO
>+        

Why the extra newline?

>+
> #define MONITOR_BRANCH()                                                      \
>     JS_BEGIN_MACRO                                                            \
>         if (TRACING_ENABLED(cx)) {                                            \
>@@ -2888,14 +2900,8 @@ js_Interpret(JSContext *cx)
>                 MONITOR_BRANCH_TRACEVIS;                                      \
>                 ENABLE_INTERRUPTS();                                          \
>             }                                                                 \
>-            fp = cx->fp;                                                      \
>-            script = fp->script;                                              \
>-            atoms = FrameAtomBase(cx, fp);                                    \
>-            currentVersion = (JSVersion) script->version;                     \
>-            JS_ASSERT(fp->regs == &regs);                                     \
>-            if (cx->throwing)                                                 \
>-                goto error;                                                   \
>         }                                                                     \
>+        RESTORE_INTERP_VARS();                                                \
>     JS_END_MACRO

Shouldn't RESTORE_INTERP_VARS() be inside the { } ?


> #define PROPAGATE_CALLNESS()                                                  \
>     JS_BEGIN_MACRO                                                            \
>-        if (ss->opcodes[ss->top - 1] == JSOP_CALL)                            \
>+        if (ss->opcodes[ss->top - 1] == JSOP_CALL ||                          \
>+            ss->opcodes[ss->top - 1] == JSOP_CALLNOTRACE)                     \
>             saveop = JSOP_CALL;                                               \
>     JS_END_MACRO

Inline function?
> 
>+                bool recursive = fp->script == fp->down->script;
>+

We might want to try deeper recursion too, so || fp->script == fp->down->down->script. And be careful with fp->down being not NULL.

>+#ifdef JS_TRACER
>+                    if (TRACE_RECORDER(cx)) {
>+                        TRACE_1(EnterFrame, inlineCallCount);
>+                        RESTORE_INTERP_VARS();
>+                    } else {
>+                        /* Check for possible recursion */
>+                        if (fp->script == fp->down->script)
>+                                MONITOR_BRANCH();

Why do we check here too? Can we unify this?



>+    /* :TODO: use cx->fp->down here...? */

?

>+    #if 0
>+    /* :HACK: detect overflows so we dont build infinite branches */
>+    if (anchor && anchor->exitType == RECURSIVE_SLURP_FAIL_EXIT &&
>+        anchor->slurpType == TT_INT32 && downMap[anchor->slurpSlot] == TT_INT32) {
>+        downMap[anchor->slurpSlot] = TT_DOUBLE;
>+    }
>+    #endif

I don't understand.


>+LIns*
>+TraceRecorder::slurpInt32Slot(LIns* val_ins, jsval* vp, VMSideExit* exit)
>+{
>+    guard(true,
>+          lir->ins2(LIR_or,
>+                    lir->ins2(LIR_peq,
>+                              lir->ins2(LIR_piand, val_ins, INS_CONSTWORD(JSVAL_TAGMASK)),
>+                              INS_CONSTWORD(JSVAL_DOUBLE)),
>+                    lir->ins2(LIR_peq,
>+                              lir->ins2(LIR_piand, val_ins, INS_CONSTWORD(1)),
>+                              INS_CONSTWORD(1))),

LIR_or doesn't generate great code. Maybe guards + jumps?

>+          exit);
>+    LIns* space = lir->insAlloc(sizeof(int32));
>+    LIns* args[] = { space, val_ins };
>+    LIns* result = lir->insCall(&js_TryUnboxInt32_ci, args);

And thats crazy slow. Plus from above we know its an int.

>+    guard(false, lir->ins_eq0(result), exit);
>+    LIns* int32_ins = lir->insLoad(LIR_ld, space, 0);
>+    return int32_ins;
>+}

>+    guard(true,
>+          lir->ins2(LIR_or,
>+                    lir->ins2(LIR_peq,
>+                              lir->ins2(LIR_piand, val_ins, INS_CONSTWORD(JSVAL_TAGMASK)),
>+                              INS_CONSTWORD(JSVAL_DOUBLE)),
>+                    lir->ins2(LIR_peq,
>+                              lir->ins2(LIR_piand, val_ins, INS_CONSTWORD(1)),
>+                              INS_CONSTWORD(1))),
>+          exit);

Dito here.

>diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp
>--- a/js/src/jstracer.cpp
>+++ b/js/src/jstracer.cpp
>@@ -135,10 +135,10 @@ static const char tagChar[]  = "OIDISIBI
> #define MAX_CALLDEPTH 10
> 
> /* Max native stack size. */
>-#define MAX_NATIVE_STACK_SLOTS 1024
>+#define MAX_NATIVE_STACK_SLOTS 4096
> 
> /* Max call stack size. */
>-#define MAX_CALL_STACK_ENTRIES 64
>+#define MAX_CALL_STACK_ENTRIES 500

We have to fix the reserved objects.

Brendan Eich [:brendan]

Comment 33

•

15 years ago

(In reply to comment #32)
> (From update of attachment 401357 [details] [diff] [review])
> 
> >+JS_REQUIRES_STACK JSBool FASTCALL
> >+js_PopInterpFrame(JSContext* cx, InterpState* state)
> >+{
> >+    JS_ASSERT(cx->fp && cx->fp->down);
> >+    JSInlineFrame* ifp = (JSInlineFrame*)cx->fp;
> >+    cx->fp = cx->fp->down;
> >+    JS_ASSERT(cx->fp->regs == &ifp->callerRegs);
> >+    /* :TODO: is this safe? */
> >+    cx->fp->regs = ifp->frame.regs;
> 
> Yeah, thats the right thing to do.

Right, the precious JSFrameRegs regs on the js_Interpret stack is what you want the top frame's cx->fp->regs always pointing at.

> >+#define RESTORE_INTERP_VARS()                                                 \
> >+    JS_BEGIN_MACRO                                                            \
> >+            fp = cx->fp;                                                      \
> >+            script = fp->script;                                              \
> >+            atoms = FrameAtomBase(cx, fp);                                    \
> >+            currentVersion = (JSVersion) script->version;                     \
> >+            JS_ASSERT(fp->regs == &regs);                                     \
> >+            if (cx->throwing)                                                 \
> >+                goto error;                                                   \
> >+    JS_END_MACRO
> >+        
> 
> Why the extra newline?

Body of the macro apart from JS_{BEGIN,END}_MACRO is over-indented four spaces.

/be

Robert Sayre

Updated

•

15 years ago

Flags: wanted1.9.1+

Flags: blocking1.9.2+

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 34

•

15 years ago

> Shouldn't RESTORE_INTERP_VARS() be inside the { } ?

Good catch, thanks.

> We might want to try deeper recursion too, so || fp->script ==
> fp->down->down->script. And be careful with fp->down being not NULL.

In this case one level of fp->down is always okay. For this patch I'm not going to try and support multiple levels - can be follow-up.

> Why do we check here too? Can we unify this?

Check what?

> >+    /* :TODO: use cx->fp->down here...? */
> 
> ?

Stale comment - thanks.

> 
> >+    #if 0
> >+    /* :HACK: detect overflows so we dont build infinite branches */
> >+    if (anchor && anchor->exitType == RECURSIVE_SLURP_FAIL_EXIT &&
> >+        anchor->slurpType == TT_INT32 && downMap[anchor->slurpSlot] == TT_INT32) {
> >+        downMap[anchor->slurpSlot] = TT_DOUBLE;
> >+    }
> >+    #endif
> 
> I don't understand.

It's #ifdef 0'd because it shouldn't ever happen. Will change to an assert and give a better explanation. Basically, we don't want to guard that a slot is an integer, fail because it's a double, then build another integer read on the same slot.


> LIR_or doesn't generate great code. Maybe guards + jumps?

Sounds good.

> And thats crazy slow. Plus from above we know its an int.

I can put this in the slow path - it could be a double that fits in an int, for example. I don't think it's "crazy slow" :) but certainly not ideal.

> We have to fix the reserved objects.

Yeah - is there an existing bug on this or should I file?

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 35

•

15 years ago

(In reply to comment #34)
> > And thats crazy slow. Plus from above we know its an int.

On second thought, we don't know that it's an int, and it seems hard to do a jump here because we don't have phi nodes. The new code would look something like:

 ialloc 4
 if val & JSVAL_INT:
  store val >> 1 into ialloc
 else
  guard val & JSVAL_DOUBLE
  call TryUnboxInt32(val, ialloc)
 result = *ialloc

It's not much different but it has a "fast" path. Is it worth it, or is there a better way of expressing this? (Also: is LIR_or being slow something we can fix?)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 36

•

15 years ago

Attached patch v5 (obsolete) (deleted) — Details — Splinter Review

Changes in this version:
 - Rebased against new (unpushed) FrameInfo patch.
 - Lots more cleanup.
 - JSOP_CALLNOTRACE removed.
 - JSOP_TRACE is emitted after every JSOP_CALL, this becomes a hint for both down and up recursion.
 - Recursive trees are trashed and blacklisted if they do not link quickly.
 - Assert-botching and massive perf regressions in string-tagcloud and v8 tests fixed.
 - Various cases of recursion we shouldn't trace in this patch are properly detected and aborted. This was mostly mutual recursion but I have a plan for that in a follow-up patch.
 - Made js_PopInterpFrame more rigorous after talking to Brendan. mrkbap reviewed it after.

I have not attempted to deal with the Oracle yet. That can be a follow-up as well, since we already gain performance - just not at full potential.

There is one minor regression still, v8/run-early-boyer.js gets ~15 "points" lower with this patch. It's 1AM here so I'll track this down tomorrow. It seems like we're running a lot of short traces somewhere.

Other than that, which I expect to be a simple heuristic somewhere, this patch is ready for a real review.

Attachment #401357 - Attachment is obsolete: true

Attachment #403438 - Flags: review?(gal)

Attachment #401357 - Flags: review?(gal)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 37

•

15 years ago

Follow-up thought: It is probably unnecessary to block tree calls if |cx->fp->script == treeInfo->script| as long as the |pc| is different, since the up-recursive guard will fail and the frame won't be popped. So that check might be too conservative, but it won't hurt anything.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 38

•

15 years ago

Attached patch v6 (deleted) — Details — Splinter Review

The previous iterations of this patch subtly changed lots of start and stop recording behavior, which regressed a few tests. In this version:

- A recorder knows whether it came from EnterFrame, LeaveFrame, or a Branch, and can use this to decide whether to keep recording or whether to allow a non-looping trace.
- A tree knows whether it only has an up-recursive side, which is usually much smaller than the down recursive side. In this case, we don't want to bother even executing the tree.
- The heuristics on how to deal with recursion now let old cases slide through as long as they make sense (see comment in EnterFrame). When there's recursion that will never compile, the entry of the script is immediately blacklisted.

SunSpider results are promising. "Forever" paste: http://pastebin.mozilla.org/673636

On my Windows box, this is a ~45ms net perf win, which is 5-6% of the running time. There's another 8-10ms to win on access-binary-trees if we figure out what to do with the Oracle.

The 6ms regression in 3d-cube is real. If you look at SunSpider 3d-cube.js (which is different from the one in our tree), it has a benchmark loop that actually calls itself recursively. In the case that we have a deeply recursive function that only runs once, reconstructing stack frames is still slightly problematic. In this specific case, tail call analysis would make that go away.

Otherwise this is part of a larger problem in TraceMonkey where the JIT and Interpreter frames diverge too much, and it's expensive to cross in between. For example, disabling the JIT is a large perf win for v8/run-early-boyer. This is because it has a very tiny loop that gets used throughout the program, and it's just not worth running it all by itself. While we should work on tracing the more complex cases of recursion in that benchmark, it's probably also important to take a look at either 1) heuristics to throw away useless traces, or 2) speeding up ExecuteTree/LeaveTree (Andreas and I have some ideas on this, I'll file a bug tomorrow).

Attachment #403438 - Attachment is obsolete: true

Attachment #403752 - Flags: review?(gal)

Attachment #403438 - Flags: review?(gal)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 39

•

15 years ago

Addendum: Luke Wagner had a neat idea - delay execution of the down-recursive trace. We could magically have the interpreter know which branches would trigger a guard failure, and it could automatically start a new recorder. The tracer would not actually execute the tree until it knew the whole thing was nicely linked.

This seems worth thinking about, though it's more distant than optimizing transitions between interp/JIT, and I don't think it would necessarily supplant the interp-frame slurping code which is necessary as soon as any guard fails.

Andreas Gal :gal

Reporter

Comment 40

•

15 years ago

Comment on attachment 403752 [details] [diff] [review]
v6


>+#if defined DEBUG
>+static JS_REQUIRES_STACK void
>+AssertDownFrameIsConsistent(JSContext* cx, VMSideExit* anchor, FrameInfo* fi)
>+{
>+    unsigned downPostSlots = fi->callerHeight;
>+    JSTraceType* typeMap = fi->get_typemap();
>+    JS_ASSERT(anchor->recursive_down);
>+    JS_ASSERT(anchor->recursive_down->callerHeight == downPostSlots);

Move these to the top and add some space after it.

>+    js_CaptureStackTypes(cx, 1, typeMap);
>+    const JSTraceType* m1 = anchor->recursive_down->get_typemap();

Here to. Let the code breathe a little.

>+    for (unsigned i = 0; i < downPostSlots; i++) {
>+        if (m1[i] == typeMap[i])
>+            continue;
>+        if (typeMap[i] == TT_INT32 && m1[i] == TT_DOUBLE)
>+            continue;
>+        JS_NOT_REACHED("invalid RECURSIVE_MISMATCH exit");
>+    }
>+    JS_ASSERT(memcmp(anchor->recursive_down, fi, sizeof(FrameInfo)) == 0);
>+}
>+#endif

> /* Max native stack size. */
>-#define MAX_NATIVE_STACK_SLOTS 1024
>+#define MAX_NATIVE_STACK_SLOTS 4096
> 
> /* Max call stack size. */
>-#define MAX_CALL_STACK_ENTRIES 64
>+#define MAX_CALL_STACK_ENTRIES 500

This is going to cause a lot of objects to be reserved but I am about to remove the reserve list so lets just ignore this issue.

Attachment #403752 - Flags: review?(gal) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 41

•

15 years ago

Pushed as cset http://hg.mozilla.org/tracemonkey/rev/910f0c1ca2e5

Watching tree to see if this sticks.

Bob Clary [:bc] (inactive)

Updated

•

15 years ago

Depends on: 520003

Bob Clary [:bc] (inactive)

Updated

•

15 years ago

Depends on: 519999

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 42

•

15 years ago

Attached patch corrected arena code (obsolete) (deleted) — Details — Splinter Review

Not sure if this needed a new bug. The arena code in the original patch is wrong, since it is unbalanced - disobeys the LIFO requirement. This patch corrects that by stashing the frames in a linked list and freeing them in LeaveTree.

Attachment #404170 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 43

•

15 years ago

New bug if the patch for this bug landed (in which case, fixed-in-tracemonkey status whiteboard please).

/be

Brendan Eich [:brendan]

Comment 44

•

15 years ago

Comment on attachment 404170 [details] [diff] [review]
corrected arena code

>diff --git a/js/src/jsbuiltins.cpp b/js/src/jsbuiltins.cpp
>--- a/js/src/jsbuiltins.cpp
>+++ b/js/src/jsbuiltins.cpp
>@@ -452,7 +452,14 @@ js_PopInterpFrame(JSContext* cx, InterpS
>     cx->fp = cx->fp->down;
>     JS_ASSERT(cx->fp->regs == &ifp->callerRegs);
>     cx->fp->regs = ifp->frame.regs;
>-    JS_ARENA_RELEASE(&cx->stackPool, ifp->mark);
>+
>+    /* Don't release |ifp->mark| yet, since ExecuteTree uses |cx->stackPool|. */
>+    if (!state->poppedFrameHead)
>+        state->poppedFrameHead = ifp;
>+    else
>+        state->poppedFrameTail->frame.down = &ifp->frame;
>+    state->poppedFrameTail = ifp;
>+    ifp->frame.down = NULL;

Here's an old trick for singly-linked lists you only append to:

    JSStackFrame *poppedFrameHead;
    JSStackFrame **poppedFrameTail;
  . . .
    state->poppedFrameHead = NULL;
    state->poppedFrameTail = &state->poppedFrameHead;

then the above-cited code simplifies to:

    *state->poppedFrameTail = ifp;
    state->poppedFrameTail = &ifp->frame.down;
    ifp->frame.down = NULL;

>@@ -6609,6 +6611,16 @@ LeaveTree(InterpState& state, VMSideExit
>                  innermost->calldepth == 0 && callstack == rp);
> 
>     JS_ARENA_RELEASE(&cx->stackPool, state.stackMark);
>+
>+    /* Release all frames that were popped by js_PopInterpFrame. */
>+    JSInlineFrame* ifp = state.poppedFrameHead;
>+    while (ifp) {
>+        JSInlineFrame* next = (JSInlineFrame*)ifp->frame.down;
>+        JS_ASSERT_IF(!next, ifp == state.poppedFrameTail);
>+        JS_ARENA_RELEASE(&cx->stackPool, ifp->mark);
>+        ifp = next;
>+    }

Now that this is LIFO it is clear you don't need to release to each mark, only to the oldest. So even simpler code would just maintain the oldest popped frame's mark.

/be

Brendan Eich [:brendan]

Comment 45

•

15 years ago

Comment on attachment 404170 [details] [diff] [review]
corrected arena code

Spun out to bug 520210.

/be

Attachment #404170 - Attachment is obsolete: true

Attachment #404170 - Flags: review?(brendan)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Whiteboard: fixed-in-tracemonkey

Bob Clary [:bc] (inactive)

Updated

•

15 years ago

Depends on: 520240

Robert Sayre

Updated

•

15 years ago

Blocks: 520321

Robert Sayre

Updated

•

15 years ago

No longer blocks: 520321

Depends on: 520321

Bob Clary [:bc] (inactive)

Updated

•

15 years ago

Depends on: 520492

Jesse Ruderman

Updated

•

15 years ago

Depends on: 520498

Jesse Ruderman

Updated

•

15 years ago

Alias: tracerecursion

Jesse Ruderman

Updated

•

15 years ago

Depends on: 520591

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 520503

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 520613

Bob Clary [:bc] (inactive)

Updated

•

15 years ago

Depends on: 520536

Robert Sayre

Comment 46

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/910f0c1ca2e5

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 521169

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

No longer depends on: 521169

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 521169

Jesse Ruderman

Updated

•

15 years ago

Depends on: 521447

Jesse Ruderman

Updated

•

15 years ago

Depends on: 521465

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 521694

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

No longer depends on: 521169

Jesse Ruderman

Updated

•

15 years ago

Depends on: 522311

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 522624

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 524061

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 528048

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

No longer depends on: 528048

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 531298

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

15 years ago

Depends on: 531629

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

No longer depends on: 531629

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Blocks: 535728

You need to log in before you can comment on or make changes to this bug.