504829 - declining TM performance with repeated runs of test case

Reporter

Description

•

15 years ago

See report here:

http://hacks.mozilla.org/2009/07/tracemonkey-overview/#comment-2035

If you run 'example 3' in that blog post over and over again the time it takes increases with every run, starting at 3ms and eventually topping out at around 200ms.  If you keep running it eventually resets and starts running at 3ms again.  (Blowing a cache, eventually?)

Tested here on the 3.5.1 final release on Windows 7.

Wesley W. Garland

Comment 1

•

15 years ago

Repro'd here as described on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5 (.NET CLR 3.5.30729)

Was also able to reproduce with a bleeding-edge tracemonkey release-build js shell on a very slow SPARC, except instead of resetting it got VERY slow -- after a slow build to 300ms it jumped to over 30s and stayed there.

Now here's where it gets interesting: If I call JS_GC() each time the loop runs, the time intervals stay the same, but instead of getting stuck at > 30s, they jump back down into the 30ms range.

Wesley W. Garland

Comment 2

•

15 years ago

Sorry, JS_MaybeGC() in my loop.

With JS_GC() in the loop, it runs at fastest speed every time.

Boris Zbarsky [:bzbarsky]

Comment 3

•

15 years ago

Attached file js shell testcase that shows the problem (deleted) — Details

Brendan Eich [:brendan]

Comment 4

•

15 years ago

Comment on attachment 389142 [details]
js shell testcase that shows the problem

Please use text/plain for .js file attachments. Easier to preview and we don't want to run 'em on click in the browser anyway (AFAIK -- anyone disagree?).

/be

Attachment #389142 - Attachment mime type: application/x-javascript → text/plain

Boris Zbarsky [:bzbarsky]

Comment 5

•

15 years ago

So what's happening here is that each time we run the loop |mul| has a different value.  So we can't reuse our existing trace for that loop and have to create a new one.  So things slow down more and more as we end up with more traces compiled for that code location (so presumably it takes longer and longer to realize that none of the traces are relevant to us).

This continues until the 34th iteration.  At that point, performance _really_ takes a nosedive (to much slower than interp).  I'm not sure why, exactly.  I tried bumping MAXEXIT, BL_BACKOFF, MAXPEERS, MAX_BRANCHES, and none of those change the "34" number.  We should try to figure out what the deal is there.

In any case, the reset observed in browser is presumably due to gc clearing all of our cached traces.

Brendan, I just forgot to switch from "autodetect" when attaching...

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 6

•

15 years ago

We shouldn't autodetect .js as anything but text/plain here -- it's the dominant case.  I wonder if there's a bug on that!

Brendan Eich [:brendan]

Comment 7

•

15 years ago

This bug depends on bug 471214 but the eval may be an added nasty. Leaving this dependent rather than a dup until I can verify that a fresh fix for bug 471214 also fixes this bug (thanks to the eval cache).

/be

Depends on: 471214

Boris Zbarsky [:bzbarsky]

Comment 8

•

15 years ago

For what it's worth, for that testcase the stats look like this:

recorder: started(7), aborted(1), completed(70), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(4), breaks(0), returns(0), unstableInnerCalls(0), blacklisted(0)
monitor: triggered(100226), exits(100226), type mismatch(0), global mismatch(0)

Here's the relevant part of a full log:

leaving trace at /Users/bzbarsky/test.js:10@86, op=call, lr=0x27ef38, exitType=BRANCH, sp=3, calldepth=0, cycles=21237
trying to attach another branch to the tree (hits = 0)

And I just checked, and I totally failed to make in the right place when testing in comment 5.  MAX_BRANCHES is absolutely where it's at for this testcase, as it should have been.

No longer depends on: 471214

Boris Zbarsky [:bzbarsky]

Comment 9

•

15 years ago

We should figure out what's going on here other than the eval thing.

What seems to be happening is that once we hit MAX_BRANCHES we continue entering the loop and then exiting it on the branch exit.  And we do that for every return the the loop header.

So in the testcase here, on the 34th iteration of the function, we enter/exit trace for that second loop about 100000 times.

If I bump the iteration count by 1, we do that for both the 34th and 35th iteration...

Depends on: 471214

Boris Zbarsky [:bzbarsky]

Comment 10

•

15 years ago

So there seems to be two side exits with the same address here, or something... in any case, the relevant guards are:

    OBJ_GET_PRIVATE(cx, callee_obj)
    -2
    ld317 = ld $stack0[16]
    and69 = and ld317, -2
    eq175 = eq and69, OBJ_GET_PRIVATE(cx, callee_obj)
    xf213: xf eq175 -> pc=0x30d846 imacpc=0x0 sp+24 rp+0

    OBJ_GET_PARENT(cx, callee_obj)
    ld318 = ld $stack0[12]
    eq176 = eq ld318, OBJ_GET_PARENT(cx, callee_obj)
    xf214: xf eq176 -> pc=0x30d846 imacpc=0x0 sp+24 rp+0

Which makes sense: guarding on calling the same function with the same scope chain.

Boris Zbarsky [:bzbarsky]

Comment 11

•

15 years ago

So even if we make this particular case not branch-explode.... is there any way we can avoid trying to execute that tree on every return to the loop header if we keep taking branch exits and we've hit MAX_BRANCHES?  I guess we keep hoping that we'll manage to hit one of our existing branches, but in this case that just can't happen.

It seems that if we limit it to trees that have hit MAX_BRANCHES, the overhead of the checking might be worth it to avoid slowdown over interp....

On the other hand, if the loop were thicker and the failing guard further down, it might not be a loss, and might even be a win, to try entering trace on each iteration.  So not sure what the best course of action is here.

Brendan Eich [:brendan]

Comment 12

•

15 years ago

There's no eval thing to figure out. We cache evals, so we get the same JSFunction every time, but different parent (scope chain link) because the lambda uses upvars that are mutated. This is a distinct bug from bug 471214 after all.

The trick here is to avoid guarding on the parent of the eval-created function object. If we indeed optimize its prod and i upvars to access active slots of the outer function's Call object, then we shouldn't guard on parent at all.

dmandelin, want to take this one?

/be

Brendan Eich [:brendan]

Updated

•

15 years ago

No longer depends on: 471214

Brendan Eich [:brendan]

Updated

•

15 years ago

Flags: wanted1.9.2?

Flags: blocking1.9.2?

Boris Zbarsky [:bzbarsky]

Comment 13

•

15 years ago

I filed bug 504881 on comment 11.

Robert Sayre

Updated

•

15 years ago

Flags: blocking1.9.2? → blocking1.9.2+

Brendan Eich [:brendan]

Comment 14

•

15 years ago

If we compute the maximum UPVAR_FRAME_SKIP when inlining a call to a function fun having upvars that are not optimized such that FUN_FLAT_CLOSURE(fun), and the skip amount is "in range" by one or another of our upvar optimizations, then we can avoid guarding on parent.

This could be an important optimization. Have to check the popular benchmarks.

/be

David Mandelin [:dmandelin]

Updated

•

15 years ago

OS: Windows NT → All

Hardware: x86 → All

Damon Sicore (:damons)

Comment 15

•

15 years ago

So, hearing from dmandelin, gal, and danderson that this is not going to be fixed in time for 1.9.2 as it's a really hard problem.  So, to make sure Sayre sees this, I'm leaving as blocking 1.9.2+ and at the same time, noming for blocking1.9.3.  

Sayre, please update accordingly.

Assignee: general → dvander

blocking2.0: --- → ?

OS: All → Windows NT

Priority: -- → P1

Hardware: All → x86

David Mandelin [:dmandelin]

Updated

•

15 years ago

OS: Windows NT → All

Hardware: x86 → All

Robert Sayre

Updated

•

15 years ago

Flags: wanted1.9.2?

Flags: wanted1.9.2+

Flags: wanted-next+

Flags: blocking1.9.2+

Version: 1.9.1 Branch → Trunk

Robert Sayre

Updated

•

15 years ago

blocking2.0: ? → beta1

Flags: wanted1.9.2+

David Mandelin [:dmandelin]

Updated

•

15 years ago

Depends on: 510554

Robert Sayre

Updated

•

14 years ago

blocking2.0: beta1+ → beta2+

Robert Sayre

Updated

•

14 years ago

blocking2.0: beta2+ → betaN+

Robert Sayre

Comment 16

•

14 years ago

On mozilla-central, I can get the test case to go from 2ms to about 10ms, but then it goes back down again.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → WORKSFORME

Boris Zbarsky [:bzbarsky]

Comment 17

•

14 years ago

What happens here is that _if_ you have JM and jitprofiling both on, after we hit the MAX_BRANCHES point we stop trying to run the loop in the tracer altogether and just run it in the mjit.  So we end up doing it 2-4x slower than a single tracer loop run, but faster than interp, of course.

No idea whether that makes this bug wfm; I guess we do have bug 504881 filed on the real underlying issue here.