Closed
Bug 607539
Opened 14 years ago
Closed 14 years ago
mandelbrot "benchmark" is a lot slower with profiling on
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: bzbarsky, Assigned: billm)
References
()
Details
(Keywords: regression, Whiteboard: [jsperf] [fixed-in-tracemonkey][hardblocker])
Attachments
(1 file, 3 obsolete files)
(deleted),
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
I see 105ms with profiling or jm disabled; 355ms with profiling enabled or tm disabled.
I thought we had this in trace-tests... but that might be as a stress-test only. :(
Building debug browser now to get some more data.
Reporter | ||
Comment 1•14 years ago
|
||
Note that the core part of the code is this:
for (var i = 0, curReal = realRange.min;
i < numCols;
++i, curReal += realStep) {
for (var j = 0, curImag = imagRange.max;
j < numRows;
++j, curImag += imagStep) {
var n = computeEscapeSpeed(curReal, curImag);
addPoint(points, n, i, j)
}
}
where computeEscapeSpeed runs a loop anywhere between 1 and 105 times. Most calls to computeEscapeSpeed run the loop 105 times, but most of the early calls run it just once.
Reporter | ||
Comment 2•14 years ago
|
||
So we first start profiling with that inner for loop there (presumably because the loop inside computeEscapeSpeed has never run enough iterations at that point yet to get profiled?). We end up deciding not to trace it because the nested loop inside computeEscapeSpeed (which has run 0 iterations, according to the "NESTED" spew!) has numSelfOps == numSelfOpsMult == 0, so that isCompilationExpensive for that loop returns true.
Then we later profile the loop inside computeEscapeSpeed and end up aborting profiling (not sure why yet). So we blacklist that loop. And that sets numSelfOps to MAX_PROFILE_OPS for that innermost loop, which means all loops containing it get blacklisted too. So we blacklist everything.
Reporter | ||
Comment 3•14 years ago
|
||
If I change the >= to > in this line:
if (numSelfOpsMult >= numSelfOps*100000)
then things seem to be good. Or at least, I only see us start profiling for that inner for loop twice, each time decide to trace it and move on. Not sure why I never see us profile the outer loop....
Also not sure yet why we ended up aborting profiling. Trying to figure that out now.
blocking2.0: --- → ?
Keywords: regression
Reporter | ||
Comment 4•14 years ago
|
||
Looks like we abort profiling if we return out of a loop we're currently profiling? But unlike aborting tracing, which will back off and try again later it looks like aborting profiling permanently blackists the loop? Is that right?
In any case, the abort happens when we take the return in computeEscapeSpeed, which is the common case for that function.
In fact, it looks like any loop that returns from inside the loop before hitting 16384 jsops will hit the abort and get blacklisted. In particular, any loop that uses return instead of a break or loop condition and happens to return before it's looped enough times to hit 16384 jsops will hit this case... Would it make more sense to just decide() at that point instead of aborting, possibly?
Updated•14 years ago
|
Whiteboard: [jsperf]
Assignee | ||
Comment 5•14 years ago
|
||
I think there are a lot of things independently going wrong here. I still have a few things to track down, but here's what I've found so far.
- The problem with the zero-iteration loop in comment 3 was tripping us up. The fix there seems good to me.
- JM wasn't compiling the main script because it used const. I filed bug 607751 to compile those opcodes. The weird thing here is that if JM can't compile a script, then we run in the interpreter, which avoids the profiler entirely. I think this is the correct thing to do: lacking the method jit, we probably should just trace as much as possible, which is what happens without the profiler.
- I don't think that aborting due to a return statement is bad. This was done by design. First, we only abort if the return breaks out of the loop being profiled (as opposed to some nested loop).
Second, I want to emphasize an important part of profiling: If we profile a loop and decide not to trace it, it's not the end of the world. We may end up profiling an outer loop, deciding that we do want to trace it, and then we'll trace the original loop as well. This is why the "un-blacklisting" mechanism was needed for profiling.
In this example, we decide not to trace the computeEscapeSpeed loop due to the return. However, we will still trace it when we decide to trace the |for (var j = 0, ...)| loop.
In general, a return statement in a loop suggests that we won't execute many iterations of the loop, so we probably don't want it to be a top-level trace. It's just another "short loop" indicator.
- This test case also seems to exhibit a problem with tracer integration, where we start recording a loop when the interpreter is running in JSINTERP_SAFEPOINT mode. It seems like this is undesirable.
- I still haven't figured out why we don't trace the outer loop. Once I figure that out, I'll post a patch for these issues.
Anyway, thanks for the great test case!
Assignee | ||
Comment 6•14 years ago
|
||
This patch fixes the issue with loops with loops of zero iterations.
It also fixes the problem with the integration between the profiler and the JITs. With the patch, we trace both the inner and outer main loops and performance with -mjp is that same as with -j.
The problem was that, while running in the interpreter, waiting to reach a safe point, we needed to make important profiling decisions. Previously, the profiler could only be called from the method JIT, never the interpreter. This patch fixes the problem. As a result, you can run in weird modes like -j -p. There's no reason to do this, but it should work now.
I pushed to the tryserver, since it's slightly risky change.
Comment on attachment 486474 [details] [diff] [review]
patch
>+ if (prof->profiled) {
>+ if (prof->traceOK) {
>+ return RecordLoopEdge(cx, inlineCallCount, prof->execOK);
>+ } else {
>+ return MONITOR_NOT_RECORDING;
>+ }
No return-after-else, just:
>+ if (prof->traceOK)
>+ return RecordLoopEdge(cx, inlineCallCount, prof->execOK);
>+ return MONITOR_NOT_RECORDING;
Attachment #486474 -
Flags: review?(dvander) → review+
Reporter | ||
Comment 8•14 years ago
|
||
> We may end up profiling an outer loop, deciding that we do want to trace it,
> and then we'll trace the original loop as well.
See, that's my concern. Once we _abort_ profiling of a loop X (which is different from just profiling and deciding not to trace), then we won't trace any loops that contain X, as far as I can tell.
In particular, AbortProfiling will set numSelfOps for X to MAX_PROFILE_OPS. So isCompilationExpensive() for X will return true. And isCompilationExpensive() for a loop containing X will return true if it returns true for X.
The only way I can see this working is if we sometimes blow away the LoopProfile structure for X and create a new one, before we start profiling the ancestor loop. As far as I can see this only happens on a ResetJIT call.
Am I just missing something?
Assignee | ||
Comment 9•14 years ago
|
||
(In reply to comment #8)
> Am I just missing something?
No, I was wrong. I forgot about that assignment to numSelfOps in AbortProfiling. I'm pretty sure I added it to avoid tracing something in one of the SunSpider date benchmarks. But I just took out the assignment and I'm not seeing any regressions. It's definitely a good idea to take it out, because it's a silly thing to do in general.
I'll post an updated patch with this change and a fix for another problem I found this morning.
Assignee | ||
Comment 10•14 years ago
|
||
This patch applies on top of the previous one. It removes the line in AbortProfiling. It also fixes a problem with PCWithinLoop. The problem was that it didn't work if you had already left the function where the loop was. Now it compares frame pointers. I should have picked up on this when you asked about it, Dave. Oh well.
Attachment #486674 -
Flags: review?(dmandelin)
Reporter | ||
Comment 11•14 years ago
|
||
We may still want to set some flag to prevent tracing of the loop being aborted (this time around; we could clear the flag the next time we profile the loop)... I do agree that a priori loops that return while being profiled should probably not get traced, since they'd just trace-abort if they did the return while recording.
Assignee | ||
Comment 12•14 years ago
|
||
Good point. I think this should do it.
Attachment #486687 -
Flags: review?(bzbarsky)
Reporter | ||
Comment 13•14 years ago
|
||
Comment on attachment 486687 [details] [diff] [review]
additional additional patch
r=me
Attachment #486687 -
Flags: review?(bzbarsky) → review+
Comment 14•14 years ago
|
||
Comment on attachment 486674 [details] [diff] [review]
additional patch
>-PCWithinLoop(JSScript *script, jsbytecode *pc, T& loop)
>-{
>- return script != loop.script || (pc >= loop.top && pc <= loop.bottom);
>+PCWithinLoop(JSStackFrame *fp, jsbytecode *pc, T& loop)
>+{
>+ return fp > loop.fp || (fp == loop.fp && pc >= loop.top && pc <= loop.bottom);
> }
|fp > loop.fp| doesn't look right to me. I know it didn't work before the new stacks, but I guess it probably does do the right thing now. Luke, does this seem safe to you? Or should it walk the stack to check?
> /*
> * When we hit a nested loop while profiling, we record where it occurs
> * and how many iterations we execute it.
> */
> struct InnerLoop {
>- JSScript *script;
>+ JSStackFrame *fp;
> jsbytecode *top, *bottom;
> uintN iters;
I'd prefer 'entryFp', 'startFp', or something like that.
Otherwise looks good.
Attachment #486674 -
Flags: review?(dmandelin)
Comment 15•14 years ago
|
||
(In reply to comment #14)
> Luke, does this seem safe to you? Or should it walk the stack to check?
The two exceptions I can think of are:
- the 'floating frames' stored in generators (which are copies of inactive generators, hence, should not appear here)
- reentering the VM (content -> C++ -> chrome), but it looks like that aborts profiling.
> I'd prefer 'entryFp', 'startFp', or something like that.
By the by, we've been trying to move toward not capitalizing either of the letters in the register-like acronyms, viz. sp, pc, and fp. (E.g., fp->prevpc(), prevPc or prevPC)
Updated•14 years ago
|
blocking2.0: ? → beta8+
Assignee | ||
Comment 16•14 years ago
|
||
Unfortunately, one of the previous patches asserted on the tryserver. I fixed the problem and added some additional assertion coverage. This patch subsumes the previous ones. Could you re-review, Dave? I think the fp comparison is safe, based on what Luke said.
Attachment #486474 -
Attachment is obsolete: true
Attachment #486674 -
Attachment is obsolete: true
Attachment #486687 -
Attachment is obsolete: true
Attachment #489328 -
Flags: review?(dmandelin)
Comment 17•14 years ago
|
||
Comment on attachment 489328 [details] [diff] [review]
new patch
I just gave it a light review since I had already reviewed it and you checked out the stack size issue. Looks good.
Attachment #489328 -
Flags: review?(dmandelin) → review+
Assignee | ||
Comment 18•14 years ago
|
||
Whiteboard: [jsperf] → [jsperf] fixed-in-tracemonkey
Assignee | ||
Comment 19•14 years ago
|
||
Had to back this out
http://hg.mozilla.org/tracemonkey/rev/cb76b2d61096
Updated•14 years ago
|
Whiteboard: [jsperf] fixed-in-tracemonkey → [jsperf]
Assignee | ||
Comment 20•14 years ago
|
||
Whiteboard: [jsperf] → [jsperf] fixed-in-tracemonkey
Comment 21•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 23•14 years ago
|
||
I'm reopening this bug, since the patches that actually landed here, whatever they do, don't fix it (as one can trivially see by trying a Nov 23 nightly, or a current trunk nightly, on the testcase). I still get 325ms or so, unless I disable methodjit or jitprofiling, in which case I get 100ms.
I'm not sure what should happen with the blocking+ flag here: leave as 8, or move to 9?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [jsperf] fixed-in-tracemonkey → [jsperf]
Assignee | ||
Comment 24•14 years ago
|
||
I landed the lower-risk stuff. This includes some new assertions and bug fixes:
http://hg.mozilla.org/tracemonkey/rev/4d66cf4385f0
http://hg.mozilla.org/tracemonkey/rev/cb3f0ac5494f
More to come later, I hope.
Comment 25•14 years ago
|
||
Blocking beta8+? Probably it should be beta9+. Beta8 was released on Dec 21..
Reporter | ||
Comment 26•14 years ago
|
||
Eddward: please do read the bug before commenting. Comment 23 paragraph two in particular.
Comment 27•14 years ago
|
||
I'm not sure whether bugs that block previous releases but are still open are being kept track of or not.
https://bugzilla.mozilla.org/buglist.cgi?field0-3-0=cf_blocking_20&type0-1-0=notsubstring&field0-1-0=cf_blocking_20&field0-0-0=cf_blocking_20&query_format=advanced&value0-2-0=beta9&value0-1-0=betaN&type0-3-0=notsubstring&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&field0-2-0=cf_blocking_20&type0-0-0=substring&value0-0-0=beta&type0-2-0=notsubstring
Shows only this bug and bug 586198.
Reporter | ||
Updated•14 years ago
|
blocking2.0: beta8+ → betaN+
Assignee | ||
Comment 28•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/94c6d2a90ae6
This is now running in ~100ms for me.
Whiteboard: [jsperf] → [jsperf] [fixed-in-tracemonkey]
Reporter | ||
Comment 29•14 years ago
|
||
Yeah, looks like at this point -mjp is only about a 10% slowdown over -j here; that's reasonable. Thanks!
Updated•14 years ago
|
Whiteboard: [jsperf] [fixed-in-tracemonkey] → [jsperf] [fixed-in-tracemonkey][hardblocker]
Comment 30•14 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•