521309 - ExecuteTree arena path is slow

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Description

•

15 years ago

bug 520321 comment #14 shows that ExecuteTree is spending way too much time allocating memory. This is exacerbated by recursion which ups the memory requirements of entering traces.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Depends on: 521328

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

No longer depends on: 521328

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 1

•

15 years ago

Attached patch patch (deleted) — Details — Splinter Review

The problem is that ExecuteTree() allocates a big chunk in the arena which gets freed very quickly by js_Interpret as it unmarks lower frames.

As discussed in the office and on IRC, this is hard to solve. There are three solutions we came up with:
 1. Make arenas not release chunks aggressively, which would increase memory usage.
 2. Make the JIT incrementally allocate frames on trace.
 3. If malloc() fails trying to reconstruct JIT frames, abort().

This patch is #3.

Assignee: general → dvander

Status: NEW → ASSIGNED

Attachment #405388 - Flags: review?(gal)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Attachment #405388 - Flags: review?(jorendorff)

Andreas Gal :gal

Updated

•

15 years ago

Attachment #405388 - Flags: review?(gal) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 2

•

15 years ago

Comment on attachment 405388 [details] [diff] [review]
patch

jorendorff - could you review? Andreas says you wrote the original code here.

And --

<sayrer> dvander: I suggest brendan since the OOM thing is more of a module owner decision

Attachment #405388 - Flags: review?(brendan)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 3

•

15 years ago

Preemptively pushing to TM to watch how the tree reacts.

http://hg.mozilla.org/tracemonkey/rev/b1aa519f777c

Igor Bukanov

Comment 4

•

15 years ago

(In reply to comment #1)
>  3. If malloc() fails trying to reconstruct JIT frames, abort().
> 
> This patch is #3.

This was already discussed, but using abort in one single place means that SM immediately looses the promise of handling OOM without any benefits of simpler code in other places. Moreover, just calling abort without even attempting to run GC or release browser caches is just a bad idea IMO.

Andreas Gal :gal

Comment 5

•

15 years ago

The promise is still maintained if you do not use the JIT. We can document that if you want.

Robert Sayre

Comment 6

•

15 years ago

(In reply to comment #4)
> (In reply to comment #1)
> >  3. If malloc() fails trying to reconstruct JIT frames, abort().
> > 
> > This patch is #3.

So, first off, dvander's patch did prove that the arena code is the problem. It accounted for 82-98% of allocated bytes in date-format-tofte. imho, that is totally unacceptable and we have to fix it, even though it seems jemalloc can deal better than OS X malloc. The patch here eliminated the giant regression in bug 520321 on OS X and seems to speed up Windows and Linux a bit as well. 

> 
> This was already discussed, but using abort in one single place means that SM
> immediately looses the promise of handling OOM without any benefits of simpler
> code in other places.

That's not true. The patch here has a large, measurable benefit. It doesn't make sense to require a rewrite of all of SpiderMonkey to consider a different OOM policy for new versions of SM.

> Moreover, just calling abort without even attempting to
> run GC or release browser caches is just a bad idea IMO.

Well, the allocations we're currently trying to make infallible here don't look the kind that user code can cause to ask for very large amounts of memory. Asking the embedding for some reserve memory, attempting gc, and asking to release caches all seem like reasonable things to do here.

Robert Sayre

Comment 7

•

15 years ago

My previous comment sounds like I'm attached to abort(). I'm not, I just want the allocation in ExecuteTree to go away. Maybe we can figure out a way to do it only once without wasting lots of space.

Graydon Hoare :graydon

Comment 8

•

15 years ago

IMO this goes under the same heading "come up with a non-stupid strategy for handling OOM in the nanojit allocator SPI", a deferred discussion.

My feeling is that we should provide two options, selectable at compile-time:

  1. setjmp/longjmp. You run out of memory in the JIT, you longjmp to the entry-point of the JIT. The JIT's supposed to be "optional" anyway. Treat a longjmp return from recording like an abort, a longjmp return from execution like an execute call that bails before doing anything.

  2. abort. we let clients turn this on instead if they're just as happy with infallible malloc. Possibly we modify the *normal* JS malloc-failure path to do this too, when it's enabled. I imagine (though am not certain) that firefox-the-embedding will prefer this option. I know other embedders don't. They can use option #1 (or just disable the JIT).

Anyone got a better idea? Note that "modify nanojit to check every allocation for NULL" is not an option, so don't suggest it. Nanojit is shared code and Adobe rejects this; they're perfectly happy with setjmp/longjmp.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 9

•

15 years ago

The problem is that the state of the world is in chaos when we come out of the JIT. The only way to recover is to have enough memory to push back interpreter frames as needed. It might be possible to propagate an OOM up the call stack but embeddings would have to be very careful about properly checking and propagating errors from anything that can call LeaveTree from a deep bail.

We could try to do an early GC, but this would require teaching the GC how to walk the JIT stack. (This could but doesn't necessarily need to wait on the conservative GC.)

Brendan Eich [:brendan]

Comment 10

•

15 years ago

Comment on attachment 405388 [details] [diff] [review]
patch

>diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp
>--- a/js/src/jstracer.cpp
>+++ b/js/src/jstracer.cpp
>@@ -5387,22 +5387,21 @@ SynthesizeFrame(JSContext* cx, const Fra
>     }
> 
>     /* Allocate the inline frame with its vars and operands. */
>     if (a->avail + nbytes <= a->limit) {
>         newsp = (jsval *) a->avail;
>         a->avail += nbytes;
>         JS_ASSERT(missing == 0);
>     } else {
>-        /*
>-         * This allocation is infallible: ExecuteTree reserved enough stack.
>-         * (But see bug 491023.)
>-         */
>         JS_ARENA_ALLOCATE_CAST(newsp, jsval *, &cx->stackPool, nbytes);
>-        JS_ASSERT(newsp);
>+        if (!newsp) {
>+            JS_NOT_REACHED("out of memory - allocator failed");
>+            abort();

File a followup on LeaveTree fallibility, which means js_DeepBail needs to record another state bit.

We can't assume OOM is rare, especially with not-small-fixed-size allocations. We can and do propagate status from js_MonitorLoopEdge, including OOM -- but then jsinterp.cpp code suppresses. Lame, and fixable with a tri-state return value.

The JIT may run ahead of interpreter-equivalent execution, but OOM is uncatchable so we should be able to unwind without any leaks or stuck locks. Semantic difference in program traces run under no-jit and jit are not a concern if OOM causes a fail-stop, in my opinion. They beat a whole-process abort any day of the week!

>@@ -6599,17 +6597,18 @@ LeaveTree(InterpState& state, VMSideExit
>         /* :TODO: don't be all squirrelin' this in here */
>         innermost->recursive_down = *(rp - 1);
>     }
> 
>     /* Slurp failure should have no frames */
>     JS_ASSERT_IF(innermost->exitType == RECURSIVE_SLURP_FAIL_EXIT,
>                  innermost->calldepth == 0 && callstack == rp);
> 
>-    JS_ARENA_RELEASE(&cx->stackPool, state.stackMark);
>+    if (state.stackMark)
>+        JS_ARENA_RELEASE(&cx->stackPool, state.stackMark);

I talked to dvander in person about releasing eagerly in the js_PopInterpFrame built-in, now that that's safe to do. We shouldn't risk bloating cx->stackPool by popping interp frames but not releasing their mem until we leave the trace.

/be

Attachment #405388 - Flags: review?(brendan)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Blocks: 521472

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

Blocks: 521476

Brendan Eich [:brendan]

Comment 11

•

15 years ago

Re: comment 6, Igor's point is true -- we add aborts but do not eliminate OOM checks elsewhere so we are paying for some OOM checking without complete checking.

But even the Electrolysis OOM plan, which tries to monitor memory use to avoid small-alloc OOMs, and thereby relieve small-allocation-request sites from having to check for null or throwing new, does still require testing at large-allocation-request sites for OOM, and propagating it rather than aborting.

We aren't clearly in the small-alloc regime with jsarena-based code, and we don't yet have the OS-based memory-use monitoring, anyway. So whatever the virtues of this patch (I agree it should go in), our design and code are not coherent. And that seemed to me to be Igor's point.

Re: comment 8, nanojit is a simpler case than running a trace. We can't longjmp out of the code that sees an OOM under LeaveTree (in js_SynthesizeFrame) to the ExecuteTree code that triggered the trace, or we'll cross native frames that might leak or leave stuck locks.

David filed bug 521472 to consider making LeaveTree propagate OOM. Another idea that would preserve LeaveTree's infallibility is to allocate an emergency reserve for synthesized frames, one such reserve per trace monitor (which is per thread), and use that if we hit OOM, stealing it from the tm and requiring it be replenished. ~300 frames' worth of memory is < 30K.

This seems straightforward to implement. Comments?

/be

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

15 years ago

No longer blocks: 521476

Depends on: 521476

Robert Sayre

Comment 12

•

15 years ago

so, should we close this bug and focus on the follow up?

Andreas Gal :gal

Comment 13

•

15 years ago

We want this patch landed as is imo. It also reduces memory use significantly. Separately we should improve our arena code. I will file a bug. I don't think the emergency reserve for frames works though. The only way we allocate is by releasing the arena, so we would have to hook in there to detect when we can release the emergency memory again and we would make the general case slow. But lets discuss all that in a separate bug.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 14

•

15 years ago

(In reply to comment #12)
> so, should we close this bug and focus on the follow up?

Yeah. Huge thanks for helping narrow this down, sayrer.

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Jason Orendorff [:jorendorff]

Comment 15

•

15 years ago

I'm a little confused as to what happened here. Is this really RESO FIXED? Nothing landed here, right?

For the record, I don't think the patch should go in. It seems like several slightly different and much better (than calling abort) alternatives have been suggested, and reading between the lines of comment 12 and comment 14 it sounds like maybe sayrer and dvander have a plan. Do I need to do anything else here?

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 16

•

15 years ago

(In reply to comment #15)

This already landed to fix the regression. There's a follow-up bug on removing the abort() - bug 521472. I don't think there's anything else to do in this bug, just wanted more input.

Jason Orendorff [:jorendorff]

Updated

•

15 years ago

Attachment #405388 - Flags: review?(jorendorff)

Robert Sayre

Comment 17

•

15 years ago

(In reply to comment #15)
> I'm a little confused as to what happened here. Is this really RESO FIXED?
> Nothing landed here, right?

To be clear, this patch did go in.

> For the record, I don't think the patch should go in.

Uh oh. So, this patch is in. Should we take it to mozilla.dev.tech.js-engine?

Brendan Eich [:brendan]

Comment 18

•

15 years ago

I trust that the followup bug 521472 will be fixed, dvander and I are committed. It shouldn't be a kiss-off to smuggle inconsistent abort on OOM (while we sweat OOM recover elsewhere, burning code size, source complexity, and dev cycles). Jason, what do you think?

/be

Jason Orendorff [:jorendorff]

Comment 19

•

15 years ago

In reply to comment 18: I'm satisfied with that.

I apologize for taking so long on the review and then chiming in after the patch went in.  I was on the fence, but at least I should have commented.  Won't happen again.