Closed Bug 581586 Opened 14 years ago Closed 14 years ago

TM: GC heap allocate dense arrays

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
All
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 584917

People

(Reporter: gal, Assigned: bhackett1024)

Details

Attachments

(1 file, 1 obsolete file)

We allocate a lot of small dense arrays. We might want to allocate them from the GC heap to avoid malloc overhead or maybe bump allocate the interior and move everything into the GC heap at the next GC?

bhackett, want to post some statistics? Whats the distribution and frequency of dense array allocations?
There was a talk about using a trivial bump allocator and a copy GC for arrays and perhaps for strings. The observation is that for array elements and string chars there would be only single reference from the heap and some extra references from the native  or JIT stacks. That can be accounted for and the arrays can be moved.
(In reply to comment #0)
> We might want to allocate them from
> the GC heap to avoid malloc overhead or maybe bump allocate the interior and
> move everything into the GC heap at the next GC?

The problem with moving is references into arrays from the stack. And we can account for them, we can do a copy GC.
Array allocation statistics, by capacity:

SS:

7 (ARRAY_CAPACITY_MIN): 48633 (90%)
8-19: 2812
20-99: 2263
100+: 81

V8:

7: 18506 (78%)
8-19: 2812
20-99: 2263
100+: 285

V8 skews to larger arrays, but in both cases the great majority are the initial min-capacity allocation.
Ok let's implement this. Should be easy. We could also do this for objects and rip out slots. Saves a ton of branches and memory. How many objects do we allocate? It's probably also much cheaper to null this new generation with void or holes in the gc instead of during allocation.
(In reply to comment #4)
> It's probably also much cheaper to null this new generation with void
> or holes in the gc instead of during allocation.

Most of the cost in filling the small arrays with holes is in taking page faults (about 4ms just in page faults on my laptop allocating/filling small arrays on SS --- see bug 580005).  So this will dovetail with work on allocation/GC performance.  I'll be looking this week at the performance curves for allocation/GC vs. JSC and what could/should be done there.
Attached patch crude patch for array GC arena (obsolete) (deleted) — Splinter Review
This adds a crude bump allocator for the dslots of small dense arrays.  On the following microbenchmark:

function foo(N) {
  for (var i = 0; i < N; i++) {
    var a = new Array(4);
    for (var j = 0; j < 4; j++)
      a[j] = 0;
  }
}
foo(1000000);

I get the following times:

JSC: 107ms
TM (old): 403ms
TM (old, maxBytes = 512K): 267ms
TM (new, maxBytes = arena = 512K): 116ms

If I lift the 'new Array' out, JSC is 20ms faster (45ms -> 25ms), so the remaining difference over JSC should be due largely to base array/loop issues.

Next is how to keep most of this gain on actual workloads.
Assignee: general → bhackett1024
Comment on attachment 460412 [details] [diff] [review]
crude patch for array GC arena

> static void
> array_trace(JSTracer *trc, JSObject *obj)
> {

> 
>+    JSContext *cx = trc->context;
>+    uint32 capacity = obj->getDenseArrayCapacity();
>+
>+    /* Relocate the dslots to a malloc heap if they are in the GC pool. */
>+    if (GCValueInPool(cx, obj->dslots)) {
>+        Value *slots = (Value*) cx->malloc((capacity + 1) * sizeof(Value));
>+        memcpy(slots, obj->dslots - 1, (capacity + 1) * sizeof(Value));
>+        obj->dslots = slots + 1;

The assumption here is that the code does not allow for GC to happen
while a local copy of JSObject::getDenseArrayElements is accessed so
there are no extra references to  obj->dslots. A quick grep over the
sources revelaed that with the bug 580803 fixed this is indeed the case.
Before that bug the following line from array_concat would violate the
assumption:

js_NewArrayObject(cx, JS_MIN(length, capacity), aobj->getDenseArrayElements());

Here js_NewArrayObject could trigger a GC with aobj->getDenseArrayElements()
on the stack.

Still this is extremely fragile and IMO we must have a static analyses here.
The analysis should allow to use the result of getDenseArrayElements only if
the GC cannot happen.
Taras: how difficult is to implement an analysis that would verify that between getDenseArrayElements() and the last user of the result a GC cannot happen. That is, the code can only call functions annotated with NO_GC attribute.
We no longer do last ditch GCs. The analysis is not necessary as long we avoid tripping the operation callback from within such critical sections.
(In reply to comment #9)
> We no longer do last ditch GCs. The analysis is not necessary as long we avoid
> tripping the operation callback from within such critical sections.

Look at the usage getDenseArrayElements in jstypedarray.cpp. It requires some efforts to prove that no GC (or arbitrary JS execution) is possible on that code path. Hence I would insist on a static analysis here. I suppose Taras can judge a complexity of the implementation here.
Static analysis will prove nothing here. The code in jstypedarray.cpp is not safe and has to be made safe. A got step would be eliminating the getDenseArrayElements API.
(In reply to comment #11)
> Static analysis will prove nothing here.

It will assert that the GC cannot happen and the usage of getDenseArrayElements is safe. 

> The code in jstypedarray.cpp is not safe and has to be made safe.

What is the difference between the usage of getDenseArrayElements in jstypedarray.cpp and in jsarray.cpp? In both cases one cannot tell 
that the GC is not possible just by looking at the caller of getDenseArrayElements. 

> A got step would be eliminating the getDenseArrayElements API.

That is a nice idea. If the code would only access JSObject::dslots via
indexed operation then the code would be trivially safe.
(In reply to comment #11)
> A got step would be eliminating the getDenseArrayElements API.

Can you do it without eliminating the significant performance gain we got from adding it?
(In reply to comment #13)
> (In reply to comment #11)
> > A got step would be eliminating the getDenseArrayElements API.
> 
> Can you do it without eliminating the significant performance gain we got from
> adding it?

In most cases the usage of getDenseArrayElements could be replaced with a helper that does memmove/memcpy on its own. In other cases there would be a performance cost if the compiler could not optimize obj->dslots[i] in a loop into *ptr++ or something, but those cases AFAICS are not performace critical.

So encapsulating dslots could be an alternative to a static analyses. But the latter is still preferable IMO.
Conservative static analyses are good and can be helpful. bhackett can do static analysis, I hear tell :-P.

Of course if we can get rid of hazardous copying APIs, great. But linearity is even stronger and it can be proven (cf. Cyclone).

/be
(In reply to comment #7)
> The assumption here is that the code does not allow for GC to happen
> while a local copy of JSObject::getDenseArrayElements is accessed so
> there are no extra references to  obj->dslots. A quick grep over the
> sources revelaed that with the bug 580803 fixed this is indeed the case.
> Before that bug the following line from array_concat would violate the
> assumption:
> 
> js_NewArrayObject(cx, JS_MIN(length, capacity), aobj->getDenseArrayElements());
> 
> Here js_NewArrayObject could trigger a GC with aobj->getDenseArrayElements()
> on the stack.
> 
> Still this is extremely fragile and IMO we must have a static analyses here.
> The analysis should allow to use the result of getDenseArrayElements only if
> the GC cannot happen.

Actually, this patch adds a new way to trigger GC, when the pool fills up, so with this patch the js_NewArrayObject above is not safe, nor is a similar one in slice or one from an InitArrayObject in splice.

Safe use of getDenseArrayElements should be possible to check with dehydra: make sure that pointers to the dslots are not live across any function call, except hand-picked ones you know cannot trigger GC (memmove, memcpy, array accessors).  Would be nice to have this, especially if dslots are put in the pool for broader classes of objects.
This is a pretty straight-forward analysis.
Attached patch patch working for SS (deleted) — Splinter Review
This patch fixes the array pool allocation to work for SS, and tests using mlock to amortize page fault cost.  For me, array pool allocation is a 6ms speedup on SS, amortization is 9ms, 14ms when combined.
Attachment #460412 - Attachment is obsolete: true
What happened to this?
OS: Mac OS X → All
This turned into bug 584917, which is fixed now.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: