Open Bug 1432565 Opened 7 years ago Updated 2 years ago

Optimize immutable Map usage

Categories

(Core :: JavaScript Engine, enhancement, P5)

enhancement

Tracking

()

People

(Reporter: sfink, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3])

Attachments

(1 file)

In bug 1432343, it looks to me like we're observing quadratic time from code like:

  function seeNewItem(item) {
     state.itemMap = (new Map(state.itemMap)).set(item.id, item);
  }

Notice that this is O(n²) space. If you see a lot of new times, you are going to hurt. I'll speculate about fixing this in later comments.
Gah. "If you see a lot of new *items*"
The main observation is that this is a versioned append-only Map, and we are unlikely to care about any version but the latest. So an implementation-level fix would be to simply do away with immutability and mutate a single Map.

But maybe you *do* occasionally need older versions. I don't know, but it seems plausible, especially if it's only a couple of them. So the next option would be to make use of the ordered nature of Map: make a Proxy data structure that refers to a single mutable Map but hand out immutable "views" of it, where a view is something like:

  {
    'map': (the mutable Map),
    'size': (the size of the Map after a key was inserted)
  }

Then a lookup would be

  function lookup(m, key) {
    if (m.map.has(key) && m.size <= [...m.map.keys()].indexOf(key))
      return m.map.get(key);
    else
      return undefined;
  }

(probably optimized to eliminate the extra has() check). So lookups on the latest version are just Map lookups, and older versions have to do an expensive linear search through a big temporary. If we were really fired up about this, we would try to spec Map.prototype.getWithIndex or something that returned the matching index in the same lookup. (Which if the Map is append-only, is trivially implementable with pointer arithmetic. And if it's not append-only and we really cared, could be optimized with a rolling count of deleted elements, which probably wouldn't be worth it.)

Alternatively, if you *did* need to do a nontrivial number of non-latest lookups, you could store the indexes along with the values in the Map so algorithmically everything is back to constant time, just with more constant overhead and more memory usage. (Well, if all views are live, it's pretty much the same memory usage, but the assumption is that most of them are dead.)
Now for the crazytalk.

If we wanted to accelerate this internally with no changes to the JS code, we could detect the pattern new Map(oldmap) and according to various rules and the phase of the moon, choose to reuse the underlying storage. So it would be like a copy-on-write setup, but we'd be storing the length of the shared data vector instead of barriering writes. And we'd have to copy when someone messed with Map.prototype or the Map instances or whatever.

I think the existing machinery in OrderedHashTable is probably enough to deal with deletions already, with some small modifications to the internal API to accommodate data sharing instead of just ranges.

It would probably require making the data store its own GC cell. (Or it could be refcounted, but then we'd need a cycle collector.)

Huh. That ended up being less crazy-sounding than I thought.

CC'ing jorendorff, but note that I have no data for how prevalent this pattern might be.
(In reply to Steve Fink [:sfink] [:s:] from comment #3)
> Now for the crazytalk.
> 
> If we wanted to accelerate this internally with no changes to the JS code,
> we could detect the pattern new Map(oldmap) and according to various rules
> and the phase of the moon, choose to reuse the underlying storage. So it
> would be like a copy-on-write setup, but we'd be storing the length of the
> shared data vector instead of barriering writes. And we'd have to copy when
> someone messed with Map.prototype or the Map instances or whatever.

There are some crazy faux-immutable data structures that can do this kind of thing, but they have some performance costs.  Using the example of a Map.  You implement the "newest" version as a hash table, and the older versions as a log of history.

   old_map = ..some expression giving a map..

old_map currently points to a HashTable.

   new_map = insert(old_map, k, v);

new_map now points to the hash table, and old_map points to 'k', k's previous value and new_map.  So that if someone reads k from old_map, the old value can be returned, and if they read anything else, the pointer to new_map is followed and the read is done there.  This can create a fair amount of memory churn, and is probably not the right answer for this situation (but maybe it'll inspire some other idea).  But it can be better than an immutable tree-based data structure which will allocate O(log N) cells for each insert, compared to O(1) in this case.
Severity: normal → enhancement
Priority: -- → P5
Attached patch bug1432565-PoC.patch (deleted) — Splinter Review
(In reply to Steve Fink [:sfink] [:s:] from comment #3)
> If we wanted to accelerate this internally with no changes to the JS code,
> we could detect the pattern new Map(oldmap) and according to various rules
> and the phase of the moon, [...].

Even just detecting |new Map(oldmap)| and then resizing the hash-table to avoid extra malloc churn could improve performance by 25-50%. Attached is a proof of concept patch (probably still contains some bugs and also bloats JSCompartment a bit) if someone wants to look into it.

Performance improvements for Map and Set copies with the attached patch:
---
// size =  0:  120ms ->  90ms
// size =  1:  130ms ->  95ms
// size =  2:  145ms -> 100ms
// size =  4:  170ms -> 110ms
// size =  8:  280ms -> 175ms
// size = 16:  450ms -> 240ms
// size = 32:  785ms -> 390ms
// size = 64: 1350ms -> 660ms
function testMapCopy() {
    var a = Array(64).fill(0);
    var m = new Map(a.entries());
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 100000; ++i) {
        q += new Map(m).size;
    }
    return [dateNow() - t, q];
}

// size =  0:  120ms -> 85ms
// size =  1:  130ms -> 90ms
// size =  2:  140ms -> 95ms
// size =  4:  155ms -> 100ms
// size =  8:  240ms -> 155ms
// size = 16:  375ms -> 205ms
// size = 32:  680ms -> 340ms
// size = 64: 1170ms -> 560ms
function testSetCopy() {
    var a = Array(64).fill(0).map((v, k) => k);
    var m = new Set(a);
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 100000; ++i) {
        q += new Set(m).size;
    }
    return [dateNow() - t, q];
}
---


Another thing worth noting is that Map/Set allocation is still slow when compared to other VMs:
---
// SpiderMonkey: 700ms
// Chakra: 135ms
// JSC: 80ms
// V8: 60ms
function testSetAlloc() {
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 1000000; ++i) {
        q += new Set().size;
    }
    return [dateNow() - t, q];
}
---

Stumbled across this old bug and decided to CC Iain in case it is relevant for ReactDOM stuff. Plus it's some great work by anba here that it seemed worth putting on the radar of someone who might conceivably do something with it. (It's heavily bitrotted now, but the idea of it.)

Severity: normal → S3
Whiteboard: [sp3]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: