Closed Bug 630456 Opened 14 years ago Closed 14 years ago

Convert objects to dictionary mode less aggressively

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(4 files, 1 obsolete file)

MAX_HEIGHT is currently 64. In some very preliminary measurements I increased it to 128 with negligible change in speed, but a ~15% reduction in the number of Shapes allocated, which translates to a few MB in moderately large workloads. I plan to do some more rigorous measurements.
A suggestion from Shaver: Could you instrument to print out script and lineno when our height goes from 64 to 65? would be interesting to know what code patterns cause it in normal browsing.
Even better, let's histogram to count the population of objects in various obj->propertyCount() bins. See jsutil.h's JSBasicStats. I should have added this ifdef DEBUG and conditioned by a one-time-getenv, a la other such instrumentation. /be
I started by measuring the number of Shapes allocated and recycled for SS and V8 with some different values of MAX_HEIGHT: - SS, MAX_HEIGHT=32: 21723 new, 0 recycled - SS, MAX_HEIGHT=64: 22122 new, 0 recycled - SS, MAX_HEIGHT=128: 20688 new, 0 recycled - V8, MAX_HEIGHT=32: 6032 new, 0 recycled - V8, MAX_HEIGHT=64: 6132 new, 0 recycled - V8, MAX_HEIGHT=128: 5934 new, 0 recycled First point: shape recycling isn't working. What's up with that? Second point: MAX_HEIGHT=64 results in more Shapes that 32 or 128 for both SS and V8. I don't claim to understand that. I thought the idea was that when converting to dictionary mode a bunch of extra Shapes are made, so the number of Shapes allocated should increase monotonically as MAX_HEIGHT decreases. > Even better, let's histogram to count the population of objects in various > obj->propertyCount() bins. Sounds good, but I don't know where/when to count the populations. If JSObject had a destructor that would be the obvious place; is there something similar that indicates when an object's lifetime ends?
(In reply to comment #3) > I started by measuring the number of Shapes allocated and recycled for SS and > V8 with some different values of MAX_HEIGHT: > > - SS, MAX_HEIGHT=32: 21723 new, 0 recycled > - SS, MAX_HEIGHT=64: 22122 new, 0 recycled > - SS, MAX_HEIGHT=128: 20688 new, 0 recycled > > - V8, MAX_HEIGHT=32: 6032 new, 0 recycled > - V8, MAX_HEIGHT=64: 6132 new, 0 recycled > - V8, MAX_HEIGHT=128: 5934 new, 0 recycled Need to know dictionary shapes (see js::Shape::newDictionaryShape) vs. total. > First point: shape recycling isn't working. What's up with that? You have to collect some but not all shapes for the ones that were GC'ed to be on the freelist. How many GC runs were there? Perhaps the GCs (if any) collected all shapes. Try a browser session, you'll see shape recycling. > Second point: MAX_HEIGHT=64 results in more Shapes that 32 or 128 for both SS > and V8. I don't claim to understand that. I thought the idea was that when > converting to dictionary mode a bunch of extra Shapes are made, so the number > of Shapes allocated should increase monotonically as MAX_HEIGHT decreases. A dictionary-mode object owns its shapes and they are mutable, so more dicts makes for fewer shapes than if they're shared in the empty-shape-rooted forest. But sharing can win too, if there's no in-situ mutation, and good re-use of common shape lineages. Dynamics! I'm taking your numbers as gospel here, since the patch must be simple. Want to post it (with the dict-shapes accounted), though, for sanity-check review? > > Even better, let's histogram to count the population of objects in various > > obj->propertyCount() bins. > > Sounds good, but I don't know where/when to count the populations. If JSObject > had a destructor that would be the obvious place; is there something similar > that indicates when an object's lifetime ends? Sure, look for thing->finalize(cx) calls (one call site, but it's in a template) in jsgc.cpp. /be
Attached patch instrumentation patch (deleted) — Splinter Review
For sanity-check.
Attachment #508929 - Flags: feedback?(brendan)
That's too simple to go wrong. With a browser, though, you'd want not to have to grep stdout or stderr, right? /be
$3 = { alloc = 161665, oom = 0, recycle = 10514, dprop = 64965, dprop4add = 0 } is the third js_GC pre-order sample of atomic-incremented meters I just threw together. Some recycling, and dictionary shapes (dprops) account 40% of all shapes. /be
Attached patch quick-and-dirty SHAPE_METER patch (obsolete) (deleted) — Splinter Review
Could clean this up and add it to one-time-getenv-triggered DEBUG only dumping. /be
Comment 7 should have said for a browser session with gmail and news.google.com open, and third js_GC bp hit was after when loading news. Odd that so few js_GC calls are hit -- I barely recognize the old jsgc.cpp code any longer. Yay for compartment GC, I think. /be
(In reply to comment #6) > That's too simple to go wrong. With a browser, though, you'd want not to have > to grep stdout or stderr, right? I just pipe stderr into a script I have that counts how many times each unique line was printed. Works fine with the browser, I do this all the time.
Ok, I like pipes -- good to know. I am generally leery of adding to the sewage in stdout (or is it stderr)? False positive sewage matching, ewww. /be
Opt build don't have sewage problems :) And my script output makes it easy to sort the sewage from the gold.
I shut down and the final meters were $14 = { alloc = 161665, oom = 0, recycle = 105903, dprop = 116662, dprop4add = 0 } /be
Opening techcrunch.com: MAX_HEIGHT=32: 286898 new 147671 dprop 15068 recycle MAX_HEIGHT=64: 280959 new 115006 dprop 23308 recycle MAX_HEIGHT=128: 238852 new 53095 dprop 18784 recycle MAX_HEIGHT=192: 243166 new 51992 dprop 20038 recycle "new" is the most important number, that's how many shapes we allocated. With MAX_HEIGHT=128 we get a ~15% drop. If we have 20MB of Shapes (as in http://blog.mozilla.com/nnethercote/2011/02/01/memory-profiling-firefox-on-techcrunch/) that'll give us a ~3MB reduction. MAX_HEIGHT=192 gave similar results, so 128 seems a better choice as it's a smaller change from the status quo. "dprop" is a good sanity check -- it drops as MAX_HEIGHT increases, ie. we convert to dictionary mode less often. "recycle" doesn't change much; I think it's more a function of how many GCs have happened. So, MAX_HEIGHT=128 seems a good choice for minimizing memory usage. What's the speed impact? For Sunspider and V8 it's negligible: --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 68.331 68.369 (0.999x) | 44.762 44.761 (------) | 3d-cube | 40.232 40.233 (------) | 25.189 25.189 (------) | 3d-morph | 63.756 63.785 (------) | 37.220 37.220 (------) | 3d-raytrace | 25.283 25.288 (------) | 11.101 11.100 (------) | access-binary- | 91.814 91.816 (------) | 85.809 85.809 (------) | access-fannkuc | 29.085 29.082 (------) | 16.514 16.514 (------) | access-nbody | 36.156 36.144 (------) | 29.335 29.335 (------) | access-nsieve | 7.373 7.373 (------) | 3.253 3.253 (------) | bitops-3bit-bi | 36.509 36.509 (------) | 31.742 31.742 (------) | bitops-bits-in | 15.818 15.818 (------) | 12.018 12.018 (------) | bitops-bitwise | 38.001 37.986 (------) | 32.968 32.967 (------) | bitops-nsieve- | 16.960 16.953 (------) | 12.874 12.874 (------) | controlflow-re | 23.123 23.169 (0.998x) | 11.792 11.792 (------) | crypto-md5 | 19.215 19.245 (0.998x) | 8.495 8.495 (------) | crypto-sha1 | 68.846 68.860 (------) | 21.930 21.930 (------) | date-format-to | 51.907 51.915 (------) | 9.617 9.617 (------) | date-format-xp | 41.597 41.590 (------) | 27.329 27.329 (------) | math-cordic | 22.733 22.745 (0.999x) | 6.305 6.305 (------) | math-partial-s | 31.510 31.514 (------) | 26.589 26.588 (------) | math-spectral- | 47.090 47.109 (------) | 34.563 34.563 (------) | regexp-dna | 25.927 25.940 (------) | 9.252 9.252 (------) | string-base64 | 60.807 60.820 (------) | 32.705 32.705 (------) | string-fasta | 95.116 95.152 (------) | 17.269 17.269 (------) | string-tagclou | 98.217 99.021 (0.992x) | 12.626 12.628 (------) | string-unpack- | 38.620 38.643 (0.999x) | 8.976 8.976 (------) | string-validat ------- | 1094.037 1095.088 (0.999x) | 570.242 570.242 (------) | all --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 1207.737 1195.445 (1.010x) | 1035.899 1035.769 (------) | v8-crypto | 1144.137 1144.135 (------) | 842.543 842.543 (------) | v8-deltablue | 889.240 889.344 (------) | 448.816 448.817 (------) | v8-earley-boye | 503.378 503.386 (------) | 237.960 237.960 (------) | v8-raytrace | 870.299 870.346 (------) | 371.779 371.779 (------) | v8-regexp | 1143.380 1143.400 (------) | 1115.519 1115.519 (------) | v8-richards | 430.371 429.986 (1.001x) | 101.115 101.088 (------) | v8-splay ------- | 6188.544 6176.046 (1.002x) | 4153.634 4153.477 (------) | all The v8-crypto change is just noise. Measuring browser changes seems more difficult.
Changing MAX_HEIGHT should not affect performance much. It's a "reasonable" limit on growth of the shape tree, the lexicographic tree of all properties (the alphabet is almost everything identifying that property), since that tree is sparse and has the potential for very long linear subtrees. So we need to clamp down on the worst case, but not defeat the win of sharing shapes. So 128 seems just fine. Browser numbers would be good. And histogrammed obj->propertyCount() bins counting populations at finalization would be great. /be
(In reply to comment #13) > > $14 = { > alloc = 161665, > oom = 0, > recycle = 105903, > dprop = 116662, > dprop4add = 0 > } dprop4add is 0 because newDictionaryShapeForAdd() is dead. I'll remove it in the patch I write.
Cc'ing jorendorff -- did the SetPropHit-ectomy eliminate dprop4add? /be
Attached patch histogram instrumentation patch (deleted) — Splinter Review
Results from running a bunch of sites (techcrunch.com, google.com, theage.com.au, etc, etc): mean property counts 1.3802, std. deviation 7.66435, max 1054 [ 0]: 253252 ****************** [ 1]: 22897 *************** [ 2]: 21902 *************** [ 3]: 8488 ************** [ 4]: 6193 ************* [ 5]: 3368 ************ [ 6]: 4814 ************* [ 7]: 3036 ************ [ 8]: 3200 ************ [ 9]: 523 ********** [ 10, +inf]: 8075 ************* Seems like a lot of empty objects.
Attached patch patch (deleted) — Splinter Review
Attachment #509626 - Flags: review?(brendan)
Attachment #508929 - Flags: feedback?(brendan)
> Seems like a lot of empty objects. Your typical Function and typical DOM object will be empty (since all the interesting stuff is on the prototype), right?
(In reply to comment #20) > > Seems like a lot of empty objects. > > Your typical Function and typical DOM object will be empty (since all the > interesting stuff is on the prototype), right? Exactly. The original design for JS was a lot like Harmony proxies but using the prototype as the handler: empty objects fronting for various peer C++ (C in the old Netscape days) data structures referenced by each object's private slot. This is more efficient than copying everything from one space to the other and back. /be
(In reply to comment #20) > > Seems like a lot of empty objects. > > Your typical Function and typical DOM object will be empty (since all the > interesting stuff is on the prototype), right? (on the prototype and in the case of Function instances, in the private data structure, JSFunction, and hanging off it.) /be
Comment on attachment 509626 [details] [diff] [review] patch Seems like a win for Firefox 4, low risk (just a cutoff on exponential worst-case growth, still low). Nick, stop me if you disagree. /be
Attachment #509626 - Flags: review?(brendan)
Attachment #509626 - Flags: review+
Attachment #509626 - Flags: approval2.0?
Comment on attachment 509625 [details] [diff] [review] histogram instrumentation patch This could be made landable, but boy did you recompile the world! Instead of touching jsapi.h, a single extern in JSObject::finish would suffice. And all #ifdef DEBUG, with the JS_DumpBasicStats conditioned on a getenv naming the file? Or not, it's easy to recreate or find and rebase this patch. We do take DEBUG-only but maintained metering, though, so I wanted to raise the option. /be
(In reply to comment #24) > > This could be made landable, but boy did you recompile the world! Instead of > touching jsapi.h, a single extern in JSObject::finish would suffice. And all > #ifdef DEBUG, with the JS_DumpBasicStats conditioned on a getenv naming the > file? Sure, I was doing it as quick-and-dirty as possible :)
Nick, can you get the patch landed? I bet a lot of shapes come from (perhaps one or two) large slow arrays. We should not be using shapes for slow array elements, esp. if the array is dense but not quite dense enough to avoid being slow-ified. Can you measure? /be
(In reply to comment #26) > Nick, can you get the patch landed? Don't I have to wait for 2.0 approval?
Comment on attachment 509626 [details] [diff] [review] patch Approval, schmaproval!
Attachment #509626 - Flags: approval2.0? → approval2.0+
From a run on perfect.js, which indexes a plain object, at JS_ShutDown: $1 = { alloc = 909, oom = 0, recycle = 0, dprop = 540, aprop = 0, iprop = 501 } Rebuilding Minefield now after updating and reconfiguring. /be
Attachment #508958 - Attachment is obsolete: true
This regressed js1_8_5/regress/regress-595365-2.js which contains the following comment: /* * NB: this test hardcodes the value of PropertyTree::MAX_HEIGHT. */ (The failing test hasn't upset the tinderboxen because it uses shapeOf and is therefore shell-only.)
(In reply to comment #31) > This regressed js1_8_5/regress/regress-595365-2.js Fixed. Sorry for the breakage.
I'm going to pull a Brendan: Brendan, can you file a new bug for the indexed prop measurements and details on how those Shapes can be optimized? This bug is done. Thanks.
Is that a "Brendan"? I know what a "gal" is. Anyway, totally agree -- was gonna file separately. I'm still working on better instrumentation. My own browsing did not find a huge slow array or two causing lots of shapes. More in the new bug. /be
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: