Closed Bug 679940 Opened 13 years ago Closed 12 years ago

split JSScript into a per-compartment portion and a shareable runtime-global portion

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: luke, Assigned: till)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(2 files, 4 obsolete files)

With COMPILE_N_GO removed (see blocking bug), the bytecode, source notes, and many words of JSScript can be shared between compartments. This can be useful for many reasons: - the shareable portion can be cached by necko (see dependent bug) - compartment per global wouldn't lose the memory savings of "brutal sharing" "Brutal sharing" is the JSAPI use of JS_CloneFunctionObject to avoid re-compiling source when creating new documents (e.g., for a tab/window). This only saves memory because chrome JS is in the same compartment; with compartment-per-global, JS_CloneFunctionObject would have to clone the underlying script. To see how much memory brutal sharing saves in practice, I instrumented a browser. Right after startup, brutal sharing saves ~300KB. Per new tab the savings is ~10K. Per new window the savings is ~1MB. So really only new windows are killer, but perhaps most people only use a single window? Test Pilot should be able to answer this.
Blocks: 679942
(In reply to Luke Wagner [:luke] from comment #0) > To see how much memory brutal sharing saves in practice, I instrumented a > browser. Right after startup, brutal sharing saves ~300KB. Per new tab the > savings is ~10K. Per new window the savings is ~1MB. So really only new > windows are killer, but perhaps most people only use a single window? Test > Pilot should be able to answer this. What does this look like on Android?
(In reply to David Mandelin from comment #1) Still trying to get this information; Android makes getting log spew difficult.
Attached patch measurement patch (deleted) — Splinter Review
Ah, printf_stderr. On mobile browser startup, the cross-global sharing is about 100-200KB on both the chrome and content process. Browsing around (thereby, I guess, exercising more chrome JS) causes this to rise to 500KB per process. Opening a bunch of tabs (8) brings it up to 700KB. Even after closing everything, there is still 500KB shared per tab. So this is a good bit more than desktop (which makes sense, since more of Fennec is implemented in JS). Measurement patch attached.
(In reply to Luke Wagner [:luke] from comment #3) > Created attachment 555593 [details] [diff] [review] > measurement patch > > Ah, printf_stderr. On mobile browser startup, the cross-global sharing is > about 100-200KB on both the chrome and content process. Browsing around > (thereby, I guess, exercising more chrome JS) causes this to rise to 500KB > per process. Opening a bunch of tabs (8) brings it up to 700KB. Even after > closing everything, there is still 500KB shared per tab. > > So this is a good bit more than desktop (which makes sense, since more of > Fennec is implemented in JS). Measurement patch attached. Thanks. That's not huge, I guess, but it does seem like a pretty significant chunk for those devices and definitely worth keeping the savings.
Attached patch Share bytecode and source notes (obsolete) (deleted) — Splinter Review
After some quick measurements showed that ~60% of all bytecode in a short browsing session were identical to previously stored ones, I took a shot at splitting the bytecodes and source notes off of the JSScripts and storing them uniquely per runtime. The attached patch does just that, similar to how filenames are handled in JSScripts. Ideally, we wouldn't even create the bytecodes in the first place, but then we'd have to deal with finding a lookup key that's guaranteed to be available. That'd create all sorts of complications with sourceless and nested functions. Still, this seems to work well and doesn't crash my browser. Try-servering here: https://tbpl.mozilla.org/?tree=Try&rev=58e3c9b12126 I didn't take any extensive measurements (and didn't yet adapt the memory reporters), but as one datapoint, arstechnica.com creates script-data arrays of ~1.3Mb in total size per compartment. With the patch, this reduces to ~0.4Mb, with the rest being shared among all compartments of that site. As the cache isn't domain-specific, I'm hopeful that we'll achieve considerable savings for common scripts such as jquery.
Attachment #707852 - Flags: review?(luke)
Attachment #707852 - Flags: review?(jwalden+bmo)
Whiteboard: [MemShrink]
bhackett seems to have just started working on lazy bytecode generation (bug 678037). It would probably make sense for you two to coordinate; perhaps you could give him my review?
Comment on attachment 707852 [details] [diff] [review] Share bytecode and source notes (In reply to Luke Wagner [:luke] from comment #6) > bhackett seems to have just started working on lazy bytecode generation (bug > 678037). It would probably make sense for you two to coordinate; perhaps > you could give him my review? That makes sense, yes. @bhackett: I don't think that this will immediately affect you too much, but if it does, I'm more than happy to make whatever changes you'd need to make this work better with lazy bytecode.
Attachment #707852 - Flags: review?(luke) → review?(bhackett1024)
Till, do you know how much of the code that is common to multiple compartments is ever run? (script->useCount ever becomes non-zero). It'd be nice to have a sense for whether this complements lazy bytecode generation (many scripts that run are identical between compartments) or is redundant with it (many scripts that are identical between compartments never run).
Comment on attachment 707852 [details] [diff] [review] Share bytecode and source notes Review of attachment 707852 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, but I'd like to see an updated version of the patch. Particularly interested in whether the memcmp bug affects the experiments you did in comment 5. ::: js/src/jsscript.cpp @@ +1504,5 @@ > +jsbytecode * > +js::SaveScriptBytecode(JSContext *cx, const jsbytecode *bytecode, uint32_t length) > +{ > + if (!bytecode) > + return NULL; I don't see any callers that might pass NULL, can you assert bytecode is non-NULL? @@ +1512,5 @@ > + > + ScriptBytecodeTable::AddPtr p = rt->scriptBytecodeTable.lookupForAdd(l); > + if (!p) { > + size_t size = offsetof(ScriptBytecodeEntry, bytecode) + length; > + ScriptBytecodeEntry *entry = (ScriptBytecodeEntry *)cx->malloc_(size); I think this would be neater and more efficient if SaveScriptBytecode took ownership of its bytecode parameter, storing it directly in the table when adding a new entry or freeing it when the bytecode is a duplicate. This would mean that ScriptBytecodeEntry would not be heap allocated (HashSet<ScriptBytecodeEntry> rather than HashSet<ScriptBytecodeEntry*>) or embed the bytecode in its structure. @@ +1532,5 @@ > + /* > + * During the IGC we need to ensure that bytecode is marked whenever it is > + * accessed even if the bytecode was already in the table. At this point > + * old scripts or exceptions pointing to the bytecode may no longer be > + * reachable. Might want to mention this is a form of read barrier here. @@ +1843,5 @@ > return false; > } > + > + script->length = prologLength + mainLength; > + uint32_t codeLength = sizeof(jsbytecode) * script->length + nsrcnotes * sizeof(jssrcnote); Can you remove the sizeofs here? They aren't used in XDRScript and are a bit confusing. ::: js/src/jsscript.h @@ +1269,5 @@ > + static HashNumber hash(const Lookup &l) { return mozilla::HashBytes(l.code, l.length); } > + static bool match(const ScriptBytecodeEntry *entry, const Lookup &lookup) { > + if (entry->length != lookup.length) > + return false; > + return memcmp(entry->bytecode, lookup.code, lookup.length); This should be !memcmp, memcmp returns zero when the memmory blocks are equal.
Attachment #707852 - Flags: review?(bhackett1024)
Comment on attachment 707852 [details] [diff] [review] Share bytecode and source notes Review of attachment 707852 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsscript.cpp @@ +164,5 @@ > /* The clone has the same bindingArray_ offset as 'src'. */ > Bindings &src = srcScript->bindings; > ptrdiff_t off = (uint8_t *)src.bindingArray() - srcScript->data; > JS_ASSERT(off >= 0); > + JS_ASSERT(off <= (srcScript->dataSize)); Less parens?
Addresses comments and adds a memory reporter for the runtime-shared bytecodes. (In reply to Brian Hackett (:bhackett) from comment #9) > I think this would be neater and more efficient if SaveScriptBytecode took > ownership of its bytecode parameter, storing it directly in the table when > adding a new entry or freeing it when the bytecode is a duplicate. This > would mean that ScriptBytecodeEntry would not be heap allocated > (HashSet<ScriptBytecodeEntry> rather than HashSet<ScriptBytecodeEntry*>) or > embed the bytecode in its structure. I did something similar: Instead of passing the bytecode, callers have to malloc instances of ScriptBytecodeEntry which SaveScriptBytecode then takes ownership of. Keeping the bytecode embedded in the structure itself is required to make ScriptBytecodeEntry::fromBytecode work. > This should be !memcmp, memcmp returns zero when the memmory blocks are > equal. Urgh, how embarrasing. That used to be (and is again, now) PodEquals, which returns true on equality, and I didn't refactor it properly. Luckily, this error didn't affect the numbers I mentioned before: I only compared the sizes of the script-data fields, whose decrease was correctly reflected. The error *did* lead to no memory actually being saved, as it caused new entries to be created for each bytecode, which is fixed now. As for numbers: Our Tp tests apparently don't really contain any scenarios where identical scripts are used in many different compartments, so the automated tests don't show much of an improvement. Some manual testing did, however, show that, on average, the script-data measure sinks to about 1/5 of its previous value, so for each tab of the same site, that's open in parallel, we save about 4/5 of that data's size. The new reporter for the runtime's script-bytecodes confirms that. Oh, and addons that inject scripts into each site are much less expensive memory-wise, now, too. Try-servering here: https://tbpl.mozilla.org/?tree=Try&rev=039c404582c5
Attachment #709504 - Flags: review?(bhackett1024)
Attachment #707852 - Attachment is obsolete: true
Attachment #707852 - Flags: review?(jwalden+bmo)
Try run for 039c404582c5 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=039c404582c5 Results (out of 330 total builds): exception: 1 success: 289 warnings: 37 failure: 3 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tschneidereit@gmail.com-039c404582c5
The test failures are caused by the patches for bug 837416 that were in Inbound when I last rebased. They have been backed out since in rev e2cb5c5978fd. Looking good otherwise.
> As for numbers: Our Tp tests apparently don't really contain any scenarios > where identical scripts are used in many different compartments, so the > automated tests don't show much of an improvement. Yeah, Talos is crap for testing changes that affect memory consumption. areweslimyet.com is much better for that. johns has even implemented a way for you to do run a try build on the AWSY benchmarks. Unfortunately he hasn't yet written any docs on how to do so, AFAIK :( Maybe this will provide the necessary motivation...
Oh, and I forgot to mention: This new version includes changes to the propery cache to include the JSScript into the hash in addition to the pc and the shape. The hash function might not be ideal, so I'd be happy about suggestions for getting a better distribution there.
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment on attachment 709504 [details] [diff] [review] Share bytecode and source notes for functions in a runtime wherever possible, v2 Review of attachment 709504 [details] [diff] [review]: ----------------------------------------------------------------- Canceling review, talked with Till on IRC and it'd be nice to make the atoms vector part of the shared portion, which should improve memory usage further and avoid the need for changes to the property cache.
Attachment #709504 - Flags: review?(bhackett1024)
Adds the atoms list to the shared things. In light of this, I have renamed the struct to SharedScriptData and fixed all related names accordingly. Alignment requirements make storing the atoms at the end of SharedScriptData's `data` field a bit annoying, but I think the encapsulation I added makes it not so bad. The field `JSScript::dataLength` annoys me, so I'll post a follow-up patch removing the need for it by making it possible to derive its value from the `bindings` class.
Attachment #712960 - Flags: review?(bhackett1024)
Attachment #709504 - Attachment is obsolete: true
Comment on attachment 712960 [details] [diff] [review] Share bytecode, source notes and atoms of functions in a runtime wherever possible, v3 Review of attachment 712960 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsscript.cpp @@ +1547,5 @@ > + if (IsIncrementalGCInProgress(rt) && rt->gcIsFull) > + ssd->marked = true; > +#endif > + > + return ssd->data; Callers will set script->code to this returned pointer but will not modify script->atoms. It seems like that will leave script->atoms as a dangling pointer in the case where an existing match was found. Can you change the interface of this function so that it takes the JSScript*, modifies the ->code and ->atoms fields and just returns a bool? @@ +1822,5 @@ > + jsbytecode *code = ssd->data; > + code[0] = JSOP_STOP; > + code[1] = SRC_NULL; > + script->length = 1; > + script->code = SaveSharedScriptData(cx, ssd); Needs a null check.
Attachment #712960 - Flags: review?(bhackett1024)
Now with proper NULL checking and stuff
Attachment #713007 - Flags: review?(bhackett1024)
Attachment #712960 - Attachment is obsolete: true
Comment on attachment 713007 [details] [diff] [review] Share bytecode, source notes and atoms of functions in a runtime wherever possible, v4 Review of attachment 713007 [details] [diff] [review]: ----------------------------------------------------------------- Nice! ::: js/src/jsscript.h @@ +1250,5 @@ > + uint32_t length; > + jsbytecode data[1]; > + > + static SharedScriptData *withDataLengths(JSContext *cx, uint32_t codeLength, > + uint32_t srcnotesLength, uint32_t natoms); Missed this earlier, but this method's name should just be 'new_'. It's not obvious from the name what it's doing.
Attachment #713007 - Flags: review?(bhackett1024) → review+
Thanks for the quick turnaround on the reviews! I pushed to try once more for good measure and will land once that returns green. http://tbpl.mozilla.org/?tree=Try&rev=993740cfecbf
> I pushed to try once more for good measure and will land once that returns > green. Before you do that, some measurements from about:memory would be nice. Some ideas: - measure at start-up - measure after loading techcrunch.com - measure after running http://gregor-wagner.com/tmp/mem50 (wait until the dialog box pops up saying that it has finished)
Try run for 993740cfecbf is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=993740cfecbf Results (out of 222 total builds): exception: 23 success: 25 warnings: 5 failure: 169 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tschneidereit@gmail.com-993740cfecbf
(In reply to Nicholas Nethercote [:njn] from comment #22) > > I pushed to try once more for good measure and will land once that returns > > green. > > Before you do that, some measurements from about:memory would be nice. That makes sense, yes. The try run turned up a crash I have to debug, so I'll do that and do some memory testing before landing.
Final patch (if the try run at http://tbpl.mozilla.org/?tree=Try&rev=186dba25515c doesn't go awry). Contains two fixes: one for the crasher in the last try run, the other for not initializing the padding before the atoms list and thus severely reducing cache hits.
Attachment #713007 - Attachment is obsolete: true
Comment on attachment 713349 [details] [diff] [review] Share bytecode, source notes and atoms of functions in a runtime wherever possible, v5 Thanks for making me do the measurements, nnethercote: they were suspiciously bad, so I investigated some and found the bug mentioned in the previous comment. With that fixed, I got the following stats (all values are for the `script-data` fields): Bug List + 10 Bugs + about:memory; with BugzillaJS installed nightly: 14.8MB patched: 05.2MB (1.5MB non-shared + 3.70MB shared) Startup directly to about:memory; with BugzillaJS installed nightly: 04.63MB patched: 02.98MB (0.35MB non-shared + 2.63MB shared) Startup directly to http://techcrunch.com/2011/07/07/facebook-now-lets-app-developers-see-their-spam-scores/ and about:memory, reload the latter; with BugzillaJS installed nightly: 09.27MB patched: 07.86MB (1.03MB non-shared + 6.83MB shared) Startup directly to techcrunch main page + 3 techcrunch stories and about:memory, reload the latter; with BugzillaJS installed nightly: 22.94MB patched: 11.75MB (2.51MB non-shared + 9.24MB shared) http://gregor-wagner.com/tmp/mem50 nightly: 70.11MB, after closing tabs and reducing memory: 55.80MB patched: 55.46MB (8.40MB non-shared + 47.06MB shared), after closing and reducing: 47.16MB (6.23MB non-shared + 40.93MB shared) All in all, I think these are some very nice savings, especially for those scenarios where multiple tabs with content containing the same scripts are open. The fact that mem50 also uses quite a bit less memory demonstrates nicely that the sharing works across domain-boundaries, too.
Attachment #713349 - Flags: review+
> All in all, I think these are some very nice savings Very much so! Fantastic work. One question: did you confirm that "explicit" went down by similar amounts to "script-data"? I just want to ensure that the memory saved in "script-data" didn't accidentally end up somewhere else :)
Blocks: 841132
(In reply to Nicholas Nethercote [:njn] from comment #28) > > All in all, I think these are some very nice savings > > Very much so! Fantastic work. Thanks! :) > > One question: did you confirm that "explicit" went down by similar amounts > to "script-data"? I just want to ensure that the memory saved in > "script-data" didn't accidentally end up somewhere else :) Yes, I did, and yes, it did. There really wasn't anywhere for it to go to, though ;)
Assignee: general → tschneidereit
Target Milestone: --- → mozilla21
(for future reference: you don't have to set Target Milestone when landing on inbound nowadays -- the sheriff merge tool sets this automagically when the cset is merged to central & the bug is resolved.)
I did some measurements too. start-up: - 3.58 --> 3.04 (2.68 + 0.36) == -0.54 MiB techcrunch.com - 7.03 --> 6.25 (5.49 + 0.76) == -0.78 MiB gmail.com - 10.87 --> 10.89 (9.99 + 0.90) = no change gmail.com x 3 - 22.89 --> 12.20 (10.20 + 2.00) = -10.79 MiB 10 articles from theage.com.au - 27.33 --> 15.63 (8.56 + 7.07) = -11.60 MiB mem50, just after finishing: - 69.51 --> 54.67 (46.02 + 8.65) = -14.84 MiB mem50, after closing all windows and minimizing memory usage *three* times: - 3.65 --> 3.05 (2.70 + 0.35) = -0.60 MiB These all roughly match yours, which is good. But there was no noticeable drop on AWSY, which surprised me. Looking closely, I see there is a small improvement (this is from the green line): - 7.51 --> 6.38 (5.58 + 0.80) = -1.13 MiB AWSY is sufficiently noisy that 1.13 MiB isn't enough to be noticeable. So that got me wondering why AWSY has only 6--7 MiB of script-data when it finishes and 30 tabs are open, but mem50 has 55--70 MiB when it finishes and 50 tabs are open. I talked to johns, and it turns out that AWSY's pageset (TP5) contains somewhat old snapshots of sites, and more importantly, that anything that causes external network accesses was deleted(!) So this means no Facebook/Twitter/GoogleAds/Reddit/etc widgets are loaded. I hadn't realized that.
In case that wasn't clear: AWSY showed a small drop, one which is swamped by its noise. But AWSY isn't a very good test of this change, because it greatly underplays the amount of JS code that real sites use. The other measurements I did all looked good.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 897507
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: