Closed
Bug 614155
Opened 14 years ago
Closed 14 years ago
TM: lazily construct toSource cache
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | - |
People
(Reporter: cdleary, Assigned: cdleary)
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
luke
:
review+
shaver
:
approval2.0-
|
Details | Diff | Splinter Review |
It makes sense to make a small cache for regexp source-to-object mappings -- JSC does it, and bz says it'll pay off in some benchmark. It seems to have applicability to real-world-code in some unoptimized prototype.js snippet.
jseward reported a BBC regexp recompilation bug that could also be solved by such a cache, whose number I can't find at the moment.
Comment 1•14 years ago
|
||
The benchmark in question, fwiw, is dromaeo. The "DOM Attributes (Prototype)" test we're a lot slower than V8/jsc/Carakan on. On the subpart I profiled (hasClassName), 60% of our time is compiling the regexp it pounds on inside prototype.js...
Blocks: domperf
blocking2.0: --- → ?
Comment 2•14 years ago
|
||
Yeah, we should really fix this for 4. Seems like a pretty easy fix, too. This will help benchmark scores disproportionally if we cache the regexps per thread and hold them across runs.
Comment 4•14 years ago
|
||
Updated•14 years ago
|
blocking2.0: ? → -
Assignee | ||
Comment 5•14 years ago
|
||
Did this late at night while waiting for try results. This makes us as fast as V8 on a simple microbenchmark.
var start = new Date();
for (var i = 0; i < 1000000; ++i)
new RegExp('foo');
print(new Date() - start);
The purge on the HashTable isn't quite right... despite all the tests in the test-suite passing I wind up in an infinite probe after purging in the sunspider harness. I also have that purge-realloc-fallibility issue mentioned in a comment.
It might still be worth it to just make a fixed-sized LRU circular-buffer array with JSAtom pointer comparisons and avoid all this headache. Would just have to figure out how to pick the right number for the size of it -- the flexibility over different workloads was the reason I went with the HashMap starting out, but the performance characteristics of the fixed-size cache will probably win out in the end anyway.
Attachment #505744 -
Flags: feedback?(lw)
Shrinking realloc can fail, at least on one of the *BSD family. I know, right?
sorted vector and binary search probably wins fine for you, and the cost of the cache shuffling for sorted insertion will be utterly dominated by all the real work we do compiling a regexp anyway.
Comment 7•14 years ago
|
||
Comment on attachment 505744 [details] [diff] [review]
Hash table as cache.
Cool, makes sense! f+, although I didn't look really close for bugs.
Comments:
- I guess HashTable::clear() isn't good enough b/c it leaves the table around which might get pretty big... Instead of adding a HashTable::purge(), perhaps you could use a js::LazilyInitialized<js::RegExpCache> in JSCompartment and call .destroy() in JSCompartment::purge?
- I like the AlreadyIncRefed return type from LookupRegExpCacheEntry, but it would be nice if you pushed the type-iness further and made that the Value type of the HashMap and the argument type of AddRegExpCacheEntry.
I would assume this for after 4.0?
Attachment #505744 -
Flags: feedback?(lw) → feedback+
Comment 8•14 years ago
|
||
Oh, one more comment if you keep with the HashTable: instead of js_AtomizeString'ing the input, you could observe that regexp literals will (should?) be atoms already and computed regexps won't, and computed ones seem like they'd be less likely to hit, so perhaps you could only use the cache if the string is already any atom and avoid corner cases where we spend a bunch of extra time atomizing.
Assignee | ||
Comment 9•14 years ago
|
||
(In reply to comment #8)
> so perhaps you could only use the cache if
> the string is already any atom and avoid corner cases where we spend a bunch of
> extra time atomizing.
But this will fall down on the workload that uses naively computed strings over and over again. i.e.
var computed = '';
for (var i = 0; i < 10; ++i)
computed += i;
for (var i = 0; i < 10000000; ++i)
new RegExp(computed);
(In reply to comment #7)
> - I guess HashTable::clear() isn't good enough b/c it leaves the table around
> which might get pretty big... Instead of adding a HashTable::purge(), perhaps
> you could use a js::LazilyInitialized<js::RegExpCache> in JSCompartment and
> call .destroy() in JSCompartment::purge?
That seems like the way to go, especially given shaver's feedback in comment 6.
> - I like the AlreadyIncRefed return type from LookupRegExpCacheEntry, but it
> would be nice if you pushed the type-iness further and made that the Value type
> of the HashMap and the argument type of AddRegExpCacheEntry.
But we don't want the hash map to keep regexps alive. I think making them NeedsIncRef may be what you meant?
Comment 10•14 years ago
|
||
(In reply to comment #9)
> But this will fall down on the workload that uses naively computed strings over
> and over again. i.e.
Clearly. The question is how often we think computed regexps are going to hit the cache. I was speculating 'not often', but that's a guess I suppose. Easy enough to measure.
> But we don't want the hash map to keep regexps alive. I think making them
> NeedsIncRef may be what you meant?
Ah, I missed that RegExp's remove themselves from the cache. The flush-on-gc took care of that anyway, although it potentially keeps RegExps alive longer. Either way, some type explaining the ref-counted-ness would be nice.
Comment 11•14 years ago
|
||
(In reply to comment #9)
> But this will fall down on the workload that uses naively computed strings over
> and over again. i.e.
>
> var computed = '';
> for (var i = 0; i < 10; ++i)
> computed += i;
> for (var i = 0; i < 10000000; ++i)
> new RegExp(computed);
More realistic example:
const line_regexp_parts = [
"^(?:(\\w+):)?", // optional label at start of line
"\\s*(\\.?\\w+)", // optional spaces, (pseudo-)opcode
"(?:\\s+([+-]?\\w+|\\([^)]*\\)))?", // optional first immediate operand
"(?:\\s+([\\w-,]+|\\([^)]*\\)))?", // optional second immediate operand
"(?:\\s*(?:#.*))?$" // optional spaces and comment
];
const line_regexp = new RegExp(line_regexp_parts.join(""));
and then heavy usage of line_regexp. This happens, IIRC jresig does it. But the heavy usage follows a one-time compile per app or page, so maybe this is cold in cache terms.
> That seems like the way to go, especially given shaver's feedback in comment 6.
The classic Unix kernel LRU cache used double list linkage a la jsclist.h, and embedded two links in each cache entry: one for chained hashing, the other for the LRU list. All entries go on the list; initially all are unhashed, free for claiming by Add.
O(1) recycle time and easy to fix the size of the entry pool (and the hashtable buckets vector).
/be
Comment 12•14 years ago
|
||
Want to switch clear to purge in the function decompilation cache when you land this?
Comment 13•14 years ago
|
||
Shrinking realloc doesn't always work. You can just not purge in that case and undo all effects.
Assignee | ||
Comment 14•14 years ago
|
||
(In reply to comment #13)
> Shrinking realloc doesn't always work. You can just not purge in that case and
> undo all effects.
Good point. If realloc won't give me *less* memory, it doesn't *deserve* to be freed! I'll finish this patch up later today, then.
Assignee | ||
Comment 15•14 years ago
|
||
f? sayrer -- do we want to land this for FF4?
Attachment #505744 -
Attachment is obsolete: true
Attachment #512284 -
Flags: review?(lw)
Attachment #512284 -
Flags: feedback?(sayrer)
Assignee | ||
Comment 16•14 years ago
|
||
Need to destroy the source cache on GC in order to release the memory allocated for it.
Attachment #512285 -
Flags: review?(gal)
Comment 17•14 years ago
|
||
Comment on attachment 512284 [details] [diff] [review]
Lazily constructed regexp cache.
Seems a bit late in the game for such an optimization...
>+static inline bool
>+AddRegExpCacheEntry(JSContext *cx, RegExp *re)
>+{
>+ RegExpCache &cache = cx->compartment->regExpCache.ref();
>+ return cache.put(re->getSource(), re);
>+}
Perhaps a comment explaining that the caller must have already done Lookup, which ensures constructed cache?
>+ RegExpCache &cache = cx->compartment->regExpCache.ref();
>+ RegExpCache::Ptr p = cache.lookup(re->getSource());
>+ /* Note that we don't bother checking flags -- it's always safe to remove these. */
>+ if (p.found())
>+ cache.remove(p);
(Keeping the comment) you can just write cache.remove(re->getSource());
>+ if (lazy.empty()) {
>+ lazy.construct();
>+ lazy.ref().init();
>+ return RetType(NULL);
>+ }
Need to oom-check init().
>+ void maybeDestroy() {
I like 'maybe*' as naming style, but more for queries. When attached to a verb, its somewhat disconcerting ;-) What about the verbose but unambiguous destroyIfConstructed() ? (And having ~LazilyConstructed call that.)
Comment 18•14 years ago
|
||
Also, I would r+, but the point of contention in comment 10 remains: perhaps you could instrument the cache and report how many cache hits occur for computed regexps compared to hits for literal regexps?
Assignee | ||
Updated•14 years ago
|
Attachment #512284 -
Flags: review?(lw)
Attachment #512284 -
Flags: feedback?(sayrer)
Assignee | ||
Updated•14 years ago
|
Attachment #512285 -
Flags: review?(gal) → review?(lw)
Comment 19•14 years ago
|
||
Comment on attachment 512285 [details] [diff] [review]
Lazily constructed source cache.
>+ LazilyConstructed<ToSourceCache> lazy = cx->compartment->toSourceCache;
I suspect you don't want a by-value copy of this hash table. (Actually, LazilyConstructed forgot to define/hide copy/assignment, so this is bad. Can you, for now, hide them to catch this sort of bug?)
r+ using a reference.
Attachment #512285 -
Flags: review?(lw) → review+
Assignee | ||
Comment 20•14 years ago
|
||
Summary: TM: small regexp cache → TM: lazily construct toSource cache
Whiteboard: fixed-in-tracemonkey
Assignee | ||
Updated•14 years ago
|
Should this be landing now? I don't think we should approve it for landing in m-c, it's not worth the even-small risk to 4.
Assignee | ||
Comment 22•14 years ago
|
||
(In reply to comment #21)
> Should this be landing now? I don't think we should approve it for landing in
> m-c, it's not worth the even-small risk to 4.
Sorry, getting a= honestly slipped my mind. :-/
As brendan mentioned in bug 629590 comment 18, compartments go away, so keeping a big chunk of memory around for a while on those older-prototype-based sites is probably okay.
Shaver, if you throw an a- 2.0 on there I'll go ahead and back it out.
Comment on attachment 512285 [details] [diff] [review]
Lazily constructed source cache.
thanks, chris. gotta save some wins for FF5! :-)
Attachment #512285 -
Flags: approval2.0-
Assignee | ||
Comment 24•14 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Assignee | ||
Comment 25•14 years ago
|
||
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/62a979cc89a2
http://hg.mozilla.org/mozilla-central/rev/36d2d943f4d7 (backout)
Note: not marking as fixed because last changeset is a backout.
Assignee | ||
Comment 26•14 years ago
|
||
Re-land after TM unfreeze:
http://hg.mozilla.org/tracemonkey/rev/6179a5b48142
Note that bug 634654 is still alive-and-kicking.
Whiteboard: fixed-in-tracemonkey
Assignee | ||
Comment 27•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•