Closed Bug 589664 Opened 14 years ago Closed 14 years ago

Speed up the JSON parser

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: n.nethercote, Assigned: Waldo)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: fixed-in-tracemonkey)

Attachments

(3 files, 3 obsolete files)

Easily the hottest statement in the JSON parser is this switch in js_ConsumeJSONText(): switch (*jp->statep) { which is called once per character. Not good -- big switches are highly unpredictable. Is there any reason why the parser isn't recursive descent? That would avoid the big switch and also there'd be no need for an explicit state stack. Should be both simpler and faster.
After the patch in bug 578216 is applied, Shark says 6.4% of string-tagcloud's time is in js_ConsumeJSONText(). About 4.1% of that then goes under functions related to building the data structure, leaving 2.3% in actual parsing time, AFAICT. If that could be halved it might save around 0.5ms in string-tagcloud, ie. not a lot. So the "simpler" argument is probably stronger than the "faster" argument here, unless fast parsing of JSON is a general goal, which it may well be.
Attached patch mostly works (WIP) (obsolete) (deleted) — Splinter Review
Nb: Brendan has expressed interest in de-JSBool-ing this file. This patch might be a good chance to do it, because it changes lots of code.
(In reply to comment #2) > Created attachment 468211 [details] [diff] [review] > mostly works (WIP) This doesn't speedup vs the existing code very much, maybe about 10%.
checking out Shark on this stuff, it looks like the combined parsing + lexing makes it difficult to optimize for GCC. I will try separating them.
Assignee: general → jwalden+bmo
Status: NEW → ASSIGNED
OS: Mac OS X → All
Hardware: x86 → All
Depends on: 643532
This is a totally by-the-spec implementation that should fix every spec compliance issue we have. It passes our JSON tests, at least. There might be some more refactoring to do, it needs recursion-checks or similar, and it needs "legacy mode" support (ugh) if it's to be used in nsJSON (and not just JSON.parse directly), among other things yet to do. It could definitely use JSON-specific fuzzing. Is it fast? "Probably", but that answer's worth the paper it's printed on. I'll test later today (tomorrow) when it's a reasonable hour and I'm not just in the zone hacking-wise. This doesn't touch the existing parsing code, making it easy to turn off if necessary. If you removed the old parsing code, I think (ignoring license headers) it'd be an overall reduction in lines of code, although I haven't tried it to get exact numbers (due to the remaining work noted above).
Attachment #468211 - Attachment is obsolete: true
One thing I learned from my work on the main JS parser: GetPrefixInteger is way faster than js_strtod. If you can use it for known-integral numbers that'll save some time.
I thought about that; I also thought it might be worth just pushing improvements like that into our strtod itself. Either way it seemed followup-able, and thus definitely something to split out if at all possible to keep changes as bite-sized as possible for a change of this magnitude.
(In reply to comment #8) > I thought about that; I also thought it might be worth just pushing > improvements like that into our strtod itself. I thought about that too but it doesn't look very easy. In the parsing case you've already scanned through the entire token and know that it's a simple integer. You need that knowledge you can avoid most of the work. > Either way it seemed > followup-able, and thus definitely something to split out if at all possible to > keep changes as bite-sized as possible for a change of this magnitude. Oh, definitely. A stack of patches might be easier/quicker than filing multiple bugs; that worked well for me in bug 639420.
Attached patch Closer, may still need more work (obsolete) (deleted) — Splinter Review
Using Opera's simple JSON.parse perf-testing demo at <http://my.opera.com/core/blog/2009/12/18/native-json-support-in-opera> because it's easy to do (and not for any particular resemblance it may bear to real-world JSON.parse usage): [jwalden@find-waldo-now src]$ cat /tmp/json-perf.js /** Returns how many times JSON.parse() ran on a simple input text in a second */ function test() { gc(); // try to minimize system effects var source = '{ "key1":"value1", "array1":[1,2,null,3,"4"], "nestedObject":' + '{ "key1":"value1", "array1":[1,2,null,3,"4"] } }'; var count=0, goOn=true; var start=(new Date()).getTime(); do{ JSON.parse(source); count++; // to avoid turning this into a Date() performance test.. we // do a date check only once per 1000 loops if(count%1000==0)goOn=((new Date()).getTime()-start) <1000; }while( goOn ); var time = (new Date()).getTime()-start; return 'Got score of '+ 100000*time/count+'. The input was parsed '+ count+' times in ' + time +'ms' ; } print(test()); print(test()); print(test()); print(test()); print(test()); [jwalden@find-waldo-now src]$ opt-unmodified/js -j -m -p -f /tmp/json-perf.js Got score of 421.00840336134456. The input was parsed 238000 times in 1002ms Got score of 412.34567901234567. The input was parsed 243000 times in 1002ms Got score of 412.75720164609055. The input was parsed 243000 times in 1003ms Got score of 412.75720164609055. The input was parsed 243000 times in 1003ms Got score of 411.93415637860085. The input was parsed 243000 times in 1001ms [jwalden@find-waldo-now src]$ opt-with-rewrite/js -j -m -p -f /tmp/json-perf.js Got score of 289.3063583815029. The input was parsed 346000 times in 1001ms Got score of 270.96774193548384. The input was parsed 372000 times in 1008ms Got score of 269.35483870967744. The input was parsed 372000 times in 1002ms Got score of 269.0860215053763. The input was parsed 372000 times in 1001ms Got score of 270.16129032258067. The input was parsed 372000 times in 1005ms (Numbers with rewrite without the gc call are in the 364000 range, generally.) For comparison: [jwalden@find-waldo-now src]$ run-jsc /tmp/json-perf.js Got score of 300.6006006006006. The input was parsed 333000 times in 1001ms Got score of 300.3003003003003. The input was parsed 333000 times in 1000ms Got score of 300.3003003003003. The input was parsed 333000 times in 1000ms Got score of 301.50602409638554. The input was parsed 332000 times in 1001ms Got score of 300.9009009009009. The input was parsed 333000 times in 1002ms
Attachment #521141 - Attachment is obsolete: true
json-parse-financial.js from Kraken would be worth measuring!
On my system it's slower for some reason. I would Shark, but first I have to figure out why Apple's gcc miscompiles it when optimization is enabled, then endeavor to hack around that problem. Ain't it grand?
Sigh, somehow not initializing the reviver variable in js_json_parse manages to confound JSONSourceParser::parse, despite the reviver not even being used until *after* that method's been called. (This came of replacing AutoValueRooter with just plain Value, but JS_ConvertArguments doesn't set unspecified arguments to anything sane, so it was garbage that...somehow...trickled backward into the actual parsing. I am not a fan of JS_ConvertArguments.) Perf with patch on OS X is soundly better on both Kraken *and* on the Opera mini-test. I don't know why my laptop's seeing better perf only on the Opera test. But with some ideas from Shark maybe my laptop won't be so cranky, even if its numbers are strangely large. Shark output for the financial data test is interesting. I looked at the data for a loop parsing the data 50k times. We have: * 2.0s reading strings, 1.1s in StringBuffer::finishString * 1.6s in js_DefineNativeProperty: fixed cost as far as this bug's concerned * 743.5ms in ValueToId (588.8ms in Atomize); we'd do well having StringBuffer::finishAtom and using it for property names * ...speaking of which, we should optimize to expect a string in property name context (string/'}' in object-open context), that advance() call in source view is 97% readString * 474.2ms reading numbers, 411.9ms in js_strtod -- definitely win there * 381.8ms for advance() itself, some winnable with more context-sensitive advancing * 111.6ms for NewDenseEmptyArray, maybe some win if Array.prototype lookup were streamlined, based on js_GetClassPrototype slowness * beyond that small stuff: nice to see js_ArrayCompPush as one of those negligibles, wasn't sure how well it would hold up I think this is finishable tomorrow -- we'll see how far I get.
Attached file cachegrind output (deleted) —
I did a Cachegrind run (using a patch stack Waldo gave me). Full results are attached. It's currently a 1.08x instruction count reduction over the old version. Note that inlining can make the attribution of instruction counts a little surprising. You'll work it out with a bit of squinting. Some easy wins I see: - Force-inline readString(), readNumber() - Changing IsJSONStringCharacter to this might help: return c < 128 && isJSONStringCharacter[c]; where isJSONStringCharacter[] is a lookup table. Not sure, it's one less test but involves a memory read. The same idea will definitely be a win for IsJSONWhiteSpace(). - Use the oneCharTokens[] idea I used in jsscan.cpp for handling []{},: - Use a JS7_ISDEC to test for 0..9, it's a single comparison instead of two. It can be used in multiple places. - In readString(), do you need to copy via a StringBuffer? Can't you just create the JSString straight out of the char buffer? - In readString(), the main loop looks like it could be tightened -- esp. the for loop within the while loop, and the '"' vs '\\' vs everything else test. I realize you haven't gone all out for performance yet, and so you're probably aware of some or all of these.
Depends on: 645121
More changes (mostly from comment 13) have bumped the rewrite's performance on the comment 10 test to the 430000 range, still more to do. (My Friday got mostly eaten up by bug 645205, among other things.) A brief response to one part of your thoughts: (In reply to comment #14) > - Use a JS7_ISDEC to test for 0..9, it's a single comparison instead of two. http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf
I guess you think my JS7_ISDEC suggestion is a poor one, oh well. I suggest you try benchmarking with it to see if it helps. Likewise with my other suggestions. I've sped up general JS parsing by ~1.8x over the past few months, I'm just trying to pass on some things I learned.
It's probably reasonable to use the same abstraction everything else uses, although I find doing the comparison manually to be more readable (at least with the JS7_ISDEC name, I'd probably think otherwise for a better one). The point was that the two-comparisons-instead-of-one optimization is one compilers already do, on the compilers we build with particularly, and in general on the compilers we make much of an effort to care about. (It's quite possible icc implements that particular optimization now, too.) So there's no need to rearrange the pattern, slightly obfuscating it in the process, to try to eke out a quantum of performance that the compiler already eked out for you.
There's probably more optimization which could be done here, but: * I currently lack a machine with precise perf-measuring capabilities. * We should get this done. Then we should make it fast. This fixes a bunch of bugs we have now, so it is well worth having even if we haven't eked the last iota of perf from it. Tweaks can be followups. r? to Paul since he's starting to do JSON stuff, and r? to Nick because he's done parser-y/scanner-y stuff a bit. This patch is atop a dozen other patches. I'll try to upload a bundle of everything for experimentation purposes, should anyone want to do so.
Attachment #521630 - Attachment is obsolete: true
Attachment #523649 - Flags: review?(pbiggar)
Attachment #523649 - Flags: review?(nnethercote)
Attached file Patch bundle for rewrite (deleted) —
Blocks: 647119
I just installed qimportbz because I think that's what the patch bundle requires. But "hg qimportbz 523650" isn't working for me. How should I proceed?
$ hg unbundle rewrite.bundle That'll add all the changesets to your repository, then you can update to the head of them and do whatever. And I guess you can use |hg strip| to remove them, or, perhaps easier, just use a clean tree.
Cachegrind says it's a 1.405x instruction count improvement for json-parse-financial. In my experience instruction counts usually don't differ that much from actual times, so I'd expect that means 1.3--1.5x faster.
Comment on attachment 523649 [details] [diff] [review] Patch, includes some new tests for stuff but mostly relies on existing tests This all looks fine, but I'm confused by the legacy stuff. AFAICT you've: (a) added a "legacy" mode to the new parser; this makes sense because it is required for Places data serialization; (b) kept the old parser around; this doesn't make sense to me. Can you explain why you've done this? Or have I misunderstood what's going on? Is there a follow-up bug to remove the old parser? Nit: "NewJSONParser" is a more obvious name than "JSONSourceParser" for the new parser, but hopefully it's a temporary placeholder in which case it doesn't matter much. >+ /* >+ * We're past an arbitrary JSON value, so the previous character is >+ * *somewhat* constrained, even if this assertion is pretty broad. Don't >+ * knock it till you tried it: this assertion *did* catch a bug once. >+ */ >+ JS_ASSERT((current[-1] == 'l' && >+ current[-2] == 'l' && >+ current[-3] == 'u' && >+ current[-4] == 'n') || >+ (current[-1] == 'e' && >+ ((current[-2] == 's' && >+ current[-3] == 'l' && >+ current[-4] == 'a' && >+ current[-5] == 'f') || >+ (current[-2] == 'u' && >+ current[-3] == 'r' && >+ current[-4] == 't'))) || >+ current[-1] == '}' || >+ current[-1] == ']' || >+ current[-1] == '"' || >+ JS7_ISDEC(current[-1])); >+} Nit: don't factor out the 'e' check for 'true' and 'false'; although it's one line shorter it's much less readable due to the extra nesting.
Attachment #523649 - Flags: review?(nnethercote) → review+
Depends on: 645205
(In reply to comment #23) > (b) kept the old parser around; this doesn't make sense to me. Can you > explain why you've done this? Or have I misunderstood what's going on? Is > there a follow-up bug to remove the old parser? Bake time. If something comes up, it's easy to revert to old code. (This is the whole everything-must-be-backout-able-somehow thing.) Once we're confident (well, not that I'm not confident now, but you get the idea) we can remove all the old parser. > Nit: "NewJSONParser" is a more obvious name than "JSONSourceParser" for the > new parser, but hopefully it's a temporary placeholder in which case it > doesn't matter much. Yeah. When the old parser goes away, I'll rename to JSONParser. > Nit: don't factor out the 'e' check for 'true' and 'false'; although it's > one line shorter it's much less readable due to the extra nesting. Good point, fixed locally. I still have a bunch of patches beneath this, touching JSON code, such that it's not easy to move it upward to push sooner. I'm going to push on the rest of the patches to get this landed without having to do patch-moving surgery.
Great thinking on the back out defense!
This does not build for me on OS X, 32-bit. I get errors like this: ../jsonparser.cpp: In member function ‘JSONSourceParser::Token JSONSourceParser::advance()’: ../jsonparser.cpp:307: error: ISO C++ says that these are ambiguous, even though the worst conversion for the first is better than the worst conversion for the second: ../jstl.h:605: note: candidate 1: T& js::RangeCheckedPointer<T>::operator[](intptr_t) const [with T = const short unsigned int] ../jsonparser.cpp:307: note: candidate 2: operator[](const short unsigned int*, int) <built-in> ../jsonparser.cpp:307: error: ISO C++ says that these are ambiguous, even though the worst conversion for the first is better than the worst conversion for the second: ../jstl.h:605: note: candidate 1: T& js::RangeCheckedPointer<T>::operator[](intptr_t) const [with T = const short unsigned int] ../jsonparser.cpp:307: note: candidate 2: operator[](const short unsigned int*, int) <built-in>
Blocks: 641479
The errors in comment 26 were being backed out as the comment was made, and I relanded with what appears to be the correct fix for the old and busted Mac compilers. http://hg.mozilla.org/tracemonkey/rev/6c8becdd1574
Whiteboard: fixed-in-tracemonkey
Target Milestone: --- → mozilla2.2
(Oh, forgot to note, I figured it was best to land with the one review to get it in, and to make tweaking and maintenance simpler, but I still am interested in more reviews. It's just easier to make little changes if needed than to carry around a large patch even longer, when it did get reviewed and at least some people were happy with it.)
Blocks: 577060
Blocks: 557354
+/* + * This class should be JSONParser, but the old JSON parser uses that name, so * until we remove the old parser the class name will be overlong. + * + * NB: This class must only be used on the stack as it contains a js::Value. + */ +class JSONSourceParser It would be better if the stack-only check was done by the compiler! class not_allocatable { private: void * operator new(size_t); }; class JSONSourceParser : public not_allocatable { ... };
I agree getting the compiler on your side is good, yet your suggestion wouldn't do the job fully because the instantiation could be as a member variable of a heap-allocated structure: class Foo { JSONSourceParser parser; ... }; ... new Foo; Gecko more broadly has a NS_STACK_CLASS macro which wraps up a gcc annotation for this, which code analyses can enforce, but SpiderMonkey has no such thing.
Is protecting against some cases better than protecting against none? Also, why does having a js::Value mean it can't be used on the heap? What will go wrong?
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
If the Value contains a GC-thing like a string, the conservative stack scanner won't see that string if it's referenced only from off the stack and not in the GC heap. I don't think it's all that likely that anyone would new a JSONSourceParser, so I suspect the operator new overload wouldn't amount to much in the end. And I suspect people who might want to do something like that would at least as often make it a member of some other class and then new that, which of course avoids the partial protection. And perhaps most of all, the class is internal and won't be exposed, and we seem to have called ix-nay on use of new anyway, so a new-overload alone probably wouldn't catch anything anyway.
As far as perf goes, post-patch, on my machine it looks like we're roughly on par with Chrome and WebKit (I think), but we're still somewhat short of Opera (this all using Opera's mini-perf test). So possibly there's more work to do here. I'm spinning up a build of TM on bm-right to see where time gets sunk now, will report back here (and in followup bugs as necessary) with shark results.
Keywords: perf
Attachment #523649 - Flags: review?(pbiggar)
Perf-investigation followup, from terse notes I took after bz and I investigated a couple weeks ago, using the kraken JSON parsing test: * did some sharking with bz of JSON parsing performance after bug 589664 ** one horrible cache-miss place in GC for allocating JSString ** creating atoms for already-existing strings needs to be faster (will be, when atoms move to per-thread storage rather than whole-runtime) ** rest of potential perf win seems to be in speeding up js_DefineNativeProperty, noble but out of scope here So it seems the JSON parsing code itself is in good shape, at least as concerns that benchmark. On others it's possible it might do differently depending on the vagaries of which fast paths it triggered or didn't trigger, but speculation on that point rather than responding to specific testcases seems unwise.
(In reply to comment #35) > > So it seems the JSON parsing code itself is in good shape, at least as > concerns that benchmark. That clunk I just heard sounded a like a gauntlet being thrown down! :) Hopefully I'll find some time soon to take a look...
Blocks: 572279
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: