Closed Bug 564369 Opened 15 years ago Closed 14 years ago

streamline TokenStream::getChar()

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(10 files, 1 obsolete file)

Attached patch patch (obsolete) (deleted) — Splinter Review
getChar() gets the next character, normalizing all EOL sequences to '\n' as it goes. It's pretty horrid, largely because a \r\n sequence can get split, which makes everything that follows more difficult. This patch: - Replaces the use of js_fgets() with a new method TokenStream::fillUserbuf() which has slightly different behaviour. (The reason for the name fillUserbuf() will become clear shortly.) (I kept js_fgets() in place because it's a JS_FRIEND_API function, I'd be happy to get rid of it though as there are no other users within SpiderMonkey.) fillUserbuf() ensures that a \r\n sequence is never split. It does this by always keeping a spare char in the buffer free so a final \n can be grabbed if necessary. - This allows getChar() to be simplified quite a bit. The EOL normalizing code is currently near the end. All the code in the case marked with "Does the line segment end in \r? We must check for a \n..." is no longer necessary. (I also added some extra assertions in the related cases.) - Removing that code removes the only place that sets TSF_CRFLAG. So earlier in the function I removed all the code that depends on TSF_CRFLAG being set. And then I removed all occurrences of TSF_CRFLAG. - Having done that, there's no longer any need to first get the data into cbuf[] and then copy it into userbuf, so I removed cbuf and changed getLineFromFile() to write directly into userbuf.base. Hence the name 'fillUserbuf()'.
Cachegrind says this reduces instruction counts for string-tagcloud by 1.4%. I haven't done any SS timings. I tried parsemark but got inconclusive results, cdleary said he would try the patch on his machine.
(In reply to comment #1) > Cachegrind says this reduces instruction counts for string-tagcloud by 1.4%. getChar() uses about 25% fewer instructions, and fillUserbuf() uses about 15% fewer than js_fgets() did.
I have some follow-up improvements, BTW, but this is enough changes for one patch.
Attachment #444037 - Flags: review?(cdleary)
Attachment #444037 - Flags: feedback?(brendan)
I think this change won't help much when the main activity is proper parsing, as I suspect it is for parsemark. But for benchmarks that contain whopping great strings -- like string-tagcloud, regexp-dna, string-unpack-code -- more of the work is just in the scanning, and this change has a more noticeable effect.
And even if there was no perf benefit, this patch is worth it for making things clearer! :)
Comment on attachment 444037 [details] [diff] [review] patch Nice work -- it pays to read old code. You could reduce TSF_ renumbering churn by swapping the last flag for the one being removed. We do that elsewhere to minimize diff size, and sometimes it wins anyway since the last flag is "more important" than the earlier ones, esp. the one being removed (whose removal signifies its marginality!). /be
Attachment #444037 - Flags: feedback?(brendan) → feedback+
Cool! Any effects on the aforementioned SS benchmarks?
(In reply to comment #8) > Cool! Any effects on the aforementioned SS benchmarks? Cachegrind says instruction counts are down by 1--1.5% but timing differences are within noise. To give you an idea, the number of instructions in getChar() on string-tagcloud dropped from 15.4M to 11.9M, and the number of instructions in js_fgets() dropped from 3.3M to 2.8M. And I'm hoping to get it down substantially further with a couple of follow-up changes. If I'm lucky I may even be able to split the slow code off into a separate function which might allow the fast path to be inlined, although ungetbuf[] might make that tricky. (I should probably do a stack of mq patches to see the effect all at once but I still haven't got into that style of working, it makes me nervous.)
(In reply to comment #7) > > You could reduce TSF_ renumbering churn by swapping the last flag for the one > being removed. We do that elsewhere to minimize diff size, and sometimes it > wins anyway since the last flag is "more important" than the earlier ones, esp. > the one being removed (whose removal signifies its marginality!). Which is considered the greater good -- minimize churn or keeping the flags in a sensible order? I can go either way but I lean towards the former.
(In reply to comment #10) > (In reply to comment #7) > > > > You could reduce TSF_ renumbering churn by swapping the last flag for the one > > being removed. We do that elsewhere to minimize diff size, and sometimes it > > wins anyway since the last flag is "more important" than the earlier ones, esp. > > the one being removed (whose removal signifies its marginality!). > > Which is considered the greater good -- minimize churn or keeping the flags in > a sensible order? I can go either way but I lean towards the former. It depends -- too much of the former and you end up needing the latter in a big "recover sensible order" patch. This particular case looks like win-win for swap-and-delete. /be
Comment on attachment 444037 [details] [diff] [review] patch I'm going to redo this change and some follow-up ones as a stack of patches, a la bug 553671.
Attachment #444037 - Attachment is obsolete: true
Attachment #444037 - Flags: review?(cdleary)
Summary: don't allow \r\n EOL sequence to be split in getChar() → streamline TokenStream::getChar()
Attached patch patch 1 (deleted) — Splinter Review
This patch: - Adds TokenStream::getLineFromFile() and replaced getChar()'s call to js_fgets() with a call to it. It's similar to js_fgets() but it guarantees that a \r\n sequence won't be split, which will make things easier for getChar(). It's also slightly faster as it unrolls the loop following a \r, thus avoiding the 'crflag' check every time around the loop. It also doesn't bother appending a '\0' because getChar() doesn't need it to. - Doesn't remove js_fgets(), even though it now has no callers, because it's a JS_FRIEND_API function. string-tagcloud is the benchmark most affected by getChar() changes. This patch reduced the instruction count (as measured by Cachegrind) for string-tagcloud from 283.4M to 282.5M.
Attachment #444578 - Flags: review?(cdleary)
Attached patch patch 2 (deleted) — Splinter Review
This patch: - Removes the handling of the split \r\n case in getChar(), because it is no longer possible. - Adds some comments and extra assertions. string-tagcloud: 282.5M (unchanged)
Attachment #444579 - Flags: review?(cdleary)
Attached patch patch 3 (deleted) — Splinter Review
This patch: - Removes all code involving TSF_CRFLAG and crflag, because they are now never set. - Declares some variables closer to their first use point rather than at the top of getChar(). string-tagcloud: 282.5M --> 281.9M
Attachment #444580 - Flags: review?(cdleary)
Attached patch patch 4 (deleted) — Splinter Review
This patch: - Removes cbuf, which is no longer needed, instead reading directly from file into userbuf. - Renames getLineFromFile() as fillUserbuf() and removes its arguments accordingly. - Tweaks the checking of the fillUserbuf()'s return value. string-tagcloud: 281.9M --> 279.4M
Attachment #444581 - Flags: review?(cdleary)
Attached patch patch 5 (deleted) — Splinter Review
This patch: - Changes 'len' and 'olen' in getChar() to 'llen' and 'ulen', which makes it clear which buffer they refer to, and avoids updating them in confusing ways. - Adds some assertions about things that can only happen when scanning from memory. - Adds some comments. string-tagcloud: 279.4M (unchanged)
Attachment #444582 - Flags: review?(cdleary)
Attached patch patch 6 (deleted) — Splinter Review
This patch changes the userbuf-to-linebuf copying in getChar(). The old version did it in a strange way: - look in userbuf for an EOL - copy from userbuf to linebuf, up to the end of userbuf or linebuf or the first EOL, whichever comes first - if we copied up to an EOL, normalize it if necessary The new version does this: - for each char in userbuf up - if we reached the end of userbuf or linebuf, stop - copy char from userbuf to linebuf - if char was an EOL, normalize then stop This is shorter, simpler, avoids the need for the 'saveEOL' member, and faster because it only iterates over userbuf once. Also, it now copies up to LINE_LIMIT chars, not just LINE_LIMIT-1. (I think the -1 might have been necessary when \r\n could be split.) This required changing an assertion in reportCompileErrorNumberVA(). string-tagcloud: 279.4M --> 277.0M
Attachment #444583 - Flags: review?(cdleary)
Attached patch patch 7 (deleted) — Splinter Review
This patch changes fillUserbuf() so it no longer stops when it reaches an EOL; this behaviour isn't needed any more. The change does preserve the "don't split \r\n behaviour", however. This change makes fillUserbuf() faster because it doesn't need to check for \n and \r every time it gets a char. It also means fillUserbuf() is called fewer times. (Hmm, I wonder if it would be better now to replace the repeated fast_fgetc() calls with some variant of fgets()? I don't know if that would be faster.) string-tagcloud: 277.0M --> 276.3M
Attachment #444584 - Flags: review?(cdleary)
Attached patch patch 8 (deleted) — Splinter Review
This patch increases LINE_LIMIT from 256 to 1024, which means that fillUserbuf can be called less often. string-tagcloud: 276.3M --> 275.3M
Attachment #444585 - Flags: review?(cdleary)
Attached patch patch 9 (deleted) — Splinter Review
This patch: - Simplifies the calculation of linepos. Instead of recording linelen and TSF_NLFLAG and using those two to calculate linepos the next time around, we instead calculate and record a single value, lineposNext. - Removes TSF_NLFLAG because it's no longer needed, and moves TSF_KEYWORD_IS_NAME over it. I also realised that the 0x10 slot was free in TokenStreamFlags, so the patch moves TSF_UNEXPECTED_EOF up as well. string-tagcloud: 275.3M (unchanged)
Attachment #444586 - Flags: review?(cdleary)
Attached patch patch 10 (deleted) — Splinter Review
This patch moves the linebuf-refilling code out of getChar() into a new function getCharFillLinebuf(). This makes getChar() small enough that the compiler can inline it in places. string-tagcloud: 275.3M --> 271.3M
Attachment #444587 - Flags: review?(cdleary)
A summary of the perf improvements: - Instruction count for string-tagcloud started at 283.4M, ended at 271.3M, that's a 1.045x reduction overall. - SS results are noisy, but look like roughly 3--4ms for string-tagcloud and string-unpack-code, giving an SS improvement of about 1% overall. - ParseMark results are noisy but good, 5--11% better on 32-bit Linux, 2--5% better on 64-bit Linux. Plus the new code is *much* clearer :) Apologies for the huge stack o' patches to review.
Comment on attachment 444578 [details] [diff] [review] patch 1 Brendan, any feedback you have on any of these patches would be welcome. The first three are equivalent to the now-obsolete patch that you previously looked at.
Attachment #444578 - Flags: feedback?(brendan)
Comment on attachment 444578 [details] [diff] [review] patch 1 Looks good -- no need for that null terminator I suppose. Guide disagrees with the comment style ( https://wiki.mozilla.org/JavaScript:SpiderMonkey:C_Coding_Style#Comments ).
Attachment #444578 - Flags: review?(cdleary) → review+
Attachment #444579 - Flags: review?(cdleary) → review+
Comment on attachment 444580 [details] [diff] [review] patch 3 Destroying lexer flags makes me happy.
Attachment #444580 - Flags: review?(cdleary) → review+
Comment on attachment 444581 [details] [diff] [review] patch 4 Nice hoisting here. Idle thought: should we be using size_t/ssize_t instead of a int for vars that contain file-entity lengths? We're probably not seeing gigabyte payloads of minified JS quite yet, but future-proofing can't hurt.
Attachment #444581 - Flags: review?(cdleary) → review+
Attachment #444582 - Flags: review?(cdleary) → review+
Comment on attachment 444578 [details] [diff] [review] patch 1 >+TokenStream::getLineFromFile(char *buf, int size, FILE *file) >+{ >+ // We must be careful with ASCII newlines: >+ // >+ // - \n ends a line >+ // - \r not followed by \n ends a line >+ // - \r\n ends a line at the \n >+ // >+ // For the last case, we avoid splitting a \r\n pair in order to keep >+ // things simpler for getChar(). To do this we keep one element in buf in >+ // reserve; that way, if the nth char we get is \r, we can peek ahead one >+ // more and get \n as the (n+1)th char if it follows. Nits: - Seems more readable to use prevailing major comment style (/*\n * ...\n */, indented appropriately). - French spacing (which is really English spacing, but I'll keep calling it French spacing) is also not house style, but surely a nit. - In any case, one space after ";" -- not two. This is good for the shell, unused by the browser. Great to see the old CRFLAG state machine go away in this bug. /be
Attachment #444578 - Flags: feedback?(brendan) → feedback+
(In reply to comment #28) > > - French spacing (which is really English spacing, but I'll keep calling it > French spacing) is also not house style, but surely a nit. > - In any case, one space after ";" -- not two. Wow, is that in the style guide? I don't see it. I'll have to get past almost 20 years of muscle memory to not use French spacing -- I learnt it in my 8th grade typing lessons...
Comment on attachment 444585 [details] [diff] [review] patch 8 Did you try any larger values to see what the marginal improvement? Another weird question: why do we hardcode a doubling of the linebuf allocation in TokenStream::init when there's a file? Doesn't seem to get used anywhere...
Attachment #444585 - Flags: review?(cdleary) → review+
(In reply to comment #30) > (From update of attachment 444585 [details] [diff] [review]) > Did you try any larger values to see what the marginal improvement? Yeah, 2048 didn't help any more. > Another weird question: why do we hardcode a doubling of the linebuf allocation > in TokenStream::init when there's a file? Doesn't seem to get used anywhere... That puzzled me at first, too. It does get used, look closely at init(). When reading from a file the first half of the allocated chunk serves as linebuf and the second half serves as userbuf. Pretty hacky, I think it's to avoid doing two mallocs but a comment would have been nice.
(In reply to comment #31) > I think it's to avoid doing > two mallocs but a comment would have been nice. Yes, that's it. Patched for bug 397210 -- sorry I didn't review harder and ask for a comment then. /be
Attachment #444583 - Flags: review?(cdleary) → review+
Attachment #444584 - Flags: review?(cdleary) → review+
Attachment #444586 - Flags: review?(cdleary) → review+
Comment on attachment 444587 [details] [diff] [review] patch 10 Awesome stuff, Nick! Feel free to r=me on future cleanup patches. I <3 clean code.
Attachment #444587 - Flags: review?(cdleary) → review+
(In reply to comment #27) > (From update of attachment 444581 [details] [diff] [review]) > Idle thought: should we be using size_t/ssize_t instead of > a int for vars that contain file-entity lengths? We're probably not seeing > gigabyte payloads of minified JS quite yet, but future-proofing can't hurt. Which length are you referrring to -- the return value of fillUserbuf()? I just copied the 'int' from js_fgets(). Besides, fillUserbuf()'s return value will never exceed LINE_LIMIT, which is currently 1024. Or were you referring to something else?
(In reply to comment #34) Don't mind me, you're right.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: