Closed Bug 91437 Opened 23 years ago Closed 23 years ago

Speedup of nsScanner::{ReadUntil,SkipWhitespace}

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.4

People

(Reporter: bratell, Assigned: bratell)

Details

(Keywords: perf, Whiteboard: [fix in hand] awaiting review)

Attachments

(5 files)

Scan faster, faster, faaaster... 23 years ago Daniel Bratell (deleted), patch		Details \| Diff \| Splinter Review
New patch, faster, smaller, better(?) 23 years ago Daniel Bratell (deleted), patch		Details \| Diff \| Splinter Review
Previous patch with a bug less 23 years ago Daniel Bratell (deleted), patch		Details \| Diff \| Splinter Review
Merged with tip - no functionality changes 23 years ago Daniel Bratell (deleted), patch		Details \| Diff \| Splinter Review
New updated patch after jst's review 23 years ago Daniel Bratell (deleted), patch		Details \| Diff \| Splinter Review

Daniel Bratell

Assignee

Description

•

23 years ago

You thought that nsScanner::ReadUntil couldn't be faster? Well, you were wrong. I hade an idea which I tested and it brought a slight (~10-20%) speedup when scanning for short (~5 chars) strings. The patch will work better the longer the string since it for 99% of the cases replaces the inner loop with a bit AND and a != 0 check. In the worst case there will be the bitwise AND, the != 0 check and the original loop. (I wonder if this magic should work itself into other places too) Patch coming when I'm back home with access to my mozilla tree.

Daniel Bratell

Assignee

Updated

•

23 years ago

Severity: normal → minor

Keywords: perf

Daniel Bratell

Assignee

Comment 1

•

23 years ago

In the scanner there was two functions were Mozilla spent time. That was ReadUntil and SkipWhitespace. They were taking 0.5% to 2% of the page load time for some pages I tested. With this patch that is down to 0.4% to 1% in my tests. The main reason for that is that SkipWhitespace don't call SetPosition when it hasn't skipped anything. That made SkipWhitespace about 70% faster. The rest is from ReadUntil which: 1) Has one iterator copy less 2) Does the search faster by first checking against a bitmask. That makes the inner loop faster. 3) Reuses the character from the Peek reducing iterator accesses by one. harishd, what do you think about this?

Assignee: harishd → bratell

Keywords: patch

Summary: Speedup of nsScanner::ReadUntil → Speedup of nsScanner::{ReadUntil,SkipWhitespace}

Daniel Bratell

Assignee

Comment 2

•

23 years ago

Attached patch Scan faster, faster, faaaster... (deleted) — Details — Splinter Review

Daniel Bratell

Assignee

Updated

•

23 years ago

Whiteboard: [fix in hand] awaiting review

Target Milestone: --- → mozilla0.9.3

harishd

Comment 3

•

23 years ago

Daniel: I'm currently working on another important issue...will review your patch once I'm done with it. Thanx for the patch.

Status: NEW → ASSIGNED

Daniel Bratell

Assignee

Comment 4

•

23 years ago

The review won't be in time for this milestone. Moving to 0.9.4.

Target Milestone: mozilla0.9.3 → mozilla0.9.4

Daniel Bratell

Assignee

Comment 5

•

23 years ago

I hope you haven't started reading the patch, because here comes one even faster. There is not much time left within ReadUntil with this one. The time in ReadUntil is now to 80% spent in SetPosition and AppendUnicode which I can't do much about. The last patch moves the filter calculation to an external class, nsReadEndCondition which is passed in to the ReadUntil functions. That allows the callers to precalculate the filter (by instantiating the nsReadEndCondition) and by storing the nsReadEndCondition in a static variable, there is no cost (~100 ns for the whole application runtime) for that calculation anymore. The total benefit of this change is about 5% of the ReadUntil time. And the total win for the patch is less than (or equal to) 1% of the load time, 10-15% of the tokenizer time. The patch also removes some unused code, and removes two ReadUntil methods that are no longer used so I guess the code size might be reduced. (~60 fewer code lines)

Daniel Bratell

Assignee

Comment 6

•

23 years ago

Attached patch New patch, faster, smaller, better(?) (deleted) — Details — Splinter Review

Daniel Bratell

Assignee

Comment 7

•

23 years ago

So the last patch contains a bug and I _will_ find it. Hang on.

Daniel Bratell

Assignee

Comment 8

•

23 years ago

Attached patch Previous patch with a bug less (deleted) — Details — Splinter Review

Daniel Bratell

Assignee

Comment 9

•

23 years ago

Found it. Not very good to initialize a static variable with an argument that varies for every call. Saw another bug too, that is removed by the patch: - const nsDependentString theTerminals(aTerminalChars, - sizeof(aTerminalChars)/sizeof(aTerminalChars[0]) - 1); aTerminalChars is no array, it's a pointer, so only the first char in aTerminalChars will be copied. Well, accept my patch and that bug is gone (if it ever has affected anything :-) ).

Daniel Bratell

Assignee

Comment 10

•

23 years ago

So harishd, you are welcome to look at this now. I promise to not detect any more defects. :-)

Andreas M. "Clarence" Schneider

Comment 11

•

23 years ago

Daniel, your patch will conflict with my patch for bug 57724. I want to move newline counting to scanner to avoid iterating over the string again in the tokenizer. Thus I need versions of ReadUntil() and SkipWhitespace() counting newlines (current counting in SkipWhitespace() is not good enough). I've written functions basing on your patch which count newlines: nsresult nsScanner::SkipWhitespace(PRInt32& aNewlineCount) { if (!mSlidingBuffer) { return kEOF; } nsresult result; PRUnichar ch; PRUnichar filter = PRUnichar(~0x2f); while (1) { if (!mCountRemaining) { result = Eof(); if (NS_FAILED(result)) return result; } ch = *mCurrentPosition; if (ch & filter) // this cannot be a whitespace char return NS_OK; switch (ch) { case kCR: ++aNewlineCount; ++mCurrentPosition; --mCountRemaining; if (!mCountRemaining) { result = Eof(); if (NS_FAILED(result)) return result; } ch = *mCurrentPosition; if (ch == kLF) break; continue; case kLF: ++aNewlineCount; case kSpace: case kTab: case kBackSpace: break; default: return NS_OK; } ++mCurrentPosition; --mCountRemaining; } } /** * Consume characters until you encounter one contained in given * input set. Counts newlines and does NOT stop reading on CR or LF, * even if specified in the terminal set. */ nsresult nsScanner::ReadUntil(nsReadingIterator<PRUnichar>& aStart, nsReadingIterator<PRUnichar>& aEnd, const nsReadEndCondition &aEndCondition, PRInt32& aNewlineCount, PRBool addTerminal) { if (!mSlidingBuffer) { return kEOF; } nsresult result = NS_OK; PRUnichar ch; PRUnichar filter = aEndCondition.mFilter & PRUnichar(~0x0f); const PRUnichar* setstart = aEndCondition.mChars; const PRUnichar* setcurrent; aStart = mCurrentPosition; while (1) { if (!mCountRemaining) { result = Eof(); if (NS_FAILED(result)) break; } ch = *mCurrentPosition; // Filter out completely wrong characters // Check if all bits are in the required area if (!(ch & filter)) { if (ch == kCR) { ++aNewlineCount; ++mCurrentPosition; --mCountRemaining; if (!mCountRemaining) { result = Eof(); if (NS_FAILED(result)) goto done; } ch = *mCurrentPosition; if (ch != kLF) goto check; } else if (ch == kLF) { ++aNewlineCount; } else { check: // Check a character that resemble the right ones for (setcurrent = setstart; *setcurrent; ++setcurrent) { if (*setcurrent == ch) { if (addTerminal) { ++mCurrentPosition; --mCountRemaining; } goto done; } } } } ++mCurrentPosition; --mCountRemaining; } done: aEnd = mCurrentPosition; return result; } Some Notes: - Calling SetPosition() is not neccessary. It will iterate over the chars again and we have just done that. - Everywhere in nsScanner there seem to be bugs with Eof(). If Eof() fills the buffer, we should not abort. I think the way I do it above would be the correct way to handle Eof(), but in practice it does not make a difference because mInputStream is always (?) null and thus Eof() returns kEOF. - We'll not win that much by filtering chars with my changes to nsHTMLTokenizer and nsHTMLTokens because ReadUntil() will typically be called with only 2 terminal chars ('<' and '&' for text tokens). But I think it's still an improvement. - The patch triggers 2 warnings on gcc: operator=(const nsReadEndCondition& aOther); // No assigning should return a value (e.g. void) and for mFilter(~PRUnichar(0)) a conflict between signed and unsigned is reported (you can say mFilter(PRUnichar(~0)) instead). - I'll post a new patch to bug 57724 including a modified version of your patch in case you want to test it (but note that it's still work in progress).

Daniel Bratell

Assignee

Comment 12

•

23 years ago

I think I believe I understand what you want to do. But, do you think we can keep it as seperate issues? If we merge them it will be even harder for harishd and company to review and understand what's going on. I would like to see this relatively small patch in and then work from that point. You could still build on it and your own patch might be a little smaller, or what do you think? (Comment to your changes meant for the other bug but I take it here: 0x0f: too magic, let the compiler do it for you '\n' & '\r'. ) I do the changes that removes the warnings though before the checkin (adding void before operator= and moving the ~ into the cast. Believe it or not, it is a win when searching for only two chars. The speed is highly dependent on the size of the inner loop and with the filter that becomes only about ~10 machine instructions. The only times we lose is when the char to be found is within the first 1-3 chars or when the filter becomes 0x0 so that nothing is filtered, but that doesn't happen.

Andreas M. "Clarence" Schneider

Comment 13

•

23 years ago

I do not want to merge the bugs, but you should know that SkipWhitespace() and ReadUntil() without line counting won't be called much after my changes. If you want to offer functions with line counting in your patch, I'll use them, if not I will include them in my patch. For now I had to include parts of your patch in my patch because I already changed it to use nsReadEndCondition and that's not in the tree yet. If you don't take my suggestions (avoiding calls to SetPosition() and issues with Eof()), I won't include them in bug 57724 either, but file another bug later. I just noticed that while writing my patch and wanted to tell you here. My comment saying there'd be only 2 terminal chars was wrong. Another 2 will be added inside ReadUntil() (yes, it should be ~kCR & ~kLF, not ~0x0f).

harishd

Comment 14

•

23 years ago

Daniel: Scanned thro' your patch ( haven't done a complete reivew yet! ) and I like your idea. Will update more on your patch soon.

harishd

Comment 15

•

23 years ago

daniel: why are numerals filtered out? Ex. <html> <head> <title>01234</title> </head> <body> test </body> </html> In the above test case 01234 will not be a part of mFilter.

Daniel Bratell

Assignee

Comment 16

•

23 years ago

They shouldn't be. Can you explain what happens? Or do you mean that they matches mFilter when searching for \r, >, & or something similar? That's just because the bit patterns happen to that way. '0' has ASCII value 00110000 and '<' has 00111100 so the '0' "match" the '<'. But the filter is only used to shortcut search so the loop after the filtering will see that it's not right. Now, digits are not that common in html pages so the extra (very small) cost for them is outweight by the shortcut whenever a letter with ascii value greater than 0x3f, i.e. A, B...a, b...

harishd

Comment 17

•

23 years ago

In ReadUntil() digits don't match the following condition: if(theChar & aEndCondition.mFilter) { ... } and therefore end up in the while loop looking for a match in the terminal set. >Now, digits are not that common in html pages so the extra (very small) cost >for them is outweight by the shortcut whenever a letter with ascii value >greater than 0x3f, i.e. A, B...a, b... Agreed. So for digits the performance would be no better than what we currently have...right?

Daniel Bratell

Assignee

Comment 18

•

23 years ago

Correct. There would be like 3-4 extra machine instructions* for a false match in the filter compared to no filter at all. Though, I removed an unnecessary iterator copy that was visible in the Quantify runs which may compensate for that so it _might_ be faster even for false matches. *) at least an AND, a CMP and a JEQ

harishd

Comment 19

•

23 years ago

Daniel: How much testing have you done and how confident are you that this will not break anything? Please test throughly...with that r=harishd

Daniel Bratell

Assignee

Comment 20

•

23 years ago

I've been running with it for some time without seeing anything strange but is hard to test everything. Do the parser have any regression tests? I will check the changes thoroughly before checking it in. jst, since you're listed as a super reviewer for the parser, can you take a look?

Daniel Bratell

Assignee

Comment 21

•

23 years ago

Attached patch Merged with tip - no functionality changes (deleted) — Details — Splinter Review

Johnny Stenback (:jst)

Comment 22

•

23 years ago

Style nit: switch(theChar) { case '\n': mNewlinesSkipped++; case ' ' : @@ -555,23 +574,26 @@ default: found=PR_FALSE; break; - } + } The above indentation change makes the end brace not line up with the line that starts the indentation block. + for (;current != mEndPosition; ++current, theChar = *current) { For clarity, move theChar = *current into the loop since theChar is not really part of the loop expression (this same pattern occires in a few places). - if (current == end) { - AppendUnicodeTo(origin, current, aString); - return Eof(); - } + return result; + //DoErrTest(aString); return result; -} +} Eh, many return's at the end of a method (nsScanner::ReadUntil)? With that fixed and a r= from Harish, sr=jst. I believe Harish has some regression tests for the parser.

Daniel Bratell

Assignee

Comment 23

•

23 years ago

Attached patch New updated patch after jst's review (deleted) — Details — Splinter Review

Daniel Bratell

Assignee

Comment 24

•

23 years ago

jst thanks for the look. I have incorporated your changes. I thought that I could trick the compiler into doing a better optimization the way I did it but it didn't work. When comparing the rough code: for (; end_condition; statement1, statement2) { if (check) continue; <more code> } to the better looking code with while (end_condition) { if (!check) { <more code> } statement1; statement2; } the generated machine code was identical. Not optimal but identical. I have now run the relevant parts of the smoke tests and haven't seen anything strange anywhere. Harish, can you take a new look and see if this is still (or even more) worthy of your r= ?

Keywords: review

harishd

Comment 25

•

23 years ago

if (!skipped) + return result; // from Peek instead why can't we have something like + PRUnichar theChar=0; + nsresult result=Peek(theChar); + NS_ENSURE_SUCCESS(result,result); Other than that I'm ok with your changes. I did run my regression test suite ( very rudimentary! ) and it passed :-). If you could run browser buster http://komodo.mozilla.org/buster/ I'd greatly appreciate it.

Daniel Bratell

Assignee

Comment 26

•

23 years ago

That would change the way things work. Right now, even if the Peek fails we still scan for whitespace. I only added the comment "// from Peek" because I thought it was difficult to see the origin of the return value. Do you still want that change? If so I guess we should just return NS_OK if there was nothing to skip.

harishd

Comment 27

•

23 years ago

If Peek() returns an error code then the while(condition) should also fail wouldn't it?

Daniel Bratell

Assignee

Comment 28

•

23 years ago

Yes, I too think so, so I added the NS_ENSURE_SUCCESS(result, result) and return NS_OK where result used to be returned. jst is that change ok with you so that I can check it in?

Johnny Stenback (:jst)

Comment 29

•

23 years ago

Ok with me.

Daniel Bratell

Assignee

Comment 30

•

23 years ago

Fix is in. Thanks for your patience.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Daniel Bratell

Assignee

Comment 31

•

23 years ago

Judging from the output in the tinderbox logs the improvement was 9.836%.

harishd

Comment 32

•

23 years ago

great job daniel. thanx.

Moied

Updated

•

23 years ago

QA Contact: bsharma → moied

Moied

Comment 33

•

23 years ago

daniel: could you please verify this.

Daniel Bratell

Assignee

Comment 34

•

23 years ago

Well, I can confirm that the code is still in the tree and that my measurements at that time showed a total speedup of ~1% and a speedup of ~10% in the tokenizer as measured by Quantify. In http://bugzilla.mozilla.org/show_bug.cgi?id=91437#c31 I noticed that a tinderbox showed a tokenizer improvement of the same size.

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.