Closed Bug 91437 Opened 23 years ago Closed 23 years ago

Speedup of nsScanner::{ReadUntil,SkipWhitespace}

Categories

(Core :: DOM: HTML Parser, defect)

x86
Windows NT
defect
Not set
minor

Tracking

()

VERIFIED FIXED
mozilla0.9.4

People

(Reporter: bratell, Assigned: bratell)

Details

(Keywords: perf, Whiteboard: [fix in hand] awaiting review)

Attachments

(5 files)

You thought that nsScanner::ReadUntil couldn't be faster? Well, you were wrong. I hade an idea which I tested and it brought a slight (~10-20%) speedup when scanning for short (~5 chars) strings. The patch will work better the longer the string since it for 99% of the cases replaces the inner loop with a bit AND and a != 0 check. In the worst case there will be the bitwise AND, the != 0 check and the original loop. (I wonder if this magic should work itself into other places too) Patch coming when I'm back home with access to my mozilla tree.
Severity: normal → minor
Keywords: perf
In the scanner there was two functions were Mozilla spent time. That was ReadUntil and SkipWhitespace. They were taking 0.5% to 2% of the page load time for some pages I tested. With this patch that is down to 0.4% to 1% in my tests. The main reason for that is that SkipWhitespace don't call SetPosition when it hasn't skipped anything. That made SkipWhitespace about 70% faster. The rest is from ReadUntil which: 1) Has one iterator copy less 2) Does the search faster by first checking against a bitmask. That makes the inner loop faster. 3) Reuses the character from the Peek reducing iterator accesses by one. harishd, what do you think about this?
Assignee: harishd → bratell
Keywords: patch
Summary: Speedup of nsScanner::ReadUntil → Speedup of nsScanner::{ReadUntil,SkipWhitespace}
Whiteboard: [fix in hand] awaiting review
Target Milestone: --- → mozilla0.9.3
Daniel: I'm currently working on another important issue...will review your patch once I'm done with it. Thanx for the patch.
Status: NEW → ASSIGNED
The review won't be in time for this milestone. Moving to 0.9.4.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
I hope you haven't started reading the patch, because here comes one even faster. There is not much time left within ReadUntil with this one. The time in ReadUntil is now to 80% spent in SetPosition and AppendUnicode which I can't do much about. The last patch moves the filter calculation to an external class, nsReadEndCondition which is passed in to the ReadUntil functions. That allows the callers to precalculate the filter (by instantiating the nsReadEndCondition) and by storing the nsReadEndCondition in a static variable, there is no cost (~100 ns for the whole application runtime) for that calculation anymore. The total benefit of this change is about 5% of the ReadUntil time. And the total win for the patch is less than (or equal to) 1% of the load time, 10-15% of the tokenizer time. The patch also removes some unused code, and removes two ReadUntil methods that are no longer used so I guess the code size might be reduced. (~60 fewer code lines)
So the last patch contains a bug and I _will_ find it. Hang on.
Attached patch Previous patch with a bug less (deleted) — Splinter Review
Found it. Not very good to initialize a static variable with an argument that varies for every call. Saw another bug too, that is removed by the patch: - const nsDependentString theTerminals(aTerminalChars, - sizeof(aTerminalChars)/sizeof(aTerminalChars[0]) - 1); aTerminalChars is no array, it's a pointer, so only the first char in aTerminalChars will be copied. Well, accept my patch and that bug is gone (if it ever has affected anything :-) ).
So harishd, you are welcome to look at this now. I promise to not detect any more defects. :-)
Daniel, your patch will conflict with my patch for bug 57724. I want to move newline counting to scanner to avoid iterating over the string again in the tokenizer. Thus I need versions of ReadUntil() and SkipWhitespace() counting newlines (current counting in SkipWhitespace() is not good enough). I've written functions basing on your patch which count newlines: nsresult nsScanner::SkipWhitespace(PRInt32& aNewlineCount) { if (!mSlidingBuffer) { return kEOF; } nsresult result; PRUnichar ch; PRUnichar filter = PRUnichar(~0x2f); while (1) { if (!mCountRemaining) { result = Eof(); if (NS_FAILED(result)) return result; } ch = *mCurrentPosition; if (ch & filter) // this cannot be a whitespace char return NS_OK; switch (ch) { case kCR: ++aNewlineCount; ++mCurrentPosition; --mCountRemaining; if (!mCountRemaining) { result = Eof(); if (NS_FAILED(result)) return result; } ch = *mCurrentPosition; if (ch == kLF) break; continue; case kLF: ++aNewlineCount; case kSpace: case kTab: case kBackSpace: break; default: return NS_OK; } ++mCurrentPosition; --mCountRemaining; } } /** * Consume characters until you encounter one contained in given * input set. Counts newlines and does NOT stop reading on CR or LF, * even if specified in the terminal set. */ nsresult nsScanner::ReadUntil(nsReadingIterator<PRUnichar>& aStart, nsReadingIterator<PRUnichar>& aEnd, const nsReadEndCondition &aEndCondition, PRInt32& aNewlineCount, PRBool addTerminal) { if (!mSlidingBuffer) { return kEOF; } nsresult result = NS_OK; PRUnichar ch; PRUnichar filter = aEndCondition.mFilter & PRUnichar(~0x0f); const PRUnichar* setstart = aEndCondition.mChars; const PRUnichar* setcurrent; aStart = mCurrentPosition; while (1) { if (!mCountRemaining) { result = Eof(); if (NS_FAILED(result)) break; } ch = *mCurrentPosition; // Filter out completely wrong characters // Check if all bits are in the required area if (!(ch & filter)) { if (ch == kCR) { ++aNewlineCount; ++mCurrentPosition; --mCountRemaining; if (!mCountRemaining) { result = Eof(); if (NS_FAILED(result)) goto done; } ch = *mCurrentPosition; if (ch != kLF) goto check; } else if (ch == kLF) { ++aNewlineCount; } else { check: // Check a character that resemble the right ones for (setcurrent = setstart; *setcurrent; ++setcurrent) { if (*setcurrent == ch) { if (addTerminal) { ++mCurrentPosition; --mCountRemaining; } goto done; } } } } ++mCurrentPosition; --mCountRemaining; } done: aEnd = mCurrentPosition; return result; } Some Notes: - Calling SetPosition() is not neccessary. It will iterate over the chars again and we have just done that. - Everywhere in nsScanner there seem to be bugs with Eof(). If Eof() fills the buffer, we should not abort. I think the way I do it above would be the correct way to handle Eof(), but in practice it does not make a difference because mInputStream is always (?) null and thus Eof() returns kEOF. - We'll not win that much by filtering chars with my changes to nsHTMLTokenizer and nsHTMLTokens because ReadUntil() will typically be called with only 2 terminal chars ('<' and '&' for text tokens). But I think it's still an improvement. - The patch triggers 2 warnings on gcc: operator=(const nsReadEndCondition& aOther); // No assigning should return a value (e.g. void) and for mFilter(~PRUnichar(0)) a conflict between signed and unsigned is reported (you can say mFilter(PRUnichar(~0)) instead). - I'll post a new patch to bug 57724 including a modified version of your patch in case you want to test it (but note that it's still work in progress).
I think I believe I understand what you want to do. But, do you think we can keep it as seperate issues? If we merge them it will be even harder for harishd and company to review and understand what's going on. I would like to see this relatively small patch in and then work from that point. You could still build on it and your own patch might be a little smaller, or what do you think? (Comment to your changes meant for the other bug but I take it here: 0x0f: too magic, let the compiler do it for you '\n' & '\r'. ) I do the changes that removes the warnings though before the checkin (adding void before operator= and moving the ~ into the cast. Believe it or not, it is a win when searching for only two chars. The speed is highly dependent on the size of the inner loop and with the filter that becomes only about ~10 machine instructions. The only times we lose is when the char to be found is within the first 1-3 chars or when the filter becomes 0x0 so that nothing is filtered, but that doesn't happen.
I do not want to merge the bugs, but you should know that SkipWhitespace() and ReadUntil() without line counting won't be called much after my changes. If you want to offer functions with line counting in your patch, I'll use them, if not I will include them in my patch. For now I had to include parts of your patch in my patch because I already changed it to use nsReadEndCondition and that's not in the tree yet. If you don't take my suggestions (avoiding calls to SetPosition() and issues with Eof()), I won't include them in bug 57724 either, but file another bug later. I just noticed that while writing my patch and wanted to tell you here. My comment saying there'd be only 2 terminal chars was wrong. Another 2 will be added inside ReadUntil() (yes, it should be ~kCR & ~kLF, not ~0x0f).
Daniel: Scanned thro' your patch ( haven't done a complete reivew yet! ) and I like your idea. Will update more on your patch soon.
daniel: why are numerals filtered out? Ex. <html> <head> <title>01234</title> </head> <body> test </body> </html> In the above test case 01234 will not be a part of mFilter.
They shouldn't be. Can you explain what happens? Or do you mean that they matches mFilter when searching for \r, >, & or something similar? That's just because the bit patterns happen to that way. '0' has ASCII value 00110000 and '<' has 00111100 so the '0' "match" the '<'. But the filter is only used to shortcut search so the loop after the filtering will see that it's not right. Now, digits are not that common in html pages so the extra (very small) cost for them is outweight by the shortcut whenever a letter with ascii value greater than 0x3f, i.e. A, B...a, b...
In ReadUntil() digits don't match the following condition: if(theChar & aEndCondition.mFilter) { ... } and therefore end up in the while loop looking for a match in the terminal set. >Now, digits are not that common in html pages so the extra (very small) cost >for them is outweight by the shortcut whenever a letter with ascii value >greater than 0x3f, i.e. A, B...a, b... Agreed. So for digits the performance would be no better than what we currently have...right?
Correct. There would be like 3-4 extra machine instructions* for a false match in the filter compared to no filter at all. Though, I removed an unnecessary iterator copy that was visible in the Quantify runs which may compensate for that so it _might_ be faster even for false matches. *) at least an AND, a CMP and a JEQ
Daniel: How much testing have you done and how confident are you that this will not break anything? Please test throughly...with that r=harishd
I've been running with it for some time without seeing anything strange but is hard to test everything. Do the parser have any regression tests? I will check the changes thoroughly before checking it in. jst, since you're listed as a super reviewer for the parser, can you take a look?
Style nit: switch(theChar) { case '\n': mNewlinesSkipped++; case ' ' : @@ -555,23 +574,26 @@ default: found=PR_FALSE; break; - } + } The above indentation change makes the end brace not line up with the line that starts the indentation block. + for (;current != mEndPosition; ++current, theChar = *current) { For clarity, move theChar = *current into the loop since theChar is not really part of the loop expression (this same pattern occires in a few places). - if (current == end) { - AppendUnicodeTo(origin, current, aString); - return Eof(); - } + return result; + //DoErrTest(aString); return result; -} +} Eh, many return's at the end of a method (nsScanner::ReadUntil)? With that fixed and a r= from Harish, sr=jst. I believe Harish has some regression tests for the parser.
jst thanks for the look. I have incorporated your changes. I thought that I could trick the compiler into doing a better optimization the way I did it but it didn't work. When comparing the rough code: for (; end_condition; statement1, statement2) { if (check) continue; <more code> } to the better looking code with while (end_condition) { if (!check) { <more code> } statement1; statement2; } the generated machine code was identical. Not optimal but identical. I have now run the relevant parts of the smoke tests and haven't seen anything strange anywhere. Harish, can you take a new look and see if this is still (or even more) worthy of your r= ?
Keywords: review
if (!skipped) + return result; // from Peek instead why can't we have something like + PRUnichar theChar=0; + nsresult result=Peek(theChar); + NS_ENSURE_SUCCESS(result,result); Other than that I'm ok with your changes. I did run my regression test suite ( very rudimentary! ) and it passed :-). If you could run browser buster http://komodo.mozilla.org/buster/ I'd greatly appreciate it.
That would change the way things work. Right now, even if the Peek fails we still scan for whitespace. I only added the comment "// from Peek" because I thought it was difficult to see the origin of the return value. Do you still want that change? If so I guess we should just return NS_OK if there was nothing to skip.
If Peek() returns an error code then the while(condition) should also fail wouldn't it?
Yes, I too think so, so I added the NS_ENSURE_SUCCESS(result, result) and return NS_OK where result used to be returned. jst is that change ok with you so that I can check it in?
Ok with me.
Fix is in. Thanks for your patience.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Judging from the output in the tinderbox logs the improvement was 9.836%.
great job daniel. thanx.
QA Contact: bsharma → moied
daniel: could you please verify this.
Well, I can confirm that the code is still in the tree and that my measurements at that time showed a total speedup of ~1% and a speedup of ~10% in the tokenizer as measured by Quantify. In http://bugzilla.mozilla.org/show_bug.cgi?id=91437#c31 I noticed that a tinderbox showed a tokenizer improvement of the same size.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: