Open
Bug 582858
Opened 14 years ago
Updated 1 year ago
Provide an nsTextFragment API which makes it possible for callers to skip ASCII scanning code
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
NEW
People
(Reporter: ehsan.akhgari, Unassigned)
References
Details
(Keywords: perf, Whiteboard: [post-2.0])
Attachments
(4 files, 4 obsolete files)
(deleted),
application/x-bzip2
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
We had a discussion with roc and bz over on irc about how much perf hit/win we'll get from storing all of our text node content as UTF-16 (aka m2b) in text fragments. I took it upon myself to write the patch and experiment with it. If all goes well, we can probably land that patch on trunk and see what our talos numbers would tell us. Here is roc's idea on how to test such a patch before landing on trunk:
roc: for a worstcase test you probably want a very large text file, many <p> elements with say 1000 characters per element, loaded locally froma file
roc: compare to trunk with text-rendering:optimizeLegibitility added, and without that added
I'll write that patch, and post the results here.
Reporter | ||
Updated•14 years ago
|
OS: Mac OS X → All
Hardware: x86 → All
Version: 1.9.1 Branch → Trunk
Reporter | ||
Updated•14 years ago
|
Reporter | ||
Comment 1•14 years ago
|
||
So, I have a patch (which I'll attach shortly), and a test case which is basically the text from 9 Project Gutenberg top books concatenated, with a paragraph break inserted every 1000 bytes.
Here is the results:
http://mzl.la/9nIoSI
I'm not quite sure how to interpret these results...
Reporter | ||
Comment 2•14 years ago
|
||
This patch works (in the sense that Firefox starts and things seem to be fine judging by some smoke tests), but before I spend any more time on this, I want to make sure that the time I spend is worth it.
Reporter | ||
Comment 3•14 years ago
|
||
(Specifically, I have not yet removed any 8bit fast path code in graphics, although they're not kicking in because I'm not setting TEXT_IS_8BIT any more.)
Reporter | ||
Comment 4•14 years ago
|
||
Reporter | ||
Comment 5•14 years ago
|
||
Fix some memory allocation issues in nsTextFragment.
Attachment #461619 -
Attachment is obsolete: true
Reporter | ||
Comment 6•14 years ago
|
||
Does anybody have any ideas? Do people think it's worth for me to write the second part of the patch (to remove the TEXT_IS_8BIT stuff) and push them to see if catlee's scripts pick up any regression/improvement?
I'm not sure why you want to do this. Are you expecting some kind of performance win from moving all DOM nodes to UTF16?
Comment 8•14 years ago
|
||
There is one, actually: textnodes would no longer have to walk over their text trying to decide how to store it, so textnode creation and modification could be faster.
Reporter | ||
Comment 9•14 years ago
|
||
(In reply to comment #8)
> There is one, actually: textnodes would no longer have to walk over their text
> trying to decide how to store it, so textnode creation and modification could
> be faster.
Yes, exactly. FWIW, my tests show that this *actually* makes us a lot better at handling things such as editing huge textareas (after bug 240933).
What are the cases that get a lot faster?
It seems to me that all we have to do is, when text is inserted/appended, scan the new text for characters >= 256.
Comment 11•14 years ago
|
||
Yes, indeed. For editor, the problem is not having a working insert method on textfragments, which I think we should fix.
For dramaeo's stupid createTextNode(huge-string) test, the scan is just ... expensive. Totally dwarfs all the other costs involved.
OK. I think it's totally OK for nsDocument::CreateTextNode (and any other DOM text methods that are a performance issue) to call a SetText method that doesn't scan, just assumes UTF16. We could even do that for editor too, if we wanted. I'm guessing that almost all the benefits of 8-bit text can be captured as long as we can create 8-bit text from the parser.
Comment 13•14 years ago
|
||
I'd be on board with that. Between than and a better insert API, do we really need this?
Comment 14•14 years ago
|
||
Well, and I guess maybe converting editor textnodes to UTF16
One thing we should consider is using vectorized instructions for the scan. That could be significantly faster.
Justin: Is that something you'd be interested to look into?
Comment 16•14 years ago
|
||
(In reply to comment #15)
> One thing we should consider is using vectorized instructions for the scan.
> That could be significantly faster.
>
> Justin: Is that something you'd be interested to look into?
Sure, but I'm not sure if I'll be able to get to by FF4 feature freeze. I can try if we think it might really help.
I don't think improving performance counts as a feature, so I don't think we need to make beta4.
Comment 18•14 years ago
|
||
Is it just this loop?
PRBool need2 = PR_FALSE;
while (ucp < uend) {
PRUnichar ch = *ucp++;
if (ch >= 256) {
need2 = PR_TRUE;
break;
}
Should be pretty easy to vectorize.
Comment 19•14 years ago
|
||
That's the loop, yes.
I think we should strongly consider having at least createTextNode just skip all that jazz and use UTF-16 in any case (separate bug?). There's no way we can get even a vectorized scan to the speed of no scan at all.
Comment 20•14 years ago
|
||
jst, you said you had an initial attempt at vectorizing this? I'm curious to take a look, since I don't see an unsigned compare operation in SSE/SSE2. I think we could
* shift each 16-bit word right 8 bits
* do a signed 16-bit int compare to 0
* pull out the high-order bit of each byte with pmovmskb
* compare the result to 0
but maybe there's a more clever solution.
Comment 21•14 years ago
|
||
Here's an experiment I did quite some time ago but never got around to properly test etc. Justin, please have a look at this and use it or dismiss it as you see fit. IIRC this was a performance win, but I forget how much, and while it seemed to do the right thing I didn't actually verify that.
Comment 22•14 years ago
|
||
Trunk: 160ms
With patch: 180ms (first run), 120ms (later runs)
Looking into why my patch causes a slowdown on the initial run...
Comment 23•14 years ago
|
||
That last testcase file had some gunk in it.
Attachment #464197 -
Attachment is obsolete: true
Comment 24•14 years ago
|
||
Oh, I think I was building my patched code on top of a different revision than I was using as the unpatched version. New numbers in a moment.
Comment 25•14 years ago
|
||
All right. With everything based atop b7836c3a63db and testcase v1.1, I get:
Trunk: 220ms (first run), 155ms (later runs)
Patched: 180ms (first run), 122ms (later runs)
This is surprisingly only 30% faster, but the assembly it's generating looks OK to me.
I'll put the patch up soon; I just need to filter it into a variety of pieces.
Comment 26•14 years ago
|
||
Not sure who to request for a review here. I'd tag bz, but that's just out of habit. :)
Comment 27•14 years ago
|
||
This patch is on top of the patches in bug 585818 and bug 585708.
Updated•14 years ago
|
Attachment #464242 -
Flags: review?(vladimir)
Comment 28•14 years ago
|
||
That patch has nothing to do with this bug. Can we please file a separate bug instead of hijacking this one?
Comment 29•14 years ago
|
||
Split into bug 585978 and bug 585980. Sorry for hijacking!
Updated•14 years ago
|
Attachment #464198 -
Attachment is obsolete: true
Updated•14 years ago
|
Attachment #464242 -
Attachment is obsolete: true
Attachment #464242 -
Flags: review?(vladimir)
Updated•14 years ago
|
Reporter | ||
Comment 30•14 years ago
|
||
Potential consumers of this API would be the editor and nsDocument::CreateTextNode.
Summary: Experiment with storing all DOM text node content as UTF-16 → Provide an nsTextFragment API which makes it possible for callers to skip ASCII scanning code
Whiteboard: [post-2.0]
Comment 31•14 years ago
|
||
Would a consumer of this API pass in a UTF-16 string, or a string known to be ASCII? Or would the API support both cases?
Reporter | ||
Comment 32•14 years ago
|
||
(In reply to comment #31)
> Would a consumer of this API pass in a UTF-16 string, or a string known to be
> ASCII? Or would the API support both cases?
A UTF-16 string. We don't need to scan strings which are known to be ASCII after all. :-)
Comment 33•13 years ago
|
||
With ehsan help, I investigated this issue a bit, and discovered that using 16 bits strings unconditionally for createTextNode does not change performances in general, and decreases performances a lot for long ascii strings.
First reason is that copying utf16 buffer to m2b is two times longer than copying ascii buffer to m1b. What we loose here is more than what we gain by skipping Is8Bit check.
Second, and main reason is that, we will have to traverse the string anyway in updateBidiFlag if string is utf16.
For the record, here is where most of the time is spend inside createTextNode for a 100000 length ascii string:
Is8Bit call: 31 ms
copy_string: 100ms
for a 1000000 length string:
Is8Bit: 931
copy_string: 1050ms
and for a 10000000 length string:
Is8Bit: 4858
copy_string: 12100ms
and when forcing UTF16 (ie: calling SetTo with aForceUTF16 to true in attached patch);
100000 length string:
memcpy: 187ms
bidi update: 550ms
1000000 length string:
memcpy: 2511ms
bidi update: 5500ms
10000000 length string:
memcpy: 26940ms
bidi update: 53000ms
Makes sense that the memory churning is costing us more than the extra CPU cycles are.
Are we currently doing bidi scanning separately from is8bit scanning? Could we merge the two and only do bidi scanning from the point where 8bit scanning stopped? If yes could someone file a separate bug?
Comment 35•13 years ago
|
||
Good idea. I've opened bug 682592
Reporter | ||
Updated•5 years ago
|
Assignee: ehsan → nobody
Status: ASSIGNED → NEW
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•