Open Bug 582858 Opened 14 years ago Updated 1 year ago

Provide an nsTextFragment API which makes it possible for callers to skip ASCII scanning code

Categories

(Core :: DOM: Core & HTML, defect)

defect

Tracking

()

People

(Reporter: ehsan.akhgari, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [post-2.0])

Attachments

(4 files, 4 obsolete files)

We had a discussion with roc and bz over on irc about how much perf hit/win we'll get from storing all of our text node content as UTF-16 (aka m2b) in text fragments. I took it upon myself to write the patch and experiment with it. If all goes well, we can probably land that patch on trunk and see what our talos numbers would tell us. Here is roc's idea on how to test such a patch before landing on trunk: roc: for a worstcase test you probably want a very large text file, many <p> elements with say 1000 characters per element, loaded locally froma file roc: compare to trunk with text-rendering:optimizeLegibitility added, and without that added I'll write that patch, and post the results here.
OS: Mac OS X → All
Hardware: x86 → All
Version: 1.9.1 Branch → Trunk
So, I have a patch (which I'll attach shortly), and a test case which is basically the text from 9 Project Gutenberg top books concatenated, with a paragraph break inserted every 1000 bytes. Here is the results: http://mzl.la/9nIoSI I'm not quite sure how to interpret these results...
Attached patch Part 1: Remove the 1b text fragment APIs (obsolete) (deleted) — Splinter Review
This patch works (in the sense that Firefox starts and things seem to be fine judging by some smoke tests), but before I spend any more time on this, I want to make sure that the time I spend is worth it.
(Specifically, I have not yet removed any 8bit fast path code in graphics, although they're not kicking in because I'm not setting TEXT_IS_8BIT any more.)
Attached file Test case (bzipped) (deleted) —
Fix some memory allocation issues in nsTextFragment.
Attachment #461619 - Attachment is obsolete: true
Does anybody have any ideas? Do people think it's worth for me to write the second part of the patch (to remove the TEXT_IS_8BIT stuff) and push them to see if catlee's scripts pick up any regression/improvement?
I'm not sure why you want to do this. Are you expecting some kind of performance win from moving all DOM nodes to UTF16?
There is one, actually: textnodes would no longer have to walk over their text trying to decide how to store it, so textnode creation and modification could be faster.
(In reply to comment #8) > There is one, actually: textnodes would no longer have to walk over their text > trying to decide how to store it, so textnode creation and modification could > be faster. Yes, exactly. FWIW, my tests show that this *actually* makes us a lot better at handling things such as editing huge textareas (after bug 240933).
What are the cases that get a lot faster? It seems to me that all we have to do is, when text is inserted/appended, scan the new text for characters >= 256.
Yes, indeed. For editor, the problem is not having a working insert method on textfragments, which I think we should fix. For dramaeo's stupid createTextNode(huge-string) test, the scan is just ... expensive. Totally dwarfs all the other costs involved.
OK. I think it's totally OK for nsDocument::CreateTextNode (and any other DOM text methods that are a performance issue) to call a SetText method that doesn't scan, just assumes UTF16. We could even do that for editor too, if we wanted. I'm guessing that almost all the benefits of 8-bit text can be captured as long as we can create 8-bit text from the parser.
I'd be on board with that. Between than and a better insert API, do we really need this?
Well, and I guess maybe converting editor textnodes to UTF16
One thing we should consider is using vectorized instructions for the scan. That could be significantly faster. Justin: Is that something you'd be interested to look into?
(In reply to comment #15) > One thing we should consider is using vectorized instructions for the scan. > That could be significantly faster. > > Justin: Is that something you'd be interested to look into? Sure, but I'm not sure if I'll be able to get to by FF4 feature freeze. I can try if we think it might really help.
I don't think improving performance counts as a feature, so I don't think we need to make beta4.
Is it just this loop? PRBool need2 = PR_FALSE; while (ucp < uend) { PRUnichar ch = *ucp++; if (ch >= 256) { need2 = PR_TRUE; break; } Should be pretty easy to vectorize.
That's the loop, yes. I think we should strongly consider having at least createTextNode just skip all that jazz and use UTF-16 in any case (separate bug?). There's no way we can get even a vectorized scan to the speed of no scan at all.
jst, you said you had an initial attempt at vectorizing this? I'm curious to take a look, since I don't see an unsigned compare operation in SSE/SSE2. I think we could * shift each 16-bit word right 8 bits * do a signed 16-bit int compare to 0 * pull out the high-order bit of each byte with pmovmskb * compare the result to 0 but maybe there's a more clever solution.
Here's an experiment I did quite some time ago but never got around to properly test etc. Justin, please have a look at this and use it or dismiss it as you see fit. IIRC this was a performance win, but I forget how much, and while it seemed to do the right thing I didn't actually verify that.
Attached file Testcase (obsolete) (deleted) —
Trunk: 160ms With patch: 180ms (first run), 120ms (later runs) Looking into why my patch causes a slowdown on the initial run...
Attached file Testcase v1.1 (obsolete) (deleted) —
That last testcase file had some gunk in it.
Attachment #464197 - Attachment is obsolete: true
Oh, I think I was building my patched code on top of a different revision than I was using as the unpatched version. New numbers in a moment.
All right. With everything based atop b7836c3a63db and testcase v1.1, I get: Trunk: 220ms (first run), 155ms (later runs) Patched: 180ms (first run), 122ms (later runs) This is surprisingly only 30% faster, but the assembly it's generating looks OK to me. I'll put the patch up soon; I just need to filter it into a variety of pieces.
Attached patch Patch v1 (obsolete) (deleted) — Splinter Review
Not sure who to request for a review here. I'd tag bz, but that's just out of habit. :)
This patch is on top of the patches in bug 585818 and bug 585708.
Attachment #464242 - Flags: review?(vladimir)
That patch has nothing to do with this bug. Can we please file a separate bug instead of hijacking this one?
Split into bug 585978 and bug 585980. Sorry for hijacking!
Attachment #464198 - Attachment is obsolete: true
Attachment #464242 - Attachment is obsolete: true
Attachment #464242 - Flags: review?(vladimir)
No longer depends on: 585708, 585818
Potential consumers of this API would be the editor and nsDocument::CreateTextNode.
Summary: Experiment with storing all DOM text node content as UTF-16 → Provide an nsTextFragment API which makes it possible for callers to skip ASCII scanning code
Whiteboard: [post-2.0]
Would a consumer of this API pass in a UTF-16 string, or a string known to be ASCII? Or would the API support both cases?
(In reply to comment #31) > Would a consumer of this API pass in a UTF-16 string, or a string known to be > ASCII? Or would the API support both cases? A UTF-16 string. We don't need to scan strings which are known to be ASCII after all. :-)
Attached patch patch used for the tests (deleted) — Splinter Review
With ehsan help, I investigated this issue a bit, and discovered that using 16 bits strings unconditionally for createTextNode does not change performances in general, and decreases performances a lot for long ascii strings. First reason is that copying utf16 buffer to m2b is two times longer than copying ascii buffer to m1b. What we loose here is more than what we gain by skipping Is8Bit check. Second, and main reason is that, we will have to traverse the string anyway in updateBidiFlag if string is utf16. For the record, here is where most of the time is spend inside createTextNode for a 100000 length ascii string: Is8Bit call: 31 ms copy_string: 100ms for a 1000000 length string: Is8Bit: 931 copy_string: 1050ms and for a 10000000 length string: Is8Bit: 4858 copy_string: 12100ms and when forcing UTF16 (ie: calling SetTo with aForceUTF16 to true in attached patch); 100000 length string: memcpy: 187ms bidi update: 550ms 1000000 length string: memcpy: 2511ms bidi update: 5500ms 10000000 length string: memcpy: 26940ms bidi update: 53000ms
Makes sense that the memory churning is costing us more than the extra CPU cycles are. Are we currently doing bidi scanning separately from is8bit scanning? Could we merge the two and only do bidi scanning from the point where 8bit scanning stopped? If yes could someone file a separate bug?
Good idea. I've opened bug 682592
Assignee: ehsan → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: