512272 - Store CSS in UTF16 to facilitate zero-copy IO

Reporter

Description

•

15 years ago

zwol came up with the idea of storing our css in the target utf format. This would allow us to a) save on memory copying b) utf conversion

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 1

•

15 years ago

Is that really such a good idea? Vlad has been doing a lot of profiling and IO is by far the biggest bottleneck at least in startup time. Furthermore, a bunch of us have been doing a lot of work in Bug 506430 to make the UTF 8->16 conversion a lot cheaper in terms of runtime.

Blocks: 506431

Brendan Eich [:brendan]

Comment 2

•

15 years ago

I do not believe i/o is anywhere near dominated by UTF-16 high bytes -- profiling data proving this, or it didn't happen :-). Plus, as this bug's subject says it is facilitating zero-copy (mmap) i/o by changing representation. How much that might win, I don't know, but transcoding won't win, it will be a net loss in cycles retired. Speeding up conversion still takes time converting. The web standards really want "UCS-2" nowadays stored as UTF-16. Although we might get the transcoding time down to noise for common cases, I'm skeptical this is a good use of coding time, never mind code footprint or cycles. /be

Dietrich Ayala (:dietrich)

Updated

•

15 years ago

Whiteboard: [ts]

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 3

•

15 years ago

We're about to land some stuff for optimizing utf8 to utf16 conversion (in bug 560430), at least speeding up the ASCII case. But there are a lot of wins here, one of the main ones being that we could hand a mmap'd buffer straight to our parsers (js, CSS, others), so at the very least there would be a win in doing many fewer temporary buffer allocations, fewer memory copies, etc.

Zack Weinberg (:zwol)

Comment 4

•

15 years ago

Well, as one of the CSS parser guys, what I really want (since almost everything in CSS is plain ASCII) is UTF-8. But we'd have to convert most of layout/style to use UTF-8 internally for that to be a win, I think. In the mean time, killing two memory copies should more than compensate for having to store 2x as much data on disk, especially when reading from mmapped jar files.

Alfred Kayser

Comment 5

•

15 years ago

Using UTF16 for css files stored in jar files, will still mean more I/O (even if mmapped, it is still I/O for the operating system), twice the number of bytes, and more memory allocation (even if we can save the read buffer. The real improvement would be to allow the CSS parser to directly read the UTF8 (or really ASCII) encoded CSS files stored in jar's, instead of first converting UTF8 to UTF16 and then parsing... Parsing UTF8 based CSS files directly from mmap'ed jarfiles would the fatest solution. It would only require the CSS parser to also handle UTF8 based streams/strings.

(dormant account)

Reporter

Comment 6

•

15 years ago

(In reply to comment #5) > Using UTF16 for css files stored in jar files, will still mean more I/O (even > if mmapped, it is still I/O for the operating system), twice the number of > bytes, and more memory allocation (even if we can save the read buffer. > The real improvement would be to allow the CSS parser to directly read the UTF8 > (or really ASCII) encoded CSS files stored in jar's, instead of first > converting UTF8 to UTF16 and then parsing... Parsing UTF8 based CSS files > directly from mmap'ed jarfiles would the fatest solution. > Indeed the size will double. From my measurements doubling the size of css files will not make a big difference because the extra io is continuous. We will not land this if it regresses performance in any way. Avoiding 2 memory copies, lower memory usage and no utf16 conversion overhead seems like it would be a win.

Benedict Hsieh [:benedict]

Assignee

Comment 7

•

15 years ago

Attached patch very rough draft (obsolete) (deleted) — Details — Splinter Review

this patch kills one memcpy by storing the CSS on disk in UTF-16, and awkwardly short-circuiting in the converterStream. areas I want to iterate on: getting the CSSParser to read straight from a provided buffer (killing another memcpy), and finding a more elegant way to determine which files are already converted. Ideally, we would just be able to detect the encoding, but currently the detection code returns "UTF8" when given a UTF-16(LE) encoded file.

Attachment #398468 - Flags: review?(tglek)

Benedict Hsieh [:benedict]

Assignee

Updated

•

15 years ago

Attachment #398468 - Attachment is patch: true

Attachment #398468 - Attachment mime type: application/octet-stream → text/plain

Zack Weinberg (:zwol)

Comment 8

•

15 years ago

> Areas I want to iterate on: getting the CSSParser to read straight from a > provided buffer (killing another memcpy), I got this (see bug 513149). Depending on how that and a bunch of other work comes out, we may want to go back to UTF-8 on disk; running the CSS parser directly on UTF-8, or even the entire style system, is on my list of things to experiment with. CSS is pure ASCII except for occasional string-type tokens, so that ought to be a win at least by cutting memory bandwidth in half. I won't be working on it till next week at the earliest, though.

Benedict Hsieh [:benedict]

Assignee

Comment 9

•

15 years ago

Attached patch readsegment (obsolete) (deleted) — Details — Splinter Review

changes nsCSSScanner to use ReadSegment, should avoid one more memcopy. still need to write some tests for this. also, I think I should split my changes to nsConverterInputStream into a different class that just handles already-converted files.

Attachment #398468 - Attachment is obsolete: true

Attachment #398468 - Flags: review?(tglek)

Zack Weinberg (:zwol)

Comment 10

•

15 years ago

> changes nsCSSScanner to use ReadSegment, should avoid one more memcopy. So, if I understand that patch, the ReadSegment callback puts the scanner's read pointer directly into the input stream's internal buffer, and then claims to have consumed all of the data in that buffer. Is that really safe? How long will the data in that buffer be valid?

Boris Zbarsky [:bzbarsky]

Comment 11

•

15 years ago

> Is that really safe? No. > How long will the data in that buffer be valid? Per the interface, until the callback function passed to ReadSegments returns. The stream is free to discard the data immediately afterward. Now it happens that the only current implementations of the interface either never invalidate the buffer (if the stream wraps a string) or don't invalidate it until the next Read/ReadSegments call (for the converter stream). But I don't think we want to depend on this. Basically, ReadSegments is designed for a push model of data flow, and the CSS parser does a pull instead...

Zack Weinberg (:zwol)

Comment 12

•

15 years ago

> Basically, ReadSegments is designed for a push model of data flow, and the CSS > parser does a pull instead... It might not be too hard to turn that around. We could set up the data stream and call ReadSegments in the various parser entry points, then within the callback, set up the scanner's read pointer and run the parser. The difficult part would be suspending the parser, instead of generating a spurious EOF, when we hit end-of-segment.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 13

•

15 years ago

Worst case, should be able to do a small buffer in between though, no? Pull data until the parser stops pulling, then buffer any of the rest until the next iteration. Will require some copies, but should be substantially fewer than before.

Alfred Kayser

Comment 14

•

15 years ago

Instead of doing ReadSegment (which is btw NOT_IMPLEMENTED for jar streams), one could also try to use the StreamBuffer interface access from the parser the underlying buffer of the stream directly. This will prevent the need to switch the parser from pull to push... One could easily add nsIStreamBufferAccess support to the JAR input stream, and in the nsCSSScanner test if the input stream has the nsIStreamBufferAccess interface and then access the 'buffers' of the underlying stream for the token scanning.

Boris Zbarsky [:bzbarsky]

Comment 15

•

15 years ago

> Instead of doing ReadSegment (which is btw NOT_IMPLEMENTED for jar streams), Why does that matter? The stream the CSS scanner has is never a jar stream. > One could easily add nsIStreamBufferAccess support Not to nsIUnicharInputStream implementations, which are what the CSS scanner is working with.

Boris Zbarsky [:bzbarsky]

Comment 16

•

15 years ago

But to be clear, we _can_ make API changes to nsIUnicharInputStream if needed. I believe our string/utf8/converter input streams are the only implementations of it that are floating around. We could also consider changing stylesheets to use a byte stream, I suppose, but that seems rather counterproductive...

Zack Weinberg (:zwol)

Comment 17

•

15 years ago

(In reply to comment #16) > > We could also consider changing stylesheets to use a byte stream, I suppose, > but that seems rather counterproductive... I'm currently suspecting the CSS scanner/parser would be faster if it did all its internal work in UTF-8, even if the data structures it generates remain UTF-16. (I'm doing some experiments to test that right now - more later today.) Does that change your opinion there?

Boris Zbarsky [:bzbarsky]

Comment 18

•

15 years ago

It depends. Is the cost of the UTF-16 to UTF-8 conversion for every single foo.style.whatever call ok? If so, working in UTF-8 might be reasonable.... That would also work better with all the places where we atomize, since atoms are UTF-8. The thing is that for the common case of ISO-8859-1 we'd need to convert to UTF-16 and then to UTF-8, right?

Zack Weinberg (:zwol)

Comment 19

•

15 years ago

Well, it should fast-path in most cases since CSS tends to be pure ASCII. That would also cover the conversion from iso-8859-n (if we don't have a fast path for that, it shouldn't be hard to add). I don't know yet, though, how much overhead we'd be getting rid of. It might not be an overall win for JS manipulation.

Benedict Hsieh [:benedict]

Assignee

Comment 20

•

15 years ago

Attached patch the actual patch (deleted) — Details — Splinter Review

Sorry, the patch submitted yesterday was incomplete. This doesn't address any of the comments made above, just fixes the previous patch. This patch depends on https://bug510844.bugzilla.mozilla.org/attachment.cgi?id=398462

Attachment #399373 - Attachment is obsolete: true

(dormant account)

Reporter

Comment 21

•

15 years ago

this last patch is promising. From initial testing, looks like small css files now load 1.2-2x quicker. Seems like there is a 15% regression on large files, but that might just be highlighting deficiencies in jar mmap setup.

(dormant account)

Reporter

Comment 22

•

15 years ago

(In reply to comment #21) > this last patch is promising. From initial testing, looks like small css files > now load 1.2-2x quicker. > > Seems like there is a 15% regression on large files, but that might just be > highlighting deficiencies in jar mmap setup. after tweaking the mmap code with madvise this way of doing css appears to be conclusively slower. We should put this on hold until until css lexing is done directly on utf8.

Benedict Hsieh [:benedict]

Assignee

Comment 23

•

15 years ago

Attached patch applies this patch on top of CSS minify patch (deleted) — Details — Splinter Review

additionally depends on https://bug513132.bugzilla.mozilla.org/attachment.cgi?id=400574

Attachment #400579 - Flags: review?(tglek)

Benedict Hsieh [:benedict]

Assignee

Comment 24

•

15 years ago

Just to clarify, the last patch depends on: https://bug510844.bugzilla.mozilla.org/attachment.cgi?id=398462 as well as https://bug513132.bugzilla.mozilla.org/attachment.cgi?id=400574

Boris Zbarsky [:bzbarsky]

Comment 25

•

15 years ago

I'm pretty opposed to the hardcoding of paths, etc that patch does. Especially since it is in fact wrong and will break extensions, hard. Was there a reason to not in fact use the existing @charset detection, with fallback to UTF-8 if the stream doesn't implement ReadSegments, say, and with the jar-maker thing adding the right @charset rule to sheets? I haven't looked all that carefully through the converter input stream machinery, but wouldn't it be easier to just implement a new subclass of nsIUnicharInputStream that simply wraps an nsIInputStream that has known UTF-16 little-endian data instead of adding branchy codepaths in the converter input stream?

Boris Zbarsky [:bzbarsky]

Comment 26

•

15 years ago

Or is the problem that we don't think we can trust general UTF-16LE data to the extent that we want to trust this data and hence need some magic something that will flag that we should use this high-trust mode?

(dormant account)

Reporter

Comment 27

•

15 years ago

(In reply to comment #25) > I'm pretty opposed to the hardcoding of paths, etc that patch does. Especially > since it is in fact wrong and will break extensions, hard. this is a proof-of-concept for benchmarking. A final patch would be very different.

Boris Zbarsky [:bzbarsky]

Comment 28

•

15 years ago

Ah, gotcha. That wasn't at all clear. As a POC, looks just fine. ;)

Dietrich Ayala (:dietrich)

Comment 29

•

15 years ago

not a big win until parser is redesigned by zach (bug 91242), per taras -> p3.

Priority: -- → P3

Dietrich Ayala (:dietrich)

Updated

•

15 years ago

Blocks: 447581

Benedict Hsieh [:benedict]

Assignee

Comment 30

•

14 years ago

We can revisit this in the future if it looks like the parser, etc, are redesigned enough to make this a win.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → WONTFIX

Benedict Hsieh [:benedict]

Assignee

Updated

•

14 years ago

Attachment #400579 - Flags: review?(tglek)

Tony Mechelynck [:tonymec]

Comment 31

•

14 years ago

(In reply to comment #18) > It depends. Is the cost of the UTF-16 to UTF-8 conversion for every single > foo.style.whatever call ok? If so, working in UTF-8 might be reasonable.... > That would also work better with all the places where we atomize, since atoms > are UTF-8. > > The thing is that for the common case of ISO-8859-1 we'd need to convert to > UTF-16 and then to UTF-8, right? I know this comment is probably useless since this bug is currently already WONTFIX (and IMHO should remain so) but: - ASCII codepoints (i.e. 0x00 to 0x7F) have identical representation in all three of US-ASCII, ISO-8859-1 and UTF-8 but not in UTF-16. - ISO-8859-1 bytes 0x80 to 0xFF can be converted directly to UTF-8 just as easily as if the data were UTF-16, since conversion of ISO-8859-1 to UTF-16 simply means "convert byte to word by means of a null high byte". In MASM it's XOR AH,AH (one additional i86 operation with no memory access, which might perhaps even be optimized away depending on how the TO_UTF8 function is coded) or in C it means using an unsigned char rather than an unsigned short for the incoming character data. Sorry, I don't know Motorola machine code but I suspect it's just as easy. All in all I believe that conversion of ISO-8859-1 to UTF-8 would be made faster by converting directly, *without* using UTF-16 as a waystation.

Boris Zbarsky [:bzbarsky]

Comment 32

•

14 years ago

Yes, but the point is that our existing character conversion infrastructure can convert various encodings to and from UTF-16. While it would be possible to create new direct-to-UTF-8 converters, it would be a good bit of work. All solvable, but a nontrivial undertaking.

Tony Mechelynck [:tonymec]

Comment 33

•

14 years ago

(In reply to comment #32) > Yes, but the point is that our existing character conversion infrastructure can > convert various encodings to and from UTF-16. While it would be possible to > create new direct-to-UTF-8 converters, it would be a good bit of work. All > solvable, but a nontrivial undertaking. If nothing else, I suppose you could use the iconv library, which IIUC uses UTF-8 as a pivot. It is available on both Unix (meaning Linux and OSX) and Windows. All I'm doing here is idle speculation, of course. No need to change anything any time soon.

very rough draft 15 years ago Benedict Hsieh [:benedict] (deleted), patch		Details \| Diff \| Splinter Review
readsegment 15 years ago Benedict Hsieh [:benedict] (deleted), patch		Details \| Diff \| Splinter Review
the actual patch 15 years ago Benedict Hsieh [:benedict] (deleted), patch		Details \| Diff \| Splinter Review
applies this patch on top of CSS minify patch 15 years ago Benedict Hsieh [:benedict] (deleted), patch		Details \| Diff \| Splinter Review