Open Bug 1354566 Opened 7 years ago Updated 2 years ago

stylo: parsing a single property is still somewhat slower than Gecko

Tracking

()

Status:

NEW

Tracking Flags:

Tracking

Status

firefox55

---

affected

firefox57

---

wontfix

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

7 years ago

This is sort of a continuation of bug 1344136 but the parameters are different enough that I think a different bug is a good idea.

Looking at a profile of the testcase in bug 1344131, stylo spends about 12% of its time in  memmove, called mostly from declaration_block::parse_one_declaration and glue::set_property.  Another 4% is the UTF16-to-UTF8 copy of the to-be-parsed string.  The rest of the stylo-specific time is various parsing code.

I put a profile at https://perf-html.io/public/ca2114c54af0c8a917d476430772b924b2fa9046/calltree/?thread=2 in case something in there jumps out to someone more familiar with the servo CSS parser/tokenizer.  If not, I can dig some more and try to do more comparable head-to-head profiles vs Gecko to see which parts are slower in stylo.

Flags: needinfo?(simon.sapin)

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

7 years ago

Blocks: stylo-perf

Bobby Holley (:bholley)

Comment 1

•

7 years ago

There's probably significant overlap with SimonSapin's work in bug 1331843.

Assignee: nobody → simon.sapin

Priority: -- → P1

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

7 years ago

Happy to reprofile once that lands.

Depends on: 1331843

Manish Goregaokar [:manishearth]

Updated

•

7 years ago

Blocks: stylo

Simon Sapin (:SimonSapin)

Comment 3

•

7 years ago

> Another 4% is the UTF16-to-UTF8 copy of the to-be-parsed string.

On the output side of this conversion/copy, rust-cssparser’s tokenizer it its current incarnation fundamentally needs UTF-8 (Rust &str) for its input. Making it generic to support UTF-16 input would be a significant amount of work.

On the input side, is it really UTF-16 rather than Latin-1? IIRC SpiderMonkey can store JS strings as Latin-1 internally when they don’t contain code units beyond U+00FF, which is the case of "10px". Based on chatting with heycam, it sounds like the WebIDL bindings layer would first convert Latin-1 to UTF-16, then the CSSOM implementation for Stylo converts that to UTF-8.

If we add a WebIDL annotation that a given string parameter is wanted in UTF-8, we could make that one conversion instead of two. We only want UTF-8 when a string is going to Stylo/Servo though, so doing this may be tricky as long as we have two style systems.

Going further, we can avoid converting/copying entirely and use std::str::from_utf8_unchecked if the string happens to only contain ASCII code units (and is stored as Latin-1). However, systematically scanning Latin-1 inputs to check whether they’re ASCII also has a cost. So ideally, SpiderMonkey would track this for us with a "is ASCII" / "might not be ASCII" bit kept in Latin-1 strings.

(This is about CSSOM, I’ve filed Bug 1354989 separately for stylesheets.)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

7 years ago

> Making it generic to support UTF-16 input would be a significant amount of work.

Right.  I'm not sure what we can do about this overhead so far.

> On the input side, is it really UTF-16 rather than Latin-1? 

It's UTF-16 (or WTF-16, I guess; more on this below) once it gets past the bindings layer.  The original JS string is in fact Latin-1 in this case.

We _could_ do something with bindings here to produce UTF-8 instead of UTF-16, yes.  As you note, that's easier to do once we don't have to worry about two style systems, or once we're ok taking a perf hit on the non-stylo style system.

> Going further, we can avoid converting/copying entirely and use std::str::from_utf8_unchecked
> if the string happens to only contain ASCII code units

As things stand, bindings will make a copy no matter what.  They do right now.  So if we implemented the "bindings output UTF-8" thing, ideally we would not need any more copies or scanning of the string or whatnot.

That said, does rust-cssparser actually expect UTF-8, or WTF-8?  That is, what happens for surrogate code points in the input?  Should bindings be outputting WTF-8 given unpaired surrogates in the WTF-16 input, or converting unpaired surrogates to replacement char, or something else?

All the encoding stuff could use a separate bug or three to track it, obviously.  ;)

Simon Sapin (:SimonSapin)

Comment 5

•

7 years ago

Filed bug 1355101 and bug 1355106.

Simon Sapin (:SimonSapin)

Comment 6

•

7 years ago

I’ve submitted https://github.com/servo/servo/pull/16954 which will hopefully reduce `memmove` traffic during parsing.

Simon Sapin (:SimonSapin)

Comment 7

•

7 years ago

https://github.com/servo/servo/pull/17820 should also help with this.

Bobby Holley (:bholley)

Updated

•

7 years ago

Priority: P1 → P4

Simon Sapin (:SimonSapin)

Comment 8

•

7 years ago

I’m kind of out of ideas at this point.

Flags: needinfo?(simon.sapin)

Chris Peterson [:cpeterson]

Comment 9

•

7 years ago

status-firefox57=wontfix unless someone thinks this bug should block 57

status-firefox57: --- → wontfix

Hiroyuki Ikezoe (:hiro)

Updated

•

7 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1430706

Simon Sapin (:SimonSapin)

Updated

•

4 years ago

Assignee: simon.sapin → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

stylo: parsing a single property is still somewhat slower than Gecko

Categories

(Core :: CSS Parsing and Computation, defect, P4)

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Updated

Updated

Updated