Closed Bug 1347408 Opened 8 years ago Closed 7 years ago

stylo: reduce tokenizer overhead

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox57

---

fixed

People

(Reporter: bholley, Assigned: SimonSapin)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Verification patch (do not land). v1 8 years ago Bobby Holley (:bholley) (deleted), patch		Details \| Diff \| Splinter Review
Add a cache for the last token returned by Parser::next(). v1 8 years ago Bobby Holley (:bholley) (deleted), patch		Details \| Diff \| Splinter Review

Bobby Holley (:bholley)

Reporter

Description

•

8 years ago

Per bug 1331843 comment 23 this isn't a huge win on the testcase that I'm measuring, but it's easy and a small win. I wrote a verification patch to ensure that we're hitting the cache in all the cases where parser.try() returns failure and rewinds.

Bobby Holley (:bholley)

Reporter

Comment 1

•

8 years ago

Attached patch Verification patch (do not land). v1 (deleted) — Details — Splinter Review

This patch prints statistics to verify that the cache is working correctly. The number of cache hits track the number of try() failures. MozReview-Commit-ID: Cpzf7eBSGyD

Bobby Holley (:bholley)

Reporter

Comment 2

•

8 years ago

Attached patch Add a cache for the last token returned by Parser::next(). v1 (deleted) — Details — Splinter Review

MozReview-Commit-ID: AJDKMAemFLT

Attachment #8847407 - Flags: review?(simon.sapin)

Bobby Holley (:bholley)

Reporter

Comment 3

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e9c33d1a6a1ef80b0872a80e0c94f39fdd0bd36c

Bobby Holley (:bholley)

Reporter

Comment 4

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=281c3a3622b0658427ff7dffbd3adcd228b6b522

Bobby Holley (:bholley)

Reporter

Comment 5

•

8 years ago

Comment on attachment 8847407 [details] [diff] [review] Add a cache for the last token returned by Parser::next(). v1 Hm, the measurements are a bit noisy, but this doesn't seem to improve much (even with the verified cache hits), and _might_ actually make it slightly slower. I'm operating under the assumption that clone()ing a Token should never be more expensive than just copying a word or two. Is that right Simon? If so, I'm not sure why this would make things slower, but combined with the oranges on try, it might be best to just punt on this one.

Flags: needinfo?(simon.sapin)

Attachment #8847407 - Flags: review?(simon.sapin)

Simon Sapin (:SimonSapin)

Assignee

Comment 6

•

8 years ago

A token can own a memory allocation, but only when there’s a backslash-escape which I expect is rare in practice. (Escapes are unnecessary when using UTF-8.) Even without allocations, Token is more than a word or two. The largest variant is: Dimension(NumericValue, Cow<'a, str>), Adding this to unit tests passes on x86_64: assert_eq!(::std::mem::size_of::<Token>(), 56); assert_eq!(::std::mem::size_of::<Cow<str>>(), 32); assert_eq!(::std::mem::size_of::<NumericValue>(), 16); Enums, especially nested, are space-inefficient because padding is often added to the discriminant to keep variant’s fields aligned. So discriminants take 8 bytes for Token, 8 bytes for Cow, and 4 bytes for Option in Option<i32> in NumericValue. If we want to reduce this we can easily cut 4 bytes by replacing Option<i32> with (i32, bool) (there’s another bool in NumericValue whose space currently gets padded up to 4 bytes, so the new bool fits in there), and another 8 bytes by replacing Cow<str> with something that uses Box<str> instead of String. (String::into_boxed_str may reallocate, but again I expect backslash-escapes to be rare.) We could go further by "flattening" enums (more Token variants instead of nested enums or booleans) but this would negatively impact ergonomics for code matching Token. … I kinda went down a rabbit hole here. I don’t know if reducing the size of Token would even have much impact on parsing performance.

Flags: needinfo?(simon.sapin)

Bobby Holley (:bholley)

Reporter

Updated

•

8 years ago

Priority: -- → P1

Bobby Holley (:bholley)

Reporter

Comment 7

•

8 years ago

Simon is investigating parser performance issues in bug 1331843, so assigning this one to him. We may or may not want to land something like this.

Assignee: bobbyholley → simon.sapin

Bobby Holley (:bholley)

Reporter

Comment 8

•

8 years ago

Generalizing this a little bit to include the various approaches of: (a) Shrinking tokens (b) Changing the API to avoid retokenizing, and (c) a 1-entry cache Stylo spends 200-250 ms tokenizing the 100x myspace sheet, and gecko spends around half that. So there's definitely room for improvement here.

Summary: stylo: add a 1-entry cache to cssparser to avoid tokenizing twice → stylo: reduce tokenizer overhead

Manish Goregaokar [:manishearth]

Updated

•

8 years ago

Blocks: stylo

Simon Sapin (:SimonSapin)

Assignee

Comment 9

•

7 years ago

https://github.com/servo/rust-cssparser/pull/159 and https://github.com/servo/servo/pull/17345 take size_of::<Token>() from 56 bytes to 32 bytes, on 64-bit platforms. Next I’ll look and maintaining a "current token" which can be borrowed without advancing the tokenizer’s position. Then, progressively moving parsing code to use that new API instead of Parser::try should reduce the amount of duplicated tokenization work.

Bobby Holley (:bholley)

Reporter

Updated

•

7 years ago

Priority: P1 → P4

Simon Sapin (:SimonSapin)

Assignee

Comment 10

•

7 years ago

https://github.com/servo/rust-cssparser/pull/171 did this.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox57: --- → fixed

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

stylo: reduce tokenizer overhead

Categories

(Core :: CSS Parsing and Computation, enhancement, P4)

Tracking

()

People

(Reporter: bholley, Assigned: SimonSapin)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated

Comment 9

Updated

Comment 10

Updated

Attachment

General

Description

File Name

Content Type