1355101 - stylo: Add CSSOMString in WebIDL, use it in CSSOM, make it USVString for Stylo

Reporter

Description

•

7 years ago

Responding to https://bugzilla.mozilla.org/show_bug.cgi?id=1354566#c4

> That said, does rust-cssparser actually expect UTF-8, or WTF-8?
> That is, what happens for surrogate code points in the input?
> Should bindings be outputting WTF-8 given unpaired surrogates
> in the WTF-16 input, or converting unpaired surrogate
> to replacement char, or something else?

cssparser itself is agnostic to surrogates, or indeed anything non-ASCII: they come out unchanged as part of an identifier-like or string token. However the Rust &str type is defined as UTF-8, without surrogates. Unsafe code can make a &str with surrogates in it and my guess is that nothing special would happen, but that is technically undefined behavior: other unsafe code is allowed to assume that this does not happen.

Currently we use NS_ConvertUTF16toUTF8 which replaces unpaired surrogates with U+FFFD as it should. This yields a different CSSOM behavior from (as far as I know) every other browser engine: they use WTF-16 internally and thus preserve unpaired surrogates "by default". (When they occurs in places where CSS syntax allows arbitrary input, for example the 'font-family' or 'content' properties.)

For WTF-16 engines, replacing unpaired surrogate with U+FFFD would require actively looking for them with some additional code and runtime cost. For this reason CSSWG rejected my proposal to have the CSSOM spec mandate this, last time I asked: https://lists.w3.org/Archives/Public/www-style/2014Jun/0060.html

I like to think that replacing with U+FFFD is probably OK, even if not every engine does it. I argue that unpaired surrogates are typically an error case that occurs accidentally. So, while I’m filing this bug to keep a record of this difference with non-Stylo and have a place to discuss it, I think we should not change this behavior unless we find it causes significant compatibility problem with real content. (Acid 3 doesn’t count.) Only in that case, we’ll consider extending rust-cssparser to support WTF-8 or WTF-16. (This implies changing every &str or String Rust type that can contain CSS syntax to some other types.)

Boris Zbarsky [:bzbarsky]

Comment 1

•

7 years ago

I don't think we should be shipping something that violates the spec and deviates from currently interoperable browser behavior without _very_ strong reasons.  And I'm not convinced that choice of string type for our parser constitutes such a reason.  Especially when I assume we knew this was a design constraint when we made that choice....

Boris Zbarsky [:bzbarsky]

Comment 2

•

7 years ago

That said, reading the linked discussion it's not clear to me that it actually considered the CSSOM case we're talking about here, since they're talking about "aren't using the CSS parser" which is very much NOT the case here.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 3

•

7 years ago

I think we shouldn't support the preservation of unpaired surrogates. Although some accidental artifacts of the Web Platform become required features, I'm not aware of any evidence that the preservation of unpaired surrogates is needed for Web compat. Absent overwhelming evidence, preservation of unpaired surrogates in UTF-8/byte-oriented code just leads to excessive complication and inability to use code designed for Rust &str semantics.

When I asked Hixie to make document.write() not to turn unpaired surrogates into the REPLACEMENT CHARACTER, it was because I didn't want to *add* processing when all browsers were UTF-16 internally. Now we have good reasons to use UTF-8 internally, so now what constitutes an added burden is flipped.

So unless there's overwhelming evidence that the preservation of unpaired surrogates in CSSOM is a huge Web compat issue, I think we shouldn't preserve them. The same goes for document.write() if we switch to html5ever in the future.

It's also worth noting that since 2014 USVString has become a thing in WebIDL. I'd be happy for us to push s/DOMString/USVString/ more broadly in APIs. (Even if in some case it makes sense to downgrade them to DOMStrings in our own copies of the WebIDL definitions because a non-WebIDL layer deals. My "replace uconv with encoding_rs" patch, among many other things, changes TextEncoder to take DOMString, because encoding_rs internally behaves as if DOMString to USVString conversion had been applied to UTF-16 input.)

Simon Sapin (:SimonSapin)

Reporter

Comment 4

•

7 years ago

> I assume we knew this was a design constraint when we made that choice

rust-cssparser is the very first bit of Rust code that I wrote beyond Hello World, before I joined Mozilla in 2013. I believe I didn’t know about surrogates at the time.

I’ve known of this specific issue since at least 2014 (when I brought it up in CSSWG), but Servo had bigger web-compat fish to fry.

> they're talking about "aren't using the CSS parser" which is very much NOT the case here

I assume that either Tab misspoke or the minutes are inaccurate, and that he meant that CSSOM parameters are not going through the part of the CSS Syntax spec (which he wrote) that says how a byte stream is decoded into an Unicode stream: https://drafts.csswg.org/css-syntax/#determine-the-fallback-encoding

Later bits of minutes also mention "an unpaired from JS" .

Boris Zbarsky [:bzbarsky]

Comment 5

•

7 years ago

> unless there's overwhelming evidence that the preservation of unpaired surrogates in CSSOM
> is a huge Web compat issue

I think given the risks we already have with stylo, this is backwards.  We should not be making behavior changes unless we have overwhelming evidence that they are NOT a compat issue.  We're going to have enough compat issues organically (due to bugs) without going looking for them...

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 6

•

7 years ago

(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #5)
> > unless there's overwhelming evidence that the preservation of unpaired surrogates in CSSOM
> > is a huge Web compat issue
> 
> I think given the risks we already have with stylo, this is backwards.  We
> should not be making behavior changes unless we have overwhelming evidence
> that they are NOT a compat issue.  We're going to have enough compat issues
> organically (due to bugs) without going looking for them...

If that's the way burden of proof goes, we should get telemetry in place ASAP looking for unpaired surrogates in the old CSSOM implementation.

(I still think we should go for no-preservation unless there's overwhelming evidence to the contrary. Not being able to use code written for &str because of doing WTF-8 just in case would be bad and sad. Having to get telemetry for what a priori look like (very) fringe issues slows things down. I'm worried about over-prudence getting in the way of getting stuff landed and moving on. For example, I had an isindex patch ready to go, but because of the requirement to get telemetry evidence despite the other three engines already having removed isindex, the patch is rotting, since I got busy with other stuff and haven't gotten around to adding the telemetry even though it would be trivial to add. So there's both the issue of telemetry even when executed perfectly adding a delay and the requirement to add telemetry being enough of a speed bump that things get stalled in context switches.)

Simon Sapin (:SimonSapin)

Reporter

Comment 7

•

7 years ago

> We're going to have enough compat issues organically (due to bugs) without going looking for them...

You make it sound like we’re actively adding risk here, but &str in cssparser and servo/components/style is the status-quo and has been since before Stylo started. If this were an existing switch to flip and all other things were equal I’d agree we should default to the conservative approach, but an addition to being a sad outcome implementing support for WTF-8 or WTF-16 would take engineering time that could be spent on something else.

Boris Zbarsky [:bzbarsky]

Comment 8

•

7 years ago

> You make it sound like we’re actively adding risk here

No, we're just adding risk compared to the other option.

I agree that this needs to be weighed against the cost of the engineering time needed to eliminate this risk.  I really wish we hadn't painted ourselves into this corner.  :(

Bobby Holley (:bholley)

Comment 9

•

7 years ago

My gut feeling is that we're probably not going to do anything about this until proven otherwise. We should reevaluate before shipping, marking p3.

Assignee: nobody → simon.sapin

Priority: -- → P3

Simon Sapin (:SimonSapin)

Reporter

Comment 10

•

7 years ago

Prior discussion on Servo’s issue tracker: https://github.com/servo/servo/issues/6564


I’ve filed https://github.com/w3c/csswg-drafts/issues/1217 to propose changing CSS specifications to align with Stylo, and will discuss it at CSSWG’s face-to-face meeting next week.

Simon Sapin (:SimonSapin)

Reporter

Comment 11

•

7 years ago

https://github.com/w3c/csswg-drafts/issues/1217 is resolved with https://github.com/w3c/csswg-drafts/pull/1266: the spec introduces a `CSSOMString` in WebIDL, a typedef for either `DOMString` or `USVString`, chosen by implementations. Choosing `USVString` effectively means replacing surrogates with U+FFFD.  This type is used for almost everything in CSSOM.

Re-purposing this bug to move or IDL definitions to align with the new spec.

Summary: stylo: CSSOM replaces unpaired surrogates with U+FFFD (which might be OK?) → stylo: Add CSSOMString in WebIDL, use it in CSSOM, make it USVString for Stylo

Simon Sapin (:SimonSapin)

Reporter

Updated

•

7 years ago

Blocks: 1355106

Simon Sapin (:SimonSapin)

Reporter

Updated

•

7 years ago

Assignee: simon.sapin → nobody

Bobby Holley (:bholley)

Updated

•

7 years ago

Priority: P3 → --

Bobby Holley (:bholley)

Comment 12

•

7 years ago

Boris, do we believe this will give us a perf win on CSSOM? Just trying to figure out the priority.

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 13

•

7 years ago

> Boris, do we believe this will give us a perf win on CSSOM?

It depends.  It will save us some copying with stylo in some cases.  But it will slow down things when the Gecko style system is in use, by requiring more copying there.

The conversion from ISO-8859-1 to UTF-8 will still be there in the common case.  The requirement to validate for unpaired surrogates will still be there, though maybe we can avoid it in the ISO-8859-1 case.

I really doubt we should block 57 for this, especially because it will be a pure performance regression for our chrome, for Fennec, etc...

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 14

•

7 years ago

Also, it's not clear to me what the scope of this bug is.  Is it to implement CSSOMString and then make it do direct to-UTF8 conversion?  Or is it really to implement it and make it an alias for USVString?  The latter won't be a performance win for stylo either, compared to what we have now.

Bobby Holley (:bholley)

Comment 15

•

7 years ago

Ok.

Priority: -- → P4

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 16

•

7 years ago

I'm developing SIMD-ASCII-optimized Latin1 to UTF-8 and UTF-8 to Latin1 conversions as part of bug 1402247.

Depends on: 1402247

Chris Peterson [:cpeterson]

Comment 17

•

7 years ago

status-firefox57=wontfix unless someone thinks this bug should block 57

status-firefox57: --- → wontfix

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Quick Search

stylo: Add CSSOMString in WebIDL, use it in CSSOM, make it USVString for Stylo

Categories

(Core :: CSS Parsing and Computation, enhancement, P4)

Tracking

()

People

(Reporter: SimonSapin, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Updated

Updated

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Updated