Closed Bug 796882 Opened 12 years ago Closed 11 years ago

Make BOM take precedence over HTTP for CSS

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla19

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(3 files, 6 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
bzbarsky
: review+
Details | Diff | Splinter Review
Determining the character encoding when CSS loading should be consistent with bug 716579 and bug 687859 for the sake of the overall consistency of the platform. That is, a BOM should take precedence over HTTP-level charset in accordance with http://encoding.spec.whatwg.org/#decode .
So BOM should imply UTF-16, then?  Right now we actually look for @charset after a BOM and allow @charset to set the encoding if it's present....

In any case, this needs to be done in the working group, since it involves changing http://www.w3.org/TR/CSS21/syndata.html#charset
(In reply to Boris Zbarsky (:bz) from comment #1)
> So BOM should imply UTF-16, then?

Or UTF-8 in the case of the UTF-8 BOM.
Chrome and Opera already do what this bug asks for. We are not currently matching IE, either. (IE strips the BOM from CSS and then proceeds to decode according to the encoding specified on the HTTP level, which is really bizarre.) I did not test Safari.
(In reply to Boris Zbarsky (:bz) from comment #1)
> In any case, this needs to be done in the working group, since it involves
> changing http://www.w3.org/TR/CSS21/syndata.html#charset

http://lists.w3.org/Archives/Public/www-style/2012Oct/0516.html
Assignee: nobody → hsivonen
I don’t see mCharsetHint on SheetLoadData ever getting set to a non-empty value. What am I missing?
Attached patch WIP (obsolete) (deleted) — Splinter Review
The patch needs bug 801402 to land first.
Depends on: 801402
> I don’t see mCharsetHint on SheetLoadData ever getting set to a non-empty value.

It might not be.  It's sort of part of the public LoadSheet() API, but consumers may always be passing an empty string.

We could just nuke it.
(In reply to Boris Zbarsky (:bz) from comment #8)
> > I don’t see mCharsetHint on SheetLoadData ever getting set to a non-empty value.
> 
> It might not be.

Turns out it’s used for preloads.

- -

The tests for this bug are at https://test.csswg.org/source/contributors/opera/submitted/css3-syntax/charset/

That is, the tests are testharness.js tests but they are part of CSS WG test suite. We don’t have any precedent for importing tests with this combination. Where should I put them? Does the tooling at http://mxr.mozilla.org/mozilla-central/source/dom/imptests/ support importing a test suite under layout/ ?
Attached patch Implement CSS3 Syntax charset handling (obsolete) (deleted) — Splinter Review
Attachment #674651 - Attachment is obsolete: true
Attached patch Import CSSWG tests (deleted) — Splinter Review
This puts them under dom/imptests/css. I guess it would be a bit nicer to put them somewhere under layout; file a bug on me if you'd like me to extend the script to allow that.
This patch is needed on top of the one Ms2ger uploaded. However, this doesn’t mark dom/imptests/css/contributors/opera/submitted/css3-syntax/charset/test_page-utf16-css-bomless-utf16be.html as a known failure, because I couldn’t get parseFailures.py to do anything.
And it’s a known failure, because it ends up testing whether our UTF-16 decoders try to sniff endianness in the BOMless case. (Our decoder sniffs. The Encoding Standard says not to, at the moment.)

Outside the scope of this bug.
Attached patch Adjust old tests by flipping red and green (obsolete) (deleted) — Splinter Review
Attachment #679679 - Attachment is obsolete: true
Attachment #680628 - Flags: review?(bzbarsky)
Attachment #679680 - Attachment is obsolete: true
Attachment #680630 - Flags: review?(bzbarsky)
Comment on attachment 680627 [details] [diff] [review]
Patch on top of Ms2ger’s patch

Review of attachment 680627 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/imptests/Makefile.in
@@ +22,5 @@
>  
>  include $(srcdir)/editing.mk
>  include $(srcdir)/html.mk
>  include $(srcdir)/webapps.mk
> +include $(srcdir)/css.mk

Gah, sorry :/ I fixed the instructions in the readme to mention this:

https://hg.mozilla.org/mozilla-central/rev/96e7580fb756
Comment on attachment 680628 [details] [diff] [review]
Adjust old tests by flipping red and green

This is landing in the same changeset as the code changes, right?

r=me
Attachment #680628 - Flags: review?(bzbarsky) → review+
Comment on attachment 680630 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with updated comment

r=me if you leave the logging about where the charset info is coming from instead of removing it.
Attachment #680630 - Flags: review?(bzbarsky) → review+
Comment on attachment 680630 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with updated comment

Actually, two more comments:

>+++ b/layout/style/Loader.cpp
>+  for (; i < sizeof(kCharsetSym) - 1; ++i) {

Why not just strncmp?

>+        aCharset.Trim(" \t\n\f\r");

Is this specced?  Is it needed?  Seems pretty weird... especially since having an unescaped linebreak inside a quoted-string is not OK in CSS.
(In reply to Boris Zbarsky (:bz) from comment #20)
> >+++ b/layout/style/Loader.cpp
> >+  for (; i < sizeof(kCharsetSym) - 1; ++i) {
> 
> Why not just strncmp?

Because the old code did not have strncmp and I don’t tend to think in terms of the C standard library.

> >+        aCharset.Trim(" \t\n\f\r");
> 
> Is this specced?

Yes. See http://dev.w3.org/csswg/css3-syntax/#the-input-byte-stream

Step 3 says, in part: “then get an encoding for the sequence of XX bytes, decoded per windows-1252, and let temp be the return value.”

The sequence of XX bytes can contain anything except the double quote.

“Get an encoding” is defined in http://encoding.spec.whatwg.org/#concept-encoding-get which says “Remove any leading and trailing ASCII whitespace from label.” And ASCII whitespace is defined to be " ", "\t", "\n", "\f" and "\r".

I suppose it would be slightly more proper to call EncodingUtils::FindEncodingForLabel() instead of inlining “get an encoding” like this.

> Is it needed? 

It’s definitely needed in other parts of the platform that call “get an encoding”. Dunno if it is needed here. It’s a side effect of reusing the “get an encoding” concept.
Attachment #680628 - Attachment is obsolete: true
Attachment #680630 - Attachment is obsolete: true
Comment on attachment 681011 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with review comments addressed

> +  for (uint32_t i = sizeof(kCharsetSym); i < aDataLength; ++i) {

Should that not be sizeof(kCharsetSym) - 1?
(In reply to Boris Zbarsky (:bz) from comment #23)
> Comment on attachment 681011 [details] [diff] [review]
> Implement CSS3 Syntax charset handling, with review comments addressed
> 
> > +  for (uint32_t i = sizeof(kCharsetSym); i < aDataLength; ++i) {
> 
> Should that not be sizeof(kCharsetSym) - 1?

Oops. Yes.

This patch uses strncmp and does not attempt to inline parts of "get an encoding".
Attachment #681011 - Attachment is obsolete: true
Attachment #681867 - Flags: review?(bzbarsky)
Comment on attachment 681867 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with review comments addressed, without off-by-one error

r=me
Attachment #681867 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky (:bz) from comment #25)
> r=me

Thanks.

Landed the code now in order to get all the BOM precedence stuff in the same train:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b301f9b2e956

Leaving open pending a decision and help from Ms2ger on how to deal with the test import. (Need to sort out the failure metadata and the HTTP headers.)
Whiteboard: [leave open]
Depends on: 816849
Having landed and shipped code changes that change behavior without resolving the bug and setting a target milestone indicating when it got is _really_ confusing for regression-hunting.

Please resolve this bug appropriately and file a followup on whatever work remains.
Blocks: 859706
(In reply to Boris Zbarsky (:bz) from comment #28)
> Having landed and shipped code changes that change behavior without
> resolving the bug and setting a target milestone indicating when it got is
> _really_ confusing for regression-hunting.

Yeah. This remained open longer than I expected it to remain open.

> Please resolve this bug appropriately and file a followup on whatever work
> remains.

Filed bug 880201.
Whiteboard: [leave open]
Target Milestone: --- → mozilla19
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated https://developer.mozilla.org/en-US/docs/Web/CSS/@charset
to reflect that change.

Teoli, can you check if we have other places we document that topic ?
Flags: needinfo?(jypenator)
I have not found other places. It looks ok.
Flags: needinfo?(jypenator)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: