Open Bug 808593 (mimesniff) Opened 12 years ago Updated 2 years ago

[meta] Implement the MIME Sniffing Standard

Categories

(Firefox :: File Handling, enhancement)

enhancement

Tracking

()

People

(Reporter: GPHemsley, Unassigned)

References

(Depends on 10 open bugs, Blocks 1 open bug, )

Details

(Keywords: meta)

The MIME Sniffing Standard describes how to sniff the MIME/Internet media type of files in an interoperable way.

While there are many areas where sniffing is optional and up to the User Agent, there are also a few requirements on User Agents for sniffing in particular contexts (e.g. with unknown types).

This is a tracking bug to track implementation of (or disagreement with) the sniffing standard.
I'm assuming I've been cc:ed because of my work a few years back on what happens if you process arbitrary (and potentially malicious) content as CSS.

CSS is not *sniffed* in the strict sense, but in quirks mode we accept *any* MIME type as potentially CSS as long as it's same origin with the document (see bug 524223).  I would support getting rid of that quirk, but I don't think we have a bug for it right now, and it would need extensive web-compat testing and potentially evangelism.

Bug 521039 and bug 562377 are related CSS MIME issues.  Bug 560388 and bug 560392 are related Content-Type-handling-in-general issues.
Depends on: 521039, 562377, 560388, 560392
(In reply to Zack Weinberg (:zwol) from comment #1)
> I'm assuming I've been cc:ed because of my work a few years back on what
> happens if you process arbitrary (and potentially malicious) content as CSS.

You were CC'd primarily because you were the only one I could find who expressed knowledge (in bug 560392 comment 4 and bug 560388 comment 4) of the IETF document of which this WHATWG spec is a successor:

http://tools.ietf.org/html/draft-abarth-mime-sniff
http://tools.ietf.org/html/draft-ietf-websec-mime-sniff

But I see that, yes, this was related to the issues with CSS that you were involved with. The spec still does not address CSS specifically, so I will look into and get back to you.
Yeah, I'm not presently up to speed on that spec as it relates to anything *but* CSS.  If nobody else has the time, I can try to evaluate it in more detail (as of today I have a bunch more discretionary time than I have had for the past several months).

Regarding CSS, our standards-mode behavior is: anything with "Content-Type: text/css", or with no Content-Type header at all, is *assumed* to be CSS and parsed as such.  Anything else is discarded regardless of its contents.  I would support putting that behavior into the sniffing spec.  http://www.w3.org/TR/CSS21/syndata.html#charset defines character-set sniffing rules for CSS; I do not remember whether or not we implement this, but regardless I don't think they need to be duplicated into the sniffing spec (perhaps a normative reference would be appropriate).

cc:ing some more of the usual suspects.
(In reply to Zack Weinberg (:zwol) from comment #3)
> Regarding CSS, our standards-mode behavior is: anything with "Content-Type:
> text/css", or with no Content-Type header at all, is *assumed* to be CSS and
> parsed as such.  Anything else is discarded regardless of its contents.  I
> would support putting that behavior into the sniffing spec. 
> http://www.w3.org/TR/CSS21/syndata.html#charset defines character-set
> sniffing rules for CSS; I do not remember whether or not we implement this,
> but regardless I don't think they need to be duplicated into the sniffing
> spec (perhaps a normative reference would be appropriate).

As I understand it, the most recent spec is here:

http://dev.w3.org/csswg/css3-syntax/

But as this is intended to be a tracking bug, I imagine it would be better if we moved the CSS-specific discussion to a different bug (or mailing list).
We shouldn't be sniffing CSS.

Past that, Christian and I are probably most familiar with the sniffing stuff.  Except maybe for HTML bits where it's Henri.

Giving the linked document a read is on my todo list.  Of course it might have helped if it had not been totally rewritten from the thing that we had already read and reviewed before.... :(
On the other hand, at first glance it was mostly reformatted, with no real substantive changes so far, right?
(In reply to Boris Zbarsky (:bz) from comment #5)
> Giving the linked document a read is on my todo list.  Of course it might
> have helped if it had not been totally rewritten from the thing that we had
> already read and reviewed before.... :(

That was intended to be a feature, not a bug. :/

(In reply to Boris Zbarsky (:bz) from comment #6)
> On the other hand, at first glance it was mostly reformatted, with no real
> substantive changes so far, right?

That was the plan. (And I think I've been mostly successful with it.)
One obvious difference between the spec and what we do is that the spec allows sniffing text/plain to various types and then rendering them in the browser.  I'm not convinced we're willing to implement that.  Right now we very carefully force such types to be handled outside the browser, even for "non-scriptable" cases, because otherwise we might render content that filtering proxies would have blocked had they had the right type for it.
Apart from the above, I think the differences between what we do and what this draft proposes are not fatal...  Though I did skim pretty quickly; I may have missed something.
(In reply to Gordon P. Hemsley [:gphemsley] from comment #0)
> This is a tracking bug to track implementation of (or disagreement with) the
> sniffing standard.

I think the bits ‘type is equal to "font" or’ and ‘type is equal to "archive" or’ are highly questionable. The most popular font types are in the process of getting application/ types and the most popular archives already have application/ types.

I suspect the ‘a reasonable amount of time has elapsed, as determined by the user agent.’ is unnecessary. The HTML spec has the same provision for the <meta> prescan. Firefox didn’t implement it, a couple of people complained, then fixed their code, and the sky didn’t fall.

What are the use cases for ‘Sniffing archives specifically’? It appears that it sniffs ODF-style files (http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part3.html#__RefHeading__752809_826425813 ; EPUB, ODF, InDesign, etc.) and Open Packaging Conventions-based files (https://en.wikipedia.org/wiki/Open_Packaging_Conventions ; OOXML, XPS, etc.) files as zip archives. Is that intended and a desirable outcome in the light of use cases? 

Otherwise, looks good to me, but then I failed to notice the problem bz pointed out in the previous comment.

(In reply to Zack Weinberg (:zwol) from comment #3)
> http://www.w3.org/TR/CSS21/syndata.html#charset defines character-set
> sniffing rules for CSS; I do not remember whether or not we implement this,

Unfortunately, we mostly do and the rules don’t make sense. Bug 796882 has morphed into implementing Level 3 rules instead, but those rules apply only after we’ve decided the file is CSS.
Depends on: 471020
Blocks: whatwg
Depends on: 789123
Depends on: 864851
Depends on: 862088
Depends on: 877500
Depends on: 878922
Depends on: 975809
Depends on: 986924
Product: Core → Firefox
Version: Trunk → unspecified
Depends on: 1406337
Depends on: 1420575
Depends on: 1423877
Depends on: 500713
Type: defect → enhancement
Depends on: 1602277
No longer depends on: 1602277
Depends on: 1718618
Depends on: 1725933
Depends on: 1725190
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.