193728 - data: Base64-encoded URIs don't support %-encoded characters

Reporter

Description

•

22 years ago

If a data: URI contains a %-encoded character and is base64 encoded, then it
plays dead, not doing anything, not even giving error messages. For example:

data:text/plain;base64,UEFTUw%3D%3D
data:text/plain;base64,%56%47%56%7A%64%43%42%6F%59%58%4D%67%63%47%46%7A%63%32%56%6B

Hixie (not reading bugmail)

Reporter

Updated

•

22 years ago

URL: data:text/plain;base64,%56%47%56%7A%6... → data:;base64,%56%47%56%7A%64%43%42%6F...

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 1

•

22 years ago

Attached patch patch (deleted) — Details — Splinter Review

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

Priority: -- → P2

Target Milestone: --- → mozilla1.3final

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Updated

•

22 years ago

Blocks: 144766

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Updated

•

22 years ago

Attachment #114695 - Flags: superreview?(darin)

Attachment #114695 - Flags: review?(bbaetz)

benc

Comment 2

•

22 years ago

let me guess, we don't unescape the URL before we start processing it.

I thought I had written a testcase for this, but maybe I didn't do it right.

Thanks for the bug, hixie.

Keywords: testcase

Summary: Base64-encoded data: URIs don't support %-encoded characters → data: Base64-encoded URIs don't support %-encoded characters

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 3

•

22 years ago

benc: That is true but only happens if the url is base64-encoded.
With my patch, we always unescape.

Bradley Baetz (:bbaetz)

Comment 4

•

22 years ago

This bug is invalid. RFC2397 says:

   The appearance of ";base64" means that the data
   is encoded as base64. Without ";base64", the data (as a sequence of
   octets) is represented using ASCII encoding for octets inside the
   range of safe URL characters and using the standard %xx hex encoding
   of URLs for octets outside that range. 

which implies that the code is correct. Does ns4 behave differently? This makes
sense; with base64, we can have % in the url, but no 'forbidden' characters.
Actually, can we have % in the url? If not, then this is OK. I can't recall the
base64 spec, though.

(According to the spec, the ";base64" bit should also be at teh beginning)

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 5

•

22 years ago

ok, netscape 4 loads none of the urls from this bug. it just does nothing when I
try to load it (sounds familiar? :) )

so is this INVALID?

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 6

•

22 years ago

(I tested version 4.8 on linux)

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 7

•

22 years ago

sigh. this came up once, already:
http://bugzilla.mozilla.org/show_bug.cgi?id=160221#c3

I knew that this sounded familiar...

Andreas Otte

Comment 8

•

22 years ago

I think we should not unescape data uris in base 64 encoding, we should also not
encode them with url-encoding. Those data urls have their own escaping rules we
should not touch them. 

Should this example urls work?

benc

Comment 9

•

22 years ago

data       := *urlchar

in RFC 2396, the term is actually "uric"...

uric          = reserved | unreserved | escaped

I don't think anything that is corrrectly base64 encoded needs to be escaped
(that's the point of using base64 encoding right?), but there is always the case
that someone could gratuitously encode characters that do not need encoding.

RFC 2396:

   Normally, the only time
   escape encodings can safely be made is when the URI is being created
   from its component parts; each component may have its own set of
   characters that are reserved, so only the mechanism responsible for
   generating or interpreting that component can determine whether or
   not escaping a character will change its semantics. Likewise, a URI
   must be separated into its components before the escaped characters
   within those components can be safely decoded.

   In some cases, data that could be represented by an unreserved
   character may appear escaped; for example, some of the unreserved
   "mark" characters are automatically escaped by some systems.  If the
   given URI scheme defines a canonicalization algorithm, then
   unreserved characters may be unescaped according to that algorithm.
   For example, "%7e" is sometimes used instead of "~" in an http URL
   path, but the two are equivalent for an http URL.

The problem, to me, is that the data: URL RFC does not clearly state if there
are reserved characters or not. 

Base64 includes "+/=", which are potentially considered "reserved"

If the data for a URI component would conflict with the
   reserved purpose, then the conflicting data must be escaped before
   forming the URI.

      reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","

   The "reserved" syntax class above refers to those characters that are
   allowed within a URI, but which may not be allowed within a
   particular component of the generic URI syntax; they are used as
   delimiters of the components described in Section 3.

   Characters in the "reserved" set are not reserved in all contexts.
   The set of characters actually reserved within any given URI
   component is defined by that component. In general, a character is
   reserved if the semantics of the URI changes if the character is
   replaced with its escaped US-ASCII encoding.

So, I think it is possible someone who did not look at these two RFC's carefully
would escape some octets in their URLs.

If you want to interpret the data: RFC as having zero reserved characters
because none are explicitly reserved, then we probably do not ned to unescape in
base64.

Andreas Otte

Comment 10

•

22 years ago

The data url is a simple one, beyond "data:" there are no more special url
characters, no need to do any escaping regarding that.

Darin Fisher

Comment 11

•

22 years ago

Comment on attachment 114695 [details] [diff] [review]
patch

if the RFC says not to try to un-URL-escape the base64 encoded data: URL, then
why do we want to bother?  just for increased compatibility?  huh?  data: URLs
are not very common (because IE doesn't support them), so  why don't we just
keep our code lean here?  stick to the standard.

my vote is INVALID or WONTFIX.

Attachment #114695 - Flags: superreview?(darin) → superreview-

Hixie (not reading bugmail)

Reporter

Comment 12

•

22 years ago

This isn't invalid. The spec is clear that the data: URI content is "urlchar"
regardless of whether the data is base64 encoded or not.

This should be fixed because it is perfectly legitimate to over-escape "uric"
characters in URIs, in order to maximise the likelyhood that the data will pass
through correctly, and the spec allows this.

Andreas Otte

Comment 13

•

22 years ago

I took a look at base64 encoding and it does not include a %, so there is no
danger of destroying the data while unescaping. I now have to agree with hixi,
lets unescape in every case, we can not destroy anything, but possibly fix urls
that got escaped when they shouldn't have.

Bradley Baetz (:bbaetz)

Comment 14

•

22 years ago

Comment on attachment 114695 [details] [diff] [review]
patch

darin, was there a final decision on this?

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 15

•

22 years ago

darin: per comment 13, could you maybe reconsider your superreview-?

Darin Fisher

Comment 16

•

22 years ago

Comment on attachment 114695 [details] [diff] [review]
patch

sr=darin (sorry, didn't catch any of the last bugmails... been so behind... but
you guys definitely convinced me!)

Attachment #114695 - Flags: superreview- → superreview+

Bradley Baetz (:bbaetz)

Comment 17

•

21 years ago

Comment on attachment 114695 [details] [diff] [review]
patch

oops, sorry - I missed darin's sr/moa on this.

Attachment #114695 - Flags: review?(bbaetz) → review+

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 18

•

21 years ago

Checking in netwerk/protocol/data/src/nsDataChannel.cpp;
/cvsroot/mozilla/netwerk/protocol/data/src/nsDataChannel.cpp,v  <-- 
nsDataChannel.cpp
new revision: 1.64; previous revision: 1.63
done

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

benc

Comment 19

•

21 years ago

are the provided links supposed to work now?

I'm using 20030617, 1.4, Mach-O, and it doesn't work for me.

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 20

•

21 years ago

benc: Well, they are supposed to work; however, the patch has not been checked
in on the 1.4 branch, so this won't work in 1.4.

benc

Comment 21

•

21 years ago

good. I haven't added this to the tests I'm running for 1.4 branch anyhow.

sorry for the confusion on my part.