Closed
Bug 331773
Opened 19 years ago
Closed 12 years ago
encodeURI fails on decodeURI("%ED%A0%80")
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: danswer, Unassigned)
References
(Blocks 1 open bug)
Details
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
It is possible to give DecodeURI a string such that EncodeURI cannot process it. This same string could just as well come from HTML markup.
Reproducible: Always
Steps to Reproduce:
var js = decodeURI ("%ED%A0%80");
alert (js.length + "\n" + js.charCodeAt(0)); // 1, 0xD800
alert (escape (js)); // %uD800
alert (encodeURI (js)); // fails here
Example 2:
<span id=myspan>�𐀂</span>
<script type='text/javascript'>
var txt = document.getElementById('myspan').innerHTML;
alert (txt.length + "\n" + txt); // length is 3
alert (escape (txt)); // %uD800%uD800%uDC02
alert (encodeURI(txt)); // fails here
</script>
Actual Results:
In the examples, encodeURI fails (on \uD800-\uDFFF) even though decodeURI and escape are successful.
Expected Results:
I expect that encodeURI should give me "%ED%A0%80" corresponding to the equivalent of (%uD800) what the escape is returning.
I have encountered this in trying to safely pass strings between the client and server. Going from the server to the browser is not so bad because one can use either
1. decodeURI(utf-8 encoded string),
2. read the characters out from an html element (such as a span) into which they have been encoded with &#unicodePointInDecimal;
3. or "\xHH" or "\uHHHH" or "\uHHHH\uHHHH" the latter being for unicode characters with 17-21 bits where the 8 H correspond to the 8 nibble UTF-8 encoding. Eg. 𐀂 (dec) -> 10002 (unicode,hex) -> decodeURI("%F0%90%80%82") (UTF-8) -> "\uD800\uDC02" (UTF-16, right?)
To go from a javascript string to an ascii representation, one would expect to use encodeURI, but this fails on decodeURI ("%ED%A0%80"). The reason it fails, I presume, is because the specified character is not valid. But in that case, escape should not work either. I would rather have consistent behaviour - if we don't die on creating the string, then I would rather not die upon manipulating it, especially user entered string data.
Csaba Gabor from Vienna
For unicode charts see: http://www.macchiato.com/unicode/chart/
References: http://en.wikipedia.org/wiki/UTF-8 and
http://en.wikipedia.org/wiki/UTF-16/UCS-2
Comment 1•19 years ago
|
||
Example 1 in comment 0 is pretty much a dupe of bug 316338. Example 2 is fixed in trunk by bug 316394.
Comment 2•19 years ago
|
||
So basically, decodeURI can produce bogus UTF16? Sounds like we should fix that in the JS engine. Same for decodeURIFragment.
Assignee: smontagu → general
Blocks: 316338
Status: UNCONFIRMED → NEW
Component: Internationalization → JavaScript Engine
Ever confirmed: true
OS: Windows XP → All
QA Contact: amyy → general
Hardware: PC → All
Comment 3•13 years ago
|
||
Now deocdeURI("%ED%A0%80") throws URIError (see bug 660612)
Comment 4•12 years ago
|
||
Throwing is okay.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•