Closed
Bug 212308
Opened 22 years ago
Closed 21 years ago
javascript error when using unescape function in UTF8
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: omercier, Unassigned)
References
()
Details
(Keywords: intl)
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624
the unescape function don't work when i am using it in html page which is in
UTF-8 charset
Reproducible: Always
Steps to Reproduce:
1.Go to the url http://195.25.243.18/ppp/unescape.html
2.
3.
Actual Results:
the page don't come and i have this error in the javascript console :
Error: uncaught exception: [Exception... "Component returned failure code:
0x8000ffff (NS_ERROR_UNEXPECTED) [nsIDOMWindowInternal.unescape]"
nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)" location: "JS frame ::
http://195.25.243.18/ppp/unescape.html :: <TOP_LEVEL> :: line 7" data: no]
Comment 1•22 years ago
|
||
That's because the string, once unescaped is expected to be in the character
encoding of the page (from which it can then be converted into the UTF-16
encoding Javascript uses natively to be returned to the script).
Your string, when unescaped, is not valid UTF-8. I assume it's valid ISO-8859-1
or something like that, but there is no reason for Mozilla to try that charset
on a UTF-8 page...
Comment 2•22 years ago
|
||
---> DOM for handling.
In the browser, the DOM unescape() function supersedes
the JS Engine implementation.
Assignee: rogerl → dom_bugs
Component: JavaScript Engine → DOM Level 0
QA Contact: pschwartau → ashishbhatt
Reporter | ||
Comment 3•21 years ago
|
||
the url has change :
http://www.agdf.com/ppp/unescape.html
What i can do to resolve this bug .
Comment 4•21 years ago
|
||
Not mix separate charsets in the same page?
I'm not sure what we can do here that would not break thousands of other pages,
really... The issue is that once we unescape the string, all we have is a
sequence of _bytes_. To write it to the document, we need to convert the bytes
to characters. To do this, we need to assume that the byte stream is a
character string represented in some encoding. We have to guess the encoding.
We guess, reasonably, that it's the same as the encoding of the page itself
(which is what it is 99.99% of the time).
Any suggestions on what we should change there?
Comment 5•21 years ago
|
||
window.unescape() is DOM Level 0
http://www.mozilla.org/docs/dom/domref/dom_window_ref123.html#1022042
document.write() is DOM HTML Levels 1 and 2:
http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/level-one-html.html#ID-75233634
http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML-20030109/html.html#ID-75233634
document.write() should work on nothing but Unicode text.
window.unescape() unescapes a 2 digit (one octet) hex value. The escaping is
defined in RFC 2396:
http://www.apps.ietf.org/rfc/rfc2396.html#sec-2
There is no way to represent all Unicode characters in 1 octet, so it can't work
directly on Unicode Text, and the only option for it, is to treat binary data it
receives as encoded, and then decode it into Unicode text, using the encoding of
the file which carries this data, and then return the result.
Here's a list of states that you want your "%E9" to have through it's life:
Binary Data->(Text decoder)->Unicode Text->(Parser)->MarkUp->(DOM)->JavaScript
Code->(JavaScript)->ASCII string of Escaped Binary Data->(Escape Sequence
Decoder)->Binary Data->(Text decoder)->Unicode
Text->(Parser)->MarkUp->(DOM)->Document text->(Renderer)->Screen image
The part that gets screwed up is:
ASCII string of Escaped Binary Data->(Escape Sequence Decoder)->Binary
Data->(Text decoder)->Unicode Text
Which is all done by window.unescape()
In particular, Text decoder inside of it fails. It tries to use UTF-8 for
decoding, because your document indicates through Content-Type tag that it is
UTF-8 encoded. The Document also contains a UTF-8 BOM.
%E9 octet has the first bit set, which in UTF-8 means that this is a
multibyte-encoded character. However there's no other bytes complementing it.
Which makes it ill-formed UTF-8 code.
http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf#G11165
The bottom line: text decoder inside window.unescape() fails because the binary
data you given it is ill-formed UTF-8. Thus the Error.
Marking INVALID.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → INVALID
Comment 6•21 years ago
|
||
This bug was valid and was fixed by the fix for bug 44272. Refer to that bug as
to why this bug was valid. Reopening now. I'm gonna reslove it as fixed in a moment.
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Comment 7•21 years ago
|
||
sorry for spamming, but this is the right thing to do :-)
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago → 21 years ago
Keywords: intl
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•