212308 - javascript error when using unescape function in UTF8

Reporter

Description

•

22 years ago

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 the unescape function don't work when i am using it in html page which is in UTF-8 charset Reproducible: Always Steps to Reproduce: 1.Go to the url http://195.25.243.18/ppp/unescape.html 2. 3. Actual Results: the page don't come and i have this error in the javascript console : Error: uncaught exception: [Exception... "Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIDOMWindowInternal.unescape]" nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)" location: "JS frame :: http://195.25.243.18/ppp/unescape.html :: <TOP_LEVEL> :: line 7" data: no]

Boris Zbarsky [:bzbarsky]

Comment 1

•

22 years ago

That's because the string, once unescaped is expected to be in the character encoding of the page (from which it can then be converted into the UTF-16 encoding Javascript uses natively to be returned to the script). Your string, when unescaped, is not valid UTF-8. I assume it's valid ISO-8859-1 or something like that, but there is no reason for Mozilla to try that charset on a UTF-8 page...

Phil Schwartau

Comment 2

•

22 years ago

---> DOM for handling. In the browser, the DOM unescape() function supersedes the JS Engine implementation.

Assignee: rogerl → dom_bugs

Component: JavaScript Engine → DOM Level 0

QA Contact: pschwartau → ashishbhatt

Mercier Olivier

Reporter

Comment 3

•

21 years ago

the url has change : http://www.agdf.com/ppp/unescape.html What i can do to resolve this bug .

URL: http://195.25.243.18/ppp/unescape.html → http://www.agdf.com/ppp/unescape.html

Boris Zbarsky [:bzbarsky]

Comment 4

•

21 years ago

Not mix separate charsets in the same page? I'm not sure what we can do here that would not break thousands of other pages, really... The issue is that once we unescape the string, all we have is a sequence of _bytes_. To write it to the document, we need to convert the bytes to characters. To do this, we need to assume that the byte stream is a character string represented in some encoding. We have to guess the encoding. We guess, reasonably, that it's the same as the encoding of the page itself (which is what it is 99.99% of the time). Any suggestions on what we should change there?

Alexey Chernyak

Comment 5

•

21 years ago

window.unescape() is DOM Level 0 http://www.mozilla.org/docs/dom/domref/dom_window_ref123.html#1022042 document.write() is DOM HTML Levels 1 and 2: http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/level-one-html.html#ID-75233634 http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML-20030109/html.html#ID-75233634 document.write() should work on nothing but Unicode text. window.unescape() unescapes a 2 digit (one octet) hex value. The escaping is defined in RFC 2396: http://www.apps.ietf.org/rfc/rfc2396.html#sec-2 There is no way to represent all Unicode characters in 1 octet, so it can't work directly on Unicode Text, and the only option for it, is to treat binary data it receives as encoded, and then decode it into Unicode text, using the encoding of the file which carries this data, and then return the result. Here's a list of states that you want your "%E9" to have through it's life: Binary Data->(Text decoder)->Unicode Text->(Parser)->MarkUp->(DOM)->JavaScript Code->(JavaScript)->ASCII string of Escaped Binary Data->(Escape Sequence Decoder)->Binary Data->(Text decoder)->Unicode Text->(Parser)->MarkUp->(DOM)->Document text->(Renderer)->Screen image The part that gets screwed up is: ASCII string of Escaped Binary Data->(Escape Sequence Decoder)->Binary Data->(Text decoder)->Unicode Text Which is all done by window.unescape() In particular, Text decoder inside of it fails. It tries to use UTF-8 for decoding, because your document indicates through Content-Type tag that it is UTF-8 encoded. The Document also contains a UTF-8 BOM. %E9 octet has the first bit set, which in UTF-8 means that this is a multibyte-encoded character. However there's no other bytes complementing it. Which makes it ill-formed UTF-8 code. http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf#G11165 The bottom line: text decoder inside window.unescape() fails because the binary data you given it is ill-formed UTF-8. Thus the Error. Marking INVALID.

Status: UNCONFIRMED → RESOLVED

Closed: 21 years ago

Resolution: --- → INVALID

Boris Zbarsky [:bzbarsky]

Updated

•

21 years ago

Depends on: 44272

Jungshik Shin

Comment 6

•

21 years ago

This bug was valid and was fixed by the fix for bug 44272. Refer to that bug as to why this bug was valid. Reopening now. I'm gonna reslove it as fixed in a moment.

Status: RESOLVED → UNCONFIRMED

Resolution: INVALID → ---

Jungshik Shin

Comment 7

•

21 years ago

sorry for spamming, but this is the right thing to do :-)

Status: UNCONFIRMED → RESOLVED

Closed: 21 years ago → 21 years ago

Keywords: intl

Resolution: --- → FIXED

Bugzilla

Quick Search

javascript error when using unescape function in UTF8

Categories

(Core :: DOM: Core & HTML, defect)

Tracking

()

People

(Reporter: omercier, Unassigned)

References

(
URL
)

Details

(Keywords: intl)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7