Closed Bug 200984 Opened 22 years ago Closed 21 years ago

[FIX]Arabic text in Javascript "unescape" function returns the wrong output

Categories

(Core :: Internationalization, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.4beta

People

(Reporter: neokuwait, Assigned: bzbarsky)

References

Details

Attachments

(3 files, 2 obsolete files)

Gecko/20030328 TO REPRODUCE: 1. view testcase ACTUAL: the unescape function produces Latin characters instead of Arabic EXPECTED: only characters in their %xx.. hexadecimal form should be affected? (see IE6)
Attached file unescape testcase (deleted) —
Intl -- not arabic specific.
Assignee: mkaply → smontagu
Status: UNCONFIRMED → NEW
Component: BiDi Hebrew & Arabic → Internationalization
Ever confirmed: true
OS: Windows XP → All
QA Contact: zach → ylong
Hardware: PC → All
Attached patch Proposed patch (obsolete) (deleted) — Splinter Review
Attached patch Oops, that had an extra chunk in it... (obsolete) (deleted) — Splinter Review
Attachment #119646 - Attachment is obsolete: true
Comment on attachment 119647 [details] [diff] [review] Oops, that had an extra chunk in it... Would you review? The key here is that ToNewCString does lossy "conversion" to a CString.... This patch makes the conversion use the proper charset encoder, does a bit of cleanup, and removes the unnecessary call to Reset() (new decoders already come blank).
Attachment #119647 - Flags: superreview?(jst)
Attachment #119647 - Flags: review?(smontagu)
mine.
Assignee: smontagu → bzbarsky
Priority: -- → P1
Summary: Arabic text in Javascript "unescape" function returns the wrong output → [FIX]Arabic text in Javascript "unescape" function returns the wrong output
Target Milestone: --- → mozilla1.4beta
Comment on attachment 119647 [details] [diff] [review] Oops, that had an extra chunk in it... // Allocate a buffer of the maximum length PRUnichar *dest = (PRUnichar*)nsMemory::Alloc(sizeof(PRUnichar) * maxLength); + NS_ENSURE_TRUE(dest, NS_ERROR_OUT_OF_MEMORY); + ... + rv = decoder->Convert(encodedData, &unescapedByteCount, dest, &destLen); + if (NS_FAILED(rv)) { nsMemory::Free(dest); - return result; + return rv; } aReturn.Assign(dest, destLen); nsMemory::Free(dest); Could you make dest an nsAutoPtr<PRUnichar *> here? If not, you may want to flip the code around so that you need only one call to nsMemory::Free(dest)... sr=jst
Attachment #119647 - Flags: superreview?(jst) → superreview+
nsAutoPtr will call "delete" on the pointer, so that's no good here (unless I actually do a "new PRUnichar[]" and cast or something like that.... I'll flip the test around; good catch.
Before I review, can you quiet my suspicions that this is a non-problem? Specifically, is non-ASCII text legal input to unescape(), and does calling unescape() with such text have defined output?
... and another question (possibly for another bug) : is %DA%D1%C8%ED the correct escaping of the string in the testcase? Why not %0639%0631%0628%064A? Or possibly %D8%B9%D8%B1%D8%A8%D9%89?
Comment on attachment 119647 [details] [diff] [review] Oops, that had an extra chunk in it... OK, I accept that we need this for compatibility, although we need a new bug to consider what the correct behaviour is, especially of escape(). Opera, IE, and Konqueror all render the second line of the testcase as: escaped: %u0639%u0631%u0628%u064A%20%u0639%u0631%u0628%u064A >+ // To gracefully deal with encoding issues, we have to do the following: Nit: don't split infinitives r=smontagu
Attachment #119647 - Flags: review?(smontagu) → review+
Comment on attachment 119647 [details] [diff] [review] Oops, that had an extra chunk in it... Waldemar, Simon asked me to make sure you were OK with this patch.. could you take a look, please?
Attachment #119647 - Flags: review+ → review?(waldemar)
The "new bug" I requested in comment 11 would be a dupe of bug 44272 :-)
I'll take a look at what the patch does. I'm currently building Mozilla so I can try it.
Attached file UTF-16 test case (deleted) —
I'm not familiar with the character set used in the first test case, so I used a UTF-16 test case. encodeURI/decodeURI work as expected, but escape and unescape don't produce visible output with the patch applied. Hmmmm.....
Comment on attachment 120504 [details] UTF-16 test case With this testcase, "escape" produces no output even without my patch... The problem is the use of flat char* buffers to handle UTF-16 text, of course. Things die at the first null, which is the first byte if your endianness is right. I suppose I could go ahead and rewrite all that stuff in nsGlobalWindow to not be broken like that....
Attachment #120504 - Attachment mime type: text/utf16 → text/html
ccing some intl people for opinions
Asking the more general question: why do the JavaScript 'escape' and 'unescape' functions care what the document character set is? They are always passed UTF-16 from the JS engine and should send UTF-16 back to it. 'encodeURI' and 'decodeURI' don't care and work properly.
> why do the JavaScript 'escape' and 'unescape' functions care what the document > character set is? Because that's the behavior real web sites expect. In particular, one will often have a site escape() a string, then on the server unescape it and expect it to be in the document encoding.... Also, some sites will escape a string server-side, converting a string in document encoding to %-escapes. When unescaped, we get a byte sequence that's in the document encoding; to give it back to the JS engine we have to convert it to UTF-16.
Blocks: 126524
Waldemar, please see comment 19?
Attached patch Updated to tip (deleted) — Splinter Review
Attachment #119647 - Attachment is obsolete: true
Attachment #119647 - Flags: review?(waldemar)
Attachment #125691 - Attachment filename: &#29696;&#25856;&#29440;&#29696;&#11776;&#28672;&#24832;&#29696;&#25344;&#26624; → test.patch
Attachment #125691 - Flags: review?(waldemar)
Attachment #125691 - Flags: review?(waldemar) → review+
Checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Verified fixed in 06-17 trunk build on WinXP except the problem in bug 44272.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: