Closed Bug 116143 Opened 23 years ago Closed 23 years ago

windows-1252 conversion is not round trip

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.8

People

(Reporter: ftang, Assigned: ftang)

References

()

Details

Attachments

(1 file)

this is true for other encoding, we may want to fix all single bytes encoding to make them round trip for cgi. here is the bug for windows-1252.
here is the patch from bug 87736 to make it round trip http://bugzilla.mozilla.org/attachment.cgi?id=40024&action=view could we r= on this patch regardless we want to fix 87736 or not.
Assignee: yokoyama → ftang
Target Milestone: --- → mozilla0.9.8
Status: NEW → ASSIGNED
The abve URL contains test cases created by bclary. Here's the explanation of cases: ** These files contain a form with a hidden field values. When the hidden field values can contain every possible 8-bit value including control characters. When the page is loaded, the hidden values are sent to an echo script, which is currently Netscape-internal. We may substitute an echo script in an external server. This type of technique is used in secure authentication apparently and we have had 2 inquiries about this problem just this week. In these sites, they generate random encrypted values for login names or other key values and then send these values back to the server. ** Explanation of files ** The hidden field values are identical in all the test cases below: x5Ex89x7Ex7Fx80x81xA4x82xA5x83x84x85x86x87x88x89x8Ax8Bx8Cx8Dx8E x90x91x92x93x94x95x96x97x98x99x9Ax9Bx9Cx9Dx9Ex9FxA2xA3xA4xA5xA6 xA7xA8xA9xAAxABxACxADxAExAFxB0xB1xB2xB3xB4xB5xA7xB6xB7xB8xB9xBA xBBxBCxBDxBExBFxC5xC6xC7xCBxCCxD0xD1x3DxD6xD7xD8xDCxDDxDExDFxE5 xE6xE7xEBxECxF0xF1x83xC7xF6xF7xF8xFCxFDxFExF 1. iso-8859-1.html This file has meta charset info indicating that it is in ISO-8859-1. 2. windows-1252.html The same hidden value as case 1 except that this page has meta charset info indicating that it is in Windows-1252. 3. nocharset.html The same hidden value as case 1 except that this page has NO meta charset info. Use "User-defined" encoding for testing. 4. Client-nocharset.html The same hidden value as case 1 except that this page has NO meta charset info. There is also a client side escaping of all 8-bit values. Our initial test shows that 2001-12-19 win32 trunk build fails all 4 test cases. For cases 1 - 3, the buffer seems to show only about 10 characters or fewer missing many of them. Each case returns somewhat different results.
Forgot to add the URL.
I said above that 2001-12-19 Win32 trunk build truncates the input values to 10 or fewer characters in all 3 cases. I tried Netscape 6.2.1-RTM build with the above test cases and the results are much better. For iso-8859-1 page, it fails to convert non-existing characters for this code page (and also x7F) an dturn them into x3F (?). But all bytes are there. For windows-1252, the results are better than the iso-8859-1 case because it also processes values x80 - x9F. The test showed that it missed only 1 character in conversion, i.e. x81, which is probably still undefined for this encoding. If we use User-defined encoding on the 'nocharset.html' test case, the results are correct and preserve all values in the echo buffer. Comparing these results shows that there have been a regression in this area between 0.9.4 branch (NS 6.2.1) and the latest trunk.
I tried the patch at: http://bugzilla.mozilla.org/attachment.cgi?id=40024&action=view on the current trunk build compiled from source tonight. The 4 test cases at: http://bclary.com/dia produced exactly the same results -- they correctly reflected that all the characters in the form hidden field value were sent to the server. The client side escaping seems to working with or without this patch. So this patch produces the desired result. There are 2 remaining issues: 1. One other Western encoding that is likely to be used by Latin 1 web sites is ISO-8859-15. We don't have a patch for that encoding yet. ISO-8859-1 will be taken care of by the above patch. I will file a separate bug for it. It should be fixed in the next milestone. 2. In cases where Mozilla found undefined characters against a certain code page, in form submissions this resulted in the truncation of the characters after that character as reported above in comment 2 for the current trunk builds. Such truncation does not happen for 0.9.4 builds. This truncation is likely to occur in other encodings even if we take care of 8859-1, 8859-15 and Windows-1252. I will file a separate bug for this.
Surprisingly when I tested the 2 remaining problem scnenarios with encodings other Western ones such as 8859-7, 8858-5, etc., I found that there was no truncation when I used the patched build. I have not tested fully yet but can ftang explain this result?
momoi, let's move other charset to a seperate bug and discuss there. Let's keep the comment of this bug report clear for landing the current patch to fix the 1252 issue. nhotta- can you r= it ?
Blocks: 104056
Looks fine. r=shanjian
Blocks: 104148
No longer blocks: 104056
Comment on attachment 62891 [details] [diff] [review] the same patch r=shanjian without mpl change Recording r=shanjian too. sr=brendan@mozilla.org. /be
Attachment #62891 - Flags: superreview+
Attachment #62891 - Flags: review+
Blocks: 104160
No longer blocks: 104148
Blocks: 104060
No longer blocks: 104160
fixed and check in. File seperate bug for the other charsets.
fixed and check in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Let me take this QA work.
QA Contact: teruko → momoi
No longer blocks: 104060
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: