Closed Bug 179026 Opened 22 years ago Closed 22 years ago

URL parameter containing non-ASCII characters is not parsed correctly

Categories

(Core :: Internationalization, defect)

x86
Windows 2000
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla1.3alpha

People

(Reporter: momoi, Assigned: darin.moz)

References

()

Details

(Keywords: regression, topembed)

Attachments

(1 file)

** Observed with 2002-11-06 1.0.2 branch and trunk builds ** ** This problem does not occur with 1.0.1 branch builds such as Netscape 7-rtm. ** 1. Go to the above chat site in China. 2. Click on any chat channels using the latest trunk or 1.0.2 branch build. 3. It cannot find the chat room in question unless the URL parameter for channel contains an ASCII character. This does happen with 1.0.1 branch builds such as Mozilla 1.0 or Netscape releases. These channels are accessed by an HREF URL such as the following: (View this under Chinese (GB2312) encoding. HREF="javachat/login.php?channel=»¨¼¾Óê¼¾" The URL is sent to the server side script with a channel name appended at the end. With the latest trunk and 1.0.2 branch builds, no character after "channel=" seems to be processed. So the resulting URL is: http://chat.sohu.com/javachat/login.php?channel= with no channel name. On this page there are a few channels that can be found and all of them contain at least 1 ASCII character as part of the channel name, e.g.: HREF="subclass.php?classname=boys" HREF="javachat/login.php?channel=³õÁµÔÚ14" Note the 2nd example. If the channel name is displayed in the status bar, that channel can be found when the link is clicked on. If not, it cannot be. The status bar displays the channel name correctly in the 2nd case above -- it seems that ASCII characters "14" seems to make this possible. The opening of chat channels at this chat site critically depends on processing the channel parameter correctly. This is a regression from 1.0.1 branch releases.
This is needed for embedding customers and should also be fixed ASAP for the 1.0.2. branch. Nominating fro nsbeta1 and topembed. Note that this may be related to Bug 169388. Severity is critical.
Severity: normal → critical
Keywords: nsbeta1, topembed
nhotta
Assignee: smontagu → nhotta
The Chinese channel name doesn't show in URL and staus bar is a regression. But by clicking any channel name, after I login as a random guest name, I always get the same chat page no matter which channel name that I clicked: http://chat.sohu.com/javachat/main.php this is the same result with IE. So, looks like it doesn't matter with whether the channel name is displayed or not in URL bar, I'll always get the same chat page at last. IE, N7.0RTM and latest branch or trunk build has the same result.
Keywords: regression
Sorry, there was a mistaken in my previous comment: In 11-07 branch or trunk build, when I click the Chinese channel name, I'll have the page: http://chat.sohu.com/javachat/main.php But with error message: "²ÎÊý´íÎó£¬Ã»ÓÐÖ¸¶¨·¿¼äÃû»òÕßÓû§êdzƣ¬Çë·µ»ØÖØнøÈë". If I enter channel name "³õÁµÔÚ14", then I'll see the chat room same as in IE.
Target Milestone: --- → mozilla1.3alpha
The regression started to happen from 10/31 1.0 branch build. The last good build is 10/30 branch build. Starting from 10/31, the escaped channel name is not shown in url bar when clicking on a channel, and "wrong parameter" error occurs in this case.
I reverted a change for nsURLParsers.cpp checked in 10/30 and the problem was fixed. The problem was that non ASCII characters after '?' were truncated. http://chat.sohu.com/javachat/login.php?channel=%B4%BA%CC%EC%BB%A8%BB%E1%BF%AA I think this is caused by bug 169902. Reassign to dbradley%netscape.com.
Assignee: nhotta → dbradley
Keywords: adt1.0.2
This looks like it worked before the 169902 fix by pure accident. p happens to be a char rather than unsigned char so the previous *p > 0 test (I'm assuming a test for reaching the end of the string) stopped truncation at an intl char with the high bit turned on. Now that there's a different test for the string termination all intl chars appear to be less than a control character. Are URLs guaranteed to be UTF8? If some other multibyte character set is allowed I don't think you can just strip bytes since a second byte might fall in the control character range. or maybe they all avoid that range and you're safe.
Attached patch v1 patch (deleted) — Splinter Review
cast to unsigned char...
dveditz: yes, the input to these URL parsing functions must be UTF-8, so this filtering is valid (if done correctly). -> me
Assignee: dbradley → darin
When saw this bug, I was wondering the same thing if URL's were always UTF8. - for (p = spec + specLen - 1; (*p > 0) && (*p <= ' '); --p); + for (p = spec + specLen - 1; (*p <= ' ') && (p != spec); --p); Yeah, I think I misunderstood the *p > 0 as looking for a terminator. It really wasn't. The comment could have been a little more explicit about *p > 0 I'm not well versed on UTF8, but I'm wondering what constitues a control character. I'm just a little worried that (unsigned char)*p <= ' ' may not be enough. Since we're going character by character through this, we may be in a sequence of valid non-whitespace characters that evaluate to true, depending on where p is positioned.
dbradley: w/ UTF-8, each non-ASCII character has the high-bit (the 8-th bit) set. as a result, the expression: ((unsigned char) *p <= ' ') should not evaluate to true if the byte (*p) is part of a UTF-8 multibyte character. (this is one of the properties of UTF-8 that make it so attractive.)
Comment on attachment 105740 [details] [diff] [review] v1 patch r=dbradley Ok, that makes sense then. It needs to get checked into the branch as well.
Attachment #105740 - Flags: review+
*** Bug 177730 has been marked as a duplicate of this bug. ***
*** Bug 179143 has been marked as a duplicate of this bug. ***
Comment on attachment 105740 [details] [diff] [review] v1 patch sr=dveditz
Attachment #105740 - Flags: superreview+
Status: NEW → ASSIGNED
Keywords: mozilla1.0.2
This is a pretty severe bug that will break many sites. Definitely needs to make it into blackbird.
I see the patch was checked in to the trunk. I tested my local build and the problem for http://chat.sohu.com/ is fixed.
On the 1.0.2 branch we'd like the bogus 169902 patch backed out instead. It looks like the real bug in 169902 was not applicable to the branch (no FilterString) and only the hitchhiker from comment 3 was checked into the branch. Although it should be equivalent, we're more comfortable with the code we know worked until 10/31.
adt plussing to back out 169902 patch as dveditz suggests in comment 18
Keywords: adt1.0.2adt1.0.2+
The 11-11 trunk build seems already has this fixed: 1. http://chat.suho.com, the chinese character parameter channel name is escaped in URL bar, and I don't get the wrong parameter error message any more. 2. The cases montioned in dup. bug 179143 were fixed: local and ftp file system are displayed/opened properly. 3. Can open the URL that including czech char in the test case of dup. bug 177730 I have tested on WinXP-SC, linux RH7.2-JA, Mac 10.1.5 and Mac 9.2.1.
marking FIXED... dbradley: thanks for checking in the patch!
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Marking as verified per comment #20.
Status: RESOLVED → VERIFIED
Comment on attachment 105740 [details] [diff] [review] v1 patch a=asa for checkin to the 1.2 branch (on behalf of drivers)
Attachment #105740 - Flags: approval+
The only issue with backing out on the 1.0 branch, is this will reintroduce the slight risk of a buffer under run that existed before. As long as everyone is fine with that, I'll back it out today.
i'd prefer not re-introducing the buffer under run. what's so risky about the current patch? seems a lot safer than a buffer under run.
From adt: ok, check in the new patch (after getting driver's approval).
What's the impact of the current fix, i.e. are there any specific areas QA should focus on testing in order to make sure that there are no regressions introduced? Thanks.
a=rjesup@wgate.com for 1.0 branch; change mozilla1.0.2+ to fixed1.0.2 when checked in
fixed1.0.2
fixed1.2 revision 1.19.2.1 date: 2002/11/11 20:16:23; author: darin%netscape.com; state: Exp; lines: +1 -1 fixes bug 179026 "URL parameter containing non-ASCII characters is not parsed correctly" r=dbradley sr=dveditz a=asa
Keywords: fixed1.2
Verified fixed on 11-12-13 branch build, change "fixed1.0.2" to "verified1.0.2".
Verified fixed in 11-26 1.2 build / WinME. Change "fixed1.2" to "verified1.2".
Keywords: fixed1.2verified1.2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: