Closed
Bug 179026
Opened 22 years ago
Closed 22 years ago
URL parameter containing non-ASCII characters is not parsed correctly
Categories
(Core :: Internationalization, defect)
Tracking
()
VERIFIED
FIXED
mozilla1.3alpha
People
(Reporter: momoi, Assigned: darin.moz)
References
()
Details
(Keywords: regression, topembed)
Attachments
(1 file)
(deleted),
patch
|
dbradley
:
review+
dveditz
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
** Observed with 2002-11-06 1.0.2 branch and trunk builds **
** This problem does not occur with 1.0.1 branch builds such as
Netscape 7-rtm. **
1. Go to the above chat site in China.
2. Click on any chat channels using the latest trunk or
1.0.2 branch build.
3. It cannot find the chat room in question unless the
URL parameter for channel contains an ASCII character.
This does happen with 1.0.1 branch builds such as Mozilla 1.0 or
Netscape releases.
These channels are accessed by an HREF URL such as the
following: (View this under Chinese (GB2312) encoding.
HREF="javachat/login.php?channel=»¨¼¾Óê¼¾"
The URL is sent to the server side script with a channel
name appended at the end.
With the latest trunk and 1.0.2 branch builds, no
character after "channel=" seems to be processed.
So the resulting URL is:
http://chat.sohu.com/javachat/login.php?channel=
with no channel name.
On this page there are a few channels that can be found and
all of them contain at least 1 ASCII character as part of
the channel name, e.g.:
HREF="subclass.php?classname=boys"
HREF="javachat/login.php?channel=³õÁµÔÚ14"
Note the 2nd example. If the channel name is displayed
in the status bar, that channel can be found when the
link is clicked on. If not, it cannot be.
The status bar displays the channel name correctly in
the 2nd case above -- it seems that ASCII characters "14"
seems to make this possible.
The opening of chat channels at this chat site critically
depends on processing the channel parameter correctly.
This is a regression from 1.0.1 branch releases.
Reporter | ||
Comment 1•22 years ago
|
||
This is needed for embedding customers and should also be fixed
ASAP for the 1.0.2. branch. Nominating fro nsbeta1 and topembed.
Note that this may be related to Bug 169388.
Severity is critical.
Comment 3•22 years ago
|
||
The Chinese channel name doesn't show in URL and staus bar is a regression.
But by clicking any channel name, after I login as a random guest name, I always
get the same chat page no matter which channel name that I clicked:
http://chat.sohu.com/javachat/main.php
this is the same result with IE.
So, looks like it doesn't matter with whether the channel name is displayed or
not in URL bar, I'll always get the same chat page at last. IE, N7.0RTM and
latest branch or trunk build has the same result.
Keywords: regression
Comment 4•22 years ago
|
||
Sorry, there was a mistaken in my previous comment:
In 11-07 branch or trunk build, when I click the Chinese channel name, I'll have
the page:
http://chat.sohu.com/javachat/main.php
But with error message: "²ÎÊý´íÎó£¬Ã»ÓÐÖ¸¶¨·¿¼äÃû»òÕßÓû§êdzƣ¬Çë·µ»ØÖØнøÈë".
If I enter channel name "³õÁµÔÚ14", then I'll see the chat room same as in IE.
Updated•22 years ago
|
Target Milestone: --- → mozilla1.3alpha
The regression started to happen from 10/31 1.0 branch build. The last good
build is 10/30 branch build. Starting from 10/31, the escaped channel name is
not shown in url bar when clicking on a channel, and "wrong parameter" error
occurs in this case.
Comment 6•22 years ago
|
||
I reverted a change for nsURLParsers.cpp checked in 10/30 and the problem was
fixed.
The problem was that non ASCII characters after '?' were truncated.
http://chat.sohu.com/javachat/login.php?channel=%B4%BA%CC%EC%BB%A8%BB%E1%BF%AA
I think this is caused by bug 169902.
Reassign to dbradley%netscape.com.
Assignee: nhotta → dbradley
Comment 7•22 years ago
|
||
This looks like it worked before the 169902 fix by pure accident. p happens to
be a char rather than unsigned char so the previous *p > 0 test (I'm assuming a
test for reaching the end of the string) stopped truncation at an intl char with
the high bit turned on. Now that there's a different test for the string
termination all intl chars appear to be less than a control character.
Are URLs guaranteed to be UTF8? If some other multibyte character set is allowed
I don't think you can just strip bytes since a second byte might fall in the
control character range. or maybe they all avoid that range and you're safe.
Assignee | ||
Comment 8•22 years ago
|
||
cast to unsigned char...
Assignee | ||
Comment 9•22 years ago
|
||
dveditz: yes, the input to these URL parsing functions must be UTF-8, so this
filtering is valid (if done correctly).
-> me
Assignee: dbradley → darin
Comment 10•22 years ago
|
||
When saw this bug, I was wondering the same thing if URL's were always UTF8.
- for (p = spec + specLen - 1; (*p > 0) && (*p <= ' '); --p);
+ for (p = spec + specLen - 1; (*p <= ' ') && (p != spec); --p);
Yeah, I think I misunderstood the *p > 0 as looking for a terminator. It really
wasn't. The comment could have been a little more explicit about *p > 0
I'm not well versed on UTF8, but I'm wondering what constitues a control
character. I'm just a little worried that (unsigned char)*p <= ' ' may not be
enough. Since we're going character by character through this, we may be in a
sequence of valid non-whitespace characters that evaluate to true, depending on
where p is positioned.
Assignee | ||
Comment 11•22 years ago
|
||
dbradley: w/ UTF-8, each non-ASCII character has the high-bit (the 8-th bit)
set. as a result, the expression:
((unsigned char) *p <= ' ')
should not evaluate to true if the byte (*p) is part of a UTF-8 multibyte
character. (this is one of the properties of UTF-8 that make it so attractive.)
Comment 12•22 years ago
|
||
Comment on attachment 105740 [details] [diff] [review]
v1 patch
r=dbradley
Ok, that makes sense then. It needs to get checked into the branch as well.
Attachment #105740 -
Flags: review+
Comment 13•22 years ago
|
||
*** Bug 177730 has been marked as a duplicate of this bug. ***
Comment 14•22 years ago
|
||
*** Bug 179143 has been marked as a duplicate of this bug. ***
Comment 15•22 years ago
|
||
Comment on attachment 105740 [details] [diff] [review]
v1 patch
sr=dveditz
Attachment #105740 -
Flags: superreview+
Assignee | ||
Updated•22 years ago
|
Status: NEW → ASSIGNED
Keywords: mozilla1.0.2
Comment 16•22 years ago
|
||
This is a pretty severe bug that will break many sites. Definitely needs to make
it into blackbird.
Comment 17•22 years ago
|
||
I see the patch was checked in to the trunk.
I tested my local build and the problem for http://chat.sohu.com/ is fixed.
Comment 18•22 years ago
|
||
On the 1.0.2 branch we'd like the bogus 169902 patch backed out instead. It
looks like the real bug in 169902 was not applicable to the branch (no
FilterString) and only the hitchhiker from comment 3 was checked into the branch.
Although it should be equivalent, we're more comfortable with the code we know
worked until 10/31.
Comment 19•22 years ago
|
||
adt plussing to back out 169902 patch as dveditz suggests in comment 18
Comment 20•22 years ago
|
||
The 11-11 trunk build seems already has this fixed:
1. http://chat.suho.com, the chinese character parameter channel name is escaped
in URL bar, and I don't get the wrong parameter error message any more.
2. The cases montioned in dup. bug 179143 were fixed: local and ftp file system
are displayed/opened properly.
3. Can open the URL that including czech char in the test case of dup. bug 177730
I have tested on WinXP-SC, linux RH7.2-JA, Mac 10.1.5 and Mac 9.2.1.
Assignee | ||
Comment 21•22 years ago
|
||
marking FIXED... dbradley: thanks for checking in the patch!
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 23•22 years ago
|
||
Comment on attachment 105740 [details] [diff] [review]
v1 patch
a=asa for checkin to the 1.2 branch (on behalf of drivers)
Attachment #105740 -
Flags: approval+
Comment 24•22 years ago
|
||
The only issue with backing out on the 1.0 branch, is this will reintroduce the
slight risk of a buffer under run that existed before. As long as everyone is
fine with that, I'll back it out today.
Assignee | ||
Comment 25•22 years ago
|
||
i'd prefer not re-introducing the buffer under run. what's so risky about the
current patch? seems a lot safer than a buffer under run.
Comment 26•22 years ago
|
||
From adt: ok, check in the new patch (after getting driver's approval).
Comment 27•22 years ago
|
||
What's the impact of the current fix, i.e. are there any specific areas QA
should focus on testing in order to make sure that there are no regressions
introduced? Thanks.
Comment 28•22 years ago
|
||
a=rjesup@wgate.com for 1.0 branch; change mozilla1.0.2+ to fixed1.0.2 when
checked in
Keywords: mozilla1.0.2 → mozilla1.0.2+
Assignee | ||
Comment 30•22 years ago
|
||
fixed1.2
revision 1.19.2.1
date: 2002/11/11 20:16:23; author: darin%netscape.com; state: Exp; lines: +1 -1
fixes bug 179026 "URL parameter containing non-ASCII characters is not
parsed correctly" r=dbradley sr=dveditz a=asa
Keywords: fixed1.2
Comment 31•22 years ago
|
||
Verified fixed on 11-12-13 branch build, change "fixed1.0.2" to "verified1.0.2".
Keywords: fixed1.0.2 → verified1.0.2
Comment 32•22 years ago
|
||
Verified fixed in 11-26 1.2 build / WinME. Change "fixed1.2" to "verified1.2".
Keywords: fixed1.2 → verified1.2
You need to log in
before you can comment on or make changes to this bug.
Description
•