Closed
Bug 117422
Opened 23 years ago
Closed 23 years ago
Using Turkish character "I" in edit boxes causes data loss after submitting the form
Categories
(Core :: Internationalization, defect, P2)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla0.9.9
People
(Reporter: mdakin, Assigned: ftang)
References
Details
(Keywords: dataloss)
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
john
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7+)
Gecko/20011221
BuildID: 20011221
If I use turkish character "small I (i without the dot)" in edit boxes or fields
in forms, I cannot see the the rest of thing I have written after that letter
after submitting it, all that data is lost.
Situation doent change even if I change character coding for composition to
Turkish ISO 8859-9
Reproducible: Always
Steps to Reproduce:
1. Open a page with data entry (like bugzilla, or any forum pages)
2. write something with Turkish letter "small I"
Actual Results: dataloss after "small I" character
Expected Results: I could see the things I have written correctly after submission
Reporter | ||
Comment 1•23 years ago
|
||
Test case:
This sentence is written without "small I" character
This sentence is written with "small ý" character
Reporter | ||
Comment 2•23 years ago
|
||
Well, it seems working now :)
but I swear it wasnt working.. (in linux and windows)
Turkish character test: üðþýçö ÜÐÞÝÇÖ
Reporter | ||
Comment 3•23 years ago
|
||
Ok it seems working somehow, closing the bug for now.
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
Comment 5•23 years ago
|
||
trying with
Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.7+) Gecko/20011228
No turkish characters:
"bahar aksamlari hava ilik olur"
Turkish Characters:
"bahar akþamlarý hava ýlýk olur"
Comment 6•23 years ago
|
||
same thing with western coding
No turkish characters:
"bahar aksamlari hava ilik olur"
with Turkish Characters:
"bahar akþamlarý hava ýlýk olur"
Reporter | ||
Comment 7•23 years ago
|
||
In this system (at work) I can verify this error.
steps:
1.go to www.google.com
2.write "tirtil" to search box (but with turkish small I characters)
3.and press "Google search"
Expectes result:
Google searches the word "tirtil"
Actual Result:
Google searches for "t" not "tirtil" , all characters after first turkish I are
lost
Some prefs:
Navigator
Language
Default character encoding : ISO 8859-1
Language En-US
Appearance
Fonts: Western
can anyone try this???
you have to change your keyboard layout to Turkish-Q and also language to Turkish
Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.6+) Gecko/20011212
Status: VERIFIED → UNCONFIRMED
Resolution: WORKSFORME → ---
I think there are 2 parts to this bug. The rendering problem, I'm thinking was
due to regression bug 116230 which was causing everyone grief with text
widgets,until it was fixed on 01/04/01. Would it be possible for you to try a
more recent build? I can' seem to recreate the rendering problem you mentioned
with my debug build from today.
The 2nd problem is the dataloss on submission. I was speaking to ftang, and he
said he thinks he knows what the problem is there.
Giving this bug to ftang as he requested.
Assignee: kin → ftang
Reporter | ||
Comment 9•23 years ago
|
||
Confirmed on,
Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.7+) Gecko/20020116
installed it, and didnt touch any of the prefs, and went to
www.google.com
written "týrtýl" and
same result, it searched for "t" not "tirtil" and same situation for turkish
character "s" (s with a dot below it, pronunciated like "sh" )
I agree with your second idea, this dataloss happen on submission, they are
lost. I guess they have to be converted to special characters on submission,
but somethin is wrong, in previous versions this didnt happen, and I dont know
when this regression happened. I will try this with mozilla versions at home
and try to find the last time it was working.
This is really a show stopper for Mozilla in Turkey.
Keywords: dataloss
Reporter | ||
Comment 10•23 years ago
|
||
I checked my old mozilla nightlies and I can see this bug in
September 16 builds but not in September 4 builds.
So there is a check-in between 4-16 September that caused this regression.
bug 81203 and some others may be related to this.
I also observed that the code "mozilla/ layout/ html/ forms/ src/
nsFormFrame.cpp" is probably responsible for this. there are lots of code
additions to that fie between 4-16 September, and versions 3.175 and 3.183
If I can check the versions between 4-16 september, I can make the time interval
narrower.
hope this helped.
Reporter | ||
Comment 11•23 years ago
|
||
I meant 4-16 October for the date interval of source of regression ..
Comment 12•23 years ago
|
||
I got different results with my old OS/2 nightlies. I checked and rechecked.
I see that the problem was introduced between 8 AM on November 1st and 8 AM on
November 2nd.
I didn't see any specific checkin that look suspicious, but it is a long list.
I specifically saw that with the 8AM Nov 2nd build, the search on www.google.com
was done with "t" instead of the whole word.
Comment 13•23 years ago
|
||
duh. The new form submission code -
http://bugzilla.mozilla.org/show_bug.cgi?id=34297
Assignee | ||
Comment 14•23 years ago
|
||
If I do this in Mozilla, I generate
http://www.google.com/search?hl=en&q=t&btnG=Google+Search
as url
which mean the q is "t"
If i do this in IE6, I got ich
http://www.google.com/search?hl=en&q=t%26%23305%3Brt%26%23305%3Bl&btnG=Google+Search
which mean the q is
"t%26%23305%3Brt%26%23305%3Bl"
escaped of "tırtıl"
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: nsbeta1
Target Milestone: --- → mozilla0.9.9
Assignee | ||
Updated•23 years ago
|
Priority: -- → P2
Assignee | ||
Comment 15•23 years ago
|
||
nsbeta1+ per triage meeting. Still need to decide HOW to fix it (what to sent out)
Assignee | ||
Comment 16•23 years ago
|
||
jkeiser- I have a patch which will make our code match IE 6 behavior. Will
attached soon. Please r=
Assignee | ||
Comment 17•23 years ago
|
||
Updated•23 years ago
|
Attachment #70223 -
Flags: needs-work+
Comment 18•23 years ago
|
||
Comment on attachment 70223 [details] [diff] [review]
patch match up to IE6
nsFormSubmission::GetEncoder():
1. Could you remove the aCtrlsModAtSubmit param and change encoder param to
aEncoder while you're in there? (This wasn't your fault.) It looks like it's
not doing anything.
2. Make nsString charset an nsAutoString (no mallocs required there), and move
that block below nsresult rv = NS_OK; (with a blank line before it)
3. Use rv = CallCreateInstance(NS_SAVEASCHARSET_CONTRACTID, aEncoder) rather
than a direct CreateInstance call. This is safer. Also remove kSaveCharsetCID
definition, since you won't need it for CallCreateInstance(). (When assigning
to nsCOMPtr<>, use do_CreateInstance().)
4. Do NS_ENSURE_SUCCESS(rv, rv); rather than if (NS_SUCCEEDED(rv)). It saves
you the if() (and the indent) and does the same thing.
5. Do the same after Init(), just for clarity's sake. return NS_OK; at the end
of the function.
nsFormSubmission::UnicodeToNewBytes()
6. Not a requirement for r=, but would be really helpful: could you put a
little JavaDoc comment on UnicodeToNewBytes() (the definition of the function
in the class) for those of us who aren't int'l-savvy? nsIFormSubmission.h has
methods similarly documented, you can find a format there.
So what exactly does this buy us? Were we using nsIUnicodeEncoder wrong
before? nsISaveAsCharset seems to wrap nsIUnicodeEncoder and do some extra
stuff with it.
Assignee | ||
Comment 19•23 years ago
|
||
John:
The problem happen when you type in a form which cannot be encoded with the
character set of the page you are viewing.
For example, if you are viewing an english page which encoded in ISO-8859-1
encoding, (which have 256 characters defined). Since we are an Unicode
application, users could type some characters other than those ISO-8859-1
characters, or copy & paste from other pages (say japanese page like
http://home.netscape.com/ja)
In that case, when we convert by using unicode converter, it will STOP at the
point that contains the character which cannot be encoded.
For example, if user put in "abc" + Two chinese characters + "eft" , the unicode
converter will convert "abc" . in 6.2.1, it will convert to "abc" + two question
mark + "eft" since we set the error behavior to repalce fallback to '?' but
someone remove that code (in the current GetEncoder, youc can see that part of
code got commented out ). this is very bad because user could type "Software "
+ Japanese names + " hardware car" when they search google. if we convert to
"software ??? hardware car" then at least the search engine can match "software"
"hardware" and "car", but the current behavior will only send out "software"
the "?" approach is not idea but acceptable. However, when I look at IE6, it
seems they use NCR to encode these "cannot be converted" character, this is not
specified in any standard but nor any standard currently address this issue. I
think it is better to fix this to the IE6-non-standard way, instead of to fix it
to the old Netscape-non-standard way. And since there are currently no standard
which address it, we don't need to worry about standard issue here.
Assignee | ||
Updated•23 years ago
|
Attachment #70223 -
Attachment is obsolete: true
Assignee | ||
Comment 20•23 years ago
|
||
I also handle the case if the string is "" differently and
remove some of the "won't build" #ifdef DEBUG_ftang part.
Comment 21•23 years ago
|
||
Comment on attachment 70334 [details] [diff] [review]
v2 of the patch.
1. CallCreateInstance( - remove space after paren
2. if (aSrc && aSrc[0]) - does this if really buy us anything? At the least,
aSrc should not be null (it comes from an nsAString) so don't check that. If
there's not a significant performance penalty to call aEncoder with aSrc == "",
please just remove the if() entirely.
With that, r=jkeiser.
Attachment #70334 -
Flags: review+
Comment 22•23 years ago
|
||
I prefer to check the conversion error then send as UTF-8 (% escape if
necessary) because I see recommendation to use UTF-8.
For HREF,
http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1
International URI draft,
http://search.ietf.org/internet-drafts/draft-masinter-url-i18n-08.txt
Is there a standard for this (and IE is based on that)?
I think IE parity is also important, so I am not pushing UTF-8 if nobody
understand, cc to momoi.
Comment 23•23 years ago
|
||
That's done in a different step. We first take the Unicode value and translate
it to the charset in question, and then we URLEncode it (which does the % thing).
Assignee | ||
Comment 24•23 years ago
|
||
>2. if (aSrc && aSrc[0]) - does this if really buy us anything?
if I don't do this , the converter will return conversion error since if I pass
null string in.
chat w/ nhotta about the spec he mention. We agree we may need to revisit this
code once they Internet Draft got accept. From my view, the chances are remote
since it will break most of the cgi.
Comment 25•23 years ago
|
||
OK, fine by me.
Comment 26•23 years ago
|
||
Comment on attachment 70334 [details] [diff] [review]
v2 of the patch.
sr=jst
Attachment #70334 -
Flags: superreview+
Comment 27•23 years ago
|
||
Comment on attachment 70334 [details] [diff] [review]
v2 of the patch.
Hold the presses on this fix ... we're using EncodeVal()'s return value wrong.
If it's nsnull, we currently crash, and now this code is returning nsnull on
empty string. Emailing ftang.
Attachment #70334 -
Flags: superreview+
Attachment #70334 -
Flags: review+
Attachment #70334 -
Flags: needs-work+
Assignee | ||
Comment 28•23 years ago
|
||
are you saying that
nameStr.Adopt(EncodeVal(aName));
will crash if EncodeVal(aName) return null ?
Comment 29•23 years ago
|
||
Yes, in fact I encountered that crash yesterday. It should be a minor change
actually, if you just change it to return a new empty string, I'll r= and I'm
sure jst will sr=.
Assignee | ||
Updated•23 years ago
|
Attachment #70334 -
Attachment is obsolete: true
Assignee | ||
Comment 30•23 years ago
|
||
Updated•23 years ago
|
Attachment #70940 -
Flags: review+
Comment 31•23 years ago
|
||
Comment on attachment 70940 [details] [diff] [review]
v3 of the patch. add nsCRT::strdup("") if it is null.
If the converter returns null, the function should probably return null too
(We will start handling null as an error at some point.) I would put that
strdup in an else personally.
r=jkeiser, assuming that was the only change from before. Your call whether to
put this in an else, you know the converter better than I do.
Assignee | ||
Comment 32•23 years ago
|
||
>If the converter returns null, the function should probably return null
then it will crash as NOW, right ?
ask jst to sr
Comment 33•23 years ago
|
||
Absolutely. But *if* the encoder is returning null only when it can't new
enough memory, then your strdup will fail as well (return null) and we will
crash anyway. Even more, on *failure* case we should not return empty string.
It's a failure, we should send nothing at all rather than empty (i.e. it should
propagate all the way back to AddNameValuePair). Unless nsISaveCharset has a
legitimate reason to return null, that code is redundant and should be put in
the else case only.
The underlying problem is that we don't currently check for null from
EncodeVal() like we should. That is filed as bug 126941. There's no need to
work around that problem if it only happens in the case of out-of-memory error.
Comment 34•23 years ago
|
||
Comment on attachment 70940 [details] [diff] [review]
v3 of the patch. add nsCRT::strdup("") if it is null.
>- // Get Charset, get the encoder.
>- nsCOMPtr<nsICharsetConverterManager> ccm(
>- do_GetService(kCharsetConverterManagerCID, &rv));
>+ nsAutoString charset(aCharset);
>+ if(charset.Equals(NS_LITERAL_STRING("ISO-8859-1")))
>+ charset.Assign(NS_LITERAL_STRING("windows-1252"));
>+
The above looks weird, but I assume it's intentional and XP safe.
>+ rv = CallCreateInstance( NS_SAVEASCHARSET_CONTRACTID, aEncoder);
Loose the space after '(' here for consistency with surrounding code.
sr=jst
Attachment #70940 -
Flags: superreview+
Assignee | ||
Comment 35•23 years ago
|
||
>- // Get Charset, get the encoder.
>- nsCOMPtr<nsICharsetConverterManager> ccm(
>- do_GetService(kCharsetConverterManagerCID, &rv));
>+ nsAutoString charset(aCharset);
>+ if(charset.Equals(NS_LITERAL_STRING("ISO-8859-1")))
>+ charset.Assign(NS_LITERAL_STRING("windows-1252"));
>+
>
>The above looks weird, but I assume it's intentional and XP safe.
That logic was there, not a new code. just look at the "-" below it. yes, it is
xp and safe.
>+ rv = CallCreateInstance( NS_SAVEASCHARSET_CONTRACTID, aEncoder);
>Loose the space after '(' here for consistency with surrounding code.
will do
Assignee | ||
Updated•23 years ago
|
Reporter | ||
Comment 37•23 years ago
|
||
Well, since this bug has "r,sr" and "a", will it be in 0.9.9 ?
I cant use mozilla because of this bug.. :)
Assignee | ||
Comment 38•23 years ago
|
||
fixed and check in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 39•23 years ago
|
||
Verifying on windows 98 2002-03-01-03-trunk
and linux Redhat build 2002-03-01-09-trunk
Status: RESOLVED → VERIFIED
Comment 40•23 years ago
|
||
*** Bug 129236 has been marked as a duplicate of this bug. ***
Comment 41•22 years ago
|
||
*** Bug 125034 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•