Closed Bug 28474 Opened 25 years ago Closed 25 years ago

illegal use of nsString-external JavaScript convert charset incorrectly

Categories

(Core :: Layout, defect, P3)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: cwang, Assigned: jbetak)

References

()

Details

(Whiteboard: [PDT+]have fix. r=ftang,rickg. a=bobj, partial fix in on Friday, resolving performance issues with ftang, RickG)

Attachments

(5 files)

OS Win 98 Netscape 6 2000021808-M14 Steps to reproduce Load page www.cww.com Results: Links at the bottom page and the date on the top display incorrectly.
I'm confirming this bug.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I've isolated the problem to a single instance culled from the above. This is the Chinese (GB2312) date display. Here's the image of incorrect date display with Mozilla on a test page I created. I also append the correct display image obtained on the same test page by 4.72.
Here's what the source of the page looks like: .... <meta http-equiv="Content-Type" content="text/html; charset=gb2312"> .... <font color=#000080><strong>WXYZ</strong> <script language="javascript" src="time.js"> </script> where WXYZ = 4 Chinese characters Note that the date is actually read via an external JS script. Note also that this page has a meta charset tag indicating he GB2312 charset. The external script looks like this: var today = new Date(); document.write('('); document.write(today.getMonth()+1); document.write('A'); document.write(today.getDate()); document.write('B'); document.write(')'); where A = GB2312-encoded character for "Month" and B = GB2312-encoded character for "Day" ------------ So the problem is that Mozilla does not process the characters generated by the external js script as GB2312. Looking at the incorrect image, the values look like GB2312 characters read in as ASCII data rather than as GB2312 as indicated by the meta tag. 4.7x does read them in ad regard them as characters matching the meta tag. I attach 2 test cases + 1 external JS file below. Case 1: A brief page containing the 4 Chinese words in GB2312 and and an external JS file which generates today's date and surround them with Chinese words for "Month" and "Date". This does not display OK on Mozilla but does OK on 4.7x. Case 2: Identical file to the above -- 4 Chinese words followed by a JS function which generates today's date. However, in this example, I wrote in the JS function inside the page. This displays OK on Mozilla on 4.7x.
If you use the first html page, china.html, with "time.js" file, Mozilla will not display the date well even with a Simplified Chinese font. 4.7x does OK with this file. If you use the 2nd file, Mozilla displays the GB2312 date OK. I don't know who should get this bug. I18n, JS, or layout?
Assignee: rickg → ftang
Ftang: I think this belongs to you. The problem with this page is the author has encoded the charset spec wrong. They coded it like this: WRONG: <meta http-equiv="Content-Type" content="text/html charset="gb2312"> RIGHT: <meta http-equiv="Content-Type" content="text/html" charset="gb2312"> Nonetheless, you may need to support this style.
As far as I know the correct Meta charset lien should look loke this: <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-5"> See this W3C document: http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#adef-http-equi v I also note that the above Chinsese page(s) use this same style -- copied from their main and frame pages: <meta http-equiv="Content-Type" content="text/html; charset=gb2312"> Also Netscape Japanese Home Page uses exactly the same style: <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=x-sjis"> and we dispkay this page OK on Mozilla. Actually Mozilla can display the Chinese page OK. The only thing it cannot display OK is the external JS generated Chinese date. I don't think we can blame this problem on the incorrect meta-tag usage.
I've dealt with this problem in Netscape 4.X. The problem is that the external JS script does not work because it is assumed to be in some charset other than the one that the HTML file uses. If the external JS script arrives over HTTP, and the HTTP Content-Type header does not have a charset parameter, the browser must make an assumption, and the assumption that we chose in Netscape 4.X is to use the HTML document's charset, which is quite reasonable. Now, for backward compatibility reasons, I would suggest that Mozilla do the same.
I am very very confused now. 1. Is the JS is embedded in the HTML file or a speerate JS file ? 2. If it is a seperate file http://bugzilla.mozilla.org/showattachment.cgi?attach_id=5485 is not a good case for this bug since it is embedded in HTML RickG's comment about meta tag is not related to this bug since the "today's news" part display correctly and only the date part is wrong. If this is related to the META tag problem, then the whole thing won't display correctly.
Status: NEW → ASSIGNED
I'm sorry about the confusion. Bugzilla does not recognize file types very well. What you need to do is save these files on your local disk: A. You need to save (id=5483) and (id=5484) into the same directory. Call the first one "china.html" and the 2nd one "time.js". The 2nd name must be that name because the first page calls it by that name. These 2 files will show you the original problem in a much shorter document. B. Save the 3rd file, (id=5485), as "china2.html". This file is essentially the same as "china.html" but I included the content of "time.js" in this file itself. This page will not have a problem displaying GB2312 characters generated by JS. Looking through Bugzilla, teruko reported on a related issue in Bug 12813. This is a bug where teh source charset for external .js file is not supported. The current problem is where there is no source charset indicated except the charset indicated by the meta charset tag on the web page. I believe that this is more common in web pages today than the source charset case. It does not look like browser QA's are not the CC line. CC'ing teruko and blee.
If you don't mind an internal URL, here is a page which has exactly the same data as the first (problematical) test case files. http://kaze:8000/bugs/bug28474.html
I turn on the assertion code stated in 28424 and visit http://warp/u/ftang/tmp/cww.html . I catch the problem right there!!! This is another missuse of nsString bug It assert in 4088 warren 3.263 NS_IMETHODIMP 4089 valeski 3.301 HTMLContentSink::OnStreamComplete(nsIStreamLoader* aLoader, 4090 nsISupports* aContext, 4091 nsresult aStatus, 4092 PRUint32 stringLen, 4093 const char* string) 4094 vidur 3.132 { 4095 warren 3.263 nsresult rv = NS_OK; 4096 warren 3.277 nsString aData(string, stringLen); 4097 vidur 3.132 warren in 3.277 (probably change from vidur's code) pass a char* to nsString aData which cause this problem. the string contains non ASCII, non ISO-8859-1 data How to fix it ? Call GetDocumentCharacterSet() from mDocument (method of nsIDocuemnt) to get the charset, call the character set converter manager to get a nsIUnicodeDecoder, use the decoder to convert char* string into PRUnichar before pass to nsString. Reassign this to vidur, cc warren add 28424 to the depend list.
Assignee: ftang → vidur
Blocks: 28424
Status: ASSIGNED → NEW
Summary: Incorrect character display (day format and links) → illegal use of nsString-Incorrect character display (day format and links)
So it seems to me like it's hardly ever going to be valid to construct an nsString from a char* without a charset decoder. Maybe we should remove that constructor in favor of one that requires a decoder.
Frank, you seem to know what the right fix is. Rather than have me stumble through this, I'd appreciate it if you could make the fix and have me review it. Thanks.
Assignee: vidur → ftang
Reassign to module owner. Vidur, I simply cannot fix all these kind of bugs. I am playing your whitebox QA here. You should not expect your QA fix your code.
Assignee: ftang → vidur
Then what exactly does the i18n group do? Fine - it'll get done at some point.
Status: NEW → ASSIGNED
Target Milestone: M16
>Then what exactly does the i18n group do? i18n groupt write all the library under mozilla/intl as gecko group write all the lib under mozilla/layout or as JS group write all the code under mozilla/js . i18n group does write some sample usage of intl library, but we don't fix all misuagge of intl library just as JavaScript group won't fix bugs in all the .js file.
when do you plan to fix this ? Please provide ETA.
Vidur's on sebbatical. I know it's his bug, but Frank, can you just fix this for us? Thanks.
Assignee: vidur → ftang
Status: ASSIGNED → NEW
jbetak- can you do this ?
Assignee: ftang → jbetak
Status: NEW → ASSIGNED
name this beta1 because it block 32215. See the screenshot in 23315 for detail. jbetak- can you verify this w/ your fix ? for 32215, checking the docuement itself is enough.
Blocks: 32215
Keywords: beta1
Summary: illegal use of nsString-Incorrect character display (day format and links) → illegal use of nsString-external JavaScript convert charset incorrectly
as discussed with ftang - we have a fix for this problem. It´s a very contained modification to one file (HTMLContentSink) ensuring that an external JavaScript file gets loaded using the HTML document encoding instead of the HTML default Latin-1.
Must for Beta1. Sidebar must work for Japanese content. We have a Japanese 3rd party contracted for sidebar content but it's blocked by this bug. See comments in bug 32215.
*** Bug 32215 has been marked as a duplicate of this bug. ***
Whiteboard: have fix. r=ftang
Putting on PDT+ radar for beta1. Please contact rickg for approval to check in.
Whiteboard: have fix. r=ftang → [PDT+]have fix. r=ftang
If you have a chance to check this in on Sunday night or sooner, please call Rick at home to get his approval. Clearing this by very early Monday morning will get it in the verification build. We really need these items checked into the branch, or we are going to be forced to miss them RSN. Thanks, Jim
ftang and jbetak talked to rickg on Fri evening. jbetak is working on some suggestions made by rickg and expects to have this done by Monday.
Whiteboard: [PDT+]have fix. r=ftang → [PDT+]have fix. r=ftang,rickg. a=bobj
Whiteboard: [PDT+]have fix. r=ftang,rickg. a=bobj → [PDT+]have fix. r=ftang,rickg. a=bobj, partial fix in on Friday, resolving performance issues with ftang, RickG
jbetak and rickg have agreed on the remaining fix; juraj will be checking in at around 7:30 oday (3/20/00).
OK, prechecking tests look good - closing down. Thanks for all your help RickG! I´m opening a new bug 32604 for the trunk fix, we didn't put in all the neccessary functionality and changes for Beta1 because of the percieved risk.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Depends on: 32604
Resolution: --- → FIXED
I'll take this bug for verification. Thanks.
QA Contact: petersen → momoi
** Checked with 3/21/2000 Win32 build ** The original problem at the CWW China site no longer occurs both for the date and for the boiler-plate link template at the very bottom of the page. My test cas and ftang's test case work. The most critical test case at Arukikata also works now. I still cannot check for the portion which contains layer but I assume the part without layer is working though the data are read in from an external source. I think these are proof enough that the fix has achieved its mission. Marking it verified as fixed.
Status: RESOLVED → VERIFIED
SPAM. HTML Element component deprecated, changing component to Layout. See bug 88132 for details.
Component: HTML Element → Layout
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: