Closed Bug 58139 Opened 24 years ago Closed 23 years ago

Unnecessary switch to UTF-16 and disappearance of the content in certain context.

Categories

(Core :: Internationalization, defect, P3)

PowerPC
Mac System 9.x
defect

Tracking

()

VERIFIED DUPLICATE of bug 62929

People

(Reporter: tarahim, Assigned: shanjian)

References

()

Details

(Keywords: intl)

Attachments

(5 files)

Reproducible in 20001504 MacTrunk. In View|Character Coding|Auto Detect, set to Auto Detect (Japanese). Go to the URL above, and click the link to http://www.asahi.com/0713/past/pnational13018.html which is the link about one page-up above the first image, or a few lines below the eighth horizontal rule. The browser switches to UTF-16 and the display is garbage. However, if you actually copy the link and paste in the Location window and then hit return, the character coding is not switched to UTF-16.
Reassign to shanjian. I cannot reproduce this.
Assignee: nhotta → shanjian
The link is actually invald and Not Found message is delivered. Weird thing is that I am getting the Not Found message in English in stead of the Japanese message you posted. The link may be referred to different servers via proxy. Is there a way to find out the more direct link to the server that is delivering the message in English? I am still seeing the bug even by clicking the link within this report.
If you can save it as a local file and still reproduce it then please attach it to the bug.
That did not work, as "Open File" behaved just like typing the link in Location window. This has to be in the context of clicked link to occur. Even Open Link in New window does not behave in the manner of this bug. FYI, the saved file itself does not any suspect data unlike several cases I have reported before.
Just to confirm, if Japanese auto detection is off you don't see the problem?
Exactly. Auto Detect is necessary for this to happen.
I have noticed that the last character diplayed by the bug is different every time. Is this related to Auto Detect failure?
Attached image Four consecutive errors shown. (deleted) —
When Auto Detect is Off and Character Code is set to Western, the last sentence of the Not Found message is displayed with an additional character at the end. When Reloaded, or Opened from Location window+Return, or Open in New window, the last sentence is not displayed. It must be some corrupt data generated by this server which screws up both Auto Detect On and Off.
Today, the server is sending the Not Found message in Japanese as nhotta posted. However, there is the last character which is generated by the server that changes everytime. Actually, this character is also apparent in the first screenshot by nhotta. I found the way to reproduce the switch to UTF-16 in this Japanese message. 1)Auto Detect (Japanese) is ON and this bug page is open. 2)Click the URL http://www.asahi.com/0713/past/pnational13018.html 3)Japanese message is shown with the last sentece. 4)Go back to this bug report, shown as Western(ISO-8859-1). 5)Auto Detect Off. 6)Click the link again. This time it is switched to UTF-16BE. So, the last character is affecting in two ways: 1) caused switch to UTF-16 in the context of English message and sometimes in Japanese (EUC) one, when loaded from clicked link. 2) caused disappearance of the last sentence in both English and Japanese message when loaded as New window and Open Location. Changed the summary.
Summary: Auto Detect in Character Coding wrongly switches to UTF-16 in some context. → Unnecessary switch to UTF-16 and disappearance of the content in certain context.
BTW, I can not help feeling like deja vu when I see this last character thing. Didn't this show up as a problem in NC4 previews, too?
I spent some tracing the problem, and here is what I found. If the page is loaded through location bar or open file, the "not found" message is the server's "native" version. If the link is invoked through another page, like in this bug, the origination site is referenced in the "not found" page. However, when that server is constructing the message, it count an addition "zero" to the end of message. This additional message make the detection module to believe that it is a UTF16. (That is the only explaination.) So once again, this is a server related problem. If there are significant number of server showing this behavior, we might do something in our charset detection module.
Status: UNCONFIRMED → RESOLVED
Closed: 24 years ago
Resolution: --- → WONTFIX
Is "Save Link As" supposed to send the referring page info to the server,too? I think "Save Link As" saves a file that contains zero. May be this server is the server I saw a similar problem during NC4 betas. Interesting.
Verified as wonfix.
Status: RESOLVED → VERIFIED
Status: VERIFIED → UNCONFIRMED
Resolution: WONTFIX → ---
Shanjian's comment. I am going to take care of this problem with my new charset detector. Similar problem has been reported for other web pages. Keep it reopen until I check in new code.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Marking NEW while Teruko works on the problem.
Keywords: intl
I decide to fix this in 62929. *** This bug has been marked as a duplicate of 62929 ***
Status: NEW → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → DUPLICATE
Verified as dup.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: