Closed Bug 65093 Opened 24 years ago Closed 9 years ago

default encoding problem

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: tgodouet, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(1 file, 1 obsolete file)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.0-test12 i686; en-US; 0.7) Gecko/20010105 BuildID: 2001010517 Mozilla on Linux (contrary to Windows's version) does not recognize correctly the accentuated characters on html page which doesn't use the standard "é" form (they put a "é" instead of "é" in their source file). As many browsers just handle them correctly (and several web sites use them), Mozilla should also. Reproducible: Always Steps to Reproduce: 1.go to the given url (http://ikarios.com/form/) 2.look for an accentuated character and see : a "?" logo is shown instead of the correct character. 3. Actual Results: a "?" logo is shown. Expected Results: the correct accentuated character should be displayed.
Reassign to bstell.
I don't know about the Linux version, but this can occur on the Windows version if you do not have the correct character coding selected. At one time Mozilla kept using Armenian as default which displayed ?'s for all extended characters.
ylong, can you reproduce this?
I can not reproduce this on 01-11 Mtrunk build with all platforms.
I believe this happens on all non-windows platforms, at least that was the case with NS 4.x.
I beleive mozilla will place an "?" as replacement for fonts it can't find. On linux RH6.2/XFree86 3.3.6-20 with the MS truetype webfonts installed, i see accented fonts on the page just fine. Not sure what problem you're really seeing, apart from some font obviously missing. If you don't have truetype fonts installed there, i wrote a little something on how to get them going, at least on RH6* (I noticed those pages were also referred to in mozilla 0.7 release notes) http://home.c2i.net/dark/linux.html#ttf http://home.c2i.net/dark/linux.html#fuzzy That may not be the problem however: I cant see which font the page uses, which indicates it simply uses users default font. I have adobe-helvetica-iso8859-1 set up for that. Which is a type1 font, not ttf. Could it be that you have selected a default font without the proper characters in? The pages i refer to above to also have some hints about making good fonts.alias files for type1 fonts, so it may be worth looking at all the same.
It seems that the problem is a mis-detection of the encoding to use, as the problem doesn't occur when I force the use of Western-ISO-8859-1. It seems not to be the default font, which is also a ISO-8859-1. I had the same problem on another web site : Mozilla set the encoding to Unicode for an unknown reason (I had set it to Western on another page before).
Do you access the site via a bookmark? (If so, this could be a dup of bug 50459)
Yes, I do access it via a bookmark.
This sounds like an encoding problem not a font or named-entity problem. The behavior: the accentuated characters on html page which doesn't use the standard "é" form (they put a "é" instead of "é" in their source file). would be produced by an incorrect encoding. The binary value would be mis-interpreted and the where as named-reference would not.
tgodouet@caramail.com, please try following. Can you reproduce this by creating a new user profile and visit that page (that would isolate the problem from conditions bookmark, cache or default charset)?
Ok, I've created a new user, and tryed to go to www.ikarios.com directly using no bookmark : same problem. The character coding is Unicode instead of Western (when I force Mozilla to use Western, it just works well).
I just noticed this happening on my start page, which has <head><title>¤</title></head>. The problem was that my default encoding was set to UTF-8 (Unicode) instead of ISO-8859-1 (the default in most browsers). Well, that, and that I don't specify a charset for my page :P I think this changed sometime in the recent past...
setting bug status to New
Status: UNCONFIRMED → NEW
Ever confirmed: true
I've seen it but only with pages stored on my local disk. If this can help locate the problem: http://www.cam.org/~tikabzy/main.html shows up properly in my browser but if I write the file to disk, I see the ? where accented characters should appear. On Linux, build 2001011421
Checked 2001-01-16-06-mtrunk linux build with above URL on my RH6.2-J. The accented characters on that page can be displayed. Saving the page to a local disk and opening it up on browser again can't reproduce the problem either. Tried with both new and old profiles. Accessing the page from a bookmark also can't make the problem happen.
tgodouet@caramail.com, we cannot reproduce it. What is your OS environment? Can you see the page correctly by other browser (e.g. Netscape 4.7)?
But I can see the question marks if I manually change the encoding to UTF-8.
The problem tgodouet@caramail.com has is that Unicode is automatically selected even with a new user profile. tgodouet@caramail.com, could you also check your default charset of the new profile? That is in "Edit -> Preferences -> Navigator -> Languages".
From my profile : My default language is : French[fr] and my default encoding is : Western From the new profile : default language : English default encoding : Western Weird thing : it seems to work now well on new profiles, while it's still posing problem on my profile. On Netscape 4.7.5, it seems to work well when I use my account.
Thanks, it could be related either cache or bookmark. Please try following with your profile (not the new one). * Clear the cache (both memory and disk), "Edit -> Preferences -> Advanced -> Cache". Then quit the app and visit the page again. If you still have the problem, please check your bookmark. * Open the book file "bookmarks.html" and see the URL is in the bookmark. If it's there then check "LAST_CHARSET" field. Cc to shanjian in case this is a cache/bookmark problem.
This is an invalid bug. The user view the unlabeled ISO-8859-1 page as UTF-8. mark this as invalid.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → INVALID
I think it is a bug, because *Mozilla* (not me) does not set the encoding correctly. Moreover, Mozilla on Windows works correctly, while Mozilla on Linux has some problems.
It seems it is both a bookmark problem and a cache problem : the LAST_CHARSET wasn't set correctly in my bookmarks for the page not shown correctly, but after having corrected it by hand, I had the same problem to a page accessed from a bookmarked page (quite unclear, I know :)) ). Now, I've forced the charset to Western for that page, clear the cache, restarted Mozilla and accessed again this page : it seems to be ok now. The problem is : why were the bookmarks invalids ??? I tried to create the same bookmarks with a new profile, and it was just all right ! It is probably a wrong set in a previous version of Mozilla, while this release does not check the validity of the bookmark (at least the charset), so the problem is never corrected.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
>It is probably a wrong set in a previous version of Mozilla I think that's the case, marking as WONTFIX, please reopen if reproducible with the latest build.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WONTFIX
Well, it is not really the case : many pages keep on being shown as UTF-8 instead of Western even if I force the use of Western (I force once, and the next time I visit the page, UTF-8 is still used).
Please specify the reproducible case with a new profile then reopen the bug.
The first time I've been accessing the following page : www.libertysurf.fr the default encoding was wrongly set by Mozilla as UTF-8 (even if the default character encoding is Western). This has been done with a new profile, so this is really a bug of Mozilla 0.7, not a misset of an old version of Mozilla. So I reopen the bug.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
IQA, please test it (www.libertysurf.fr), use a new profile on Linux.
I cannot reproduce this. The site http://www.libertysurf.fr/ look correct for me. I changed the summary to mention encoding since this is not a font problem. (was: the accentuated characters are shown as a "?")
Summary: the accentuated characters are shown as a "?" → default encoding problem
I tested this in 2001-01-19 Linux build. After I created new profile, http://www.libertysurf.fr/ looks correct. I cannot reproduce this problem. tgodouet@caramail.com, could you attach the pref.js file which is located in user profile directory? It should be in .mozilla/<Profile Name>/<random name directory> directory.
tgodouet, Can you attach your prefs.js file?
user_pref("intl.charsetmenu.browser.cache", "windows-1252, UTF-8"); This means the user used "Character Coding" menu to set a charset to "windows-1252" and "UTF-8". tgodouet, do you remember you did that?
I see that the home page setting in the attached prefs.js is set to a file. user_pref("browser.download.dir", "/home/thib"); I was able to recreate the encoding condition: create a new profile open a browser window to a directory listing (this sets encoding to utf-8) go to a page with no charset info (encoding remains utf-8)
Assignee: nhotta → shanjian
Status: REOPENED → NEW
Keywords: intl
I cannot reproduce the problem using 6.0 RTM on Windows2000. Using trunk build ID 2001012504, I see the similar problem as Brian described. In my case, no check mark in the menu but UTF-8 was in the menu after ISO-8859-1. And the page http://www.libertysurf.fr/ was displayed correctly. Reassign to shanjian, I think you made a charset related change after RTM. Please check if this is related to your change.
In reply to : nhotta@netscape.com No I have never changed the encoding neither in my new profile nor in the old one to something else that Western ISO-8859-1. But I remember having a lot of encodings in View->Character Coding in my old profile, while I have now only (but maybe too many :)) ) Western ISO-8859-1, Western Windows-1252 and Unicode UTF-8 in my new one. I have also spotted that some pages, even after having set manually the encoding to Western ISO-8859-1, are shown after some time as UTF-8 (or other, but not as Western ISO-8859-1). For example www.caramail.com.
I could not reproduce the problem on my machine. From above discussion, I agreed that it is an encoding problem. There is one important imformation I would like to know, what is charset setting? Did you turn off the charset autodetector?
By forcing the use of a charset, I mean clicking on : View->Character Coding->Western (ISO-8859-1) Concerning to the autodetector, I turned it off some time ago, but I did have problems before (I did it to check if it works better without). And right now, I don't now how to turn it on ... :((
Moreover, with the autodetector disabled, Mozilla should set the character encoding to my default, Western ISO-8859-1, but it just doesn't.
I believe the critical point is tht tgodouet *DISPLAYS A DIRECTORY LISTING* (as his home page). I believe displaying a directory listing selects UTF-8 which gets carried to subsequent pages.
tgodouet, I think brian pointed out a very important point. The page you visited just before the problematic page might be related. And if that is the case, that is definitely a bug and I will fix it. For your information, the charset source priority order are (from low to high) 86 kCharsetUninitialized = 0, 87 kCharsetFromWeakDocTypeDefault, 88 kCharsetFromUserDefault , 89 kCharsetFromDocTypeDefault, 90 kCharsetFromParentFrame, 91 kCharsetFromCache, 92 kCharsetFromBookmarks, 93 kCharsetFromAutoDetection, 94 kCharsetFromMetaTag, 95 kCharsetFromByteOrderMark, 96 kCharsetFromHTTPHeader, 97 kCharsetFromUserForced, 98 kCharsetFromOtherComponent, 99 kCharsetFromPreviousLoading Note, line 89 is for XML file, which default is UTF8. and Line 99 is for same page only.
I've updated my Mozilla to release 2001012400. I've also changed my default start page to "Blank Page", and it does seem to work better (I have not noticed any problem since that) (I changed this setting on my old release too, but as I used it like that just one day or 2, I can't confirm anything). I have to continue to test it, but the problem is probably the start page.
Ok, it really seems that the bug is a wrong (and definitive) set of the encoding to UTF-8 when the start page is a directory : since I've changed my start page to "blank", I haven't notice any problem.
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong
I tried to visit some UTF8 page just before visit this problem page but still could not reproduce it. I will dismiss this bug for now.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WORKSFORME
Have you tried to set your start page as a directory, restart mozilla, and then go on the problem page ?
I'll mark it as verified. Please re-open it if you disagree.
Status: RESOLVED → VERIFIED
It is not corrected : when Mozilla shows a directory at start, I still get some encoding problem with my Mozilla 2001032908. So I reopen it.
Status: VERIFIED → REOPENED
Resolution: WORKSFORME → ---
Please provide more information. What kind of directory did you set as your home page. I need to reproduce the problem before I can fix it.
Using my Linux debug build of Apr 4, 2001 I tried setting my start page to a directory. The encoding for the directory showed UTF-8. I cleared cache and when to www.caramail.com and the page looked to display correctly and the encoding showed ISO-8859-1. In the past I could reproduce the problem but at present I am unable to reproduce it.
I still have this problem with the build 2001041405. Anyway, it doesn't occur every time, but only in the following conditions: I set my home dir as the mozilla start page. Then, I open 3 windows, in which I load : www.linuxfr.org www.slashdot.org http://perso0.free.fr/cgi-bin/wwwcount.cgi?df=fcron.dat&dd=C When that pages are loaded, I open www.caramail.com in the window displaying either slashdot.org or perso0.free.fr (the bug seems not to occur if I open www.caramail.com in the linuxfr.org window, which is a french page) We can see if the bug occur when a question mark icon ( "?" ) is displayed instead of a comma ( something like // \\ but in one character) in the botton left of the page (when that icon appears, some other accents are not displayed properly in the pages after the login page). After the first page of www.caramail.com where user is asked for a login and a password, another page more complex appears : in that page, only a few accents are displayed as "?", the others are just printed correctly on the screen. Note that some of the accents may be displayed correctly as they use the &eacute; syntax, but some others are just written like "é" in the source page (and are problematic). Otherwise (when start page = blank, or when I access caramail.com directly), everything is displayed correctly. I can attach a "ls -al" of my home dir (it might help) if you want, and if you need it (and can't do it yourself because you don't speak french), I can open a (free) mail account at www.caramail.com for testing purpose.
If you don't see any misinterpreted accent on the main page of www.caramail.com, try to follow the link "Créez votre compte gratuit". There are many accents there and you don't need any account to go there.
I have my default global encoding set to arabic (in prefs). I have my startup page / home page set to my home dir. Be sure to *clear-cache* and *exit* before each test. Test 1 Start up. See home dir in browser. Do not open second window. Go to www.caramail.com. see "?" characters. Encoding is arabic. Test 2 Start up. See home dir in first window. *Do* open second window. See home dir in second window. In *second* window go to www.caramail.com. see "?" characters. Encoding is UTF-8.
Is that true this bug will only happen when the default home page set to a directory in ftp:// or file:// url ?
move to moz0.9.3
Target Milestone: --- → mozilla0.9.3
I'd guess that when the default encoding is inherited in the second window it gets the encoding UTF-8 encoding from the file:///...
With bstell's steps, I finally was able to reproduce the bug. The problem was caused by following statement: (http://lxr.mozilla.org/seamonkey/source/xpfe/browser/resources/content/navigator.js#287) 281 // set default character set if provided 282 if ("arguments" in window && window.arguments.length > 1 && window.arguments[1]) { 283 if (window.arguments[1].indexOf("charset=") != -1) { 284 var arrayArgComponents = window.arguments[1].split("="); 285 if (arrayArgComponents) { 286 //we should "inherit" the charset menu setting in a new window 287 appCore.setDefaultCharacterSet(arrayArgComponents[1]); //XXXjag see bug 67442 288 } 289 } 290 } When a new windows is brought up, we pass its launcher's charset and set it as default charset for the new webshell. From this point, the real user default charset can not be got. I guess this might not be the right way to use default character set. It should be better to pass charset information through url if it is possible.
Status: REOPENED → ASSIGNED
While it make some sense to send charset to new window to show the old page, it does not make sense to convey this default charset information from one webshell to another. So I proposed this patch to fix the problem.
Attached patch proposed fix (obsolete) (deleted) — Splinter Review
question: With this patch, if a user goes to a web site that does not have charset tags and sets the charset to make it display properly Will "new" windows from links on that web site remember the web site's charset or will the user have to reset them every time?
For my last patch, the first page in "new" window will remember the charset used for its launching page. Subsequent pages in the same new window will work the same as links in original window. I am not very sure what that will be.
I believe that the user would have to reset the charset for every new window.
I am now seeing a different behavior in recent build. The problem can no longer be reproduced using bstell's reproduce steps. But to answer brian's question any way. If we click a link in a web page, the new page will not use the charset of the previous page. It will resort to user default if there is no other sources. Openning a link in new window should not have a different behavior, and it is wrong to change the new window's user default charset forever. I will mark this bug as worksforme. If anybody is experience the old problem again, I will make some effort to check in my current patch.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → WORKSFORME
Yuying reported a similar problem in another bug, I need to reopen this bug and check in the fix.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: need r/sr
*** Bug 86234 has been marked as a duplicate of this bug. ***
brian, can you review my fix? Since we did not do anything when openning a link in currect window, we should do the same in new window as well.
Status: REOPENED → ASSIGNED
couldn't get r/sr in time, push to 0.9.4.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
I have to withdraw my proposal. While I was working on 45187, I found the code I removed in my patch is used to transfer inherit charset from current page to its linked one. Remove that one will destroy this function. To inherit or not to inherit launching page's charset in various situations seems a rather complicated issue, and it is rather hard to satisfy all needs. I need to investigate more on this before any decision could be made.
Whiteboard: need r/sr
Retarget this one. We need to redesign inherit charset mechanism to fix this one.
Target Milestone: mozilla0.9.4 → Future
shanjian is no longer working on mozilla for 2 years and these bugs are still here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago20 years ago
Resolution: --- → WONTFIX
Mass Reassign Please excuse the spam
Assignee: shanjian → nobody
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
The given URL uses now a specified char-encoding. <meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1"> So the bug does not occur anymore I think. See bug 158285/bug 227631 and Bug 255241.
Similar problem with the Composer. Mozilla/5.0 (Windows; U; Windows NT 5.0; de-AT; rv:1.8.1.2) Gecko/20070222 SeaMonkey/1.1.1 installed on Windows 2000 Professional First use of Composer. I write a HTML file. I save it (without controlling character sets). I view it in the browser. Special characters are displayed incorrectly. The reason is obviously the following: The default setting in the character encoding is UTF-8. This is used in writing HTML files. Nevertheless, in the <head> section, the following line is written: <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"> This behavior can, of course, be changed by choosing the character set manually (either before composing or on saving). Thus, the bug is twofold: - UTF-8 is used as a default without telling the user. - The ISO specification written into the file head is a lie.
Hi, Amazing people before considered this not happening on Win platform. I have been irritated by this bug ever since i started using firefox.... Have a look at this: Using 20070309 Firefox/2.0.0.3 01. Set default char encoding in options to Western 02. Set Autodetect in view menu to (Off) 03. Put this in adress bar: http://setiathome.berkeley.edu/forum_thread.php?id=12331 04. Set char encoding in view menu to western => solves the problem 05. Press reload => you're staring at � again Come on, that really shouldn't happen.... the page contains no charset attribute, so there is absolutely no reason why firefox should not obey the user. Further more, whithout obeying the user. I have surfed the web for years and never once had to worry about char encoding before... Clearly this could and should be transparant to the end user... (eg. you could even go to the lenght to run a check to see if there are �'s in the page to try and solve at app level) Imagine what this means to a german or french reading person, or any language with a lot of accents and special characters. This should really be a priority bug i reckon... Thanx for the hard work... nut
(In reply to comment #77) > the page contains no charset attribute, so there is absolutely no reason why > firefox should not obey the user. The page contains no charset attribute, but the server sends an HTTP header Content-Type: text/html; charset=UTF-8 and Firefox obeys that. This is not a bug, and certainly not the issue that this bug report is about.
(In reply to comment #78) > The page contains no charset attribute, but the server sends an HTTP header > Content-Type: text/html; charset=UTF-8 and Firefox obeys that. This is not a > bug, and certainly not the issue that this bug report is about. My apologies. i did not realise this could be send in HTTP headers... reconsidering then however, probably neither do most people who build websites... It is probably a server wide setting... therefor, the user choosing a specific encoding for a specific site should probably gain priority over http headers. At least for the duration of the session... nut
Attachment #37593 - Attachment is obsolete: true
QA Contact: amyy → i18n
Our defaults are now consistent across operating systems and don't come from localizers anymore (even though the defaults vary by locale).
Status: NEW → RESOLVED
Closed: 20 years ago9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: