Closed
Bug 65093
Opened 24 years ago
Closed 9 years ago
default encoding problem
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
WORKSFORME
Future
People
(Reporter: tgodouet, Assigned: jshin1987)
References
()
Details
(Keywords: intl)
Attachments
(1 file, 1 obsolete file)
(deleted),
text/plain
|
Details |
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.0-test12 i686; en-US; 0.7) Gecko/20010105
BuildID: 2001010517
Mozilla on Linux (contrary to Windows's version) does not recognize correctly
the accentuated characters on html page which doesn't use
the standard "é" form (they put a "é" instead of "é"
in their source file).
As many browsers just handle them correctly (and several web sites
use them), Mozilla should also.
Reproducible: Always
Steps to Reproduce:
1.go to the given url (http://ikarios.com/form/)
2.look for an accentuated character and see : a "?" logo is shown
instead of the correct character.
3.
Actual Results: a "?" logo is shown.
Expected Results: the correct accentuated character should be displayed.
Comment 1•24 years ago
|
||
Reassign to bstell.
Comment 2•24 years ago
|
||
I don't know about the Linux version, but this can occur on the Windows version
if you do not have the correct character coding selected. At one time Mozilla
kept using Armenian as default which displayed ?'s for all extended characters.
Comment 3•24 years ago
|
||
ylong, can you reproduce this?
Comment 4•24 years ago
|
||
I can not reproduce this on 01-11 Mtrunk build with all platforms.
Comment 5•24 years ago
|
||
I believe this happens on all non-windows platforms, at least that was the case
with NS 4.x.
I beleive mozilla will place an "?" as replacement for fonts it can't find.
On linux RH6.2/XFree86 3.3.6-20 with the MS truetype webfonts installed, i see
accented fonts on the page just fine.
Not sure what problem you're really seeing, apart from some font obviously
missing. If you don't have truetype fonts installed there, i wrote a little
something on how to get them going, at least on RH6*
(I noticed those pages were also referred to in mozilla 0.7 release notes)
http://home.c2i.net/dark/linux.html#ttf
http://home.c2i.net/dark/linux.html#fuzzy
That may not be the problem however: I cant see which font the page uses, which
indicates it simply uses users default font. I have adobe-helvetica-iso8859-1
set up for that. Which is a type1 font, not ttf.
Could it be that you have selected a default font without the proper characters in?
The pages i refer to above to also have some hints about making good fonts.alias
files for type1 fonts, so it may be worth looking at all the same.
It seems that the problem is a mis-detection of the encoding to use,
as the problem doesn't occur when I force the use of Western-ISO-8859-1.
It seems not to be the default font, which is also a ISO-8859-1.
I had the same problem on another web site : Mozilla set the encoding to Unicode
for an unknown reason (I had set it to Western on another page before).
Do you access the site via a bookmark?
(If so, this could be a dup of bug 50459)
Comment 10•24 years ago
|
||
This sounds like an encoding problem not a font or named-entity problem.
The behavior:
the accentuated characters on html page which doesn't use
the standard "é" form (they put a "é" instead of "é"
in their source file).
would be produced by an incorrect encoding. The binary value would
be mis-interpreted and the where as named-reference would not.
Comment 11•24 years ago
|
||
tgodouet@caramail.com, please try following.
Can you reproduce this by creating a new user profile and visit that page (that
would isolate the problem from conditions bookmark, cache or default charset)?
Reporter | ||
Comment 12•24 years ago
|
||
Ok, I've created a new user, and tryed to go to www.ikarios.com
directly using no bookmark : same problem.
The character coding is Unicode instead of Western (when I force Mozilla
to use Western, it just works well).
Comment 13•24 years ago
|
||
I just noticed this happening on my start page, which has
<head><title>¤</title></head>. The problem was that my default encoding was
set to UTF-8 (Unicode) instead of ISO-8859-1 (the default in most browsers).
Well, that, and that I don't specify a charset for my page :P
I think this changed sometime in the recent past...
Comment 15•24 years ago
|
||
I've seen it but only with pages stored on my local disk. If this can help
locate the problem:
http://www.cam.org/~tikabzy/main.html
shows up properly in my browser but if I write the file to disk, I see the ?
where accented characters should appear.
On Linux, build 2001011421
Comment 16•24 years ago
|
||
Checked 2001-01-16-06-mtrunk linux build with above URL on my RH6.2-J. The
accented characters on that page can be displayed. Saving the page to a local
disk and opening it up on browser again can't reproduce the problem either.
Tried with both new and old profiles. Accessing the page from a bookmark also
can't make the problem happen.
Comment 17•24 years ago
|
||
tgodouet@caramail.com, we cannot reproduce it.
What is your OS environment? Can you see the page correctly by other browser
(e.g. Netscape 4.7)?
Comment 18•24 years ago
|
||
But I can see the question marks if I manually change the encoding to UTF-8.
Comment 19•24 years ago
|
||
The problem tgodouet@caramail.com has is that Unicode is automatically selected
even with a new user profile.
tgodouet@caramail.com, could you also check your default charset of the new
profile? That is in "Edit -> Preferences -> Navigator -> Languages".
Reporter | ||
Comment 20•24 years ago
|
||
From my profile :
My default language is : French[fr]
and my default encoding is : Western
From the new profile :
default language : English
default encoding : Western
Weird thing : it seems to work now well on new profiles, while
it's still posing problem on my profile.
On Netscape 4.7.5, it seems to work well when I use my account.
Comment 21•24 years ago
|
||
Thanks, it could be related either cache or bookmark. Please try following with
your profile (not the new one).
* Clear the cache (both memory and disk), "Edit -> Preferences -> Advanced ->
Cache". Then quit the app and visit the page again. If you still have the
problem, please check your bookmark.
* Open the book file "bookmarks.html" and see the URL is in the bookmark. If
it's there then check "LAST_CHARSET" field.
Cc to shanjian in case this is a cache/bookmark problem.
Comment 22•24 years ago
|
||
This is an invalid bug. The user view the unlabeled ISO-8859-1 page as UTF-8.
mark this as invalid.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 23•24 years ago
|
||
I think it is a bug, because *Mozilla* (not me) does not set
the encoding correctly.
Moreover, Mozilla on Windows works correctly, while Mozilla
on Linux has some problems.
Reporter | ||
Comment 24•24 years ago
|
||
It seems it is both a bookmark problem and a cache problem :
the LAST_CHARSET wasn't set correctly in my bookmarks for the page
not shown correctly, but after having corrected it by hand, I had
the same problem to a page accessed from a bookmarked page
(quite unclear, I know :)) ).
Now, I've forced the charset to Western for that page, clear the cache,
restarted Mozilla and accessed again this page : it seems to be ok now.
The problem is : why were the bookmarks invalids ???
I tried to create the same bookmarks with a new profile, and it was
just all right !
It is probably a wrong set in a previous version of Mozilla, while this release
does not check the validity of the bookmark (at least the charset), so the
problem is never corrected.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Comment 25•24 years ago
|
||
>It is probably a wrong set in a previous version of Mozilla
I think that's the case, marking as WONTFIX, please reopen if reproducible with
the latest build.
Status: REOPENED → RESOLVED
Closed: 24 years ago → 24 years ago
Resolution: --- → WONTFIX
Reporter | ||
Comment 26•24 years ago
|
||
Well, it is not really the case :
many pages keep on being shown as UTF-8 instead of Western
even if I force the use of Western (I force once, and the next time
I visit the page, UTF-8 is still used).
Comment 27•24 years ago
|
||
Please specify the reproducible case with a new profile then reopen the bug.
Reporter | ||
Comment 28•24 years ago
|
||
The first time I've been accessing the following page : www.libertysurf.fr
the default encoding was wrongly set by Mozilla as UTF-8 (even if the
default character encoding is Western).
This has been done with a new profile, so this is really a bug of Mozilla 0.7,
not a misset of an old version of Mozilla.
So I reopen the bug.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 29•24 years ago
|
||
IQA, please test it (www.libertysurf.fr), use a new profile on Linux.
Comment 30•24 years ago
|
||
I cannot reproduce this. The site http://www.libertysurf.fr/ look correct for
me.
I changed the summary to mention encoding since this is not a font problem.
(was: the accentuated characters are shown as a "?")
Summary: the accentuated characters are shown as a "?" → default encoding problem
Comment 31•24 years ago
|
||
I tested this in 2001-01-19 Linux build. After I created new profile,
http://www.libertysurf.fr/ looks correct.
I cannot reproduce this problem.
tgodouet@caramail.com, could you attach the pref.js file which is located
in user profile directory? It should be in .mozilla/<Profile Name>/<random name
directory> directory.
Comment 32•24 years ago
|
||
tgodouet,
Can you attach your prefs.js file?
Reporter | ||
Comment 33•24 years ago
|
||
Comment 34•24 years ago
|
||
user_pref("intl.charsetmenu.browser.cache", "windows-1252, UTF-8");
This means the user used "Character Coding" menu to set a charset to
"windows-1252" and "UTF-8".
tgodouet, do you remember you did that?
Comment 35•24 years ago
|
||
I see that the home page setting in the attached prefs.js is set to a file.
user_pref("browser.download.dir", "/home/thib");
I was able to recreate the encoding condition:
create a new profile
open a browser window to a directory listing (this sets encoding to utf-8)
go to a page with no charset info (encoding remains utf-8)
Updated•24 years ago
|
Comment 36•24 years ago
|
||
I cannot reproduce the problem using 6.0 RTM on Windows2000.
Using trunk build ID 2001012504, I see the similar problem as Brian described.
In my case, no check mark in the menu but UTF-8 was in the menu after
ISO-8859-1. And the page http://www.libertysurf.fr/ was displayed correctly.
Reassign to shanjian, I think you made a charset related change after RTM.
Please check if this is related to your change.
Reporter | ||
Comment 37•24 years ago
|
||
In reply to : nhotta@netscape.com
No I have never changed the encoding neither in my new profile
nor in the old one to something else that Western ISO-8859-1.
But I remember having a lot of encodings in View->Character Coding
in my old profile, while I have now only (but maybe too many :)) )
Western ISO-8859-1, Western Windows-1252 and Unicode UTF-8 in my new one.
I have also spotted that some pages, even after having set manually
the encoding to Western ISO-8859-1, are shown after some time as UTF-8 (or
other, but not as Western ISO-8859-1).
For example www.caramail.com.
Comment 38•24 years ago
|
||
I could not reproduce the problem on my machine.
From above discussion, I agreed that it is an encoding problem.
There is one important imformation I would like to know, what is
charset setting? Did you turn off the charset autodetector?
Reporter | ||
Comment 39•24 years ago
|
||
By forcing the use of a charset, I mean clicking on :
View->Character Coding->Western (ISO-8859-1)
Concerning to the autodetector, I turned it off some time ago,
but I did have problems before (I did it to check if it works
better without).
And right now, I don't now how to turn it on ... :((
Reporter | ||
Comment 40•24 years ago
|
||
Moreover, with the autodetector disabled, Mozilla should set
the character encoding to my default, Western ISO-8859-1,
but it just doesn't.
Comment 41•24 years ago
|
||
I believe the critical point is tht tgodouet *DISPLAYS A DIRECTORY LISTING*
(as his home page).
I believe displaying a directory listing selects UTF-8 which gets carried to
subsequent pages.
Comment 42•24 years ago
|
||
tgodouet, I think brian pointed out a very important point. The page you
visited just before the problematic page might be related. And if that is
the case, that is definitely a bug and I will fix it.
For your information, the charset source priority order are (from low to high)
86 kCharsetUninitialized = 0,
87 kCharsetFromWeakDocTypeDefault,
88 kCharsetFromUserDefault ,
89 kCharsetFromDocTypeDefault,
90 kCharsetFromParentFrame,
91 kCharsetFromCache,
92 kCharsetFromBookmarks,
93 kCharsetFromAutoDetection,
94 kCharsetFromMetaTag,
95 kCharsetFromByteOrderMark,
96 kCharsetFromHTTPHeader,
97 kCharsetFromUserForced,
98 kCharsetFromOtherComponent,
99 kCharsetFromPreviousLoading
Note, line 89 is for XML file, which default is UTF8. and Line 99 is for
same page only.
Reporter | ||
Comment 43•24 years ago
|
||
I've updated my Mozilla to release 2001012400.
I've also changed my default start page to "Blank Page",
and it does seem to work better (I have not noticed any problem
since that) (I changed this setting on my old release too,
but as I used it like that just one day or 2, I can't confirm
anything).
I have to continue to test it, but the problem is probably
the start page.
Reporter | ||
Comment 44•24 years ago
|
||
Ok, it really seems that the bug is a wrong (and definitive) set of the encoding
to UTF-8 when the start page is a directory :
since I've changed my start page to "blank", I haven't notice any problem.
Comment 46•24 years ago
|
||
I tried to visit some UTF8 page just before visit this problem page
but still could not reproduce it. I will dismiss this bug for now.
Status: NEW → RESOLVED
Closed: 24 years ago → 24 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 47•24 years ago
|
||
Have you tried to set your start page as a directory,
restart mozilla, and then go on the problem page ?
Comment 48•24 years ago
|
||
I'll mark it as verified. Please re-open it if you disagree.
Status: RESOLVED → VERIFIED
Reporter | ||
Comment 49•24 years ago
|
||
It is not corrected : when Mozilla shows a directory at start,
I still get some encoding problem with my Mozilla 2001032908.
So I reopen it.
Status: VERIFIED → REOPENED
Resolution: WORKSFORME → ---
Comment 50•24 years ago
|
||
Please provide more information. What kind of directory did you
set as your home page. I need to reproduce the problem before
I can fix it.
Comment 51•24 years ago
|
||
Using my Linux debug build of Apr 4, 2001 I tried setting my start
page to a directory. The encoding for the directory showed UTF-8.
I cleared cache and when to www.caramail.com and the page looked to
display correctly and the encoding showed ISO-8859-1.
In the past I could reproduce the problem but at present I am unable to
reproduce it.
Reporter | ||
Comment 52•24 years ago
|
||
I still have this problem with the build 2001041405.
Anyway, it doesn't occur every time, but only
in the following conditions:
I set my home dir as the mozilla start page. Then,
I open 3 windows, in which I load :
www.linuxfr.org
www.slashdot.org
http://perso0.free.fr/cgi-bin/wwwcount.cgi?df=fcron.dat&dd=C
When that pages are loaded, I open www.caramail.com
in the window displaying either slashdot.org or perso0.free.fr
(the bug seems not to occur if I open www.caramail.com
in the linuxfr.org window, which is a french page)
We can see if the bug occur when a question mark icon ( "?" )
is displayed instead of a comma ( something like //
\\
but in one character) in the botton left of the page
(when that icon appears, some other accents are not displayed
properly in the pages after the login page).
After the first page of www.caramail.com where user is asked
for a login and a password, another page more complex appears :
in that page, only a few accents are displayed as "?", the others
are just printed correctly on the screen.
Note that some of the accents may be displayed correctly
as they use the é syntax, but some others are just
written like "é" in the source page (and are problematic).
Otherwise (when start page = blank, or when I access caramail.com
directly), everything is displayed correctly.
I can attach a "ls -al" of my home dir (it might help) if you want,
and if you need it (and can't do it yourself because you don't speak french),
I can open a (free) mail account at www.caramail.com for testing purpose.
Reporter | ||
Comment 53•24 years ago
|
||
If you don't see any misinterpreted accent on the main page of
www.caramail.com, try to follow the link "Créez votre compte gratuit".
There are many accents there and you don't need any account to go there.
Comment 54•24 years ago
|
||
I have my default global encoding set to arabic (in prefs).
I have my startup page / home page set to my home dir.
Be sure to *clear-cache* and *exit* before each test.
Test 1
Start up.
See home dir in browser.
Do not open second window.
Go to www.caramail.com.
see "?" characters.
Encoding is arabic.
Test 2
Start up.
See home dir in first window.
*Do* open second window.
See home dir in second window.
In *second* window go to www.caramail.com.
see "?" characters.
Encoding is UTF-8.
Comment 55•23 years ago
|
||
Is that true this bug will only happen when the default home page set to a
directory in ftp:// or file:// url ?
Comment 57•23 years ago
|
||
I'd guess that when the default encoding is inherited in the second window
it gets the encoding UTF-8 encoding from the file:///...
Comment 58•23 years ago
|
||
With bstell's steps, I finally was able to reproduce the bug. The problem
was caused by following statement:
(http://lxr.mozilla.org/seamonkey/source/xpfe/browser/resources/content/navigator.js#287)
281 // set default character set if provided
282 if ("arguments" in window && window.arguments.length > 1 && window.arguments[1]) {
283 if (window.arguments[1].indexOf("charset=") != -1) {
284 var arrayArgComponents = window.arguments[1].split("=");
285 if (arrayArgComponents) {
286 //we should "inherit" the charset menu setting in a new window
287 appCore.setDefaultCharacterSet(arrayArgComponents[1]); //XXXjag see bug 67442
288 }
289 }
290 }
When a new windows is brought up, we pass its launcher's charset and set it as default charset for
the new webshell. From this point, the real user default charset can not be got.
I guess this might not be the right way to use default character set. It should be better to pass charset
information through url if it is possible.
Status: REOPENED → ASSIGNED
Comment 59•23 years ago
|
||
While it make some sense to send charset to new window to show the old page, it does not make
sense to convey this default charset information from one webshell to another. So I proposed this
patch to fix the problem.
Comment 60•23 years ago
|
||
Comment 61•23 years ago
|
||
question:
With this patch,
if a user goes to a web site that does not have charset tags
and
sets the charset to make it display properly
Will "new" windows from links on that web site remember the web
site's charset or will the user have to reset them every time?
Comment 62•23 years ago
|
||
For my last patch, the first page in "new" window will remember the charset used for its launching page.
Subsequent pages in the same new window will work the same as links in original window. I am not very
sure what that will be.
Comment 63•23 years ago
|
||
I believe that the user would have to reset the charset for every new
window.
Comment 64•23 years ago
|
||
I am now seeing a different behavior in recent build. The problem can no longer
be reproduced using bstell's reproduce steps. But to answer brian's question
any way. If we click a link in a web page, the new page will not use the charset
of the previous page. It will resort to user default if there is no other
sources. Openning a link in new window should not have a different behavior, and
it is wrong to change the new window's user default charset forever.
I will mark this bug as worksforme. If anybody is experience the old problem
again, I will make some effort to check in my current patch.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 23 years ago
Resolution: --- → WORKSFORME
Comment 65•23 years ago
|
||
Yuying reported a similar problem in another bug, I need to reopen this bug and
check in the fix.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: need r/sr
Comment 66•23 years ago
|
||
*** Bug 86234 has been marked as a duplicate of this bug. ***
Comment 67•23 years ago
|
||
brian, can you review my fix? Since we did not do anything when openning a link
in currect window, we should do the same in new window as well.
Status: REOPENED → ASSIGNED
Comment 68•23 years ago
|
||
couldn't get r/sr in time, push to 0.9.4.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Comment 69•23 years ago
|
||
I have to withdraw my proposal. While I was working on 45187, I found the code I
removed in my patch is used to transfer inherit charset from current page to its
linked one. Remove that one will destroy this function.
To inherit or not to inherit launching page's charset in various situations
seems a rather complicated issue, and it is rather hard to satisfy all needs. I
need to investigate more on this before any decision could be made.
Whiteboard: need r/sr
Comment 70•23 years ago
|
||
Retarget this one. We need to redesign inherit charset mechanism to fix this
one.
Target Milestone: mozilla0.9.4 → Future
Comment 71•20 years ago
|
||
shanjian is no longer working on mozilla for 2 years and these bugs are still
here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 20 years ago
Resolution: --- → WONTFIX
Comment 73•20 years ago
|
||
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all
the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 74•20 years ago
|
||
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
Comment 75•20 years ago
|
||
The given URL uses now a specified char-encoding.
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1">
So the bug does not occur anymore I think.
See bug 158285/bug 227631 and Bug 255241.
Comment 76•18 years ago
|
||
Similar problem with the Composer.
Mozilla/5.0 (Windows; U; Windows NT 5.0; de-AT; rv:1.8.1.2) Gecko/20070222 SeaMonkey/1.1.1
installed on Windows 2000 Professional
First use of Composer.
I write a HTML file.
I save it (without controlling character sets).
I view it in the browser.
Special characters are displayed incorrectly.
The reason is obviously the following:
The default setting in the character encoding is UTF-8. This is used in writing HTML files. Nevertheless, in the <head> section, the following line is written:
<meta content="text/html; charset=ISO-8859-1"
http-equiv="content-type">
This behavior can, of course, be changed by choosing the character set manually (either before composing or on saving). Thus, the bug is twofold:
- UTF-8 is used as a default without telling the user.
- The ISO specification written into the file head is a lie.
Comment 77•18 years ago
|
||
Hi,
Amazing people before considered this not happening on Win platform. I have been irritated by this bug ever since i started using firefox....
Have a look at this:
Using 20070309 Firefox/2.0.0.3
01. Set default char encoding in options to Western
02. Set Autodetect in view menu to (Off)
03. Put this in adress bar: http://setiathome.berkeley.edu/forum_thread.php?id=12331
04. Set char encoding in view menu to western => solves the problem
05. Press reload => you're staring at � again
Come on, that really shouldn't happen....
the page contains no charset attribute, so there is absolutely no reason why firefox should not obey the user. Further more, whithout obeying the user. I have surfed the web for years and never once had to worry about char encoding before... Clearly this could and should be transparant to the end user... (eg. you could even go to the lenght to run a check to see if there are �'s in the page to try and solve at app level)
Imagine what this means to a german or french reading person, or any language with a lot of accents and special characters.
This should really be a priority bug i reckon...
Thanx for the hard work...
nut
Comment 78•18 years ago
|
||
(In reply to comment #77)
> the page contains no charset attribute, so there is absolutely no reason why
> firefox should not obey the user.
The page contains no charset attribute, but the server sends an HTTP header Content-Type: text/html; charset=UTF-8 and Firefox obeys that. This is not a bug, and certainly not the issue that this bug report is about.
Comment 79•18 years ago
|
||
(In reply to comment #78)
> The page contains no charset attribute, but the server sends an HTTP header
> Content-Type: text/html; charset=UTF-8 and Firefox obeys that. This is not a
> bug, and certainly not the issue that this bug report is about.
My apologies. i did not realise this could be send in HTTP headers...
reconsidering then however, probably neither do most people who build websites... It is probably a server wide setting...
therefor, the user choosing a specific encoding for a specific site should probably gain priority over http headers. At least for the duration of the session...
nut
Updated•16 years ago
|
Attachment #37593 -
Attachment is obsolete: true
Comment 80•16 years ago
|
||
Updated•15 years ago
|
QA Contact: amyy → i18n
Our defaults are now consistent across operating systems and don't come from localizers anymore (even though the defaults vary by locale).
Status: NEW → RESOLVED
Closed: 20 years ago → 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•