Closed Bug 158285 Opened 22 years ago Closed 4 years ago

default char. encoding is ignored and the encoding of the current page is used when opening a new page by typing the url in the url bar

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: juergen, Assigned: smontagu)

References

Details

(Keywords: intl)

Attachments

(1 file)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.1b) Gecko/20020718 BuildID: 2002071813 Mozilla displays umlauts entered directly as latin-1 as question marks. IE displays them. Reproducible: Always Steps to Reproduce: 1. display html with latin-1 umlauts 2. 3.
Attached file Test case (deleted) —
I loaded the test case from a file url and it displayed the question marks. Viewing the attachment from mozilla.org displays the umlauts correctly.
[View]-[Character Encoding]-[Western(ISO-8859-1)] will display them to have this coding as default, change it in [View]-[Preferences]:Navigator::Languages {-> resolve this bug as 'invalid'} für weitere fragen bez. öäü kannst du mir eine email senden gruss aus der schweiz damir perisa
yes, as there's no charset is given, the behavior is unspecified. @Damir: you should also go to do some fixing on your settings ;)
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
My default encoding is Western(ISO-8859-1), so for all HTML files without a charset specification it should display my umlauts correctly, shouldn't it? But I saw that Mozilla chose Unicode (UTF-8) for my test file loaded from the filesystem (and also loaded via a local apache), but chose Western loaded from this page hiere. So the problem seems to be that Mozilla ignores my default encoding for some files or URLs (which ones, why ?). The garbage below is the output of tcp Tunnel Monitor, the first is from local apache with correct umlauts, the second one is from the attachment below and displaying wrong umlauts. (BTW pasted into jedit there are no blank lines between, another bug?) HTTP/1.1 200 OK Date: Fri, 19 Jul 2002 15:12:39 GMT Server: Apache/1.3.19 (Unix) mod_jk mod_ssl/2.8.2 OpenSSL/0.9.6a Last-modified: Fri, 19 Jul 2002 11:47:26 GMT Etag: "34285-f9-3d37fc4e" Accept-ranges: bytes Content-length: 249 Connection: close Content-type: text/html <!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>Leitfaden für Java</title> </head> <body> <div class="titel">Leitfaden für Java und Ähnliche</div> again with Ampersand: &uuml; &auml; </body> </html> HTTP/1.1 200 OK Date: Fri, 19 Jul 2002 15:17:32 GMT Server: Apache/1.3.26 (Unix) mod_throttle/3.1.2 Connection: close Content-type: text/html <!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>Leitfaden für Java</title> </head> <body> <div class="titel">Leitfaden für Java und Ähnliche</div> again with Ampersand: &uuml; &auml; </body> </html>
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Works for me: Build 2002071408 on Windows2000 with ISO-8859-1 as default setting as well as build 2002070807 on Linux (SuSE 8.0). Umlauts show correctly.
Same Problem with bookmarks. The charset value seems to change randomly. Having problems with emails too. (Read from file ...?) Noticed that Mozilla uses UNIX end-of-line style on windows. Perhaps the reason for empty lines?
We have this problem too. There must be difference between the mail creation font and the receive mail font. By creating the mail the imported signature got wrong. All german Umlauts like ä, ö, ü get into a ?. If the mail is sent they are correct written. hope this helps a bit to solve Reimar
Same with me, ä, ö, ü are displayed as ? even in the source code, if not written as &xuml;. Same in email, all this using Western ISO-8859-1 inprefs and Moz 1.1. Come on people and confirm this bug, as German AOL users (not me) will find this really annoying if Moz is used as engine. This is what comment 3 displays. für weitere fragen bez. öäü, nice!
Actually it's not only with german umlauts, but also with spanish acents and ñ. Maybe the spanish speakers in the US are a reason to fix the bug.
Actually it works with a 50-50 chance. When the name Jürgen in comment 10 is displayed as J?rgen the bez. äöü is displayed correctly in comment 9 and comment 3. However after restarting Mozilla and going straight to the same Bugzilla page I encountered the Jürgen in comment 10 with properly displayed Umlaut but the bits that where displayed correctly before are displayed as öäü and not äöü. Maybe that helps.
I found another hint. I tried it five times, so I hope that it is reproducible for others too. I use Moz 1.1 (de-AT does not matter) on WinME standard encoding Western(ISO-8859-1). If I open the mail client and use a link that opens in a new window, the Umlaute are displayed as ?, even on other pages loaded in that window from the addressline. However if I open the same page by typing in the address instead of using a link from the email-client, the Umlaute are displayed properly. Except the aöä in comment 3 that is displayed as öäü, all other Umlaute, such as Jürgen in comment 10 are displayed properly. Hope this helps. Vote for it!
Here are two pages on the same site with different behavior: 1.: http://www.anno1503.com/german/index.php4?language=1 This page displays umlauts as "?". Viewing the source code in Mozilla shows "?" there, too, but saving the page and viewing it in Wondows Notepad shows black rectangles instead. This page does *not* have a charset defined in the source code. Default charset I set is Western (ISO 8859-1). 2.: http://www.anno1503.com/german/productinfo/index.php4 This page displays umlauts correctly, also in the source code, where they are *not* encoded as &*uml; (which is unfortunately very common in German pages created with Window tools). This page does have a default charset Western (ISO 8859-1). So it seems like Mozilla obeys the charset definition in a page, but ignores the default charset set if the page has none specified. I'm using Mozilla 1.1 (20020826) on Windows XP (German).
Sorry - in the text about the second page I meant: This page does have a charset defined in the source code: Western (ISO 8859-1).
[Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.2b) Gecko/20021016] Hint and "Workaround": To prevent the '?', simply add it to the URL ;-) Example: http://www.echtenamen.de/ Umlauts are '?' in Browser and ViewSource, PageInfo: Encoding=UTF-8 http://www.echtenamen.de/? Umlauts are OK in Browser and ViewSource, PageInfo: Encoding=ISO-8859-1 This page does not have a charset defined in source code. Possibly differing encoding settings due to different internal cache handling?
[Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.2) Gecko/20021126] After upgrade to 1.2, the links above (comment 15) are both ok (Encoding=ISO-8859-1). But I found an internal ISO-8859 Page (our internal Bugzilla with Umlauts in the comments;-) where the problem still occurs: PageInfo shows "Encoding=UTF-8" instead of "ISO-8859-1". Later, Mozilla hangs due to Bug 169777 and I removed the XUL.mfl file. Then ... the internal page encoding was correct. Maybe this Bug is related to some general XUL.mfl problem. As a workaround remove XUL.mfl in the profile.
[Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.3a) Gecko/20021212] As described in comment 12, I received an email pointing to <http://www.01net.com/rdn?oid=199249&thm=UNDEFINED> which has no specified charset: Moz uses UTF-8 instead of my preferences setting of ISO-8859-1; Test case: Moz being closed, start Mail module, display email, click link, new browser window opens. Workaround: manually select ISO-8859-1 from "View > Character Coding" menu. As opposed to comment 16, deleting my XUL.mfl file did not help. (I tried twice.) Is there any mean to trace what happens, and how Moz chooses the charset ? PS: All bug reports are for various Windows releases (from Win95 to WinXP); Has anyone seen the bug on non-Windows system ?
Regarding comment 17: Yes, I take back my comment 16. Meantime, I found the following workaround: 1. Clear the charset caches (Entries shown in bottom half of View|Character Coding) by removing the following lines from prefs.js: user_pref("intl.charsetmenu.browser.cache", "UTF-8, windows-1252, ..."); user_pref("intl.charsetmenu.composer.cache", "UTF-8, ISO-8859-1"); user_pref("intl.charsetmenu.mailview.cache", "ISO-8859-15, UTF-8"); 2. Clear XUL.mfl and Cache\*.* The bug often occurs, when the link is in a mail. It does not occur when the link it pasted direct to the location bar. But I haven't found a reproducible situation right now. Does it depend on the ordering of the entries in browser.cache and mailview.cache? Possibly erroneously using mailview.cache index on browser.cache? Possibly related: Bug 148369, Bug 159295.
[Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.3a) Gecko/20021212] After my comment 17, Checking comment 18: *NB: My Disk Cache was (and is) disabled and size=0 (and its directory is empty). *My prefs.js file contains: user_pref("intl.charsetmenu.browser.cache", "ISO-8859-15, windows-1252, UTF-8"); user_pref("intl.charsetmenu.mailview.cache", "ISO-8859-15, ISO-8859-1"); NB: While ISO-8859-1 does not appear in the browser line, it does appear in the Character Coding (browser) menu along the 3 other charsets.!. PS: Next time I get the problem, I'll try your workaround(s)...
I just found a similar problem in Mail that may be related: I received a mail with correctly displayed umlauts and with a PDF file attached. After viewing the PDF directly in the mail window and then returning to the body text of the mail, the umlauts were suddenly corrupted to "ü" and similar character pairs (always two characters replacing one umlaut, not question marks as in the browser). I then looked into the source code of the mail, but there the umlauts were still displyed correctly! Some info from the mail header: Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 8BIT Maybe this helps you tracking down the problem.
Mozilla sometimes but not always does similar things in POST - Requests. On RedHat Linux 8.0.
[Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.4a) Gecko/20030401] Update on my comment 17: WorksForMe now: clicking directly on the link in the comment. I'm not saying that this bug is fixed: I think that I saw some '?' recently; We just have to narrow the cause. ***** Re comment 2: WFM too (now): both from link, or from local filesystem. ***** Re comment 13, and comment 15: WFM too (now). ***** Another probably related bug: see bug 182764 comment 1.
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030507] Confirming comment 22, on v1.4b and W2K... (see also bug 182764 comment 11 !)
Stumbled into http://www.improve-technologies.com/pages/Java/IDE/Eclipse/SWT_et_JFace/_Comparaison_SWT-JFace_et_Swing/ and found that french letters still display wrong (as bullets) Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.5) Gecko/20031007 Switching Character Code from UTF-8 to Western helps.
Flags: blocking1.6b+
blocking flags can only be set (+) by drivers. you can request (?) them.
Flags: blocking1.6b+
Flags: blocking1.6b?
Flags: blocking1.6?
OK. I have now requested for blocking. This bug and a I see there are at least one more for the same thing has existed since 1.2. That it has existed for such a long time and still is not fixed makes it look like Mozilla is not for the international user. At least it looks like it is of no priority. Mozilla really needs to fix the two basic things a non-ASCII user needs: URLs using non-ASCII and pages using non-ASCII. Both are still not working.
Without a specific reproducible test case and clear steps to reproduce recorded in this bug, it's unlikely that this bug is going to be fixed for 1.6.
I suspect this bug is invalid, and it's certainly in the wrong component.
Assignee: sgehani → smontagu
Component: XP Apps → Internationalization
Flags: blocking1.6b?
Flags: blocking1.6b-
Flags: blocking1.6?
Flags: blocking1.6-
QA Contact: pawyskoczka → amyy
WFM with Moz 1.7b WinNT4. I never had problems with that. My settings are: View->Character encoding->Auto-Detect: OFF View->Character encoding->Western(ISO-8859-1) see bug 238782 comment 5 for some useful hints "... because without any other information available, Mozilla is 'forced' to assume that linked pages use the same encoding as that of the linking page (in case of google, it's usually UTF-8)."
I strongly recommend that the default character encoding of Mozilla is changed to Western (ISO-8859-1) in the next release. At least in my Firefox 0.8, the default seems to be UTF-8, which leads to *lots* of problems with German web pages (and probably other languages, too). ISO-8859-1 is the most commonly used encoding, at least for German pages.
I can reproduce this in Mozilla 1.7rc2 on Redhat 9 with Gnome 2.2.0: 1. Install mozilla-i686-pc-linux-gnu-1.7rc2-installer.tar.gz 2. Run mozilla with a new profile. 3. Confirm these settings are in effect: Preferences -> Languages -> Default Character Coding = "Western (ISO-8859-1)" View -> Character Encoding -> Western (ISO-8859-1) View -> Character Encoding -> Auto-Detect -> (Off) 4. Visit http://www.tpub.com/math1/3d.htm (this page does not supply a character coding) 5. Note the first paragraph: "In some division problems such as..." Division symbols are displayed as question marks. 6. Select View -> Character Encoding The current setting is now Unicode (UTF-8). 7. Select Western (ISO-8859-1) The division symbols display correctly. 8. Reload It's back to UTF-8 again.
(In reply to comment #31) > 4. Visit http://www.tpub.com/math1/3d.htm > (this page does not supply a character coding) It supplies (the wrong) character coding in HTTP headers: HTTP/1.1 200 OK Date: Tue, 25 May 2004 16:44:10 GMT Server: Apache/2.0.46 (Red Hat) Accept-Ranges: bytes X-Powered-By: PHP/4.3.2 Connection: close Content-Type: text/html; charset=UTF-8
Hmm, I just viewed http://www.tpub.com/math1/3d.htm with Mozilla 1.7/Windows, character encoding is UTF-8 and not my default setting ISO-8859-1. BTW, when I wget this page, I can not see the line "Content-Type: text/html; charset=UTF-8" in the page source.
(In reply to comment #33) > BTW, when I wget this page, I can not see the line "Content-Type: text/html; > charset=UTF-8" in the page source. Yes, but if you use |wget -s| you will see it in the headers.
*** Bug 238782 has been marked as a duplicate of this bug. ***
*** Bug 182764 has been marked as a duplicate of this bug. ***
Using Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a3) Gecko/20040714 I find that opening the test case (from a file, not from Bugzilla), with AutoDetect=Universal, that Mozilla correctly determines it to be Western (the detector uses Windows-1252, not ISO-8859-1, for its western charset). I'm guessing there have been some improvements to the auto-detector since this bug was opened (Maybe bug 171813 or bug 183354 ?). And with Autodetect=off, loading the file does in fact fall back to the default charset specified in Preferences. Jürgen Weber, is this working for you?
Yes, it seems to work now, even the page from Comment #24. Thanks.
=>WFM per reporter's comment 38.
Status: NEW → RESOLVED
Closed: 22 years ago20 years ago
Resolution: --- → WORKSFORME
It does not work in Mozilla 1.7.1 (on Solaris) nor in Firefox 0.9.2 (on MS Windows). I have finaly found a way that works for me evey time. Do like this (prefs instructions for Mozilla, prefs are found in other places in Firefox): Edit->preferences->Navigator->laguages: set default character encoding to Western (ISO 8859-1) Edit->preferences->advanced->clear cache Go to: http://www.google.se/ view->character encoding now shows Unicode (UTF-8) search: utf-8 unicode test breve view->character encoding shows Unicode (UTF-8) open link "Unicode (UTF-8) Test" in new window (by middle klick or through menu). You now see page: http://www.unics.uni-hannover.de/nhtcapri/multilingual1.html view->character encoding shows Unicode (UTF-8) Go to: http://www.jonkoping.se/ in this window by typing new URL in location field. view->character encoding shows Unicode (UTF-8) menu in left border has ?-characters in for example: "V?r kommun" at the top. This is wrong! The above page is in Western (ISO 8859-1)! Mozilla and Firefox both retains Unicode instead of switching to Western (ISO 8859-1) as it should. Please remove RESOLVED and mark this as blocking for Firefox 1.0 and Mozilla 1.8. I have several users trying Fixfox/Mozilla on MS Windows instad of MS IE and they have gotten this bug. This bug will give very bad impression on my MS Windows users and they will not switch from MS IE.
I can reproduce what's described in comment #40. With the 'character encoding autodetector OFF', the default character encoding pref. is ignored and the character encoding of the current document is assumed when a new web page (which doesn't specify the character encoding) is opened by typing its address in the url bar. There may be another bug that's been already filed on this issue. Anyway, I have to see if there's a way to tell whether a page is opened by clicking a in the current page or typing an address in the url bar. In the latter case, we have to respect the default character encoding preference when no character encoding is specified.
Status: RESOLVED → REOPENED
Keywords: intl
OS: Windows NT → All
Hardware: PC → All
Resolution: WORKSFORME → ---
Summary: Umlauts display as question marks → default char. encoding is ignored and the encoding of the current page is used when opening a new page by typing the url in the url bar
(In reply to comment #41) > I have to see if there's a way to tell whether a page is opened by clicking a > in the current page or typing an address in the url bar. In the latter case, > we have to respect the default character encoding preference when no character > encoding is specified. Wouldn't make sense to assume the previous page's encoding only if the current page is on the same site (or maybe domain)? Clicking out of a search page is the obvious example here: there's no reason to assume a search result on Google is going to follow Google's UTF-8-ness. It shouldn't matter whether the link was clicked or entered. The extended technique of this would be to cache the encoding on a per-site basis. For instance: I open a page on somewhere.tld in new window, which is encoded in UTF-8 but is undeclared by server or <meta>. My default 8859-1 encoding is applied, so it looks wrong. I fix the encoding by menu. Henceforth, at least as long as the cache lasts, any undeclared page from somewhere.tld should default to UTF-8.
(In reply to comment #42) > Wouldn't make sense to assume the previous page's encoding only if the current > page is on the same site (or maybe domain)? Clicking out of a search page is > the obvious example here: Yup, I'm aware of that, but it's debatable. Anyway, that's a different issue. > The extended technique of this would be to cache the encoding on a per-site > basis. For instance: I open a page on somewhere.tld in new window, which is This had better be dealt with in yet another bug (as an enhancement), hadn't it? Why don't you file a bug for this?
(In reply to comment #43) > (In reply to comment #42) > > I'm aware of that, but it's debatable. Anyway, that's a different issue. I agree it's debatable, but it doesn't seem like a different issue to me, just a different approach. To solve this problem, there will likely need to be a new decision point to determine whether to fall back to the default encoding. The technique I suggest would (I assert) work to solve the specific problem in comment 40 and to fix the click-from-a-search-page example as well. If it ends up implemented in the long run, then the test for whether the URL was clicked or entered becomes moot. > This had better be dealt with in yet another bug (as an enhancement), hadn't > it? Why don't you file a bug for this? I'll think about this some more...
I have never really understood how the view->character encoding menu is to work. Help does not explain it good enough. >With the 'character encoding autodetector OFF', the default character encoding >pref. is ignored and the character encoding of the current document is assumed >when a new web page (which doesn't specify the character encoding) is opened by >typing its address in the url bar I am very doubtful that it is correct to use the same character encoding on a new page as the current one - it does not matter if you change page by clicking on a link or entering a new in the url bar. A link can go anywhere. I have always interpreted the character encoding displayed in view->character encoding to mean the encoding used on current page, not the encoding my window should always use. When I go to a web page I would expect character encoding identification to work like this: 1) If page sends a specified character encoding - use that. 2) If page does not specify an encoding the original sematics of HTTP/HTML was to use ISO 8859-1. Unfortunately many ignored this so now most expect you have to identify the character encoding in some heuristic way. One is to use the, by the user defined, defined character encoding. This is what I expected the preference defining default character set to mean. In this case the default character encoding is always used, never the one of previous viewed page. One alternative that is good is to do: first assume default character encoding and do automatic test for that. If test fails, try to auto detect the correct encoding. That is the only simple clear way I can think of just now. Assuming next page is of same character encoding as previous page does often get wrong character encoding for users browsing all over the world. It is also confusing for the user who will not understand why a page suddenly (one the have been to before) is displayed with wrong character set. It is very important to be consistent. As it is now the same page will sometimes be displayed correctely and sometimes not. The default character encoding preference and the view->character encoding could be better. For example: - In preference you set default encoding to use for pages without a specified encoding. Also, you can switch on verification and auto-detect of correct if verification fails. - The view->character encoding menu displays the assumed encoding used to decode the current displayed page. You can through this menu change it manually. This manu is not for changing between manual and automatic detection. That belongs in preferences. It is confusing with a menu displaying both currently used encoding of paged as well as changing to auto detect.
Please change the summary, it doesn't matter if I click a link (new window/tab) or enter the new URL in the URL bar. If the new page hasn't specified a charset encoding then the previous one will be used regardless of the default encoding. See for example bug 255241 and bug 266440.
Bug 65093 appears to be the same problem (encoding carried over from the linking page despite the official default being ISO-8859-1 and the user having his own default encoding preference). Since that bug appears to have been forgotten, perhaps it should be declared a duplicate of this one. A lot of German pages rely on the default; when you find them via Google (whose search result list displays in UTF-8), these pages are displayed wrongly (show gaps ~3 characters wide instead of Umlaut characters). Example: http://www.marktplatz-oberbayern.de/
(In reply to comment #47) > Bug 65093 appears to be the same problem [...] > Since that bug appears to have been forgotten, > perhaps it should be declared a duplicate of this one. I think you're right about the dupeness, altho that bug thrashes around and doesn't identify the problem until its cmts # 52,57,58. (Of course, this bug thrashes around until comment 40...) But, that bug at least has a patch, however obsolete it might be now. (In reply to comment #46) > Please change the summary, it doesn't matter if I click a link [...] > or enter the new URL in the URL bar. I'll leave that up to Jungshik.
I think that bug https://bugzilla.mozilla.org/show_bug.cgi?id=227631 have a cleanner and reproduceble description of this bug.
*** Bug 284275 has been marked as a duplicate of this bug. ***
QA Contact: amyy → i18n
Status: REOPENED → RESOLVED
Closed: 20 years ago4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: