Closed Bug 227631 Opened 21 years ago Closed 4 years ago

Character encoding can be wrong when opening link in new window.

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: ryde, Assigned: jshin1987)

References

()

Details

(Keywords: intl, Whiteboard: dupeme)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Firebird/0.7 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Firebird/0.7 A webpage that HAS a specified character encoding (ex. google utf-8) contains links to other locations that does NOT have character encodings specified. Opening one of these links in a NEW WINDOW will render it using the previous webpage character encoding instead of the default iso-8859-1. Reproducible: Always Steps to Reproduce: 1. Exit FireBird an save its profile: rename "Application Data\Phoenix" to Phoenix_x 2. Start FireBird (a new profile will be created) and goto http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=http%3A%2F%2Fwww.puttes.se%2Fsmorgasar%2Fraksmorgas.htm&btnG=Google+Search 3. Open the search result link in a NEW WINDOW (via the rightclick context menu). Actual Results: A webpage that is rendered using the wrong character encoding utf-8. Many of the charecter is displayed as '?'. Expected Results: It should have displayed a webpage that is renderd using the default character encoding iso-8859-1. It is important to clean the profiles by renaming or removing the "Application Data\Phoenix" dir, and let it create a new fresh profile each time, since FireBird saves the character encoding in several files, and this can be very confusing. It can be noted that opening the link in a NEW TAB instead will result in a correctly displayed webpage.
It shows ? no matter how I open the link. I didn't make a new profile, I don't understand how that should matter. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031206 Firebird/0.7+
Works for me. Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.6b) Gecko/20031207 Firebird/0.7+ Have you set View --> Character Coding --> Autodetect to Universal? I've noticed when I install new builds that it defaults to Off, so, if this solves it then maybe this needs to be changed so it defaults to Universal.
Jason: As I said, FireBird saves the character information in several files (I guess in cache, and bookmarks), so it will reuse the last encoding for that website next time you visit it. Blair: Ok, setting Autodetect to universal works, almost. The characters are renderd correctly in this special case, but the charset is Windows-1252, wrong IMHO. Does'nt the Autodetect feature try to guess the charset depending on the content? Autodetect is only needed when you have a webpage that does not use the default (HTML spec.) charset iso-8859-1 and also does not specify which one it is using, so it will try to guess, and in this case wrongly. I still think there is a initialization bug lurking around here, especially considering the difference between opening a new window and opening a new tab. The latter works correct, the former incorrectly uses the charset from the referring page.
The autodetect feature is dangerous and definitely detects wrong sometimes. I recently visited a site that got renderd using Chinese Simplified GB18030, but it should have been iso-8859-1. Entire numberseries became '?'.
What's requested here is : When opening in a new window/new tab, the default charset (specified in the preference) should be used instead of the 'parent' charset __when no other source of information exists__. This is not always desirable. Your google example is one of cases where that's desirable, but there are other cases. For example, Russian web pages are split between KOI8-R and Windows-1251 with each taking about the equal share. Russian users usually set their default encoding to either of them. Suppose the default encoding is set to Windows-1251 and a link (in KOI8-R encoded page) is requested to be opened in a new window/tab via the context menu. With what you asked for implemented, the link would be opened in Windows-1251 even though the chance is pretty high that it's in KOI8-R (assuming that the link is internal.) The same can happen with Shift_JIS and EUC-JP for Japanese web pages. It can happen even for Western European pages (ISO-8859-1 vs ISO-8859-15). Needless to say, if everybody specifies the charset in their pages/sites, there would be no problem. One possible solution would be to use the default encoding instead of the parent document encoding ONLY if the parent document encoding is UTF-8 (and other encoding forms of Unicode such as UTF-16, UTF-32) because usually UTF-8 encoded pages are explicitly tagged. > charset is Windows-1252, wrong IMHO. Windows-1252 is a proper superset of ISO-8859-1. Are you sure the page in question doesn't have a single character not covered by ISO-8859-1 but covered by Windows-1252? Anyway, mistaking ISO-8859-1 for Windows-1252 doesn't do any harm when rendering the page. re: comment #4 You're right that it doesn't always work. Instead of 'universal charset', you may want to use one of more restricted detectors.
Assignee: blake → jshin
Severity: normal → enhancement
Component: General → Internationalization
Keywords: intl
OS: Windows 2000 → All
Product: Firebird → Browser
Hardware: PC → All
Version: unspecified → Trunk
Whiteboard: dupeme
Adding a special case for UTF* might help in the google case, but not the others. A more reasonable approach would be to add a "Use parent charset" setting in the "View -> Character Coding" menu and everyone will be happy including the russians. Please note that opening in NEW TAB or SAME WINDOW is diffrent from opening in NEW WINDOW. Thus I still consider this a bug, not an enhancement. But anyway, there is contradiction in the argument of using the parent charset. Is the entire web build from one mother page defining what charset to use? Why do we have the ability to select a default charset then? If the russian "double default character encodings" trouble is to be solved then we need: selectable multi default character encodings (that works rather similar to Auto-Detect). Um, BTW, there is an Auto-Detect - russian. Does'nt this work for the russian websites?
Can you tell me why opening in a new tab is different from opening in a new window? Also, can you tell me your scenario where using 'the default' charset is better? > But anyway, there is contradiction in the argument of using > the parent charset. Is the entire web build from one mother page > defining what charset to use? Why > do we have the ability to select a default charset then? You didn't pay attention to 'when NO other souce of information is available' part. Mozilla rely on several different (actually almost 10) sources of information to determine the document charset. The parent charset and the default charset take rather __low___ priority in that mechanism.
> Can you tell me why opening in a new tab is different from opening in a new > window? Why? I have not looked at the source, but I guess it's a bug. If you follow the link http//www.ryde.net/bug/link.html there is a "how-to" to reproduce the bug. Important please note: the profile needs to be removed between the tests. This is the very problem: Open in SAME WINDOW or NEW TAB works perfectly, but open in NEW WINDOW does not. > Also, can you tell me your scenario where using 'the default' charset is > better? Becouse this is the most common charset for the pages _I_ visit when __no other source of information exists__. And I dont count the referring page relevant.
Ok, it might be relevant if the parent (referring) page is in the same domainname as the new, then the parent charset can be used if no other source of information exists.
In Firefox 0.9 the situation is even worse. Instead of printing '?' for the misencoded characters it randomly removes entire words and sentences.
Severity: enhancement → normal
I really think that it is a (serious) bug. If you: 1. enter google.com; 2. hit Ctrl+n to open a new window; 3. enter bol.com.br in the new window. The new window will use the encoding specified in google.com. This is odd. First of all, this new window aren't related to the old window. And, of course, the user has choosed to use the *default* enconding (i.e., the user has choosed to *not use* the autodetect enconding feature). Why a new window shouldn't use the default enconding?
I agree with Daniel. It simply does not make any sense that a window inherits properties from a not related window. Even though the bug disappears with the autodetecting setting, this is not the the default option! Another point is that, even if the autodetect setting is turned off, if you open google.com, close Firefox and open bol.com.br, the bugs does not show up. That show us that the browser works with bol.com.br with the autodetect setting off, and that the bug is caused due to the fact that the Firefox is not handling correctly with different enconding of the two pages. I do not understand why this bug is still unconfirmed, since so many people are complaining about it.
(In reply to comment #11) > I really think that it is a (serious) bug. > > If you: > 1. enter google.com; > 2. hit Ctrl+n to open a new window; > 3. enter bol.com.br in the new window. > > The new window will use the encoding specified in google.com. That is bug 158285.
*** Bug 266440 has been marked as a duplicate of this bug. ***
How about this? If the character encoding of the current (parent-to-be) document/window is UTF-8 (or other forms of Unicode), a new window will be opened without any pre-set character encoding so that the default character encoding (set in the user's pref.) will be used. This will not fix all the problems, but will solve most of problems. Why can't I just do the above for other encodings? Well, 'follow the parent/referer encoding-heuristic' was introduced because that's needed (for Japanese and Russians) and I don't want to break cases that needs it.
(In reply to comment #15) > How about this? If the character encoding of the current (parent-to-be) > document/window is UTF-8 (or other forms of Unicode), a new window will be > opened without any pre-set character encoding so that the default character > encoding (set in the user's pref.) will be used. This will not fix all the > problems, but will solve most of problems. > I think the real problem here is "what should the charset be when new window is opened and the page doesn't contain charset info?" We had number of cases where the parent-to-be page had a charset; but child pages had no charset information. (If bol.com.br has meta-charset, then it will be displayed correctly even after ctl+n from google.com) The real fix is to have meta-charset in all pages.
(In reply to comment #16) > I think the real problem here is "what should the charset be > when new window is opened and the page doesn't contain charset info?" Yes, exactly. Why not use the character encoding defined as default in user preferences? I think this is the behaviour that I expect.
(In reply to comment #17) > Why not use the character encoding defined as default in user preferences? I can't agree with you 100%. Some point in time, I believe, Netscape/Mozilla used the char encoding defined in user pref. However, the behavior caused the problem where the non-meta-charset defined pages are always displaying the page using the default. Appearently, users wanted to have the SMARTer behavior to inherite the charset from parent page. ( our ex-netscape evangelist momoi-san may be able to give us more info ) Having said I agree from the comment #12 where "It simply does not make any sense that a window inherits properties from a not related window." The key is __NOT RELATED WINDOW___ Current implementation is to fix the encoding problem with assumption that new window is related to the parent. (Incidently, we decided to NOT to inherite properties for new tab window. Tab browsing was introduced later in the dev cycle) IMHO, THE REAL FIX IS TO HAVE META-CHARSET IN ALL HTML PAGES and we should close this bug. ( I am sure the same bug will re-surface again in future though.... )
Related to this seems to be bug 158285.
There should be a way to set the encoding, couse we live in a real world, and many sites are only checked with IE. If we want to get Mozilla/Firefox to be used international, we must make most of the pages usable as they are. So let me overwrite any settings, if I know, wat I'm doing. I hate to set the encoding to ISO 8859-1 on each reload. I testes it with the new 30gigs.com Mailsystem in german. There is no encoding set. The umlaute are symbols (? in a box) here. I set from Unicode to ISO, press reload, set from Uncode to Iso and so on. This is a kind of NONSENSE too!
(In reply to comment #20) > There should be a way to set the encoding, couse we live in a real world, > and many sites are only checked with IE. > If we want to get Mozilla/Firefox to be used international, > we must make most of the pages usable as they are. Maybe to make behaviour more similar to IE you should make the broswer autodetect the character encoding. In my case (*), I could go to the menu "View -> Character Encoding -> Auto Detect" and choose Universal. I would also advice you to contact the webmaster of the site you are visiting to make the problem known to all concerned parties. > So let me overwrite any settings, if I know, wat I'm doing. > I hate to set the encoding to ISO 8859-1 on each reload. > I testes it with the new 30gigs.com Mailsystem in german. > There is no encoding set. The umlaute are symbols (? in a box) here. > I set from Unicode to ISO, press reload, set from Uncode to Iso > and so on. > This is a kind of NONSENSE too! In general, when the character encoding is unspecified through the HTTP request or through an appropriate HTML meta tag, the default encoding in Mozilla Firefox (and other Mozilla browsers, I think) is ISO-8859-1, as the HTML standards suggest. This behaviour can be changed. In my case (*), I can go to the options panel (menu "Edit -> Preferences")m and in the "General" section select "Languages", where I can change the default character encoding. The fact that when you reload you lose the character encoding is strange but unrelated to this bug. In principle, the character encoding is chached, which means that when revisiting that page or when reloading normally (and maybe even when force-reloading), the character encoding is preserved. Are you sure that the encoding is not set through HTTP or through HTML? If it is not so, you should look for a bug report that matches your description or open a new one. I cannot make further diagnosis because I cannot visit <http://3gigs.com> in German. In English, the character encoding is specified --at least-- through HTML meta tags... Since your report is not relevant to the bug described in this page, please continue your enquiries elsewhere, unless you have not expressed yourself correctly. Feel free to contact me via e-mail for help regarding this topic, or ask around the Mozillazine forums [http://forums.mozillazine.org/]. Cheers. (*) I am using Mozilla Firefox 1.0.7 under Debian GNU/Linux, English version.
QA Contact: i18n

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

Not inhering encodings across navigations for security reasons.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.