Closed Bug 63608 Opened 24 years ago Closed 16 years ago

What's related (related links) sidebar can't handle non-Western characters

Categories

(SeaMonkey :: Sidebar, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jshin, Unassigned)

References

()

Details

(Keywords: intl)

*Symptom: What's related (related links) sidebar renders everything in a font for ISO-8859-1 so that non-Western-European characters are not rendered correctly in 'related links' sidebar. The default encoding (set in Edit|Preference|Navigator |Language) doesn't seem to affect how 'related links' sidebar is rendered. * To reproduce: go to any popular non-English (or non-Western -European) sites likely to have related links in non-Western-European and see how related links are rendered in 'related links' sidebar. Change the default encoding in Preference dialog to the encoding of the site and see if there's any difference. * Suggestion: Given that there are so many sites with no encoding information or incorrect encoding information, it's very hard to get right the encoding of links in 'related links' sidebar. However, as a zeroth order approximation, Mozilla may render everything in 'related links' sidebar as if they're in the default encoding set in Preference dialog. If a user sets her/his default encoding to a particular encoding, sites (s)he visits are more likely to be in that encoding than any other encodings and so are related links to sites (s)he visits. However, for some users, only a small portion of sites visited (let alone being the majority which might be the case for some users) may be indeed in the default encoding. For them, the zeroth order approximation doesn't work very well. A better approach would be use the encoding of the page currently being viewed to render links in 'related links' sidebar. There's no guarantee that related links to the site are in the same encoding as the site, but it's pretty likely that they are. Maybe, in case the encoding of related links can be explicitly determined (from http header or meta tag), that should be honored and two approximations mentioned above can be used only as fallbacks when the encoding cannot be determined explicitly (or the encoding is cached in memory from the previous visit). There's a problem, though, with depending on the encoding info. provided via http header or meta tag because there are a lot of sites that get it wrong.
The url for a site with non-English related links is added.
QA contact to blee since he is familiar with international issues involved in the Related links. I thought the related links (NS6) is tagged with UTF-8 encoding so that the problem has to do with the server database not bein correct.
QA Contact: shrir → blee
This needs to be fixed for the next release -- added the nsbeta1 keyword. This may be a server-side problem. In the first What's Related release, the server only converted Latin1 and Japanese encodings to UTF-8. I don't know if the WR vendor fixed this for other encodings. So the first step would be to verify that the client is receiving good data from the WR server.
Keywords: nsbeta1
Adding tpringle, kmurray and lbaliman to cc: list.
Assinging to TPringle for resolution from Alexa. May need help from Bob Jung. Adding Bobj to cc: list.
Assignee: matt → tpringle
Priority: -- → P2
nav triage team: Marking nsbeta1+
Whiteboard: nsbeta1+
nav triage team: Resetting priority so that this bug gets retriaged.
Priority: P2 → --
Changing QA Contact to andreasb.
QA Contact: blee → andreasb
Removing nsbeta1+ from status whiteboard, need to figure what to do in general with what's related.
Whiteboard: nsbeta1+
Changing QA contact to jonrubin@netscape.com.
QA Contact: andreasb → jonrubin
Jon : is this still a problem in NS 6.01? Which uses the Netscape WR tab rather than the Alexa tab.
Is Alexa sending back info correctly converted to UTF-8? In Alexa's original implementation, it only did so for ISO-Latin1 and Japanese charset encodings. I don't know if Alexa ever fixed its server to convert other charsets (e.g., Korean charsets) to UTF-8. It appears from external usage, that Mozilla only displays ISO-Latin1 WR titles. For pages with non-ISO-Latin1 titles (even Japanese), the WR sidebar displays the URL. Is this because that is what Alexa is returning? Or is the browser doing something? Netscape 4.x displayed Japanese titles in the WR dropdown. But it appears that the WR info returned to 4.x is different than what is returned to Mozilla. When I try both, I get a different list of related URLs for the same URL. Are they pointing to the same WR server/URL or is Alexa sniffing the browser?
Ccing Matt and Myron - do you guys know the answer to this?
Does this affect 6.01 as vishy asked. We are checking in that code instead of using the Alexa tab. If it doesn't this bug is only for mozilla and not netscape
Vishy, 6.01 appears to be fine. Japanese characters to display correctly.
I checked Korean as well. I can see Korean characters in 6.01, but I cannot verify as to whether they make any sense. But Japanese is definitely displaying properly.
In NS 6.0, Korean characters look *mostly* fine, but Korean characters in some sites are treated as if they're ISO-8859-1. I guess this is due to the fact that they're regarded as *non-Korean* sites (and as a result the conversion to Unicode was not done properly) when the DB entries for them were made. For instance, try <http://www.ohmynews.com> and there are two entries in 'What's related' with Korean characters properly displayed and 5 entries with Korean characters garbled (rendered as though they're ISO-8859-1).
marking as nsbeta1- per i18n triage.
Keywords: nsbeta1nsbeta1-
Todd - Who's got the answer to this one? This is a server side issue, correct.
Matt, do you think Shawkat would know?
Assigning TM = M0.9.2 | P3. Linda - Can you work with Todd on this one?
Keywords: rtm
Priority: -- → P3
Target Milestone: --- → mozilla0.9.2
Todd, this was first reported quite a while ago, so I checked W/R results today (05.25.01) and I am seeing corrupted (German) characters in the W/R results. Please let me know how you want to proceed and what I can do to help.
Lynn, I think I see What's Related extended characters working in DE 6.1b, but not FR 6.1b. Would you please confirm? Teruko, would you please check JA 6.1b?
It depends what you're looking at. The default language for the browser is currently set wrong on FR, so you won't see the right information.
Assignee: tpringle → vishy
Target Milestone: mozilla0.9.2 → mozilla1.0
Changing milestone, reassigning to vishy.
Keywords: nsBranch
I see a lot of sites are still broken, even netscape one' for example http://www.atour.co.jp/golf/index2.html http://home.netscape.com/zh/tw/ http://home.netscape.com/zh/cn/ http://home.netscape.com/ko/ http://www.edu.cn/ etc. This is a server side issue. I think we should first run the top 100 intl QA sites against this bug (See how many of the what's related links for those top 100 intl sites are borken) . It is very sad that this kind of problem still happen after years of intergrating "What's related" service into the client.
I quickly walked through the JA top 100 sites and at least the following sites are broken for "What's related": http://www.rakuten.co.jp http://www.cool.ne.jp http://www.tok2.com http://www.suntory.co.jp http://www.otd.co.jp http://www.fujitv.co.jp http://www.melma.com http://www.alpha-net.ne.jp
-> samir for investigation with help from Frank.
Assignee: vishy → sgehani
Target Milestone: mozilla1.0 → mozilla0.9.5
mass change, switching qa contact from jonrubin to ruixu.
QA Contact: jonrubin → ruixu
Blocks: 99227
Marking nsbranch- as it was decided in the August bug triage that we wouldn't have eenough time in eMojo to fix this. Let's revisit for MachV.
Keywords: nsbranch-
I just tried these again: > http://www.rakuten.co.jp > http://www.cool.ne.jp > http://www.tok2.com > http://www.suntory.co.jp > http://www.otd.co.jp > http://www.fujitv.co.jp > http://www.melma.com > http://www.alpha-net.ne.jp and all but fujitv returned Japanese results in the What's Related sidebar panel. fujitv reported no related links at all.
removed keyword nsbranch since it now has nsbranch-, per pdt mtg.
Keywords: nsbranch
Mass-moving lower-priority 0.9.5 bugs off to 0.9.6 to make way for remaining 0.9.4/eMojo bugs, and MachV planning, performance and feature work. If you disagree with any of these targets, please let me know.
Target Milestone: mozilla0.9.5 → mozilla0.9.6
No longer blocks: 99227
Blocks: 107067
Keywords: nsbranch-
Moving to mozilla0.9.7.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
-> mozilla0.9.9
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Sidebar triage team: commercial client invesitgation to be done by Sujay with help from i18n QA. (The mozilla What's Related content is entirely web-based.)
Target Milestone: mozilla0.9.9 → Future
No longer blocks: 107067
As per Sujay's email, filed bug 12290 in bugscape for commercial build.
Product: Browser → Seamonkey
Assignee: samir_bugzilla → nobody
Priority: P3 → --
QA Contact: ruixu → sidebar
Target Milestone: Future → ---
Depends on: 468337
Currently the Alexa server seems smart enough to return just the URL as the link if the <title> of the linked page contains non-western characters. Please re-open if this is not the case.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.