Closed Bug 107217 Opened 23 years ago Closed 15 years ago

Cyrillic is rendered with a double-width font in UTF-8.

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME
mozilla1.2alpha

People

(Reporter: mikhailian, Assigned: shanjian)

References

(Depends on 1 open bug, )

Details

(Keywords: intl)

Attachments

(1 file)

Cyrillic text is rendered with a double-with font in utf-8, as if it was Chinese. Apparently, there is no way to change this setting in the user configs. See http://bellinux.sourceforge.net/mikhailian/ for an example of such a text. Mozilla version is 0.9.5, built with the following options: ./configure '--enable-optimizations=-O4\ -finline\ -fno-omit-frame-pointer\ -march=pentiumpro\ -mcpu=pentiumpro' --disable-debug --enable-svg --enable-mathml --prefix=/usr/local/mozilla-9.5
Font Metric change may have caused this problem.This looks ok on my Windows. Teruko: can you verify on Windows? -> Assiging to bstell for Linux
Assignee: yokoyama → bstell
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.7
I forgot to add that the problem appears only on Linux. Yet another, more visible example of incorrectly rendered page is http://bellinux.sf.net/lite/ I also attach the output of xlsfonts.XFree version is 4.1.0-7 from the Debian distro
--> ftang
Assignee: bstell → ftang
Status: ASSIGNED → NEW
If you get user-specified font for x-unicode lang group effective (see bug 91190. it's turned off now and you have to modify the source and build mozilla), Cyrillic letters in UTF-8 pages will be rendered with glyphs from 'NON-CJK' fonts most of time. Why 'most of time'? Because some iso10646-1 fonts in XF86 4.x have exactly the same FFRE(foundry,family,repertoire,encoding) but with different values in 'additional style' field and it's not possible to differentiate them from one another at the moment. There are so many complicated things Mozilla have to deal with in picking up appropriate fonts and glyphs in X11. On font-front, MS Windows makes the life of developers much easier than X11. Hopefully, X Render (a new extension introduced in XF86 4.x and already used by KDE 2.x) will make things much simpler for X11 developers including Mozilla developers than now.
Depends on: 91190
Keywords: intl
shanjian- please take a look at this one.
Assignee: ftang → shanjian
Well, cyrillic (and in particular Bulgarian) is rendered as double-width in most UTF-8 (all?) e-mail messages for a long time (>Mozilla 0.9.3). My system: Win98SP2 Japanese distribution, Mozilla 0.9.5 The two URLs mentioned are also double-width in my Win98.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
accepting
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.8 → mozilla0.9.9
reset TM.
Target Milestone: mozilla0.9.9 → mozilla1.2
I'm not sure if this is the same thing or a different bug. Both Greek and Cyrillic appear to have incorrect letter-spacing on this page (in Linux): http://www.unicode.org/unicode/standard/WhatIsUnicode.html. I think this is actually a very good page to test Unicode capabilities on (the ZWSP bug shows up here too).
>I'm not sure if this is the same thing or a different bug. It is the same, I think.Jungshik Shin already explained why this happens. The same bug can occur for any non-CJK language for which there are symbols in CJK (double-width) fonts.
I can confirm this problem for Mozilla 1.0rc2 for Linux. Test page: http://www.cl.cam.ac.uk/~mgk25/ucs/wgl4.txt (line 0410, etc.) Most likely cause of the problem: Mozilla takes a glyph out of a CJK font with a higher priority than out of an ISO10646-1 font, because CJK fonts provide usually full coverage (so Mozilla knows what is in them), whereas all ISO10646-1 fonts are necessarily just subset fonts, and testing a font for the presence of each glyph can be inefficient (unless implemented properly). Suggested solution: When Mozilla finds any ISO 10646-1 font, then it should take from any higher-priority CJK fonts only those glyphs that fall into one of the following Unicode ranges: 2380..D7AF,F900..FAFF,FE30..FE6F,FF01..FF5E,FFE0..FFE6,20000..2FFFF. For comparison, the corresponding Unicode blocks are: 2E80..2EFF; CJK Radicals Supplement 2F00..2FDF; Kangxi Radicals 2FF0..2FFF; Ideographic Description Characters 3000..303F; CJK Symbols and Punctuation 3040..309F; Hiragana 30A0..30FF; Katakana 3100..312F; Bopomofo 3130..318F; Hangul Compatibility Jamo 3190..319F; Kanbun 31A0..31BF; Bopomofo Extended 31F0..31FF; Katakana Phonetic Extensions 3200..32FF; Enclosed CJK Letters and Months 3300..33FF; CJK Compatibility 3400..4DBF; CJK Unified Ideographs Extension A 4E00..9FFF; CJK Unified Ideographs A000..A48F; Yi Syllables A490..A4CF; Yi Radicals AC00..D7AF; Hangul Syllables F900..FAFF; CJK Compatibility Ideographs FE30..FE4F; CJK Compatibility Forms FE50..FE6F; Small Form Variants FF00..FFEF; Halfwidth and Fullwidth Forms 20000..2A6DF; CJK Unified Ideographs Extension B 2F800..2FA1F; CJK Compatibility Ideographs Supplement Such a restricted mapping of a CJK font could easily be used with a higher priority than an ISO 10646-1 font, without troubling European users with doublewidth Greek, Cyrillic and Blockgraphics glyphs.
Does the document indicate a language? If a document just has a encoding tag of Unicode how should an app say the the user wants CJK glyphs or western glyphs? Japanese users could reasonably argue that a Japanese width glyphs should be used. Cyrillic users could reasonably argue that a Cyrillic width chars should be used. For a while Mozilla ignored a variety of chars in CJK fonts such as smart quotes. Moz did this because the width of a CJK smart quote was far too big for a western document. However, CJK users then complained that they could not access the right width smart quotes for CJK documents; the western smart quotes were too narrow. Mozilla no longer ignores these chars but instead first tries to find a font in the language group; ie: a western font for western documents, a Japanese font for Japanese docs, etc. Yes, mozilla avoids iso10646 fonts because it is so expensive to find out what is in them. The problem is that XLFD registry-encoding of iso10646-1 only says Unicode but gives no clue about what chars a font has. To find out Moz needs to call XLoadQueryFont for *every* font it looks at until a it finds one. This is extremely expensive and avoided when ever possible. > testing a font for the presence of each glyph can be inefficient (unless > implemented properly). Could you describe an efficient method? > Suggested solution: When Mozilla finds any ISO 10646-1 font, then it should > take from any higher-priority CJK fonts only those glyphs that fall into one > of the following Unicode ranges: How would this work for Japanese users that want wider chars?
> If a document just has a encoding tag of Unicode how should an app say the > the user wants CJK glyphs or western glyphs? Japanese users could reasonably > argue that a Japanese width glyphs should be used. Cyrillic users could > reasonably argue that a Cyrillic width chars should be used. Mozilla 1.0rc2 uses always double-width cyrillic and block graphics characters from CJK fonts, even if the source is a HTML 4.01 file with LANG=en or LANG=ru as an attribute of the HTML element. As far as I can tell, language tagging does not influence Mozilla's choice of glyphs at the moment. Example file: http://www.cl.cam.ac.uk/~mgk25/ucs/wgl4.html It would be desireable if the non-ideographic characters are taken from CJK fonts *only* if the HTML language tag (or in its absence as a fallback perhaps the URL/DNS country code) suggests that the document is in a CJK language. By default, Mozilla should follow the same width convention as xterm, which is documented in http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c The old CJK terminal emulator habit of rendering every character from a double-byte encoded font as a double-width character has always been an accidental typographic menace and must by no means be carried over and generalized for non-CJK languages into the Unicode world.
>It would be desireable if the non-ideographic characters are taken from CJK >fonts *only* if the HTML language tag (or in its absence as a fallback perhaps >the URL/DNS country code) suggests that the document is in a CJK language. Before falling back (if this would ever be implemented), Mozilla can also check the accept-language parameter of the HTTP request. As of my day-to-day experience, I can tell that non-CJK users tend to remove all the CJK fonts from the system to bypass the problem. This is not much of a loss because occasional CJK glyphs can be displayed with a unicode font as well. What causes more harm is that many users can not figure out how to deal with this double-sized vs. `normal' issue at all. Thus, proper documentation and/or a configuration parameter will also help to solve the issue.
> As far as I can tell, language tagging does not influence Mozilla's choice > of glyphs at the moment. Are you seeing no language effect or are you saying you would like iso10646 fonts to be considered equal with non-iso10646 fonts? Erik and I have struggled with the iso10646 problem for years now and lacking some way to solve this we have had no choice but to only use them as a last resort. Making *all* page layout performance suffer so that iso10646 fonts can be used is not attractive. Does this system have Cyrillic fonts other than iso10646? If non iso10646 Cyrillic fonts are available then the Cyrillic font searching code needs work. > It would be desireable if the non-ideographic characters are taken from CJK > fonts *only* if the HTML language tag (or in its absence as a fallback perhaps > the URL/DNS country code) suggests that the document is in a CJK language. I think there is agreement that Mozilla should use glyphs appropiate for the document's language group. However, until there is an reasonable way to find out what is in a iso10646 font; those fonts will only be used as a last resort.
dup of bug 163754, perhaps?
QA Contact: teruko → i18n
WORKSFORME for some time now.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: