Closed Bug 167136 Opened 22 years ago Closed 22 years ago

Allowed blank(space) glyph list have to be updated

Categories

(Core :: Internationalization, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jshin1987, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(5 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; ko-KR; rv:1.1b) Gecko/20020721 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; ko-KR; rv:1.1b) Gecko/20020721 In the page at the URL given above, Hangul Vowel filler(U+1160) is rendered as a question mark. The font specified in the page (CODE2000 : http://home.att.net/~jameskass) has the non-spacing blank glyph for U+1160, but Mozilla regards the glyph (blank) as invalid and falls back to the question mark for U+1160. Reproducible: Always Steps to Reproduce: 1.install CODE2000 font available at http://home.att.net/~jameskass 2. launch mozilla 3. go to http://jshin.net/i18n/korean/fillers.html Actual Results: Hangul vowel filler(U+1160) following Hangul leading consonants are rendered as a question mark. Expected Results: Hangul vowel filler(U+1160) should be rendered as a non-spacing/combining/zero-width blank. It's easy to fix and I'll attach the patch.
Attached patch patch (deleted) — Splinter Review
add U+1160 to the list of characters that are allowed to have 'blank' glyph. I haven't added U+115F(Hangul leading consonant filler) because it appears to be rendered fine without being added to the list..
Keywords: intl
Attached image a screenshot revealing the problem (deleted) —
intl.
Assignee: kmcclusk → yokoyama
Status: UNCONFIRMED → NEW
Component: GFX Compositor → Internationalization
Ever confirmed: true
QA Contact: petersen → ruixu
Keith Packard (a member of XFree86 Core team and the maintainer of fontconfig package) went through the Unicode char. table and came up with a more extensive list of characters that are supposed to have 'blank' visual representation (empty outline) (his original list came from Mozilla source) Below is the list taken from his email about the issue: range added to fc comments U+180B - U+180E no (but I don't have a Mongolian font to heck against) U+200C - U+200F yes (the Unicode description isn't clear) U+2028 - U+2029 no (these seem like they're supposed to be drawn) U+202A - U+202F yes (these also appear blank from the description) U+3164 yes (HANGUL FILLER, similar to U+1160) U+FEFF yes (byte order detector (ZERO WIDTH NO-BREAK SPACE)) U+FFA0 yes HALFWIDTH HANGUL FILLER (similar to U+3164) U+FFF9 - U+FFFB yes INTERLINEAR ANNOTATION marks for furigana I guess some of characters listed above are taken care of by Mozilla (e.g. ZWNBS/BOM), but I believe others have to be added. FYI, the related thread in XF86-font list begins at http://www.xfree86.org/pipermail/fonts/2002-September/002099.html
Although deprecated, U+206A - U+206D appear to have be included as well. As for U+206E and U+206F, I'm not sure. BTW, I'm wondering how these characters are handled in MacOS 9/X, gtk and X11. At least in gtk, Mozilla doesn't have this problem rendering the page given at the URL with the same truetype font(CODE2000). Are they handled at a higher layer before reaching to the lower level of font access?
QA Contact: ruixu → ylong
changing summary line because it's not just about Hangul Vowel filler but also involves many other characters. also reassigning it to myself.
Assignee: yokoyama → jshin
Summary: U+1160(Hangul Vowel filler) is rendered as a question mark → Allowed blank(space) glyph list have to be updated
A simplstic patch for this bug is just modify the macro to check if a char. is allowed to be blank. However, as comment #5 shows, there are a little bit too many of them to use a simple macro. Would there be a better way to deal with this list (a data structure?)?
Adding shanjian to CC to seek his opinion on the best way to represent the list of blank characters as he was the last one to change the line in question :-)
I ended up using CCMap. This may or may not be excessive for this simple task. It seems to be all right considering that the map is created only once per session at the beginning and CCMap accessor macro is fast. Shanjian, can you review?
A couple of issues to resolve: - find out which characters currently in the list are reliably filtered out (possibly in a platform-independent way) upstream and remove them from the list. It seems like what chars are filtered out is not platform-independent (e.g. nsFontMetricsWin does not get U+115f from upstream, while nsFontMetricsXft gets it unfiltered. I can't check how this is handled in Mac) - think about a need to make the list user-configurable (in prefs.js). Some fonts have _legitimate_ blanks glyphs in code points in PUA. Obviously, this cannot be hard-coded. With CCMap, it's easy to make this user-configurable.
jshin, Thanks a lot for doing this. Using CCmap is the right approach. This has been in my mind for quite some time and I haven't found the time to do it. I have a suggestion. Can you write a perl tools to generated the CCMAP in binary form instead of generate it in run time? That will shrink the memory footprint and improve starttime performance. We will need to apply similar approach in several other places. (Punctuation mark check in layout is one example.)
Attached file ccmapbin.pl (deleted) —
Shanjian, Attached is a simple perl tool to generate PRUint16 array for CCMap. Actually, it generates three files for LE/BE(16bit), BE(32bit) and BE(64bit). I tested the result (with a simple test program modified from printCCMap() in nsCompressedCharMap.cpp) on ix86 (32bit LE), Alpha(64bit LE), Sparc(32bit BE), and PA-Risc(32bit BE) and it worked fine. I couldn't find a 64bit BE machine (PR-Risc machine I used is 64bit but its long is only 32bit..), but I believe it should work well there, too. Can you tell me where else we need this (nsFontMetricsGTK.cpp is one of them)? Perhaps, I'll file a new bug (to put 'precompiled CCMap' in place of character list) and make this bug dependent on it. BTW, currently, it just works on BMP, but can be extended easily.
Thanks for your greak work!!! punctuation checking in nsTextFrame is a sure thing: http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsTextFrame.cpp#4645 CJK and hangul check in linebreaker is questionable, http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp I am sure that we will need this in some other places now and in future.
Shanjian, Thank you for your kind words. I filed a new bug 180266 for this and am going to make this bug depend on it. I didn't have to, but it seems like it's more 'conceptually' clear... I added Shanjian to CC of bug 180266 and anyone here is welcome to add her/himself to CC list there. > I am sure that we will need this in some other places now and in future. So am I :-) Especially, I guess Mozilla may need to look up Unicode char. class table in several places (line breaking, rendering/layout - bidi, ... text boundary identification, editing - search and replace, etc) Most TRs at http://www.unicode.org/reports appear relevant to adopting this approach in Mozilla in one way or another (UTR 14, UTR 29, UTR 9, UTR 13 to name just a few)...
Depends on: 180266
180266 was just resolved and accordingly this is, too.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: