Closed Bug 167136 Opened 22 years ago Closed 22 years ago

Allowed blank(space) glyph list have to be updated

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jshin1987, Assigned: jshin1987)

References

(
URL
)

Details

(Keywords: intl)

Attachments

(5 files)

patch 22 years ago Jungshik Shin (deleted), patch		Details \| Diff \| Splinter Review
a screenshot revealing the problem 22 years ago Jungshik Shin (deleted), image/jpeg		Details
a screenshot taken with a patched mozilla 22 years ago Jungshik Shin (deleted), image/jpeg		Details
a new patch using CCMap (with a more extensive list of blank chars) 22 years ago Jungshik Shin (deleted), patch		Details \| Diff \| Splinter Review
ccmapbin.pl 22 years ago Jungshik Shin (deleted), text/plain		Details

Jungshik Shin

Assignee

Description

•

22 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; ko-KR; rv:1.1b) Gecko/20020721 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; ko-KR; rv:1.1b) Gecko/20020721 In the page at the URL given above, Hangul Vowel filler(U+1160) is rendered as a question mark. The font specified in the page (CODE2000 : http://home.att.net/~jameskass) has the non-spacing blank glyph for U+1160, but Mozilla regards the glyph (blank) as invalid and falls back to the question mark for U+1160. Reproducible: Always Steps to Reproduce: 1.install CODE2000 font available at http://home.att.net/~jameskass 2. launch mozilla 3. go to http://jshin.net/i18n/korean/fillers.html Actual Results: Hangul vowel filler(U+1160) following Hangul leading consonants are rendered as a question mark. Expected Results: Hangul vowel filler(U+1160) should be rendered as a non-spacing/combining/zero-width blank. It's easy to fix and I'll attach the patch.

Jungshik Shin

Assignee

Comment 1

•

22 years ago

Attached patch patch (deleted) — Details — Splinter Review

add U+1160 to the list of characters that are allowed to have 'blank' glyph. I haven't added U+115F(Hangul leading consonant filler) because it appears to be rendered fine without being added to the list..

Jungshik Shin

Assignee

Updated

•

22 years ago

Keywords: intl

Jungshik Shin

Assignee

Comment 2

•

22 years ago

Attached image a screenshot revealing the problem (deleted) — Details

Jungshik Shin

Assignee

Comment 3

•

22 years ago

Attached image a screenshot taken with a patched mozilla (deleted) — Details

Boris Zbarsky [:bzbarsky]

Comment 4

•

22 years ago

intl.

Assignee: kmcclusk → yokoyama

Status: UNCONFIRMED → NEW

Component: GFX Compositor → Internationalization

Ever confirmed: true

QA Contact: petersen → ruixu

Jungshik Shin

Assignee

Comment 5

•

22 years ago

Keith Packard (a member of XFree86 Core team and the maintainer of fontconfig package) went through the Unicode char. table and came up with a more extensive list of characters that are supposed to have 'blank' visual representation (empty outline) (his original list came from Mozilla source) Below is the list taken from his email about the issue: range added to fc comments U+180B - U+180E no (but I don't have a Mongolian font to heck against) U+200C - U+200F yes (the Unicode description isn't clear) U+2028 - U+2029 no (these seem like they're supposed to be drawn) U+202A - U+202F yes (these also appear blank from the description) U+3164 yes (HANGUL FILLER, similar to U+1160) U+FEFF yes (byte order detector (ZERO WIDTH NO-BREAK SPACE)) U+FFA0 yes HALFWIDTH HANGUL FILLER (similar to U+3164) U+FFF9 - U+FFFB yes INTERLINEAR ANNOTATION marks for furigana I guess some of characters listed above are taken care of by Mozilla (e.g. ZWNBS/BOM), but I believe others have to be added. FYI, the related thread in XF86-font list begins at http://www.xfree86.org/pipermail/fonts/2002-September/002099.html

Jungshik Shin

Assignee

Comment 6

•

22 years ago

Although deprecated, U+206A - U+206D appear to have be included as well. As for U+206E and U+206F, I'm not sure. BTW, I'm wondering how these characters are handled in MacOS 9/X, gtk and X11. At least in gtk, Mozilla doesn't have this problem rendering the page given at the URL with the same truetype font(CODE2000). Are they handled at a higher layer before reaching to the lower level of font access?

Rui Xu

Updated

•

22 years ago

QA Contact: ruixu → ylong

Jungshik Shin

Assignee

Comment 7

•

22 years ago

changing summary line because it's not just about Hangul Vowel filler but also involves many other characters. also reassigning it to myself.

Assignee: yokoyama → jshin

Summary: U+1160(Hangul Vowel filler) is rendered as a question mark → Allowed blank(space) glyph list have to be updated

Jungshik Shin

Assignee

Comment 8

•

22 years ago

A simplstic patch for this bug is just modify the macro to check if a char. is allowed to be blank. However, as comment #5 shows, there are a little bit too many of them to use a simple macro. Would there be a better way to deal with this list (a data structure?)?

Jungshik Shin

Assignee

Comment 9

•

22 years ago

Adding shanjian to CC to seek his opinion on the best way to represent the list of blank characters as he was the last one to change the line in question :-)

Jungshik Shin

Assignee

Comment 10

•

22 years ago

Attached patch a new patch using CCMap (with a more extensive list of blank chars) (deleted) — Details — Splinter Review

I ended up using CCMap. This may or may not be excessive for this simple task. It seems to be all right considering that the map is created only once per session at the beginning and CCMap accessor macro is fast. Shanjian, can you review?

Jungshik Shin

Assignee

Comment 11

•

22 years ago

A couple of issues to resolve: - find out which characters currently in the list are reliably filtered out (possibly in a platform-independent way) upstream and remove them from the list. It seems like what chars are filtered out is not platform-independent (e.g. nsFontMetricsWin does not get U+115f from upstream, while nsFontMetricsXft gets it unfiltered. I can't check how this is handled in Mac) - think about a need to make the list user-configurable (in prefs.js). Some fonts have _legitimate_ blanks glyphs in code points in PUA. Obviously, this cannot be hard-coded. With CCMap, it's easy to make this user-configurable.

Shanjian Li

Comment 12

•

22 years ago

jshin, Thanks a lot for doing this. Using CCmap is the right approach. This has been in my mind for quite some time and I haven't found the time to do it. I have a suggestion. Can you write a perl tools to generated the CCMAP in binary form instead of generate it in run time? That will shrink the memory footprint and improve starttime performance. We will need to apply similar approach in several other places. (Punctuation mark check in layout is one example.)

Jungshik Shin

Assignee

Comment 13

•

22 years ago

Attached file ccmapbin.pl (deleted) — Details

Shanjian, Attached is a simple perl tool to generate PRUint16 array for CCMap. Actually, it generates three files for LE/BE(16bit), BE(32bit) and BE(64bit). I tested the result (with a simple test program modified from printCCMap() in nsCompressedCharMap.cpp) on ix86 (32bit LE), Alpha(64bit LE), Sparc(32bit BE), and PA-Risc(32bit BE) and it worked fine. I couldn't find a 64bit BE machine (PR-Risc machine I used is 64bit but its long is only 32bit..), but I believe it should work well there, too. Can you tell me where else we need this (nsFontMetricsGTK.cpp is one of them)? Perhaps, I'll file a new bug (to put 'precompiled CCMap' in place of character list) and make this bug dependent on it. BTW, currently, it just works on BMP, but can be extended easily.

Shanjian Li

Comment 14

•

22 years ago

Thanks for your greak work!!! punctuation checking in nsTextFrame is a sure thing: http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsTextFrame.cpp#4645 CJK and hangul check in linebreaker is questionable, http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp I am sure that we will need this in some other places now and in future.

Jungshik Shin

Assignee

Comment 15

•

22 years ago

Shanjian, Thank you for your kind words. I filed a new bug 180266 for this and am going to make this bug depend on it. I didn't have to, but it seems like it's more 'conceptually' clear... I added Shanjian to CC of bug 180266 and anyone here is welcome to add her/himself to CC list there. > I am sure that we will need this in some other places now and in future. So am I :-) Especially, I guess Mozilla may need to look up Unicode char. class table in several places (line breaking, rendering/layout - bidi, ... text boundary identification, editing - search and replace, etc) Most TRs at http://www.unicode.org/reports appear relevant to adopting this approach in Mozilla in one way or another (UTR 14, UTR 29, UTR 9, UTR 13 to name just a few)...

Depends on: 180266

Jungshik Shin

Assignee

Comment 16

•

22 years ago

180266 was just resolved and accordingly this is, too.

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.