Closed Bug 449292 Opened 16 years ago Closed 15 years ago

Opentype font shaping using Harfbuzz

Categories

(Core :: Graphics, enhancement)

x86
Linux
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b1

People

(Reporter: mreddy, Assigned: jfkthame)

References

Details

Attachments

(15 files, 41 obsolete files)

(deleted), patch
gerv
: review+
Details | Diff | Splinter Review
(deleted), image/png
Details
(deleted), image/png
Details
(deleted), patch
gerv
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
jtd
: review+
smontagu
: review+
Details | Diff | Splinter Review
(deleted), patch
jtd
: review+
smontagu
: review+
Details | Diff | Splinter Review
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), patch
jtd
: review+
Details | Diff | Splinter Review
(deleted), patch
jtd
: review+
Details | Diff | Splinter Review
(deleted), patch
jtd
: review+
Details | Diff | Splinter Review
(deleted), patch
jtd
: review+
Details | Diff | Splinter Review
(deleted), patch
jfkthame
: review+
Details | Diff | Splinter Review
(deleted), patch
jtd
: review+
Details | Diff | Splinter Review
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 Build Identifier: We have got Freetype2 font backend working for GTK2 (code derived from QT backend). Next step is to add Opentype shaping using Harfbuzz and the language/script specific shapers. Stuart suggested to file a bug for this and get the discussion going to find the best way to achieve this. Reproducible: Always Steps to Reproduce: 1. 2. 3.
Status: UNCONFIRMED → NEW
Component: GFX: Gtk → GFX: Thebes
Ever confirmed: true
The code they are usually is basically the same as the QT freetype code. It needs a way to support shaping in some fashion, be it pango or harfbuzz.
Bug 449356 will provide code for using Pango shapers.
Ideally we would want to get rid of Pango usage completely.
(In reply to comment #4) > Ideally we would want to get rid of Pango usage completely. If you are using GTK+, then aren't you using Pango anyway?
So the issue here is that we want to move everything to using freetype directly, and using an external library for shaping as necessary. The problem is that Harfbuzz, as is, does not handle opentype sufficiently by itself, and instead relies on pango to do part of that work. There is a project to push all the necessary bits into harfbuzz proper and expose harfbuzz as a separate library, but that is still a few months off from completion. An alternate approach is to hook into pango's shapers directly; that is, use the freetype font back end, but for shaping call into pango's back-end. This should hopefully avoid much of the memory consumption problems, but still provide full shaping even before harfbuzz is ready by itself. I think this is going to be the short-term plan -- so the feature availability will look something like: - Full Pango (now) - Freetype only, no shaping (now-ish) - Freetype + Pango shaping (soon) - Freetype + Harfbuzz shaping (later) I don't think we can jump straight to the last step there as there is a bunch of work that still needs to be done in Harfbuzz itself.
(In reply to comment #6) > all the necessary bits into harfbuzz proper and expose harfbuzz as a separate > library, but that is still a few months off from completion. Great to hear that. > > - Full Pango (now) > - Freetype only, no shaping (now-ish) > - Freetype + Pango shaping (soon) > - Freetype + Harfbuzz shaping (later) This roadmap makes sense. For our short term goals we could live with Freetype + Pango Shaping and wait for Harfbuzz to be ready.
Depends on: 502906
Attached patch harfbuzz shaping code (obsolete) (deleted) — Splinter Review
This is the harfbuzz code from Behdad plus my additions that I'm currently using for testing. (There will be updates before we're ready to land this.)
Attached patch gfx interface to harfbuzz shaping library (obsolete) (deleted) — Splinter Review
Thebes font shaping back-end using harfbuzz. This is currently hooked in to the Mac platform only, others will follow.
(In reply to comment #9) > Created an attachment (id=395653) [details] > gfx interface to harfbuzz shaping library > > Thebes font shaping back-end using harfbuzz. This is currently hooked in to the > Mac platform only, others will follow. This appears to be out-of-sync, it no longer applies cleanly. The MacOSFontEntry::CreateFont method no longer exists. I applied the patches in the order of attachment.
Did you apply the patches from bug 502906 (which this depends on) first? Sorry, I should have noted that. It's still possible that they're out-of-sync, though, as the whole stack of patches is in flux - e.g, the CreateFont method has to be renamed because it conflicts on Windows.
QA Contact: gtk → thebes
Blocks: 24139
Attached patch part 1 - harfbuzz library code from upstream (obsolete) (deleted) — Splinter Review
Assignee: nobody → jfkthame
Attachment #395651 - Attachment is obsolete: true
Attachment #395653 - Attachment is obsolete: true
Attachment #433560 - Flags: review?(jdaggett)
Attachment #433560 - Attachment description: harfbuzz library code from upstream → part 1 - harfbuzz library code from upstream
Attached patch part 3 - harfbuzz shaper backend for thebes (obsolete) (deleted) — Splinter Review
Attachment #433563 - Flags: review?(jdaggett)
Ted, are you OK to review these configure/make changes for me? If not, can you suggest who else should? - thanks. Obviously this is dependent on Part 1, which imports a copy of the harfbuzz source directory.
Attachment #433561 - Attachment is obsolete: true
Attachment #433715 - Flags: review?
Attachment #433715 - Flags: review? → review?(ted.mielczarek)
Attached patch part 3, v2 - harfbuzz shaper backend for thebes (obsolete) (deleted) — Splinter Review
Tidied up table caching a bit. (Note that this is just an initial patch to get a default harfbuzz shaper in place; follow-on work will be needed to hook up OpenType language and feature support, etc.)
Attachment #433563 - Attachment is obsolete: true
Attachment #433716 - Flags: review?(jdaggett)
Attachment #433563 - Flags: review?(jdaggett)
Comment on attachment 433716 [details] [diff] [review] part 3, v2 - harfbuzz shaper backend for thebes Cancelled review request on part 3 as I'm reworking significantly for performance.
Attachment #433716 - Flags: review?(jdaggett)
Harfbuzz's license will need to be added to about:license.
Attached patch part 1.1 - fix leak in harfbuzz blobs (obsolete) (deleted) — Splinter Review
This is to fix a leak in the harfbuzz code from upstream, until such time as the patch gets incorporated there and we refresh the code. (Reported on harfbuzz mailing list, 2010-03-20.)
Attachment #434053 - Flags: review?(jdaggett)
Attached patch part 1.2 - add harfbuzz to about:license (obsolete) (deleted) — Splinter Review
(In reply to comment #18) > Harfbuzz's license will need to be added to about:license. The actual copyright line varies among the files, so I wasn't sure how we prefer to handle that.... something like this?
Attachment #434054 - Flags: review?(reed)
Attached patch part 3, v3 - harfbuzz shaper backend for thebes (obsolete) (deleted) — Splinter Review
This seems to work pretty well, as far as it goes - i.e, no bidi or complex-script support, so it fails a bunch of reftests (e.g. bidi) and mochitests (e.g. Thai editing); and feature control is not yet hooked up. But it passes as well as expected on tryserver within those limits, on both Mac and Windows (it doesn't touch the Linux code). So I'd like to try and get it landed as a first step, so that we can check that it doesn't break the existing code (it's preffed off by default); maybe flip the pref on trunk for a couple of test cycles to get some talos results there; and so that it doesn't keep bitrotting in my queue. :) I'm not filing separate bugs on the specific test failures when the pref is enabled, as I currently consider them to still be part of this bug. If there are remaining issues after we consider this to be largely complete, we can file them at that stage. Further patches will follow to improve script support and to provide feature control....
Attachment #433716 - Attachment is obsolete: true
Attachment #434064 - Flags: review?(jdaggett)
This will get fixed upstream in due course, but meanwhile adding the new constants here allows us to use Unicode 5.2 data files in our script run itemizer. (Not that we actually have any specific support for the new scripts!)
Attachment #434257 - Flags: review?(jdaggett)
Attached patch part 3, v4 - harfbuzz shaper backend for thebes (obsolete) (deleted) — Splinter Review
Functionally unchanged from the previous version - just some minor optimizations, and removed unused Unicode properties to reduce data table size.
Attachment #434064 - Attachment is obsolete: true
Attachment #434261 - Flags: review?(jdaggett)
Attachment #434064 - Flags: review?(jdaggett)
Comment on attachment 433715 [details] [diff] [review] part 2, v2 - configure/makefile changes for harfbuzz >bug 449292 - part 2 - add harfbuzz library to the gfx build process. > >diff --git a/config/autoconf.mk.in b/config/autoconf.mk.in >--- a/config/autoconf.mk.in >+++ b/config/autoconf.mk.in >@@ -450,16 +450,18 @@ PNG_REQUIRES = > else > PNG_CFLAGS = @MOZ_PNG_CFLAGS@ > PNG_LIBS = @MOZ_PNG_LIBS@ > PNG_REQUIRES = png > endif > > QCMS_LIBS = @QCMS_LIBS@ > >+MOZ_HARFBUZZ_LIBS = @MOZ_HARFBUZZ_LIBS@ >+ > MOZ_NATIVE_SQLITE = @MOZ_NATIVE_SQLITE@ > SQLITE_CFLAGS = @SQLITE_CFLAGS@ > SQLITE_LIBS = @SQLITE_LIBS@ > > NSPR_CONFIG = @NSPR_CONFIG@ > NSPR_CFLAGS = @NSPR_CFLAGS@ > NSPR_LIBS = @NSPR_LIBS@ > >diff --git a/configure.in b/configure.in >--- a/configure.in >+++ b/configure.in >@@ -7970,16 +7970,21 @@ AC_SUBST(MOZ_CAIRO_CFLAGS) > AC_SUBST(MOZ_CAIRO_LIBS) > > dnl qcms > dnl ======================================================== > > QCMS_LIBS='$(DEPTH)/gfx/qcms/$(LIB_PREFIX)mozqcms.$(LIB_SUFFIX)' > AC_SUBST(QCMS_LIBS) > >+dnl HarfBuzz >+dnl ======================================================== I think you're missing a paired line of === above the title line. (qcms is too, can you add one above that as well?) >diff --git a/gfx/harfbuzz/Makefile.in b/gfx/harfbuzz/Makefile.in >new file mode 100644 >--- /dev/null >+++ b/gfx/harfbuzz/Makefile.in >+DIRS = src Don't do this. Makefiles with nothing but DIRS in them are pointless and add extra time to our build with no gain. Either add harfbuzz/src directly to DIRS in gfx/Makefile, or use VPATH = src in this Makefile, and include everything from the Makefile below in this Makefile directly, and leave that Makefile out. >diff --git a/gfx/harfbuzz/src/Makefile.in b/gfx/harfbuzz/src/Makefile.in >new file mode 100644 >--- /dev/null >+++ b/gfx/harfbuzz/src/Makefile.in >@@ -0,0 +1,101 @@ >+ifdef GNU_CC >+MODULE_OPTIMIZE_FLAGS = -O2 >+else >+ifeq ($(OS_ARCH),SunOS) >+MODULE_OPTIMIZE_FLAGS = -xO5 >+endif >+ifeq ($(OS_ARCH),WINNT) >+# FIXME: bug 413019 I think you should leave this out unless proven otherwise. (In fact, that bug calls for reversing this change in the cairo makefile!) >+CSRCS = \ >+ hb-blob.c \ Please use two-space indents when adding new line-continuations to Makefiles. (Only use tabs in Makefiles where absolutely required by Makefile syntax.) >+# not used in moz build: >+# hb-ft.c >+# hb-glib.c Any particular reason to call these out here? >+EXPORTS_NAMESPACES = harfbuzz >+ >+EXPORTS_harfbuzz = hb.h hb-blob.h hb-buffer.h hb-common.h hb-font.h hb-language.h hb-shape.h hb-unicode.h Does this fit in 80 chars? I can't tell? r- for the src/Makefile thing.
Attachment #433715 - Flags: review?(ted.mielczarek) → review-
(In reply to comment #24) > Don't do this. Makefiles with nothing but DIRS in them are pointless and add > extra time to our build with no gain. Either add harfbuzz/src directly to DIRS > in gfx/Makefile, or use VPATH = src in this Makefile, and include everything > from the Makefile below in this Makefile directly, and leave that Makefile out. I've gone with the first option here, as it seemed tidiest for now. > >+# not used in moz build: > >+# hb-ft.c > >+# hb-glib.c > > Any particular reason to call these out here? Only that I thought it might be a helpful reminder when updating harfbuzz sources in the future (and likely modifying the list of files), to keep track of which files from the upstream code drop are NOT being included in our build. If you don't mind too much, I'll leave it in there.
Attachment #433715 - Attachment is obsolete: true
Attachment #434511 - Flags: review?(ted.mielczarek)
Comment on attachment 434054 [details] [diff] [review] part 1.2 - add harfbuzz to about:license Sorry, I don't have time to go through all the harfbuzz files in order to give this the review it deserves, so deferring to gerv. Quick comments, however... >+ <p class="correctme">This license, with various combinations >+ of dates and copyright holders, applies to most files in the >+ directory <span class="path">gfx/harfbuzz/</span>.</p> "most files"? What about the other files? Also, comma usage in this paragraph seems a bit off... >+Copyright (C) 2009 Keith Stribley <devel@thanlwinsoft.org> The e-mail address seems weird here...
Attachment #434054 - Flags: review?(reed) → review?(gerv)
Comment on attachment 434511 [details] [diff] [review] part 2, v3 - configure/makefile changes for harfbuzz - address review comments Looks good. Ok, seems fine to leave the list of unused sources in there.
Attachment #434511 - Flags: review?(ted.mielczarek) → review+
(In reply to comment #26) > (From update of attachment 434054 [details] [diff] [review]) > Sorry, I don't have time to go through all the harfbuzz files in order to give > this the review it deserves, so deferring to gerv. Ok, thanks. > >+ <p class="correctme">This license, with various combinations > >+ of dates and copyright holders, applies to most files in the > >+ directory <span class="path">gfx/harfbuzz/</span>.</p> > > "most files"? What about the other files? I'm adding a moz-specific Makefile.in, though I suppose we could put the harfbuzz license on that as well if you prefer, rather than the standard tri-license statement - I don't think it matters. > >+Copyright (C) 2009 Keith Stribley <devel@thanlwinsoft.org> > > The e-mail address seems weird here... It was correct as of late 2009, at least - I've seen him use it.
Comment on attachment 434054 [details] [diff] [review] part 1.2 - add harfbuzz to about:license jfkthame: if it actually works like that, can you please lay it out like Cairo Component Licences: about:licence#cairo ? As reed says, what do you mean by "most"? That there are also some tri-licensed files in there? Or PD? Which files? As for the email address, if he puts that in his copyright line, keep it :-) Thanks. Gerv
Attachment #434054 - Flags: review?(gerv) → review-
Reformatted the harfbuzz section following the cairo model. The only exception to the license within /gfx/harfbuzz was the Makefile.in that I added to build harfbuzz as part of gecko; I'd put a standard tri-license statement in that file. But I'd suggest just switching it to use the same simple license as the actual harfbuzz code; then we can omit the potentially confusing "some of the files" phrase. If you're in agreement with this (I don't think it really matters which of the licenses we use there for our added Makefile), I'll modify it in the "part 2" patch before any of this lands.
Attachment #434054 - Attachment is obsolete: true
Attachment #435311 - Flags: review?(gerv)
Comment on attachment 435311 [details] [diff] [review] part 1.2, v2 - add harfbuzz to about:license - address review comments r=gerv. Gerv
Attachment #435311 - Flags: review?(gerv) → review+
This takes the script-run itemization originally introduced in the harfbuzz shaper and moves it to the font group, prior to the font-matching process. This means we will find script runs before calling any shaper, rather than relying on the shapers to do it internally (although for the platform shapers, we don't really control how they handle it). There are two benefits to doing this. The immediate reason for it here (in the harfbuzz context) is that it will allow us to decide which shaper to use on a per-script basis, rather than per-font. This is important because many fonts cover more than one script, so we cannot simply decide at font instantiation time whether to use harfbuzz or platform shaping. The second benefit, though, is a long-term one even if/when we get to the stage of using harfbuzz for all scripts: this will allow script to be a primary input to the font-matching process, in place of our (obsolete) language-group concept that was based on legacy charsets. Font preferences can then be configured in terms of script (with language/region as a secondary key where needed), which makes much more sense than the existing ad-hoc collection of labels. Although script-run itemizing is pretty fast, I was concerned this could have a negative performance impact, as it's an extra step we're doing before passing text runs to the platform. But tryserver results look good, with no sign of a regression. It's possible that breaking the text into script runs earlier actually helps platform shaping APIs to use faster paths and thus compensate for the extra work we're doing.
Attachment #435739 - Flags: review?(jdaggett)
This revises the harfbuzz preference to be an integer instead of a boolean, allowing multiple levels of use (primarily for testing purposes, while harfbuzz script support is incomplete). With gfx.font_rendering.harfbuzz.level set to 0, the platform shaper will always be used; at level 1, harfbuzz will be used for "simple" scripts (Latin, Cyrillic, CJK) that do not require specific complex shaping; and at level 2, it will be used for all scripts (even though the results will currently be broken). This patch also makes the preference setting "live", so it can be changed at any time and text will immediately be re-rendered accordingly.
Attachment #435741 - Flags: review?(jdaggett)
Attached patch part 3, v5 - harfbuzz shaper backend for thebes (obsolete) (deleted) — Splinter Review
Minor update to the harfbuzz integration to support basic RTL directionality (no actual Arabic shaping in the harfbuzz library, though).
Attachment #434261 - Attachment is obsolete: true
Attachment #436338 - Flags: review?(jdaggett)
Attachment #434261 - Flags: review?(jdaggett)
Rebased on top of updated Part 3 patch.
Attachment #435739 - Attachment is obsolete: true
Attachment #436342 - Flags: review?(jdaggett)
Attachment #435739 - Flags: review?(jdaggett)
(In reply to comment #34) > Created an attachment (id=436338) [details] > part 3, v5 - harfbuzz shaper backend for thebes > > Minor update to the harfbuzz integration to support basic RTL directionality > (no actual Arabic shaping in the harfbuzz library, though). pairedChars isn't includes Full-width pair mapped into U+FF00-U+FFFF. Why? If it is unnecessary to add it (due to normalize?) to array, you should add comment. Also, this seems that this code doesn't consider IVS (http://unicode.org/reports/tr37/ or bug 552460). If so, you should add comment or support it. And, my concern is about PUA. In Japanese, some people uses PUA to use specific glyph. It has to use gfxUniscribeShaper or gfxGDIShaper.
Re paired characters: Last I looked at this, gfxTextRunWordCache was creating text runs from space-separated lists of unrelated "words". Characters in "words" included things like half of paired characters. So no interpretations could be made from context beyond the nearest space. Bug 416725 comment 0 and Bug 416475 comment 5 are related.
Blocks: 543353
(In reply to comment #36) > pairedChars isn't includes Full-width pair mapped into U+FF00-U+FFFF. Why? Only because they weren't included in the ICU code that was used as a model. I think we should add these for consistency. > Also, this seems that this code doesn't consider IVS > (http://unicode.org/reports/tr37/ or bug 552460). If so, you should add > comment or support it. Right, variation selectors are not yet supported by our cmap-interpreting code. There is a TODO comment somewhere in the code, IIRC.... ah yes, see gfxHarfBuzzShaper::GetGlyph(). > And, my concern is about PUA. In Japanese, some people uses PUA to use > specific glyph. It has to use gfxUniscribeShaper or gfxGDIShaper. Why? PUA characters should work with the harfbuzz shaper just as well as any other, I expect.
(In reply to comment #37) > Re paired characters: > > Last I looked at this, gfxTextRunWordCache was creating text runs from > space-separated lists of unrelated "words". Characters in "words" included > things like half of paired characters. So no interpretations could be made > from context beyond the nearest space. Bug 416725 comment 0 and Bug 416475 > comment 5 are related. You're right; the use of paired-character information has to be limited to the current "word", as we cannot rely on wider context being correct or consistent.
When you update paired-chars, please leave a comment at the following bug so I update Pango also: https://bugzilla.gnome.org/show_bug.cgi?id=615192
Attached image harfbuzz memory usage vs. coretext (deleted) —
Ran through some simple load tests to evaluate memory usage, looks like the with harfbuzz in use memory usage almost doubles. Simple load tests (run on OSX 10.5): 1. Create two new profiles, and set gfx.font_rendering.harfbuzz.level to 1 for one of the profiles 2. In the Startup pref panel set "When Firefox starts" to "Show my windows and tabs from last time" 3. Create 4 tabs and load them with the URLs below, then select the first tab: http://people.mozilla.org/~jdaggett/memtesting/ http://people.mozilla.org/~jdaggett/memtesting/iterateblogs.html#5643 http://people.mozilla.org/~jdaggett/memtesting/iteratebuildlogs.html#12153 http://people.mozilla.org/~jdaggett/memtesting/iteratelongtext.html#7677 4. Open a terminal window and enter: for ((a=0;a<100000;a++)) do ps -o etime,rss,command | grep firefox | grep -v grep; sleep 5; done This will keep a running tab on the resident set size of the firefox process. The number will bounce up and down but generally stabilize after 30 minutes. With CoreText shaping, memory usage averages 220MB, with Harfbuzz around 420MB for these tests. Chart with data: https://spreadsheets.google.com/ccc?key=0ArCKGq7OfNmMdG45elVjSFlDV3J0NzVLX3ZuYXF1V1E&hl=en
HarfBuzz itself doesn't allocate much memory or cache anything. The only exception is when a font file has errors that HarfBuzz needs to fix. In that situation it makes a copy. Would be good to see where all that memory is going.
I'm pretty confident it represents font tables that gfx is caching (for performance). The code caches tables in the gfxFontEntry, and will only release them when all gfxFont instances that were using them are destroyed. And because we cache gfxFonts, and expire them from the cache on a timer, a test that cycles through lots of pages can end up keeping large numbers of fonts alive in the cache. On some platforms, we can improve this by getting direct access to font data in the system's caches, or memory-mapping the font file, but not all platform APIs support these things. So the current code provides a "generic" implementation that caches its own copies of the tables; I'm working on followups that should improve the footprint.
This provides additional character properties to support the harfbuzz Unicode callbacks. In particular, it is needed for bidi mirroring to work (in the absence of 'rtlm' support in fonts).
Attachment #438729 - Flags: review?(jdaggett)
Although ATS does not offer any API to access font table data from an ATSFontRef without copying, there is a Core Text API that claims to do so. Therefore, by instantiating the CTFont in gfxMacFont (rather than only within gfxCoreTextShaper) and overriding the table access methods there, we should be able to reduce the OS X memory footprint by bypassing the use of the private table cache on gfxFontEntry. (A similar technique will be possible on Windows when using DirectWrite font APIs, but not when using GDI, afaict.)
Attachment #438730 - Flags: review?(jdaggett)
I have not been able to reproduce the memory usage results described in comment #41. In particular, running the test pages listed there, I see the numbers continuing to climb (gradually, erratically), regardless of the shaping back-end chosen. This also happens with a shipping 3.6.3 build; e.g. after 20 minutes, it was in the region of 500MB.
(In reply to comment #46) > I have not been able to reproduce the memory usage results described in comment > #41. In particular, running the test pages listed there, I see the numbers > continuing to climb (gradually, erratically), regardless of the shaping > back-end chosen. This also happens with a shipping 3.6.3 build; e.g. after 20 > minutes, it was in the region of 500MB. Yeah, looks like there's some sort of leak running trunk code under 10.6, this is completely unrelated to the patches here. Logged as bug 558958. If you look at the chart attached there you can see the difference between Harfbuzz/CoreText even on 10.6 but it's quickly swamped by the leak. Under 10.5 you'll see the behavior (i.e. the Harfbuzz shaper code taking up a lot more memory).
Testing on 10.5 indicates that the patch in part 7, which overrides the font table accessors for gfxMacFont, resolves the RAM footprint issue.
What about on Windows, what's the Harfbuzz memory overhead there?
Attached patch consolidated patch, parts 1-5 (obsolete) (deleted) — Splinter Review
To make things easier for others to review the attached patches, attaching a consolidated patch for the current versions of parts 1-5.
I think the integration work looks good so far but I have some real issues with the Harfbuzz patch itself. I realize this is upstream code that's subject to the constraints of other projects but I'm concerned about the structure of the current code. Given the relative complexity of OpenType layout and the structure of the GSUB/GPOS/GDEF tables, OpenType shaping code is bound to be somewhat complex. But I think the structure of Harfbuzz code currently is much, much too complex. One part of this complexity is the mishmash of language conventions used, the API is C but the code is limited subset of C++. On the mailing list, there seems to be confusion about this: http://lists.freedesktop.org/archives/harfbuzz/2009-November/000460.html > On Fri Nov 6 07:14:27 PST 2009, Adam Langley agl at google.com wrote: > On Thu, Nov 5, 2009 at 11:46 PM, Martin Hosken <martin_hosken at sil.org> wrote: >> I'm about to try adding graphite support as a contrib to >> harfbuzz-ng. Graphite itself is in C++ and so the linking code is >> going to have to bridge between C and C++. But I notice that you go >> to great lengths to ensure that harfbuzz doesn't link to libstdc++. >> I am wondering why this is the case. > > Harfbuzz is a C library. It would be very sad if it needed to pull in > libstdc++. That would pollute Pango, and then the rest of the > GNOME/GTK world. > > AGL Despite this, some of the code in Harfbuzz is clearly using C++ but in a way so as to avoid pulling in libstd++. There's a big mix of macro-ized code, template use and classes-as-structs coding. The code for sanitizing OpenType tables is a good example of this (hb-open-type-private.hh). It uses a combination of templatized structs and macros to implement a generalized way of assuring that font table data is copasetic. This is code required to prevent things like potential buffer overflow bugs. But the way it's written it's difficult to confirm easily what is being checked and how well. There are also parts of the API that involve resource management that I think really don't belong there. For example, font tables are handled as arbitrary binary objects, hb_blob's: struct _hb_blob_t { hb_reference_count_t ref_count; unsigned int length; hb_mutex_t lock; /* the rest are protected by lock */ unsigned int lock_count; hb_memory_mode_t mode; const char *data; hb_destroy_func_t destroy; void *user_data; }; There's a ref count, a lock and a memory mode. This seems like complete overkill for a shaper. Part of the reason for the lock and memory mode is the way font tables are sanitized, the Harfbuzz code tries to clean up font tables directly, either on the original table data or by making a copy of the data. The table sanitizing code is currently embedded within the layout object: void _hb_ot_layout_init (hb_face_t *face) { hb_ot_layout_t *layout = &face->ot_layout; memset (layout, 0, sizeof (*layout)); layout->gdef_blob = Sanitizer<GDEF>::sanitize (hb_face_get_table (face, HB_OT_TAG_GDEF)); layout->gdef = &Sanitizer<GDEF>::lock_instance (layout->gdef_blob); layout->gsub_blob = Sanitizer<GSUB>::sanitize (hb_face_get_table (face, HB_OT_TAG_GSUB)); layout->gsub = &Sanitizer<GSUB>::lock_instance (layout->gsub_blob); layout->gpos_blob = Sanitizer<GPOS>::sanitize (hb_face_get_table (face, HB_OT_TAG_GPOS)); layout->gpos = &Sanitizer<GPOS>::lock_instance (layout->gpos_blob); } Font table sanitizing really belongs in a separate font management layer that Harfbuzz accesses. In some cases sanitizing may be expensive enough that it would make sense to cache the sanitized results. In other environments, a closed embedded environment with a fixed set of fonts for example, it would make sense to sanitize the fonts once at build time and not do the sanitizing at runtime. The sanitizing process may be part of a larger process that sanitizes all tables before use. Chrome for example completely reconstructs downloaded font data, so doing it again as part of the shaper would be redundant. I also think the degree of caching will depend a lot on access patterns and the size and complexity of the data in a given table. For small, simple tables, simply caching the table data may make the most sense. But for tables like the GSUB/GPOS tables with all their nested offsets, some other form of data structure could easily make better sense in a given environment. There are some other design choices that don't make sense to me, the code appears to be grouping feature tags with script and language tags and making them all strings, even though all of these tags are by definition four-byte tags in OpenType: typedef struct _hb_feature_t { const char *name; const char *value; unsigned int start; unsigned int end; } hb_feature_t; There was discussion on the list about this [2]. From the design doc description [1] this appears to be an attempt at future-proofing but the library only attempts to implement shaping of OpenType, I don't think it makes sense to design for things that don't exist. This leads to wasted cycles going between tags and strings. Summary of my suggestions for Harfbuzz: 1. Remove resource mgmt APIs, replace with callbacks for accessing font data. No locking/refcounting/mem swizzling internally. 2. Make the API and implemenentation C code if C-only code is a requirement, or normal C++ if it's not 3. Simplify font table sanitizing code and move it to a utility library [1] Behdad's design overview of Harfbuzz: http://lists.freedesktop.org/archives/harfbuzz/2009-August/000359.html [2] Feature/Script/Language tags discussion http://lists.freedesktop.org/archives/harfbuzz/2009-December/000495.html
Attachment #433560 - Flags: review?(jdaggett) → review-
Attachment #434053 - Flags: review?(jdaggett) → review+
Attachment #434257 - Flags: review?(jdaggett) → review+
(In reply to comment #49) > What about on Windows, what's the Harfbuzz memory overhead there? Rather surprisingly, I'm not seeing a large overhead. Running John's page-cycling tests with GDI, Task Manager is showing memory usage fluctuating between 80-200MB (I'm assuming the peaks come from image-heavy pages). Turning on Harfbuzz doesn't seem to make a huge difference; it drops below 90MB range at times, and occasionally peaks as high as 220MB. I'd guess the average may be 20-30MB higher - say 150MB compared to 120 - but the fluctuation during the run is large and rapid enough to make this difference hard to assess.
This chart compares memory usage (as reported by pslist from Microsoft sysinternals tools) for John's page-cycling tests, with harfbuzz enabled vs using the existing uniscribe/gdi path. RAM usage was sampled every second for 10 minutes; the chart shows a 10-second moving average, to somewhat smooth out the rapid fluctuations and make overall comparison easier. Raw data is at https://spreadsheets.google.com/ccc?key=0Ao272K9CbMGldC1fbVZOWm85Njk2TUpzejg5bS13SFE&hl=en_GB
Hi John, Thanks for the feedback. I really appreciate. It would have been more appropriate on the harfbuzz list though. I'll reply briefly below. I think it's more appropriate to move the discussion to the list however. (In reply to comment #51) > I think the integration work looks good so far but I have some real > issues with the Harfbuzz patch itself. I realize this is upstream > code that's subject to the constraints of other projects but I'm > concerned about the structure of the current code. Given the relative > complexity of OpenType layout and the structure of the GSUB/GPOS/GDEF > tables, OpenType shaping code is bound to be somewhat complex. But I > think the structure of Harfbuzz code currently is much, much too > complex. It's definitely more complex than the code it's replacing (which was plain C). However, it's very carefully designed, easier to audit, and I'm much more confident in it than in the code it replaces. > One part of this complexity is the mishmash of language conventions > used, the API is C but the code is limited subset of C++. On the What's so complex about it? The API is C. Anything else is implementation details. The previous HarfBuzz also used C++ for some of the implementation. The way the hb-layout code uses C++ *is* very complex though, I admit, but not complicated. [...] > Despite this, some of the code in Harfbuzz is clearly using C++ but in > a way so as to avoid pulling in libstd++. There's a big mix of > macro-ized code, template use and classes-as-structs coding. Suggestions for improvements are welcome. The code works with gcc and MSVC, and have been tested to build fine on ARM (and has compile-time safeguards), so I don't see what's wrong with complexity if it works great. That said, there *is* an outstanding mystery bug with the Apple toolchain. We have not been able to nail that one down yet. > The code for sanitizing OpenType tables is a good example of this > (hb-open-type-private.hh). It uses a combination of templatized > structs and macros to implement a generalized way of assuring that > font table data is copasetic. This is code required to prevent things > like potential buffer overflow bugs. But the way it's written it's > difficult to confirm easily what is being checked and how well. It's difficult for you right now maybe. I completely agree that it's underdocumented. I do plan on documenting it all though. Couple points re sanitize() though: it's not aiming for copacetic tables at all. It's goal is easy to define, and easy to confirm, much easier than you would imagine in fact: - sanitize() ensures that it's safe to all the other methods on the struct. With that in mind, and 10 minutes studying of the macros used, you can confirm the sanitize code for all the 100 different structs in an hour or two. I actually would appreciate someone doing that. > There are also parts of the API that involve resource management that > I think really don't belong there. For example, font tables are > handled as arbitrary binary objects, hb_blob's: At the outset they are binary objects. What's wrong with that? > struct _hb_blob_t { > hb_reference_count_t ref_count; > > unsigned int length; > > hb_mutex_t lock; > /* the rest are protected by lock */ > > unsigned int lock_count; > hb_memory_mode_t mode; > > const char *data; > > hb_destroy_func_t destroy; > void *user_data; > }; > > There's a ref count, a lock and a memory mode. This seems like > complete overkill for a shaper. Part of the reason for the lock and > memory mode is the way font tables are sanitized, the Harfbuzz code > tries to clean up font tables directly, either on the original table > data or by making a copy of the data. This code may very well be suboptimal. However, having had to maintain code dealing with libraries like FreeType, Fontconfig, and Pango that lack ref counting and / or thready-safety, my goal for HarfBuzz has been getting the API right and ensure correctness initially. That code works, and not in an awfully inefficient way. Suggestions for improvements are welcome, but I don't think it's even worth it given the usage pattern. > The table sanitizing code is currently embedded within the layout object: I plan to make that be delayed for a bit. See below. > Font table sanitizing really belongs in a separate font management > layer that Harfbuzz accesses. I've considered that before. The problem however is: there are numerous fonts out there that fail the literal word of the spec in one way or another. Using an external sanitizer will result in rejecting the layout tables in that fonts. The aim of the harfbuzz sanitizer is to clean that up just enough such that we can use the correct bits of the tables. It must closely match the code using the tables and that's why it's tied closely to the layout code (instead of being put in a separate module logically). > In some cases sanitizing may be > expensive enough that it would make sense to cache the sanitized > results. In other environments, a closed embedded environment with a > fixed set of fonts for example, it would make sense to sanitize the > fonts once at build time and not do the sanitizing at runtime. Indeed. In the future, I'll add API and cmdline tools to sanitize fonts. On Linux for example we'll cache the sanitizing result in the fontconfig cache and skip sanitizing for sane fonts. Other environments will be able to do similarly. It's planned, just not there yet. > The sanitizing process may be part of a larger process that sanitizes > all tables before use. Chrome for example completely reconstructs > downloaded font data, so doing it again as part of the shaper would be > redundant. The Chrome sanitizer doesn't check the OpenType Layout tables. Given all the offsets in the GSUB/GPOS/GDEF tables, reconstructing them is a tedious task, and would be pointless when one can do with the much simpler sanitizer in harfbuzz. And lets not forget why Chrome reconstructs fonts: because the Uniscribe shaper does a bad job dealing with broken fonts. That's not a good excuse to say harfbuzz also should crash on crap font. It does belong in harfbuzz to be robust against whatever font data is thrown at it. > I also think the degree of caching will depend a lot on access > patterns and the size and complexity of the data in a given table. > For small, simple tables, simply caching the table data may make the > most sense. But for tables like the GSUB/GPOS tables with all their > nested offsets, some other form of data structure could easily make > better sense in a given environment. Not sure what kind of caching you are referring to. Harfbuzz itself doesn't do any caching. > There are some other design choices that don't make sense to me, the > code appears to be grouping feature tags with script and language > tags and making them all strings, What grouping are you referring to? Language tags are used all over the net and software as strings. It just happens that OpenType has assigned a limited set of them four-byte tags. > even though all of these tags are by definition four-byte tags in OpenType: > > typedef struct _hb_feature_t { > const char *name; > const char *value; > unsigned int start; > unsigned int end; > } hb_feature_t; > > There was discussion on the list about this [2]. From the design doc > description [1] this appears to be an attempt at future-proofing but > the library only attempts to implement shaping of OpenType, I don't > think it makes sense to design for things that don't exist. This > leads to wasted cycles going between tags and strings. Going from tags to strings and vice versa is at most a byteswap. The API though is designed to allow finer-level control over features. For example, to allow disabling GPOS kerning and enabling old-style TrueType kerning. Moreover, HarfBuzz aims to be an SFNT text shaper, not necessarily limited to OpenType. There's work on adding Graphite support already, and AAT can be added in the future. Moreover, there is need to have a string-based API. I mentioned in the thread that a separate uint32_t-based API may be added in the future. > Summary of my suggestions for Harfbuzz: > > 1. Remove resource mgmt APIs, replace with callbacks for accessing font data. > No locking/refcounting/mem swizzling internally. You seem to drastically under-estimate the headaches of lifecycle-management in a library. Mozilla also disables cairo's mutex use. So I don't think Mozilla is the perfect use case of those features. When you are dealing with FT_Face and hb_face_t objects passed around between cairo, pango, poppler, Python wrappers, etc, the resource management API is absolutely necessary IMO. > 2. Make the API and implemenentation C code if C-only code is a requirement, > or normal C++ if it's not There is no C-only requirement. The requirement is C-only API. And the C++ code is standard. Not sureewhat you mean by "normal C++". Everyone uses their own subset of C++. HarfBuzz is no exception. > 3. Simplify font table sanitizing code and move it to a utility library Again, the sanitize code is an absolutely necessary part of the design. Can't be moved out, and is not of much use outside harfbuzz as it doesn't try to be sanitize the tables to the word of the spec (such a sanitizer exists in FreeType, and is useless IMO other than to font designers). Cheers, behdad > [1] Behdad's design overview of Harfbuzz: > http://lists.freedesktop.org/archives/harfbuzz/2009-August/000359.html > > [2] Feature/Script/Language tags discussion > http://lists.freedesktop.org/archives/harfbuzz/2009-December/000495.html
(In reply to comment #54) Hi Behdad, Thanks for the comments. I'll cross-post my comments on the harfbuzz list. > It's definitely more complex than the code it's replacing (which was > plain C). However, it's very carefully designed, easier to audit, > and I'm much more confident in it than in the code it replaces. Complex code for dealing with complex data is one thing but the Harfbuzz code makes even simple things complicated in some cases. Take as an example some of the struct's defined in hb-open-type-private.hh, there's a whole set of *structs* defined for basic data types. There are far, far simpler ways of dealing with endian data than this. What advantage does this complexity add? Does it somehow simplify the code? ============== from hb-open-type-private.hh ============== #define _DEFINE_INT_TYPE1_UNALIGNED(NAME, TYPE, BIG_ENDIAN, BYTES) \ struct NAME \ { \ inline NAME& set (TYPE i) { (TYPE&) v = BIG_ENDIAN (i); return *this; } \ inline operator TYPE(void) const { return BIG_ENDIAN ((TYPE&) v); } \ inline bool operator== (NAME o) const { return (TYPE&) v == (TYPE&) o.v; } \ inline bool sanitize (SANITIZE_ARG_DEF) { \ TRACE_SANITIZE (); \ return SANITIZE_SELF (); \ } \ private: unsigned char v[BYTES]; \ }; \ ASSERT_SIZE (NAME, BYTES) #define DEFINE_INT_TYPE1(NAME, TYPE, BIG_ENDIAN, BYTES) \ struct NAME \ { \ inline NAME& set (TYPE i) { BIG_ENDIAN##_put_unaligned(v, i); return *this; } \ inline operator TYPE(void) const { return BIG_ENDIAN##_get_unaligned (v); } \ inline bool operator== (NAME o) const { return BIG_ENDIAN##_cmp_unaligned (v, o.v); } \ inline bool sanitize (SANITIZE_ARG_DEF) { \ TRACE_SANITIZE (); \ return SANITIZE_SELF (); \ } \ private: unsigned char v[BYTES]; \ }; \ ASSERT_SIZE (NAME, BYTES) #define DEFINE_INT_TYPE0(NAME, type, b) DEFINE_INT_TYPE1 (NAME, type##_t, hb_be_##type, b) #define DEFINE_INT_TYPE(NAME, u, w) DEFINE_INT_TYPE0 (NAME, u##int##w, (w / 8)) DEFINE_INT_TYPE (USHORT, u, 16); /* 16-bit unsigned integer. */ DEFINE_INT_TYPE (SHORT, , 16); /* 16-bit signed integer. */ DEFINE_INT_TYPE (ULONG, u, 32); /* 32-bit unsigned integer. */ DEFINE_INT_TYPE (LONG, , 32); /* 32-bit signed integer. */ /* Array of four uint8s (length = 32 bits) used to identify a script, language * system, feature, or baseline */ struct Tag : ULONG { inline Tag (const Tag &o) { *(ULONG*)this = (ULONG&) o; } inline Tag (uint32_t i) { (*(ULONG*)this).set (i); } inline Tag (const char *c) { *(ULONG*)this = *(ULONG*)c; } inline bool operator== (const char *c) const { return *(ULONG*)this == *(ULONG*)c; } /* What the char* converters return is NOT nul-terminated. Print using "%.4s" */ inline operator const char* (void) const { return CONST_CHARP(this); } inline operator char* (void) { return CHARP(this); } inline bool sanitize (SANITIZE_ARG_DEF) { TRACE_SANITIZE (); /* Note: Only accept ASCII-visible tags (mind DEL) * This is one of the few times (only time?) we check * for data integrity, as opposed o just boundary checks */ return SANITIZE_SELF () && (((uint32_t) *this) & 0x80808080) == 0; } }; ASSERT_SIZE (Tag, 4); #define _NULL_TAG_INIT {' ', ' ', ' ', ' '} DEFINE_NULL_DATA (Tag, 4, _NULL_TAG_INIT); #undef _NULL_TAG_INIT ========================================================== > > One part of this complexity is the mishmash of language > > conventions used, the API is C but the code is limited subset of > > C++. On the > > What's so complex about it? The API is C. Anything else is > implementation details. The previous HarfBuzz also used C++ for > some of the implementation. The way the hb-layout code uses C++ *is* > very complex though, I admit, but not complicated. > > > Despite this, some of the code in Harfbuzz is clearly using C++ > > but in a way so as to avoid pulling in libstd++. There's a big > > mix of macro-ized code, template use and classes-as-structs coding. > > Suggestions for improvements are welcome. The code works with gcc > and MSVC, and have been tested to build fine on ARM (and has > compile-time safeguards), so I don't see what's wrong with > complexity if it works great. That said, there *is* an outstanding > mystery bug with the Apple toolchain. We have not been able to nail > that one down yet. The point is not that it compiles correctly, it's whether the code is simple and easy to maintain. If all text in Firefox is going to pass through this code we need for it to be sufficiently simple and easy to debug by an engineer unfamiliar with the code. Using complicated nested macro-ized structs makes that job much, much harder. > > There are also parts of the API that involve resource management > > that I think really don't belong there. For example, font tables > > are handled as arbitrary binary objects, hb_blob's: > > At the outset they are binary objects. What's wrong with that? Because font tables have a much more specific usage pattern than arbitrary binary data. Why obfuscate that fact? > > Font table sanitizing really belongs in a separate font management > > layer that Harfbuzz accesses. > > I've considered that before. The problem however is: there are > numerous fonts out there that fail the literal word of the spec in > one way or another. Using an external sanitizer will result in > rejecting the layout tables in that fonts. The aim of the harfbuzz > sanitizer is to clean that up just enough such that we can use the > correct bits of the tables. It must closely match the code using > the tables and that's why it's tied closely to the layout code > (instead of being put in a separate module logically). Sorry, can't an external sanitizer do the clean up you suggest? I'm suggesting that the sanitizing happen outside the shaper code, so that apps are free to manage the sanitized data based on their needs. For example, a lot of platform fonts may already by sufficiently sanitized that this isn't needed, the code could easily cache that fact and skip the sanitizing process the next time. Or for platforms with a locked-down set of fonts that sanitizing could take place as part of the build process, no need to do it at runtime. The current code doesn't permit this, it ties the sanitization to the create of a layout object. > The Chrome sanitizer doesn't check the OpenType Layout tables. > Given all the offsets in the GSUB/GPOS/GDEF tables, reconstructing > them is a tedious task, and would be pointless when one can do with > the much simpler sanitizer in harfbuzz. Right, Chrome dumps the GSUB/GPOS/GDEF tables. Reconstructing or sanitizing are the same thing, just with different degrees of strictness. If it's tedious for these tables, all the more reason to allow the sanitized versions to be easily cached outside of Harfbuzz code. That's sort of what the hb_blob memory mode stuff is trying to allow. > In the future, I'll add API and cmdline tools to sanitize fonts. On > Linux for example we'll cache the sanitizing result in the > fontconfig cache and skip sanitizing for sane fonts. Other > environments will be able to do similarly. It's planned, just not > there yet. The fontconfig cache is a good example of a more appropriate place for resource management. > > I also think the degree of caching will depend a lot on access > > patterns and the size and complexity of the data in a given table. > > For small, simple tables, simply caching the table data may make > > the most sense. But for tables like the GSUB/GPOS tables with all > > their nested offsets, some other form of data structure could > > easily make better sense in a given environment. > > Not sure what kind of caching you are referring to. Harfbuzz itself > doesn't do any caching. That's my point, Harfbuzz doesn't allow the caching of sanitized font tables. It would be better to allow apps to do this resource management rather than Harfbuzz. App usage patterns vary. In the case of GSUB/GPOS tables, the sanitized table structure itself may not be the ideal structure to cache because untangling all the interrelated intra-table and inter-table references requires still more processing before use. > Going from tags to strings and vice versa is at most a byteswap. Below is the code in hb-ot-tag.c that converts between OpenType language tags in string form to 4-byte tag form. It's a little more than a byteswap, note the bsearch call. ============== from hb-ot-tag.c ============== hb_tag_t hb_ot_tag_from_language (hb_language_t language) { const char *lang_str; LangTag *lang_tag; if (language == NULL) return HB_OT_TAG_DEFAULT_LANGUAGE; lang_str = hb_language_to_string (language); if (0 == strcmp (lang_str, "ot:")) { char tag[4]; int i; lang_str += 3; i = 0; while (i < 4 && lang_str[i]) { tag[i] = lang_str[i]; } while (i < 4) tag[i] = ' '; return HB_TAG_STR (tag); } /* find a language matching in the first component */ lang_tag = bsearch (lang_str, ot_languages, ARRAY_LENGTH (ot_languages), sizeof (LangTag), lang_compare_first_component); /* we now need to find the best language matching */ if (lang_tag) { hb_bool_t found = FALSE; /* go to the final one matching in the first component */ while (lang_tag + 1 < ot_languages + ARRAY_LENGTH (ot_languages) && lang_compare_first_component (lang_str, lang_tag + 1) == 0) lang_tag++; /* go back, find which one matches completely */ while (lang_tag >= ot_languages && lang_compare_first_component (lang_str, lang_tag) == 0) { if (lang_matches (lang_str, lang_tag->language)) { found = TRUE; break; } lang_tag--; } if (!found) lang_tag = NULL; } if (lang_tag) return lang_tag->tag; return HB_OT_TAG_DEFAULT_LANGUAGE; } hb_language_t hb_ot_tag_to_language (hb_tag_t tag) { unsigned int i; unsigned char buf[8] = "ot:"; for (i = 0; i < ARRAY_LENGTH (ot_languages); i++) if (ot_languages[i].tag == tag) return hb_language_from_string (ot_languages[i].language); buf[3] = tag >> 24; buf[4] = (tag >> 16) & 0xFF; buf[5] = (tag >> 8) & 0xFF; buf[6] = tag & 0xFF; return hb_language_from_string ((char *) buf); } ============================================== > The API though is designed to allow finer-level control over features. > For example, to allow disabling GPOS kerning and enabling old-style > TrueType kerning. Moreover, HarfBuzz aims to be an SFNT text shaper, > not necessarily limited to OpenType. There's work on adding Graphite > support already, and AAT can be added in the future. Moreover, there > is need to have a string-based API. I mentioned in the thread that a > separate uint32_t-based API may be added in the future. For a library whose main purpose is to do OpenType layout this seems like complete overdesign. All OpenType language, script and feature tags are 4-byte tags that use only ASCII charaters in the 0x20 to 0x7F range. So there's plenty of space for Harfbuzz-only tags to support things like kern table kerning (e.g. HB_OT_TAG_TTKERN = 0x80000001). Graphite uses similar tags. If strings are required for AAT, add an API when support for that is added. This applies equally to hb_language_t, hb_script_t, and hb_feature_t. > > Summary of my suggestions for Harfbuzz: > > > > 1. Remove resource mgmt APIs, replace with callbacks for accessing font data. > > No locking/refcounting/mem swizzling internally. > > You seem to drastically under-estimate the headaches of lifecycle-management in > a library. Mozilla also disables cairo's mutex use. So I don't think Mozilla > is the perfect use case of those features. When you are dealing with FT_Face > and hb_face_t objects passed around between cairo, pango, poppler, Python > wrappers, etc, the resource management API is absolutely necessary IMO. I'm not underestimating it, I'm simply arguing that the code for doing that resource management does not belong in Harfbuzz. > > 2. Make the API and implemenentation C code if C-only code is a requirement, > > or normal C++ if it's not > > There is no C-only requirement. The requirement is C-only API. And the C++ > code is standard. Not sureewhat you mean by "normal C++". Everyone uses their > own subset of C++. HarfBuzz is no exception. There are C design patterns and C++ design patterns. You're mixing them here in ways that make the code hard to maintain. Either use a simple set of C structs and associated functions or create a set of straight-forward C++ classes with a C-API. > > 3. Simplify font table sanitizing code and move it to a utility library > > Again, the sanitize code is an absolutely necessary part of the design. Can't > be moved out, and is not of much use outside harfbuzz as it doesn't try to be > sanitize the tables to the word of the spec (such a sanitizer exists in > FreeType, and is useless IMO other than to font designers). Just because it's necessary doesn't mean it needs to be tied directly to the code that uses it in ways that make caching the sanitized data difficult.
(In reply to comment #55) > (In reply to comment #54) > Hi Behdad, > > Thanks for the comments. I'll cross-post my comments on the harfbuzz list. Thanks. Since I don't think the branched discussion is not relevant to the review here anymore, I'm not copying my reply here. Here it is on the list: http://lists.freedesktop.org/archives/harfbuzz/2010-April/000575.html
Whether we'll eventually want to use harfbuzz by default in a dwrite environment, when dwrite's own shaping should also be fine, remains to be seen. But it's a useful option for testing/comparison purposes, at least, and it's cheap to implement.
Attachment #439496 - Flags: review?(bas.schouten)
Comment on attachment 436338 [details] [diff] [review] part 3, v5 - harfbuzz shaper backend for thebes + // subclasses should override these if they can do something more efficient + // than getting tables with GetFontTable() and caching them in the entry + // (e.g. get a pointer directly from the OS without copying font data) + virtual const void *GetFontTablePtr(PRUint32 aTag, PRUint32 *aLength); + virtual void ReleaseFontTablePtr(PRUint32 aTag); + virtual PRBool HasFontTable(PRUint32 aTag); + class FontTableCacheEntry { + friend class gfxFontEntry; // for entry destructor to check refcounts + public: + nsrefcnt AddRef(void) { + NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt"); + ++mRefCnt; + NS_LOG_ADDREF(this, mRefCnt, "FontTableCacheEntry", sizeof(*this)); + return mRefCnt; + } + nsrefcnt Release(void) { + NS_PRECONDITION(0 != mRefCnt, "dup release"); + --mRefCnt; + NS_LOG_RELEASE(this, mRefCnt, "FontTableCacheEntry"); + return mRefCnt; + } + + FontTableCacheEntry(nsTArray<PRUint8>& aBuffer) { + MOZ_COUNT_CTOR(FontTableCacheEntry); + mData.SwapElements(aBuffer); + } + + ~FontTableCacheEntry() { + MOZ_COUNT_DTOR(FontTableCacheEntry); + } + + const void *GetTablePtr() const { return mData.Elements(); } + PRUint32 GetTableLength() const { return mData.Length(); } + + protected: + nsAutoRefCnt mRefCnt; + nsTArray<PRUint8> mData; + + private: + // not implemented + FontTableCacheEntry(const FontTableCacheEntry&); + }; Seems like this could be better structured. I think you could change around the FontTableCacheEntry to be gfxFontTable, use NS_INLINE_DECL_REFCOUNTING(gfxFontTable), then use a nsRefPtrHashtable: + nsClassHashtable<nsUint32HashKey,FontTableCacheEntry> mFontTableCache; ==> nsRefPtrHashtable<nsUint32HashKey, gfxFontFamily> mFontTableCache; Clients accessing font tables would keep around a ref ptr, obviating the need for the explicit release call. I'm still concerned about the memory impact of caching lots of tables for a long time but that can be evaluated and optimized. + // fontEntry's cache until the shaper is destroyed. + mutable nsDataHashtable<nsUint32HashKey,hb_blob_t*> mBlobCache; This too would be better as a hashtable of refptr's I think. + gfxFont *GetFont() const { return mFont; } Assert not null? + PRBool FontCanSupportHarfBuzz() { + return (mFUnitsConvFactor > 0.0f) && HasFontTable(TRUETYPE_TAG('c','m','a','p')); + } HasFontTable pulls in the entire table! We're reading in the cmap just after startup, so we should cache that info. If the table exists we should cache that too! What's the mFUnitsConvFactor > 0.0f expression for? > +#!/usr/bin/env perl > + > +# usage: > +# perl genHBUnicodeScriptData.pl hb-unicode.h Scripts.txt > gfxHBUnicodeScriptData.cpp > + > +# split 16-bit values into two fields, for "page" and "char" indexing Could you include a couple more lines of description here on how to run this utility? Like "pull down the Unicode database from xxx, then do yyy". Also the output appears to have tabs, spaces would be better. Comments describing what the different arrays are would be useful too for documentation. +static PRUint32 +FindHighestBit(PRUint32 aValue) +{ + // propagate highest bit into all lower bits of the value + aValue |= (aValue >> 1); + aValue |= (aValue >> 2); + aValue |= (aValue >> 4); + aValue |= (aValue >> 8); + aValue |= (aValue >> 16); + // isolate the leftmost bit + return (aValue & ~(aValue >> 1)); +} Use ffs(v) << 1 instead? If this doesn't work, define in gfxFontUtils. > static PRUint32 > findHighestBit(PRUint32 value) > { > /* propagate highest bit into all lower bits of the value */ > value |= (value >> 1); > value |= (value >> 2); > value |= (value >> 4); > value |= (value >> 8); > value |= (value >> 16); > /* isolate the leftmost bit */ > return (value & ~(value >> 1)); > } Ditto. +hb_script_t +gfxHBUnicodeScript::getScriptCode(hb_codepoint_t ch) +{ + if (ch < 0x10000) { + return hb_script_t(script[pages[0][ch >> CHAR_BITS]] + [ch & ((1 << CHAR_BITS) - 1)]); + } + if (ch >= 0x110000) { + return HB_SCRIPT_UNKNOWN; + } + return hb_script_t(script[pages[planes[(ch >> 16) - 1]] + [(ch & 0xffff) >> CHAR_BITS]] + [ch & ((1 << CHAR_BITS) - 1)]); +} Use named constants. UNICODE_BMP_LIMIT, UNICODE_LIMIT? Maybe there's a common convention for naming these limits? +// if (posInfo->back) { // mark attachment +// gfxPoint *basePos = outPos - posInfo->back; +// outPos->MoveTo(basePos->x + mFont->FUnitsToDevUnits(posInfo->x_offset), +// basePos->y + mFont->FUnitsToDevUnits(posInfo->y_offset)); +// } else { +// } Removed commented out code here. > - if (fontRef == (ATSFontRef)kATSUInvalidFontID) > + if (fontRef == (ATSFontRef)kATSUInvalidFontID) { > return NS_ERROR_FAILURE; > + } > > ByteCount dataLength; > OSStatus status = ::ATSFontGetTable(fontRef, aTableTag, 0, 0, 0, &dataLength); > - NS_ENSURE_TRUE(status == noErr, NS_ERROR_FAILURE); > + if (status != noErr) { > + return NS_ERROR_FAILURE; > + } Actually, these should both be NS_ENSURE_TRUE. Why the second change? +// return a font table as a HarfBuzz blob; these are cached for the +// lifetime of the shaper, to avoid repeated calls to the font +hb_blob_t * +gfxHarfBuzzShaper::GetTableBlob(hb_tag_t aTag) const We shouldn't have method names like this, this is returning a font table in hb_blob form. GetFontTable? Ditto for the 'mBlobCache'. +void +gfxHarfBuzzShaper::GetGlyphMetrics(hb_codepoint_t glyph, + hb_glyph_metrics_t *metrics) const +{ + metrics->x_advance = metrics->y_advance = 0; + metrics->x_offset = metrics->y_offset = 0; + metrics->width = metrics->height = 0; + + hb_blob_t *hheaBlob = GetTableBlob(TRUETYPE_TAG('h','h','e','a')); + if (hb_blob_get_length(hheaBlob) < sizeof(HMetricsHeader)) { + hb_blob_destroy(hheaBlob); + return; + } + + hb_blob_t *hmtxBlob = GetTableBlob(TRUETYPE_TAG('h','m','t','x')); + + const HMetricsHeader* hhea = + reinterpret_cast<const HMetricsHeader*>(hb_blob_lock(hheaBlob)); + PRUint32 numLongMetrics = hhea->numberOfHMetrics; + hb_blob_unlock(hheaBlob); + hb_blob_destroy(hheaBlob); + + if (glyph >= numLongMetrics) { + glyph = numLongMetrics - 1; + } + + // TODO: set additional metrics if/when harfbuzz needs them + const HMetrics* hmtx = + reinterpret_cast<const HMetrics*>(hb_blob_lock(hmtxBlob)); + metrics->x_advance = hmtx->metrics[glyph].advanceWidth; + hb_blob_unlock(hmtxBlob); + hb_blob_destroy(hmtxBlob); +} This is really inefficient, both at the API level and at the implementation level. Harfbuzz calls this callback for every character in a textrun. Seems like a version of GetGlyphMetrics that takes an array of codepoints and returns an array of metrics would make more sense. And you're calling the same code per-character to pull in the 'hhea' and 'hmtx' tables repeatedly, that's inefficient. In the case of the 'hhea' you're only using the numLongMetrics value, there's no reason for pulling in the table every time, even if it is already cached in the font entry. Pull in metrics once and dump them when appropriate. And the locking/unlocking here is just wrong, this level of locking will lead to deadlock if it was ever implemented across threads. But that's really a criticism of the Harfbuzz design, not this patch per se. +hb_codepoint_t +gfxHarfBuzzShaper::GetGlyph(hb_codepoint_t unicode, + hb_codepoint_t variation_selector) const +{ + hb_blob_t *cmapBlob = GetTableBlob(TRUETYPE_TAG('c','m','a','p')); + const PRUint8 *cmapData = + reinterpret_cast<const PRUint8*>(hb_blob_lock(cmapBlob)); + hb_codepoint_t gid = + gfxFontUtils::MapCharToGlyph(cmapData, hb_blob_get_length(cmapBlob), + unicode); // TODO: variation selectors + hb_blob_unlock(cmapBlob); + hb_blob_destroy(cmapBlob); + return gid; +} Similar problem here, MapCharToGlyph should be called once, cached and dumped when appropriate. +struct HMetrics { + HLongMetric metrics[1]; // actually numberOfHMetrics +// mozilla::AutoSwap_PRUint16 leftSideBearing[]; +}; Remove the commented out code, I don't think that definition works since metrics is variable-length. > * This file is based on usc_impl.c from ICU 4.2.0.1, slightly adapted > * for use within Mozilla Gecko, separate from a standard ICU build. > * > * The original ICU license of the code follows: > * > * ICU License - ICU 1.8.1 and later > * Does this require about:license addition? Jonathan, when you update the patches could you clean up and consolidate them? For example, there are several renames of files and classes and the script itemizer code is spread across patches. It's hard to review individual patches when the contents of that patch are swizzled around by a later patch.
Attachment #436338 - Flags: review?(jdaggett) → review-
(In reply to comment #58) > Seems like this could be better structured. I think you could change > around the FontTableCacheEntry to be gfxFontTable, use > NS_INLINE_DECL_REFCOUNTING(gfxFontTable), then use a nsRefPtrHashtable: > > + nsClassHashtable<nsUint32HashKey,FontTableCacheEntry> mFontTableCache; > > ==> nsRefPtrHashtable<nsUint32HashKey, gfxFontFamily> mFontTableCache; > > Clients accessing font tables would keep around a ref ptr, obviating > the need for the explicit release call. Maybe I'm missing something obvious, but if the client doesn't do a release call, how will the hashtable know when an entry can be removed?
(In reply to comment #58) > + PRBool FontCanSupportHarfBuzz() { > + return (mFUnitsConvFactor > 0.0f) && > HasFontTable(TRUETYPE_TAG('c','m','a','p')); > + } > > HasFontTable pulls in the entire table! The generic "fallback" currently does, yes, but this can be overridden on platforms where we have a cheaper way to determine this. > We're reading in the cmap just after > startup, so we should cache that info. If the table exists we should cache > that too! Yes, we could add an mHasCmap flag to the font. I doubt the impact will be measurable, though (based on the profiling I've done so far). > What's the mFUnitsConvFactor > 0.0f expression for? The harfbuzz positioning code needs to be able to convert between font units and device units, which requires the conversion factor to have been set up by the font during InitMetrics() or equivalent. This just checks that it was in fact initialized (especially as our current platform font subclasses don't do that, it's a new field); if not, we won't attempt to use harfbuzz with that font.
(In reply to comment #58) > > - if (fontRef == (ATSFontRef)kATSUInvalidFontID) > > + if (fontRef == (ATSFontRef)kATSUInvalidFontID) { > > return NS_ERROR_FAILURE; > > + } > > > > ByteCount dataLength; > > OSStatus status = ::ATSFontGetTable(fontRef, aTableTag, 0, 0, 0, &dataLength); > > - NS_ENSURE_TRUE(status == noErr, NS_ERROR_FAILURE); > > + if (status != noErr) { > > + return NS_ERROR_FAILURE; > > + } > > Actually, these should both be NS_ENSURE_TRUE. No, because NS_ENSURE_TRUE will cause a warning if the table is not present; this is inappropriate. If the table is not present, the function should simply return a code telling the caller that the table didn't exist. The *first* one could be done with NS_ENSURE_FALSE, I suppose, as we shouldn't be getting invalid font refs. But the second one leads to massive spew of irrelevant warnings.
(In reply to comment #58) > Use ffs(v) << 1 instead? We'd need fls(), not ffs() here. That would actually be + return aValue ? 1 << (fls(aValue) - 1) : 0; However, that's a FreeBSD-ism, doesn't seem to be available on Linux or Windows.
(In reply to comment #62) > We'd need fls(), not ffs() here. > However, that's a FreeBSD-ism, doesn't seem to be available on Linux or > Windows. Argh, well we should at least eliminate the duplicate versions of the HSB function.
Attachment #433560 - Attachment is obsolete: true
Attachment #434053 - Attachment is obsolete: true
Attached patch part 1 - harfbuzz code from upstream (obsolete) (deleted) — Splinter Review
Updated to Behdad's latest code from harfbuzz-ng repository.
Attachment #441005 - Flags: review?(jdaggett)
This will get incorporated upstream eventually, so this is just a temporary patch here.
Attachment #441006 - Flags: review?(jdaggett)
This is also temporary, pending updates to the core harfbuzz API - it's just so we have a suitable shaping API to use for the time being.
Attachment #441007 - Flags: review?(jdaggett)
This implements the script-run itemizer; following patch will connect it to the gfx font code.
Attachment #441009 - Flags: review?(jdaggett)
Attachment #441010 - Flags: review?(gerv)
Attachment #441011 - Flags: review?(jdaggett)
Attachment #441012 - Flags: review?(jdaggett)
Attachment #441014 - Flags: review?(bas.schouten)
Attachment #441010 - Flags: review?(gerv) → review+
Attachment #435741 - Attachment is obsolete: true
Attachment #436338 - Attachment is obsolete: true
Attachment #436342 - Attachment is obsolete: true
Attachment #438729 - Attachment is obsolete: true
Attachment #438730 - Attachment is obsolete: true
Attachment #438936 - Attachment is obsolete: true
Attachment #439496 - Attachment is obsolete: true
Attachment #441015 - Flags: review?(jdaggett)
Attachment #439496 - Flags: review?(bas.schouten)
Attachment #435741 - Flags: review?(jdaggett)
Attachment #436342 - Flags: review?(jdaggett)
Attachment #438729 - Flags: review?(jdaggett)
Attachment #438730 - Flags: review?(jdaggett)
(In reply to comment #58) > I'm still concerned about the memory impact of caching lots of tables for a > long time but that can be evaluated and optimized. On OS X and Windows DWrite, we don't actually have to cache tables; we can get table pointers from the OS (which is presumably doing its own caching) and just hold those, so we don't use the cache in gfxFontEntry. That exists as a default implementation for platforms (currently GDI) where we don't have direct table access. The actual tables are cached in the gfxFontEntry rather than gfxFont because this allows them to be shared between multiple instances of gfxFont (different sizes, different features) based on the same physical font. But the gfxFont may bypass this cache if it can get table pointers more efficiently. > + PRBool FontCanSupportHarfBuzz() { > + return (mFUnitsConvFactor > 0.0f) && > HasFontTable(TRUETYPE_TAG('c','m','a','p')); > + } > > HasFontTable pulls in the entire table! The default implementation in gfxFontEntry does, yes, but any platform that can answer the question without doing so can provide an override. Currently OS X and DWrite do so, by using APIs that get a reference to the table data without actually copying. > We're reading in the cmap just after > startup, so we should cache that info. If the table exists we should cache > that too! We could add an mHasCmapTable flag to the font entry, but we'd still have to deal with fonts being instantiated before the cmap has been read, so it gets a bit messy. I'm not sure it's an optimization worth doing, but we can if you really want. > +void > +gfxHarfBuzzShaper::GetGlyphMetrics(hb_codepoint_t glyph, > + hb_glyph_metrics_t *metrics) const ... > This is really inefficient, both at the API level and at the > implementation level. Harfbuzz calls this callback for every > character in a textrun. Seems like a version of GetGlyphMetrics that > takes an array of codepoints and returns an array of metrics would > make more sense. That would be a possible modification to harfbuzz, I suppose, but it's not how it currently works. Moreover, unless there's a platform API that can efficiently return metrics for an array of glyphs, it probably wouldn't make much difference. In most cases we'd have to measure them individually, to avoid the platform text APIs doing their own layout processing... > And you're calling the same code per-character to > pull in the 'hhea' and 'hmtx' tables repeatedly, that's inefficient. The "blobs" are cached by the shaper, so it's cheap to refer to them. > In the case of the 'hhea' you're only using the numLongMetrics value, > there's no reason for pulling in the table every time, even if it is > already cached in the font entry. OK, it now caches the numLongMetrics value in the shaper. > Similar problem here, MapCharToGlyph should be called once, cached and dumped > when appropriate. I don't see how this would help. cmap tables are designed to allow fairly efficient lookup; if we were to try and cache those results, the overhead of managing the cache and looking up characters in that would probably be just as great. What we can do as a slight optimization is to cache the specific cmap subtable that we're using, so that we don't have to re-scan the available subtables.
Sorry, forgot to de-tab the Unicode data and generator script last time. Should be clean now.
Attachment #441008 - Attachment is obsolete: true
Attachment #441143 - Flags: review?(jdaggett)
Attachment #441008 - Flags: review?(jdaggett)
Comment on attachment 441014 [details] [diff] [review] part 6.2 - use DWrite APIs to access font tables directly Looks good to me! What's the rationale of the following though? >+void >+gfxDWriteFont::ReleaseFontTablePtr(PRUint32 aTag) >+{ >+ // do nothing: we will release in our destructor >+} >+
Attachment #441014 - Flags: review?(bas.schouten) → review+
Because harfbuzz will request the same font tables many times, the font instance has a cache of the pointers and only releases them when the font is destroyed. This is an override of the [Get/Release]FontTablePtr implementation in gfxFont, which is based on a reference-counting cache in the gfxFontEntry. So although we don't need to handle the release call here because we want the table pointers to persist as long as the font, we do have to override the superclass's implementation so that it doesn't try to release the wrong thing.
This is an alternative implementation of font table access/caching. This uses the hb_blob_t wrapper to return font tables, and to cache them in the gfxFontEntry for sharing across gfxFont instances; this leads to a somewhat simpler implementation, IMO, as the blob already provides the necessary refcounting and a "destroy" callback that can be used to remove entries from the cache when the last instance disappears. AFAICT, there is no significant difference in real resource usage (or performance) between this and the previous implementation; in either case, the data is cached and shared at the gfxFontEntry level, or not copied/cached at all (merely "wrapped" at the gfxFont level) if the platform APIs provide direct access. This version makes the gfxFont and gfxFontEntry accessors less generic, as they are specifically working with the hb_blob_t type, but the benefit is a somewhat simpler, more direct API and implementation. I'm leaning towards this approach for its relative simplicity, but leaving both versions of the patch here for now to invite comments.
Attachment #441453 - Flags: review?(jdaggett)
The alternate version of part 6.1 using the hb_blob_t font access API; avoids copying table data to an in-process cache on OS X.
Attachment #441454 - Flags: review?(jdaggett)
Ditto, for DWrite fonts.
Attachment #441455 - Flags: review?(jdaggett)
This is the version of the harfbuzz shaping backend that works with the direct hb_blob_t font table APIs. This simplifies the shaper, which no longer needs to handle the wrapping of generic data pointers in blob form.
Attachment #441456 - Flags: review?(jdaggett)
The makefile patch (part 2, v3) needs refreshing.
Updated the harfbuzz code again to the latest git version. This will keep evolving, but I think we need to get an initial version into the tree so that we can move forward with the integration; we'll keep pulling updates periodically as harfbuzz itself matures.
Attachment #441005 - Attachment is obsolete: true
Attachment #441730 - Flags: review?(jdaggett)
Attachment #441005 - Flags: review?(jdaggett)
Refreshed part 3 due to makefile bitrot.
Attachment #441746 - Flags: review?(jdaggett)
Refreshed part 4 due to makefile bitrot.
Attachment #441009 - Attachment is obsolete: true
Attachment #441143 - Attachment is obsolete: true
Attachment #441747 - Flags: review?(jdaggett)
Attachment #441009 - Flags: review?(jdaggett)
Attachment #441143 - Flags: review?(jdaggett)
Attached patch part 7 [ALT] - gfxHarfBuzzShaper - refreshed (obsolete) (deleted) — Splinter Review
Refreshed part 7 [ALT] due to bitrot.
Attachment #441456 - Attachment is obsolete: true
Attachment #441748 - Flags: review?(jdaggett)
Attachment #441456 - Flags: review?(jdaggett)
Attached patch part 7 [ALT] - gfxHarfBuzzShaper - refreshed (obsolete) (deleted) — Splinter Review
Oops, the patch wasn't properly refreshed last time (I have a love/hate relationship with mq!). This should be the right version now.
Attachment #441748 - Attachment is obsolete: true
Attachment #441758 - Flags: review?(jdaggett)
Attachment #441748 - Flags: review?(jdaggett)
The latest hb changes (part 1) contains code that trips up your alternate shaping api (part 1.5).
Attachment #441007 - Attachment is obsolete: true
Attachment #442034 - Flags: review?(jdaggett)
Attachment #441007 - Flags: review?(jdaggett)
A mercurial patch queue (with the "alternate" table cache and shaper implementation) is available at http://hg.mozilla.org/users/jkew_mozilla.com/harfbuzz-mq/.
Handy as a way of loading all local fonts to test the sanitizer. Roughly 10% of the fonts on my system require GPOS fixup, something to do with PairPos subtables. To run, download and run locally. When run, allow/deny dialog will appear. To see sanitizer output build with "#define HB_DEBUG_SANITIZE 500" in hb-open-type-private.hh.
On OS X and DWrite, harfbuzz layout works well using metrics directly from the hmtx table. However, on GDI it's essential to use glyph widths obtained from Windows APIs, otherwise we get poor spacing because the "ideal" widths from hmtx don't work well with the hinted, grid-fitted glyph shapes GDI renders. So this patch allows the harfbuzz shaper to request hinted metrics from the platform font implementation if it chooses to support this. This is included (along with refreshed versions of other patches) in the mq repository at http://hg.mozilla.org/users/jkew_mozilla.com/harfbuzz-mq/.
Attachment #445110 - Flags: review?(jdaggett)
Attachment #441006 - Flags: review?(jdaggett) → review+
Comment on attachment 442034 [details] [diff] [review] part 1.5 - add alternate shaping API to take opentype feature tags - refreshed Looks good but having a more descriptive function names would be better instead of xxx_2 (e.g. _hb_ot_substitute_complex_2).
Comment on attachment 442034 [details] [diff] [review] part 1.5 - add alternate shaping API to take opentype feature tags - refreshed Looks good but having a more descriptive function names would be better instead of xxx_2 (e.g. _hb_ot_substitute_complex_2).
Attachment #442034 - Flags: review?(jdaggett) → review+
Comment on attachment 441746 [details] [diff] [review] part 3 - add unicode character properties - refreshed Part 3. Looks fine but I'd like Simon to take a look also.
Attachment #441746 - Flags: review?(smontagu)
Attachment #441746 - Flags: review?(jdaggett)
Attachment #441746 - Flags: review+
Comment on attachment 441747 [details] [diff] [review] part 4 - script run itemizer - refreshed + void setText(const PRUnichar * src, PRUint32 length); + + PRBool next(PRUint32 *pRunStart, PRUint32 *pRunLimit, + PRInt32 *pRunScript); In general gfx code uses capitalized method names. + if (pRunStart != NULL) { + *pRunStart = scriptStart; + } + if (pRunLimit != NULL) { + *pRunLimit = scriptLimit; + } + if (pRunScript != NULL) { + *pRunScript = scriptCode; + } Are there cases when these are null pointers? If not, might be better to use a reference types rather than pointers.
Attachment #441747 - Flags: review?(smontagu)
Attachment #441747 - Flags: review?(jdaggett)
Attachment #441747 - Flags: review+
Comment on attachment 441011 [details] [diff] [review] part 5 - break text into script runs before font-matching and shaping > void > gfxFontGroup::InitTextRun(gfxContext *aContext, > gfxTextRun *aTextRun, > const PRUnichar *aString, > PRUint32 aLength) > { > + // split into script runs so that script can potentially influence > + // the font matching process below > + gfxScriptItemizer scriptRuns(aString, aLength); > + > + PRUint32 runStart = 0, runLimit = aLength; > + PRInt32 runScript = HB_SCRIPT_LATIN; > + while (scriptRuns.next(&runStart, &runLimit, &runScript)) { > + InitTextRun(aContext, aTextRun, aString, aLength, > + runStart, runLimit, runScript); > + } > +} Seems like this should only happen when the Harfbuzz shaper is being used or at least only when Harfbuzz is enabled. In the Uniscribe case, this leads to breaking up first with gfxScriptItemizer, then with Uniscribe's ScriptItemize for each individual script run. Better to skip gfxScriptItemizer when not using Harfbuzz I think. Just curious, how similar is the output of gfxScriptItemizer to that of ScriptItemize?
Comment on attachment 441012 [details] [diff] [review] part 6 - generic font table cache and accessors in gfxFontEntry and gfxFont Part 6, couple quick comments: -gfxFontUtils::MapCharToGlyph(PRUint8 *aBuf, PRUint32 aBufLength, PRUnichar aCh) +gfxFontUtils::MapCharToGlyphFormat12(const PRUint8 *aBuf, PRUint32 aCh) +{ + const Format12Cmap *cmap12 = reinterpret_cast<const Format12Cmap*>(aBuf); + PRUint32 numGroups = cmap12->numGroups; + PRUint32 powerOf2 = mozilla::FindHighestBit(numGroups); + PRUint32 rangeOffset = numGroups - powerOf2; + PRUint32 range = 0; + PRUint32 startCharCode; + + const Format12Group *groups = &cmap12->groups[0]; + if (groups[rangeOffset].startCharCode <= aCh) { + range = rangeOffset; + } + + while (powerOf2 > 1) { + powerOf2 >>= 1; + if (groups[range + powerOf2].startCharCode <= aCh) { + range += powerOf2; + } + } + + startCharCode = groups[range].startCharCode; + if (startCharCode <= aCh && groups[range].endCharCode >= aCh) { + return groups[range].startGlyphId + aCh - startCharCode; + } + + return 0; +} Seems like you need to pass in aBufLength and confirm that numGroups is valid. Or maybe better, confirm this during the initial cmap table read. + case 4: + return aCh < 0x10000 ? MapCharToGlyphFormat4(aBuf + offset, aCh) : 0; Please use named constants (UNICODE_BMP_LIMIT here?).
Comment on attachment 441012 [details] [diff] [review] part 6 - generic font table cache and accessors in gfxFontEntry and gfxFont Marking this as obsolete; I think we should go with the "ALT" implementation that has simpler code overall.
Attachment #441012 - Attachment is obsolete: true
Attachment #441012 - Flags: review?(jdaggett)
Attachment #441013 - Attachment is obsolete: true
Attachment #441013 - Flags: review?(jdaggett)
Attachment #441014 - Attachment is obsolete: true
Attachment #441015 - Attachment is obsolete: true
Attachment #441015 - Flags: review?(jdaggett)
Comment on attachment 441006 [details] [diff] [review] part 1.4 - fix for bidi mirroring bug in hb_shape No longer needed, this is now fixed in the upstream harfbuzz code.
Attachment #441006 - Attachment is obsolete: true
(In reply to comment #99) > (From update of attachment 441011 [details] [diff] [review]) > > void > > gfxFontGroup::InitTextRun(gfxContext *aContext, > > gfxTextRun *aTextRun, > > const PRUnichar *aString, > > PRUint32 aLength) > > { > > + // split into script runs so that script can potentially influence > > + // the font matching process below > > + gfxScriptItemizer scriptRuns(aString, aLength); > > + > > + PRUint32 runStart = 0, runLimit = aLength; > > + PRInt32 runScript = HB_SCRIPT_LATIN; > > + while (scriptRuns.next(&runStart, &runLimit, &runScript)) { > > + InitTextRun(aContext, aTextRun, aString, aLength, > > + runStart, runLimit, runScript); > > + } > > +} > > Seems like this should only happen when the Harfbuzz shaper is being > used or at least only when Harfbuzz is enabled. In the Uniscribe > case, this leads to breaking up first with gfxScriptItemizer, then > with Uniscribe's ScriptItemize for each individual script run. Better > to skip gfxScriptItemizer when not using Harfbuzz I think. I'd rather keep things simple by using a consistent overall processing model as far as possible, rather than switching between different paths at this level depending which shaper is enabled. Note that we may want to choose different shapers for the individual script runs; to support this, we *have* to do the script itemization before the shaper choice. In addition, there are a couple of side benefits to doing this: first, it works around a nasty Core Text bug on 10.5; see bug 543353. And second, it seems to actually give us a performance win with Uniscribe in some cases (especially on older systems - maybe XP's Uniscribe is less efficient). > Just curious, how similar is the output of gfxScriptItemizer to that > of ScriptItemize? I haven't tried to compare. Just from looking at the complexity of the structures that ScriptItemize returns, I'd guess it may be doing something more fine-grained - e.g., it looks like it may separate digit runs into their own "items".
(In reply to comment #98) > (From update of attachment 441747 [details] [diff] [review]) > + void setText(const PRUnichar * src, PRUint32 length); > + > + PRBool next(PRUint32 *pRunStart, PRUint32 *pRunLimit, > + PRInt32 *pRunScript); > > In general gfx code uses capitalized method names. > > + if (pRunStart != NULL) { > + *pRunStart = scriptStart; > + } > + if (pRunLimit != NULL) { > + *pRunLimit = scriptLimit; > + } > + if (pRunScript != NULL) { > + *pRunScript = scriptCode; > + } > > Are there cases when these are null pointers? If not, might be better to use a > reference types rather than pointers. Yeah, might as well. All this was closely following the original ICU itemizer code, which had a pure-C API, but I'm happy to adapt these to make the API fit our code better. (Just FTR, I'd prefer _not_ to modify the various internal method names and instance vars to fit our conventions, as that will just add to the integration hassle for any future updates we might want to merge from ICU.)
(In reply to comment #100) > (From update of attachment 441012 [details] [diff] [review]) > Part 6, couple quick comments: Oops, this belongs more properly in part 7, it's not part of the table cache! Moved the code. > Seems like you need to pass in aBufLength and confirm that numGroups is valid. > Or maybe better, > confirm this during the initial cmap table read. We validate the numGroups value in gfxFontUtils::ReadCMAPTableFormat12... const PRUint32 numGroups = ReadLongAt(aBuf, OffsetNumberGroups); NS_ENSURE_TRUE(tablelen >= 16 + (12 * numGroups), NS_ERROR_GFX_CMAP_MALFORMED); So if the table isn't big enough for the claimed number of groups, we'll refuse to use it. However, I notice that the arithmetic expression there could actually overflow, so I've just rewritten the check to avoid that risk.
(In reply to comment #96) > (From update of attachment 442034 [details] [diff] [review]) > Looks good but having a more descriptive function names would be better instead > of xxx_2 (e.g. _hb_ot_substitute_complex_2). I've renamed them to _hb_ot_substitute_complex_with_tags, etc. (I see these as an interim implementation, pending the finalization of internal harfbuzz feature processing.)
Depends on: 566209
Comment on attachment 441746 [details] [diff] [review] part 3 - add unicode character properties - refreshed What bothers me about this is that it means we will have three different implementations of the UCD (or overlapping subsets of it) in our codebase: this one, the one in intl, and the one in js (which I am supposed to be updating in bug 394604). Could we put this somewhere else and unify the other ones to it?
(In reply to comment #107) > (From update of attachment 441746 [details] [diff] [review]) > What bothers me about this is that it means we will have three different > implementations of the UCD (or overlapping subsets of it) in our codebase: this > one, the one in intl, and the one in js (which I am supposed to be updating in > bug 394604). Could we put this somewhere else and unify the other ones to it? I wondered about that, too, and would be interested to hear suggestions. Just bear in mind that the access to character properties here will be critical to text-rendering performance, so I don't want to do something that will involve additional cost on those lookups. But if we can keep property lookup equally efficient while unifying them, I'm all for it.
FWIW I have a long-term plan to write a library just exposing the UCD data...
Jonathan, your patch queue needs to be updated for changes in gfx/thebes/src/Makefile.in.
With jkew's patches as of 5/17, I'm seeing a crash in hb_font_get_contour_point with DejaVu Sans, using the diacritics testcase: -1608640736[a0ab10]: InitTextRun 210ec960 fontgroup 1acf74e0 (DejaVu Sans,serif) lang: en-jp len 81 TEXTRUN " à á â ã ä å ç è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ" ENDTEXTRUN #0 0x00000000 in ?? () #1 0x04da270e in hb_font_get_contour_point (font=0x16ceb40, face=0x1acbf6d0, point_index=4, glyph=689, x=0xbfff2ec0, y=0xbfff2ebc) at /builds/mozharfbuzz/gfx/harfbuzz/src/hb-font.cc:202 #2 0x04dab97b in AnchorFormat2::get_anchor (this=0x1ba36d72, layout=0xbfff319c, glyph_id=689, x=0xbfff2f50, y=0xbfff2f4c) at hb-ot-layout-gpos-private.hh:246 #3 0x04dabb6a in Anchor::get_anchor (this=0x1ba36d72, layout=0xbfff319c, glyph_id=689, x=0xbfff2f50, y=0xbfff2f4c) at hb-ot-layout-gpos-private.hh:314 #4 0x04dac073 in MarkArray::apply (this=0x1ba36c8c, context=0xbfff30cc, mark_index=0, glyph_index=36, anchors=@0x1ba35b6c, class_count=1, glyph_pos=1) at hb-ot-layout-gpos-private.hh:406 #5 0x04dac318 in MarkBasePosFormat1::apply (this=0x1ba35b60, context=0xbfff30cc) at hb-ot-layout-gpos-private.hh:1092 #6 0x04dac38b in MarkBasePos::apply (this=0x1ba35b60, context=0xbfff30cc) at hb-ot-layout-gpos-private.hh:1132 #7 0x04dadcfd in PosLookupSubTable::apply (this=0x1ba35b60, context=0xbfff30cc, lookup_type=4) at hb-ot-layout-gpos-private.hh:1457 #8 0x04dade7d in PosLookup::apply_once (this=0x1ba315c6, layout=0xbfff319c, buffer=0x1ac2ba10, context_length=1114112, nesting_level_left=8) at hb-ot-layout-gpos-private.hh:1524 #9 0x04dadf39 in PosLookup::apply_string (this=0x1ba315c6, layout=0xbfff319c, buffer=0x1ac2ba10, mask=65535) at hb-ot-layout-gpos-private.hh:1549 #10 0x04dadfcf in GPOS::position_lookup (this=0x1ba313c0, layout=0xbfff319c, buffer=0x1ac2ba10, lookup_index=13, mask=65535) at hb-ot-layout-gpos-private.hh:1592 #11 0x04da643b in hb_ot_layout_position_lookup (font=0x16ceb40, face=0x1acbf6d0, buffer=0x1ac2ba10, lookup_index=13, mask=65535) at /builds/mozharfbuzz/gfx/harfbuzz/src/hb-ot-layout.cc:594 #12 0x04da4476 in _hb_ot_position_complex_with_tags (font=0x16ceb40, face=0x1acbf6d0, buffer=0x1ac2ba10, feature_tags=0x5092120, num_tags=11) at /builds/mozharfbuzz/gfx/harfbuzz/src/hb-ot-shape.cc:223 #13 0x04dbaef2 in hb_shape_ot_generic (font=0x16ceb40, face=0x1acbf6d0, buffer=0x1ac2ba10, user_features=0xbfff4414, num_user_features=0) at /builds/mozharfbuzz/gfx/harfbuzz/src/hb-shape.cc:352 #14 0x04c3c2ec in gfxHarfBuzzShaper::InitTextRun (this=0x210f49f0, aContext=0x2134ee50, aTextRun=0x210ec960, aString=0xbfff4d94, aRunStart=0, aRunLength=81, aRunScript=25) at /builds/mozharfbuzz/gfx/thebes/src/gfxHarfBuzzShaper.cpp:419 #15 0x04c18fb5 in gfxFont::InitTextRun (this=0x21375b40, aContext=0x2134ee50, aTextRun=0x210ec960, aString=0xbfff4d94, aRunStart=0, aRunLength=81, aRunScript=25) at /builds/mozharfbuzz/gfx/thebes/src/gfxFont.cpp:1285 #16 0x04c1ee44 in gfxFontGroup::InitTextRun (this=0x1acf74e0, aContext=0x2134ee50, aTextRun=0x210ec960, aString=0xbfff4d94, aTotalLength=81, aScriptRunStart=0, aScriptRunEnd=81, aRunScript=25) at /builds/mozharfbuzz/gfx/thebes/src/gfxFont.cpp:2067 #17 0x04c1f07f in gfxFontGroup::InitTextRun (this=0x1acf74e0, aContext=0x2134ee50, aTextRun=0x210ec960, aString=0xbfff4d94, aLength=81) at /builds/mozharfbuzz/gfx/thebes/src/gfxFont.cpp:2017 #18 0x04c1fb40 in gfxFontGroup::MakeTextRun (this=0x1acf74e0, aString=0xbfff4d94, aLength=81, aParams=0xbfff4f8c, aFlags=17825857) at /builds/mozharfbuzz/gfx/thebes/src/gfxFont.cpp:1992 #19 0x04c36dd8 in TextRunWordCache::MakeTextRun (this=0x16a7220, aText=0xbfff654c, aLength=80, aFontGroup=0x1acf74e0, aParams=0xbfff63d0, aFlags=17825856) at /builds/mozharfbuzz/gfx/thebes/src/gfxTextRunWordCache.cpp:693 There have been a lot of checkin's to harfbuzz-ng in the last week so this may already be fixed.
Crashes with hb level == 1 and DejaVu Sans v.2.21 (from OpenOffice 3.0)
Attachment #441746 - Flags: review?(smontagu) → review+
Attachment #441747 - Flags: review?(smontagu) → review+
(In reply to comment #112) > Created an attachment (id=446943) [details] > simplified testcase for crash in hb_font_get_contour_point > > Crashes with hb level == 1 and DejaVu Sans v.2.21 (from OpenOffice 3.0) Yup, this is because I hadn't implemented the relevant font callback yet - but didn't realize this would actually crash, I thought it would just mean certain font features wouldn't fully work. I'll have an update shortly with this fixed. (Avoiding the crash is trivial, but given that DejaVu actually relies on this, I want to support it properly.)
The patch queue now fixes the DejaVu crash, and is updated to apply cleanly to current trunk. Note that parts 1.3 and 1.5 are now obsolete, following updates to the harfbuzz-ng code upstream.
Comment on attachment 434257 [details] [diff] [review] part 1.3 - update harfbuzz script code list for Unicode 5.2 Marking patch as obsolete (incorporated upstream).
Attachment #434257 - Attachment is obsolete: true
Comment on attachment 442034 [details] [diff] [review] part 1.5 - add alternate shaping API to take opentype feature tags - refreshed Marking as obsolete (upstream API has been updated to the point where we can use it, although it is not yet fully implemented internally).
Comment on attachment 442034 [details] [diff] [review] part 1.5 - add alternate shaping API to take opentype feature tags - refreshed Marking as obsolete (upstream API has been updated to the point where we can use it, although it is not yet fully implemented internally).
Attachment #442034 - Attachment is obsolete: true
Fixed the crash upstream.
Consolidated review of non-harfbuzz code (i.e. all parts except part 1) based on jkew's patch queue as of 5/24: +++ b/gfx/thebes/public/gfxFont.h +typedef struct _hb_blob_t hb_blob_t; Shouldn't this be in a hb header? + // check whether this is an sfnt we can potentially use with harfbuzz + PRBool FontCanSupportHarfBuzz() { + return HasFontTable(TRUETYPE_TAG('c','m','a','p')); + } As noted in comment 58, this doesn't seem like good style, the code should be based on a bool in the gfxFontEntry, instead of pulling in the cmap for the sole purpose of deciding whether it exists or not. + static PRUint32 + MapCharToGlyph(const PRUint8 *aBuf, PRUint32 aBufLength, PRUint32 aCh); This is no longer used, we have format specific versions. Better to omit this. -void -gfxCoreTextShaper::CreateDefaultFeaturesDescriptor() -{ You seem to be moving this from gfxCoreTextShaper to gfxMacFont but it looks like it's still declared in both. Is this just code cleanup or are you making some behavioral change with this? Hard to tell in a big patch. +hb_blob_t * +gfxFontEntry::GetFontTable(PRUint32 aTag) +{ + if (!mFontTableCache.IsInitialized()) { + // we do this here rather than on fontEntry construction + // because not all shapers will access the table cache at all + mFontTableCache.Init(10); + } + + FontTableCacheEntry *entry = nsnull; + if (!mFontTableCache.Get(aTag, &entry)) { + nsTArray<PRUint8> buffer; + if (NS_SUCCEEDED(GetFontTable(aTag, buffer))) { + entry = new FontTableCacheEntry(buffer, // adopts buffer elements + aTag, mFontTableCache); + if (mFontTableCache.Put(aTag, entry)) { + return entry->GetBlob(); + } + hb_blob_destroy(entry->GetBlob()); + delete entry; // we failed to cache it! + return hb_blob_create_empty(); + } + } + + if (entry) { + return hb_blob_reference(entry->GetBlob()); + } + + return hb_blob_create_empty(); +} Just to confirm, hb_blob_reference needs to be called when Get succeeds but not when Put succeeds? Seems odd. I'm still fuzzy on what the exact protocol is for using these. Are the returned hb_blob_t objects ref counted? How are they released? I also really don't like the harbuzz-ism of returning "empty" objects when a simple null pointer would suffice. The HB callback could use the empty object pattern if that's required for harfbuzz. + PRBool ok = PR_FALSE; + + NS_ASSERTION(mPlatformShaper || mHarfBuzzShaper, "no shaper?!"); + if (!mPlatformShaper && !mHarfBuzzShaper) { + return ok; } Minor nit, return PR_FALSE here instead of using the default value of ok. In gfxFontGroup::InitTextRun > + PRUint32 runStart = aScriptRunStart; > nsAutoTArray<gfxTextRange,3> fontRanges; > - ComputeRanges(fontRanges, aString, 0, aLength); > + ComputeRanges(fontRanges, aString, aScriptRunStart, aScriptRunEnd); > PRUint32 numRanges = fontRanges.Length(); > > nsAutoTArray<PRPackedBool,SMALL_GLYPH_RUN> unmatchedArray; > PRPackedBool *unmatched = NULL; > > for (PRUint32 r = 0; r < numRanges; r++) { > const gfxTextRange& range = fontRanges[r]; > PRUint32 matchedLength = range.Length(); > gfxFont *matchedFont = (range.font ? range.font.get() : nsnull); > > if (matchedFont) { > // create the glyph run for this range > aTextRun->AddGlyphRun(matchedFont, runStart, (matchedLength > 0)); > > // do glyph layout and record the resulting positioned glyphs > matchedFont->InitTextRun(aContext, aTextRun, aString, > - runStart, matchedLength); > + runStart, matchedLength, aRunScript); > } else { > // no font available, so record missing glyph info instead > if (unmatched == NULL) { > - if (unmatchedArray.SetLength(aLength)) { > + if (unmatchedArray.SetLength(aTotalLength)) { > unmatched = unmatchedArray.Elements(); > - ::memset(unmatched, PR_FALSE, aLength*sizeof(PRPackedBool)); > + ::memset(unmatched, PR_FALSE, > + aTotalLength*sizeof(PRPackedBool)); > } > } Er, hang on, why is this code messing about aTotalLength for the missing array? Shouldn't that be aScriptRunEnd - aScriptRunStart? +typedef struct { + AutoSwap_PRUint32 startCharCode; + AutoSwap_PRUint32 endCharCode; + AutoSwap_PRUint32 startGlyphId; +} Format12Group; + +typedef struct { + AutoSwap_PRUint16 format; + AutoSwap_PRUint16 reserved; + AutoSwap_PRUint32 length; + AutoSwap_PRUint32 language; + AutoSwap_PRUint32 numGroups; + + Format12Group groups[1]; +} Format12Cmap; + const PRUint32 numGroups = ReadLongAt(aBuf, OffsetNumberGroups); + NS_ENSURE_TRUE((tablelen - 16) / 12 >= numGroups, NS_ERROR_GFX_CMAP_MALFORMED); I think it would be better to define Format12CmapHeader instead of Format12Cmap, then use sizeof(Format12CmapHeader) and sizeof(Format12Group) in the range checks. I realize the code has other places where size constants are there but in this case I think it's important to understand the structural checks that are being done because the code in the MapCharxxx routines appear to rely on this later. +gfxFontUtils::MapCharToGlyphFormat12(const PRUint8 *aBuf, PRUint32 aCh) +{ + const Format12Cmap *cmap12 = reinterpret_cast<const Format12Cmap*>(aBuf); + PRUint32 numGroups = cmap12->numGroups; + PRUint32 powerOf2 = mozilla::FindHighestBit(numGroups); + PRUint32 rangeOffset = numGroups - powerOf2; + PRUint32 range = 0; + PRUint32 startCharCode; + + const Format12Group *groups = &cmap12->groups[0]; + if (groups[rangeOffset].startCharCode <= aCh) { + range = rangeOffset; + } + + while (powerOf2 > 1) { + powerOf2 >>= 1; + if (groups[range + powerOf2].startCharCode <= aCh) { + range += powerOf2; + } + } + + startCharCode = groups[range].startCharCode; + if (startCharCode <= aCh && groups[range].endCharCode >= aCh) { + return groups[range].startGlyphId + aCh - startCharCode; + } + + return 0; +} After puzzling over this, this code looks correct but it really cries out for a comment explaining the algorithm being used. Asserting (range + powerOf2 < numGroups) after powerOf2 >>= 1 would be appropriate. And a comment noting that numGroups is verified in the ReadFormat routine would make sense. + cairo_win32_scaled_font_select_font(mScaledFont, DCFromContext(aContext)); Hmm, what's this? Is this a bug fix for something else? + mFUnitsConvFactor = GetAdjustedSize() / oMetrics.otmEMSquare; The denominator here is guaranteed not to be zero? One would hope but this is font data after all, anything is possible. + if (GetCharWidthI(DCFromContext(aCtx), aGID, 1, NULL, &devWidth)) { + width = devWidth << 16; + mGlyphWidths.Put(aGID, width); + return width; + } Although unlikely, devWidth is a 32-bit int, so I think you need an error check or a MAX calc for (devWidth << 16). + mHmtxBlob(nsnull), + mNumLongMetrics(0), + mCmapBlob(nsnull), Within the FontTableCache is one thing but I really hate the generic term "blob" here, it obscures that this is a *font table* not a generic bunch of bits. I would suggest 'mHmtxTable', 'mCmapTable'. Same type (hb_blob_t) just different names. + mCmapFormat = + gfxFontUtils::FindPreferredSubtable((const PRUint8*)data, + hb_blob_get_length(mCmapBlob), + &mSubtableOffset, &sym); Better to cache the preferred format in the font entry, since the cmap has already been scanned once at this point. + gid = unicode < 0x10000 ? Use 'UNICODE_BMP_LIMIT'. + // read this from the hhea table without caching the blob + hb_blob_t *hheaBlob = mFont->GetFontTable(TRUETYPE_TAG('h','h','e','a')); + if (hb_blob_get_length(hheaBlob) >= sizeof(HMetricsHeader)) { + const HMetricsHeader* hhea = reinterpret_cast<const HMetricsHeader*> + (hb_blob_lock(hheaBlob)); + mNumLongMetrics = hhea->numberOfHMetrics; + hb_blob_unlock(hheaBlob); + } + hb_blob_destroy(hheaBlob); s/Blob/Table/ + if (fcd->mShaper->GetFont()->GetContourPoint(fcd->mContext, point_index, + glyph, fx, fy)) { + // convert to 16.16 fixed-point + *x = fx * 65536; + *y = fy * 65536; + return true; + } There's no implementation for GetContourPoint currently, only a default implementation. And the conversion to 16.16 fixed ints should really be done with a macro like 'Float2Fixed', defined in gfx or in harfbuzz. + // Cached copy of the hmtx table and numLongMetrics field from hhea, + // for use when looking up glyph metrics; initialized to 0 by the + // constructor so we can tell it hasn't been set yet. + // This is a signed value so that we can use -1 to indicate + // an error (if the hhea table was not available). + mutable hb_blob_t *mHmtxBlob; + mutable PRInt32 mNumLongMetrics; + + // Cached pointer to cmap subtable to be used for char-to-glyph mapping. + // This comes from GetFontTablePtr; if it is non-null, our destructor + // must call ReleaseFontTablePtr to avoid permanently caching the table. + mutable hb_blob_t *mCmapBlob; + mutable PRInt32 mCmapFormat; + mutable PRUint32 mSubtableOffset; Sorry, why do all these fields need to be mutable? + if (FontCanSupportHarfBuzz()) { + mHarfBuzzShaper = new gfxHarfBuzzShaper(this); + } + mPlatformShaper = new gfxCoreTextShaper(this); If harfbuzz is in use, seems like most of the time the platform shaper won't be needed. Do this lazily? Same for DWrite code. + mFUnitsConvFactor = GetAdjustedSize() / ::CTFontGetUnitsPerEm(aCTFont); + Zero check? > + const PRUint8* data = (const PRUint8*)hb_blob_lock(mCmapBlob); > > + (hb_blob_lock(hheaBlob)); > > + reinterpret_cast<const HMetrics*>(hb_blob_lock(mHmtxBlob)); This is more a harfbuzz API point but table level locking sounds crazy to me. I realize it's stubbed out at this point but if real mutexes were actually implemented, I can easily imagine deadlocking scenarios where two threads trying to lock the same set of tables in different order cause deadlock. I would really prefer if our code could avoid any pretense of thread safety here. I still need to go through some of the larger routines in gfxHarfBuzzShaper but I'm going to wait until tomorrow, my brain's kinda fried currently.
(In reply to comment #119) > Consolidated review of non-harfbuzz code (i.e. all parts except part 1) based > on jkew's patch queue as of 5/24: > > +++ b/gfx/thebes/public/gfxFont.h > +typedef struct _hb_blob_t hb_blob_t; > > Shouldn't this be in a hb header? IIRC, at one point #include-ing hb-blob.h in gfxFont.h led to conflicts elsewhere in the build (as gfxFont.h is a public header that gets used lots more places), but I'll check whether this is still an issue. As hb_blob_t is only needed for pointer parameters within this header, the opaque struct declaration is sufficient. (FWIW, we do the same thing with some cairo types in other gfx headers, for example.) > As noted in comment 58, this doesn't seem like good style, the code > should be based on a bool in the gfxFontEntry, instead of pulling in > the cmap for the sole purpose of deciding whether it exists or not. I'll see if I can rework it in that way - that'll mean ensuring that the cmap has in fact been read by the time we are needing to decide whether to instantiate a harfbuzz shaper. > -void > -gfxCoreTextShaper::CreateDefaultFeaturesDescriptor() > -{ > > You seem to be moving this from gfxCoreTextShaper to gfxMacFont but it > looks like it's still declared in both. Is this just code cleanup or > are you making some behavioral change with this? Hard to tell in a > big patch. It's needed in gfxMacFont because this now creates a CTFontRef in order to support font table access via the Core Text APIs, to bypass the gfxFontEntry cache and reduce RAM footprint. > +hb_blob_t * > +gfxFontEntry::GetFontTable(PRUint32 aTag) > +{ .... > + if (entry) { > + return hb_blob_reference(entry->GetBlob()); > + } > + > + return hb_blob_create_empty(); > +} > > Just to confirm, hb_blob_reference needs to be called when Get > succeeds but not when Put succeeds? Seems odd. Put() is used with a newly-created blob that has had its ref count initialized to 1; if harfbuzz then calls destroy() on the blob, it will be deleted and removed from the cache. Get() is returning an additional reference to a pre-existing blob; it needs to bump the ref count so that the blob is not actually deleted and removed from cache until harfbuzz calls destroy() for EACH reference it has been holding. > I'm still fuzzy on what the exact protocol is for using these. Are > the returned hb_blob_t objects ref counted? How are they released? Yes, they're refcounted (internally by harfbuzz, not our AddRef/Release protocol). Harfbuzz calls hb_blob_destroy() when it is finished with them; this actually decrements the refcount, and destroys "for real" only if it reaches zero. At that point the blob's destruction callback gets called (which removes the entry from the table cache here). > I > also really don't like the harbuzz-ism of returning "empty" objects > when a simple null pointer would suffice. The HB callback could use > the empty object pattern if that's required for harfbuzz. I think Behdad has just updated HB to allow us to return null for nonexistent tables, so I can revise this. > + cairo_win32_scaled_font_select_font(mScaledFont, DCFromContext(aContext)); > > Hmm, what's this? Is this a bug fix for something else? We have to select the font into the device context in order for GetCharWidthI to work. But it's better to do it once for the run, at SetupCairoFont() time, than on each individual glyph-metrics call. > + if (fcd->mShaper->GetFont()->GetContourPoint(fcd->mContext, point_index, > + glyph, fx, fy)) { > + // convert to 16.16 fixed-point > + *x = fx * 65536; > + *y = fy * 65536; > + return true; > + } > > There's no implementation for GetContourPoint currently, only a > default implementation. Right, it's a hook where we could potentially use platform-specific APIs to get a hinted point location. I was considering implementing it for GDI but it's non-critical. > + // Cached copy of the hmtx table and numLongMetrics field from hhea, > + // for use when looking up glyph metrics; initialized to 0 by the > + // constructor so we can tell it hasn't been set yet. > + // This is a signed value so that we can use -1 to indicate > + // an error (if the hhea table was not available). > + mutable hb_blob_t *mHmtxBlob; > + mutable PRInt32 mNumLongMetrics; > + > + // Cached pointer to cmap subtable to be used for char-to-glyph mapping. > + // This comes from GetFontTablePtr; if it is non-null, our destructor > + // must call ReleaseFontTablePtr to avoid permanently caching the table. > + mutable hb_blob_t *mCmapBlob; > + mutable PRInt32 mCmapFormat; > + mutable PRUint32 mSubtableOffset; > > Sorry, why do all these fields need to be mutable? Because they're modified from within const methods. The harfbuzz callbacks pass user_data as a const pointer, through which we ultimately call back to shaper object methods. Conceptually, those methods are just accessors that can reasonably be declared const. These fields are mutable so that the accessors can use them to cache things internally, even though the methods themselves are declared as const. As an alternative, we could cast away the const-ness of the user_data pointer in the harfbuzz callbacks. > + if (FontCanSupportHarfBuzz()) { > + mHarfBuzzShaper = new gfxHarfBuzzShaper(this); > + } > + mPlatformShaper = new gfxCoreTextShaper(this); > > If harfbuzz is in use, seems like most of the time the platform shaper > won't be needed. Do this lazily? Same for DWrite code. Currently, the platform shaper is used for any complex script runs we encounter. We could create them lazily but that would mean an additional check every time we want to use them. I guess that doesn't really matter though. > > + const PRUint8* data = (const PRUint8*)hb_blob_lock(mCmapBlob); > > + (hb_blob_lock(hheaBlob)); > > + reinterpret_cast<const HMetrics*>(hb_blob_lock(mHmtxBlob)); > > This is more a harfbuzz API point but table level locking sounds crazy > to me. I realize it's stubbed out at this point but if real mutexes were > actually implemented, I can easily imagine deadlocking scenarios > where two threads trying to lock the same set of tables in different > order cause deadlock. I would really prefer if our code could avoid > any pretense of thread safety here. I think Behdad is intending to change/eliminate this. For now, just think of hb_blob_lock() as being hb_blob_get_data_ptr(); it's how we get a pointer to the actual block of data inside the blob wrapper. I'll work through a bunch of these points and update the queue in a while; thanks.
(In reply to comment #120) > (In reply to comment #119) > > +++ b/gfx/thebes/public/gfxFont.h > > +typedef struct _hb_blob_t hb_blob_t; > > > > Shouldn't this be in a hb header? > > IIRC, at one point #include-ing hb-blob.h in gfxFont.h led to conflicts > elsewhere in the build (as gfxFont.h is a public header that gets used lots > more places), but I'll check whether this is still an issue. As hb_blob_t is > only needed for pointer parameters within this header, the opaque struct > declaration is sufficient. (FWIW, we do the same thing with some cairo types in > other gfx headers, for example.) It ends up causing a conflict with woff.h when compiling gfxUserFontSet.cpp on windows, because both harfbuzz and woff headers hack around the lack of <stdint.h> on windows by defining [u]int[8,16,32]_t types themselves in this case. We could resolve this by additional hacking in the woff header, I suppose, but I think the simplest solution is to avoid including harfbuzz headers in the big gfxFont.h header, so their use remains more localized.
I have pushed an update to the patch queue, with a bunch of the above review comments addressed. Currently the fixes are in an additional patch added to the queue, but I plan to merge them into the appropriate patches in the stack once things are closer to finalized.
I did some tests today with fonts whose GPOS/GSUB tables intentionally mangled. It was a bit depressing, I can basically crash on any platform with these. I tried with the harfbuzz build, but even here we crash: #0 0x94546673 in AppendOTFeaturesFromTable () #1 0x945468ac in AcquireFeatureTableForOTFont () #2 0x94519d4f in TFontFeatureTable::LoadFeatureTable () #3 0x94532d45 in TFontFeatures::TFontFeatures () #4 0x9454b99c in TBaseFont::CopyFeatures () #5 0x944febbd in TFont::SetExtras () #6 0x9452fe08 in TFont::TFont () #7 0x945198a5 in CTFontCreateWithPlatformFont () #8 0x04c21918 in gfxMacFont::gfxMacFont (this=0x1ca88020, aFontEntry=0x1ca7b0e0, aFontStyle=0x1ca5c868, aNeedsBold=0) at /builds/mozharfbuzz/gfx/thebes/src/gfxMacFont.cpp:134 #9 0x04c24e13 in MacOSFontEntry::CreateFontInstance (this=0x1ca7b0e0, aFontStyle=0x1ca5c868, aNeedsBold=0) at /builds/mozharfbuzz/gfx/thebes/src/gfxMacPlatformFontList.mm:319 #10 0x04bf7d15 in gfxFontEntry::FindOrMakeFont (this=0x1ca7b0e0, aStyle=0x1ca5c868, aNeedsBold=0) at /builds/mozharfbuzz/gfx/thebes/src/gfxFont.cpp:124 So I think if we're going to use harfbuzz for safety reasons we may have to work around relying on using CoreText, as sucky as that may be.
(In reply to comment #123) > I did some tests today with fonts whose GPOS/GSUB tables intentionally mangled. > It was a bit depressing, I can basically crash on any platform with these. This doesn't surprise me at all. > I > tried with the harfbuzz build, but even here we crash: > > #0 0x94546673 in AppendOTFeaturesFromTable () > #1 0x945468ac in AcquireFeatureTableForOTFont () > #2 0x94519d4f in TFontFeatureTable::LoadFeatureTable () > #3 0x94532d45 in TFontFeatures::TFontFeatures () > #4 0x9454b99c in TBaseFont::CopyFeatures () > #5 0x944febbd in TFont::SetExtras () > #6 0x9452fe08 in TFont::TFont () > #7 0x945198a5 in CTFontCreateWithPlatformFont () > #8 0x04c21918 in gfxMacFont::gfxMacFont (this=0x1ca88020, > aFontEntry=0x1ca7b0e0, aFontStyle=0x1ca5c868, aNeedsBold=0) at > /builds/mozharfbuzz/gfx/thebes/src/gfxMacFont.cpp:134 > #9 0x04c24e13 in MacOSFontEntry::CreateFontInstance (this=0x1ca7b0e0, > aFontStyle=0x1ca5c868, aNeedsBold=0) at > /builds/mozharfbuzz/gfx/thebes/src/gfxMacPlatformFontList.mm:319 > #10 0x04bf7d15 in gfxFontEntry::FindOrMakeFont (this=0x1ca7b0e0, > aStyle=0x1ca5c868, aNeedsBold=0) at > /builds/mozharfbuzz/gfx/thebes/src/gfxFont.cpp:124 > > So I think if we're going to use harfbuzz for safety reasons we may have to > work around relying on using CoreText, as sucky as that may be. Actually, that's not particularly sucky; I think we can rely solely on CGFont, which I'd guess is unlikely to look at the OTL tables. So then checking these will be purely up to harfbuzz - which means we can fix any issues that come up. I've got a patch locally that replaces most of our Apple Type Services API usage with CGFont (I started to look at this as ATS is deprecated in 10.6), but currently it's built on top of the metrics-reading patch in bug 532533 that you've r-minused, so it's not immediately usable. I can rework this, though, to remove all Core Text dependency from the harfbuzz path, but I don't think we should let that hold up the existing harfbuzz patches. It's a separate issue and will not affect the actual harfbuzz integration as such. How about Windows? I'm curious how the harfbuzz build fares there, as I'm guessing that GDI probably doesn't try to read the OTL tables if we don't call Uniscribe.
Blocks: 569531
(In reply to comment #124) > How about Windows? I'm curious how the harfbuzz build fares there, as I'm > guessing that GDI probably doesn't try to read the OTL tables if we don't call > Uniscribe. Tested today on Windows 7, both Uniscribe and harfbuzz appear to be fine. Rendering is goofy but nothing crashes.
(In reply to comment #119) > In gfxFontGroup::InitTextRun > > nsAutoTArray<PRPackedBool,SMALL_GLYPH_RUN> unmatchedArray; > > PRPackedBool *unmatched = NULL; (etc...) > Er, hang on, why is this code messing about aTotalLength for the > missing array? Shouldn't that be aScriptRunEnd - aScriptRunStart? Actually, the missing array is now redundant, so I've removed it. (That was a leftover workaround from the old Core Text code; with the new script- and font-run management the missing-glyph ranges no longer get passed to the shaper at all.)
The patch queue has numerous updates, including a replacement of CTFontRef-based table access with CGFontRef code. This should mean that for fonts that are only using the harfbuzz shaper, we don't need to instantiate a CTFontRef at all, and may thereby avoid vulnerabilities in the platform code. Ultimately, I'd like to eliminate use of ATSFontRef altogether, and just use CGFontRef in the Mac font entries, but that involves further work on reading font metrics activation of downloaded fonts.
(In reply to comment #127) > Ultimately, I'd like to eliminate use of ATSFontRef altogether, and just use > CGFontRef in the Mac font entries, but that involves further work on reading > font metrics activation of downloaded fonts. Not sure what "font metrics activation" is but I think I know where you're heading, getting rid of ATSFontRef's sounds fine. Webkit code uses an entirely different way of activating fonts in 10.5/10.6 without using ATS. Just to clarify, I was opposed to reading in font metrics directly to fix bug 532533 because I thought that was a big impact patch for something that's basically an Apple problem. Big changes in font metrics have a wide impact and there's lots of potential for regressions. But if problems with Apple code are more widespread then I think we have to bite the bullet and work through the regressions.
Blocks: 569770
Additional review of gfx/thebes patches, based on patch queue (http://hg.mozilla.org/users/jkew_mozilla.com/harfbuzz-mq/), revision 2845074026e8, June 1. + const PRUint32 numGroups = ReadLongAt(aBuf, OffsetNumberGroups); + NS_ENSURE_TRUE((tablelen - 16) / 12 >= numGroups, From the last review comments, MapCharToGlyphFormat12 needs comments and code above should use sizeof(). +inline PRBool +operator<(const gfxFontFeature& a, const gfxFontFeature& b) +{ + return (a.mTag < b.mTag) || ((a.mTag == b.mTag) && (a.mValue < b.mValue)); +} + Why is this needed, sorting? Looks to be unused. + // custom opentype feature settings + nsTArray<gfxFontFeature> featureSettings; + + // Language system tag, to override document language; + // this is an OpenType "language system" tag represented as a 32-bit integer + // (see http://www.microsoft.com/typography/otspec/languagetags.htm). + PRUint32 languageOverride; Neither of these properties are part of the hash but I guess that's not so terrible. + nsAutoTArray<hb_feature_t,20> features; + if (disableLigatures) { + hb_feature_t ligaOff = { HB_TAG('l','i','g','a'), 0, 0, -1 }; + hb_feature_t cligOff = { HB_TAG('c','l','i','g'), 0, 0, -1 }; + features.AppendElement(ligaOff); + features.AppendElement(cligOff); + } Default features are defined in harfbuzz? At least a comment stating that liga and clig are assumed as defaults would be nice. + gfxFloat mAdjustedSize; This field is now present in both gfxFont and gfxDWriteFont. + PR_LOG(gFontSelection, PR_LOG_DEBUG,\ + ("InitTextRun %p fontgroup %p (%s) lang: %s len %d features: %s" + "TEXTRUN \"%s\" ENDTEXTRUN\n", Add a space after "features: %s". Separate log for text runs? - // prefer to get xHeight from ATS metrics (unhinted) rather than Core Text (hinted), - // see bug 429605. if (atsMetrics.xHeight > 0) mMetrics.xHeight = atsMetrics.xHeight * size; else - mMetrics.xHeight = GetCharHeight(aCTFont, 'x'); + mMetrics.xHeight = ::CGFontGetXHeight(mCGFont) * size / upem; So CGFontGetXHeight returns unhinted values? + pref->AddObserver("gfx.font_rendering.", observer, PR_FALSE); + + PRInt32 level; + nsresult rv = pref->GetIntPref(PREF_HARFBUZZ_LEVEL, &level); + if (NS_SUCCEEDED(rv)) { + SetUseHarfBuzzLevel(level); + } Current trunk code observes "gfx.font_rendering.", so you just need to add code to gfxPlatform::FontsPrefsChanged. + nsCOMPtr<nsIPrefBranch2> pref = do_GetService(NS_PREFSERVICE_CONTRACTID); + if (pref) { + PRInt32 level; + nsresult rv = pref->GetIntPref(PREF_HARFBUZZ_LEVEL, &level); + if (NS_SUCCEEDED(rv)) { + gfxPlatformFontList::PlatformFontList()->SetUseHarfBuzzLevel(level); + gfxTextRunWordCache::Flush(); + } + } + gfxFontCache::GetCache()->AgeAllGenerations(); I bumped into this working on bug 504698, simply calling AgeAllGenerations() on the font cache only removes expired fonts, it doesn't clear out the existing hashtable so as long as a reference exists in a back cache, the old font will continue to get picked up. I added a Flush() method to both clear out the hashtable and to call AgeAllGenerations(). Patch is waiting your review. +PRInt32 +gfxUnicodeProperties::ScriptShapingLevel(PRInt32 aScriptCode) +{ + switch (aScriptCode) { + case HB_SCRIPT_LATIN: + case HB_SCRIPT_CYRILLIC: + case HB_SCRIPT_HAN: + case HB_SCRIPT_HIRAGANA: + case HB_SCRIPT_KATAKANA: + case HB_SCRIPT_COMMON: + case HB_SCRIPT_INHERITED: + case HB_SCRIPT_UNKNOWN: + return 1; /* level 1 scripts that can use the "generic" shaper */ + } + + return 2; /* all others are considered level 2 */ +} Might be nice to have an enum here rather than 1 and 2 but maybe you have plans for this? > +++ b/xpcom/glue/nsTArray.h > @@ -902,16 +902,70 @@ class nsTArray : public nsTArray_base { > > // A variation on the Sort method defined above that assumes that > // 'operator<' is defined for elem_type. > void Sort() { > Sort(nsDefaultComparator<elem_type, elem_type>()); > } > > // > + // Comparison > + // > + > + // This method compares two arrays, element by element, using the LessThan > + // method defined on the given Comparator object. It returns -1, 0 or 1, > + // according to whether this array is considered less than, equal to, or > + // greater than the other. > + template<class Comparator> > + PRInt32 Compare(const self_type& other, const Comparator& comp) const { Hmmm, what's this? Just some hacking code? In gfxHarfBuzzShaper::GetGlyphMetrics > if (mUseHintedWidths) { > metrics->x_advance = mFont->GetHintedGlyphWidth(aContext, glyph); > return; > } No need to zero the other metrics fields? In gfxHarfBuzzShaper::SetGlyphsFromRun + gfxTextRun::CompressedGlyph g; + + static const PRInt32 NO_GLYPH = -1; + nsAutoTArray<PRInt32,SMALL_GLYPH_RUN> charToGlyphArray; + if (!charToGlyphArray.SetLength(aRunLength)) + return NS_ERROR_OUT_OF_MEMORY; + + PRInt32 *charToGlyph = charToGlyphArray.Elements(); + for (PRUint32 offset = 0; offset < aRunLength; ++offset) { + charToGlyph[offset] = NO_GLYPH; + } + + for (PRInt32 g = 0; g < numGlyphs; ++g) { + PRInt32 loc = ginfo[g].cluster; + if (loc < aRunLength) { + charToGlyph[loc] = g; + } + } You're reusing the variable name 'g' here. In gfxHarfBuzzShaper::GetGlyphPositions > while (glyphInfo < limit) { > *outPos = curPos; > if (posInfo->x_offset) { > outPos->x += NS_roundf(hb2appUnits * posInfo->x_offset); > } > if (posInfo->y_offset) { > outPos->y += NS_roundf(hb2appUnits * posInfo->y_offset); > } > curPos.x += NS_roundf(hb2appUnits * posInfo->x_advance); > if (posInfo->y_advance) { > curPos.y += NS_roundf(hb2appUnits * posInfo->y_advance); > } > ++posInfo; > ++glyphInfo; > ++outPos; > } I don't think it makes much sense to be testing the offset/advances and then adding them, I doubt you're optimizing anything and it makes the code harder to read. You also don't need to involve glyphInfo in this loop, you can make the limit (outPos + hb_buffer_get_length(aBuffer)). This code points down to: limit = outPos + hb_buffer_get_length(aBuffer); while (outPos < limit) { outPos->x = curPos.x + NS_roundf(hb2appUnits * posInfo->x_offset); outPos->y = curPos.y + NS_roundf(hb2appUnits * posInfo->y_offset); curPos.x += NS_roundf(hb2appUnits * posInfo->x_advance); curPos.y += NS_roundf(hb2appUnits * posInfo->y_advance); ++posInfo; ++outPos; } For the other half of the loop: > } else { > while (glyphInfo < limit) { > hb_glyph_metrics_t metrics; > GetGlyphMetrics(aContext, glyphInfo->codepoint, &metrics); > curPos.x += NS_roundf(hb2appUnits * metrics.x_advance); > ++glyphInfo; > ++outPos; > } > } In this case the code is just calculating the total width? If so, don't need to be incrementing outPos. This code is going to call GetGlyphMetrics <buffer-length> times. Why not this? gfxFloat conv = mFont->FUnitsToDevUnitsFactor(); while (glyphInfo < limit) { glyph = (glyphInfo->codepoint < mNumLongMetrics) ? glyphInfo->codepoint : mNumLongMetrics - 1; curPos.x += FloatToFixed(conv * PRUint16(hmtx->metrics[glyph].advanceWidth)); ++glyphInfo; }
Attachment #445110 - Flags: review?(jdaggett) → review-
(In reply to comment #129) > From the last review comments, MapCharToGlyphFormat12 needs comments and code > above should use sizeof(). Oops, I'd done that but it was dropped from the queue. Restored it. > +inline PRBool > +operator<(const gfxFontFeature& a, const gfxFontFeature& b) > +{ > + return (a.mTag < b.mTag) || ((a.mTag == b.mTag) && (a.mValue < b.mValue)); > +} > + > > Why is this needed, sorting? Looks to be unused. Currently, the -moz-font-feature-settings patch (in bug 511339) puts the features into a sorted array (so that arrays can easily be compared for equality), so this exists to support the InsertElementSorted method. That implementation may change, though. > - mMetrics.xHeight = GetCharHeight(aCTFont, 'x'); > + mMetrics.xHeight = ::CGFontGetXHeight(mCGFont) * size / upem; > > So CGFontGetXHeight returns unhinted values? Yes, CGFonts are size-independent and return everything in font space (hence the multiplication by size/upem here). > +PRInt32 > +gfxUnicodeProperties::ScriptShapingLevel(PRInt32 aScriptCode) > +{ > + switch (aScriptCode) { > + case HB_SCRIPT_LATIN: > + case HB_SCRIPT_CYRILLIC: > + case HB_SCRIPT_HAN: > + case HB_SCRIPT_HIRAGANA: > + case HB_SCRIPT_KATAKANA: > + case HB_SCRIPT_COMMON: > + case HB_SCRIPT_INHERITED: > + case HB_SCRIPT_UNKNOWN: > + return 1; /* level 1 scripts that can use the "generic" shaper */ > + } > + > + return 2; /* all others are considered level 2 */ > +} > > Might be nice to have an enum here rather than 1 and 2 but maybe you have plans > for this? Yeah, this is a temporary hack; e.g. if we get Arabic shaping into HB soon, I'd probably make that "level 2" and the indic stuff, etc, would be "level 3". The idea is to make it possible to try harfbuzz on varying levels of script complexity, but the actual numbers are purely ad hoc based on what's useful for current experimentation. Once harfbuzz has more script support we may end up discarding this altogether, or exposing script-specific shaper selection in some other way. If we keep some such functionality in the long term, we might want to move it to a properties file and expose it via about:config. > > +++ b/xpcom/glue/nsTArray.h > Hmmm, what's this? Just some hacking code? No, it's part of the code from bug 511339 to store features in sorted arrays. But in the light of dbaron's comments there, I expect that will be changing... > In gfxHarfBuzzShaper::GetGlyphMetrics > > > if (mUseHintedWidths) { > > metrics->x_advance = mFont->GetHintedGlyphWidth(aContext, glyph); > > return; > > } > > No need to zero the other metrics fields? Not currently, as HB doesn't use them. This will change, but before then I'm expecting Behdad to split the callbacks into a GetGlyphMetrics() one for all the values, and a simplified (cheaper) GetGlyphAdvance() for the common case where that's the only value we need. > In gfxHarfBuzzShaper::GetGlyphPositions .... > I don't think it makes much sense to be testing the offset/advances > and then adding them, I doubt you're optimizing anything and it makes > the code harder to read. Profiling showed that the integer/floating-point conversions are the most expensive part of this loop, and so testing for the (very common) zero cases gives a win because it avoids most of those conversions altogether. I've simplified this a bit, but kept the zero-checks (in a modified form) as they do help here. > For the other half of the loop: This is actually obsolete now, and would never be used; it was left from when I was using an earlier iteration of the low-level harfbuzz code. Removed.
Part 5 (early script itemization), refreshed.
Attachment #441011 - Attachment is obsolete: true
Attachment #449251 - Flags: review?
Attachment #441011 - Flags: review?(jdaggett)
Attachment #449251 - Flags: review? → review?(jdaggett)
Part 6, the font table access and caching, including platform-specific implementations for CoreGraphics and DWrite backends.
Attachment #441453 - Attachment is obsolete: true
Attachment #441454 - Attachment is obsolete: true
Attachment #441455 - Attachment is obsolete: true
Attachment #449252 - Flags: review?(jdaggett)
Attachment #441453 - Flags: review?(jdaggett)
Attachment #441454 - Flags: review?(jdaggett)
Attachment #441455 - Flags: review?(jdaggett)
Attachment #441758 - Attachment is obsolete: true
Attachment #445110 - Attachment is obsolete: true
Attachment #449253 - Flags: review?(jdaggett)
Attachment #441758 - Flags: review?(jdaggett)
Latest harfbuzz-ng code (f2a1b411b1d48c3dfac0df8e78c848d9aa3bb047).
Attachment #441730 - Attachment is obsolete: true
Attachment #449254 - Flags: review?(jdaggett)
Attachment #441730 - Flags: review?(jdaggett)
Attachment #449251 - Flags: review?(jdaggett) → review+
Attachment #449252 - Flags: review?(jdaggett) → review+
Attachment #449253 - Flags: review?(jdaggett) → review+
Comment on attachment 449253 [details] [diff] [review] part 7 - gfxHarfBuzzShaper interface to the harfbuzz library This will need to be refreshed for changes in bug 504698 (which used some of the code in this patch), otherwise looks good.
Attachment #449251 - Flags: superreview?(roc)
Comment on attachment 449251 [details] [diff] [review] part 5 - break text into script runs before font-matching and shaping Roc, requesting sr on this patch as it could be considered an architectural change in text layout - it breaks text runs up according to Unicode scripts *before* doing font-matching and shaping, rather than leaving this to the individual shapers. This is needed in order for us to be able to choose different shaping back-ends on the fly according to the script of the text being rendered. (It may also be beneficial in the future if we want to update our font preference system to be based on script rather than charset, which I think we should do eventually.) In testing locally and on tryserver, I have not seen any regression from this change - in fact, it has some benefits even for existing shapers. On WinXP, Uniscribe performance appears to be slightly better; and on OS X 10.5, it works around a Core Text issue (bug 543353).
Attachment #449251 - Flags: superreview?(roc) → superreview+
Attachment #449254 - Attachment is patch: true
Attachment #449254 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 449253 [details] [diff] [review] part 7 - gfxHarfBuzzShaper interface to the harfbuzz library + /** + * Whether to use the harfbuzz shaper (depending on script "complexity level") + */ + PRInt8 UseHarfBuzzLevel(); The possible results here should be documented. + mozilla::AutoSwap_PRUint32 tableVersionNumber; Avoid using mozilla:: namespace qualifiers where possible. Here you can just add "using namespace mozilla" to the top of the file after the #includes.
Attachment #449253 - Flags: superreview+
(In reply to comment #137) > (From update of attachment 449253 [details] [diff] [review]) > + /** > + * Whether to use the harfbuzz shaper (depending on script "complexity > level") > + */ > + PRInt8 UseHarfBuzzLevel(); > > The possible results here should be documented. Added comments here and in gfxUnicodeProperties.cpp to explain what's going on. (This is temporary, pending more complete shaping implementations within harfbuzz - I don't see it as a long-term feature in its present form, it's primarily for testing purposes.) > + mozilla::AutoSwap_PRUint32 tableVersionNumber; > > Avoid using mozilla:: namespace qualifiers where possible. Here you can just > add "using namespace mozilla" to the top of the file after the #includes. Done. Refreshed patches are in my mercurial queue.
Attachment #441747 - Flags: superreview?(roc)
Attachment #441747 - Flags: superreview?(roc) → superreview+
Attachment #449254 - Flags: superreview?(jmuizelaar)
Attachment #449254 - Flags: review?(jdaggett)
Attachment #449254 - Flags: review+
Comment on attachment 449254 [details] [diff] [review] part 1 - harfbuzz code from upstream I've gone through the code once more and while I have some concerns, I think those can be addressed in follow-on bugs. I've logged what I think are outstanding issues. The biggest one I think is the fact that fonts are sanitized per-use as opposed to once per app run. This is logged as bug xxx. The code in hb_buffer is still not completely allocation failure tolerant but since we're Mozilla code is using "infallible" malloc, this isn't an issue. It might be wise to enforce an upper limit on the size of strings so that a long string (e.g. >32K) can't cause a memory squeeze to cause the browser to exit. One concern I still have is how harfbuzz handles funky fonts. All the callbacks provide no way to deal with "this font is fucked" type errors. Obviously these errors don't usually occur but with downloadable fonts and things like Star Wars fonts around, anything is possible. > hb_codepoint_t > hb_font_get_glyph (hb_font_t *font, hb_face_t *face, > hb_codepoint_t unicode, hb_codepoint_t variation_selector); > > hb_bool_t > hb_font_get_contour_point (hb_font_t *font, hb_face_t *face, > unsigned int point_index, > hb_codepoint_t glyph, hb_position_t *x, hb_position_t *y); > > void > hb_font_get_glyph_metrics (hb_font_t *font, hb_face_t *face, > hb_codepoint_t glyph, hb_glyph_metrics_t *metrics); > > hb_position_t > hb_font_get_kerning (hb_font_t *font, hb_face_t *face, > hb_codepoint_t first_glyph, hb_codepoint_t second_glyph); One other minor nit, it would be nice if there were comments documenting initialization rules for callback routines. For example, hb_font_get_glyph_metrics clears the metrics struct passed into the callback, so some of the code in gfxHarfBuzzShaper::GetGlyphMetrics is unneeded: > metrics->x_advance = metrics->y_advance = 0; > metrics->x_offset = metrics->y_offset = 0; > metrics->width = metrics->height = 0;
Sanitizing fonts once per app run is bug 570915.
Attachment #449254 - Flags: superreview?(jmuizelaar) → superreview?(roc)
Comment on attachment 449254 [details] [diff] [review] part 1 - harfbuzz code from upstream I don't think this really requires superreview...
Attachment #449254 - Flags: superreview?(roc) → superreview+
(In reply to comment #139) > One other minor nit, it would be nice if there were comments > documenting initialization rules for callback routines. For example, > hb_font_get_glyph_metrics clears the metrics struct passed into the > callback, so some of the code in gfxHarfBuzzShaper::GetGlyphMetrics is > unneeded: > > > metrics->x_advance = metrics->y_advance = 0; > > metrics->x_offset = metrics->y_offset = 0; > > metrics->width = metrics->height = 0; Good point. I've removed these redundant lines from gfxHarfBuzzShaper. Agree that it would be good to have more extensive documentation of the harfbuzz APIs, but as long as it's a work in progress...
Blocks: 511339
Does this mean this bug is FIXED (also: AWESOME)? Seems like tracking follow-ons in other bugs will be less confusing when the time comes to deal with regressions or any such unlikely pain.
Now that these patches have landed, can I just set gfx.font_rendering.harfbuzz.level to 1 or 2 in any build? Or do I have to pass a special option to configure?
Attached patch mingw fix (deleted) — Splinter Review
Part 7 broke compilation on mingw, where wchar_t != unsigned short. See bug 508905 for details. The attached patch fixes the problem.
Attachment #450850 - Flags: review?(jdaggett)
Blocks: 508905, 570342
I'm worried that this is really a more fundamental problem with the mingw build. The parameter to hb_buffer_add_utf16 is declared as "const uint16_t *", and the local variable (function parameter) in gfxHarfBuzzShaper::InitTextRun is declared as "const PRUnichar *", which is supposed to be a 16-bit UTF16 code unit. Looking at http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/prtypes.h#453, I see that PRUnichar is typedef'd to wchar_t on WIN32; if mingw has a 32-bit wchar_t, then I think this is wrong and should be changed. If PRUnichar is a 32-bit type, then simply casting pointers will give WRONG results here.
(In reply to comment #145) > Now that these patches have landed, can I just set > gfx.font_rendering.harfbuzz.level to 1 or 2 in any build? Yes, but only on Mac or Windows; it's not hooked up on Linux yet (that's bug 569770.) Setting gfx.font_rendering.harfbuzz.level will use harfbuzz for "simple" scripts (Latin, Cyrillic, CJK, etc), and setting it to 2 will use it for everything, even though we know that scripts requiring complex shaping (i.e. Arabic, Indic, etc) will not be rendered correctly at present. (As we get more complex-script support in harfbuzz, this pref setting may well evolve to support more distinctions, depending what seems useful for testing purposes.)
(In reply to comment #144) > Does this mean this bug is FIXED (also: AWESOME)? Seems like tracking > follow-ons in other bugs will be less confusing when the time comes to deal > with regressions or any such unlikely pain. Yes, I think we can mark this FIXED; there's still work to do, but we have separate bugs for the Linux implementation (bug 569770) and for turning it on by default (bug 569531), as well as for exposing font feature control to CSS (bug 511339).
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
(In reply to comment #147) > Looking at > http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/prtypes.h#453, > I see that PRUnichar is typedef'd to wchar_t on WIN32; if mingw has a 32-bit > wchar_t, then I think this is wrong and should be changed. If PRUnichar is a > 32-bit type, then simply casting pointers will give WRONG results here. It's not a matter of 32-bit wchar_t. mingw has 16-bit wchar_t and its values are compatible. The problem is that it's different builtin type, so an explicit cast is needed. It would be needed on MSC as well if it didn't use -Zc:wchar_t-. There were more similar problems in the past, see: http://hg.mozilla.org/mozilla-central/rev/c97c0621f626 for an example.
(In reply to comment #150) > (In reply to comment #147) > > Looking at > > http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/prtypes.h#453, > > I see that PRUnichar is typedef'd to wchar_t on WIN32; if mingw has a 32-bit > > wchar_t, then I think this is wrong and should be changed. If PRUnichar is a > > 32-bit type, then simply casting pointers will give WRONG results here. > > It's not a matter of 32-bit wchar_t. mingw has 16-bit wchar_t and its values > are compatible. The problem is that it's different builtin type, so an explicit > cast is needed. Oh yes, I've seen that previously, but was forgetting the details and jumped to the wrong conclusion - sorry!
Blocks: 571652
This introduced a bug/compile error in the FT2 backend: in FontEntry::ReadCMAP: 1.29 - return gfxFontUtils::ReadCMAP(buf, len, mCharacterMap, mUVSOffset, 1.30 - unicodeFont, symbolFont); 1.31 + nsresult rv = gfxFontUtils::ReadCMAP(buf, len, mCharacterMap, mUVSOffset, 1.32 + unicodeFont, symbolFont); 1.33 + mHasCmapTable = NS_SUCCEEDED(rv); 1.34 } This is missing a "return rv;" here, otherwise the function returns no value.
Attachment #450997 - Flags: review?(vladimir)
Comment on attachment 450997 [details] [diff] [review] patch to fix up FT2 ReadCMAP omission Pushed to trunk http://hg.mozilla.org/mozilla-central/rev/84c1cc66b411
Attachment #450997 - Flags: review?(vladimir) → review+
Comment on attachment 450850 [details] [diff] [review] mingw fix Better for Jonathan to review this I think.
Attachment #450850 - Flags: review?(jdaggett) → review?(jfkthame)
Comment on attachment 450850 [details] [diff] [review] mingw fix This is fine, to fix the issue here. (Do we have a static assertion anywhere to verify that PRUnichar is in fact a 16-bit type? I'm sure this isn't the only place where our code makes this assumption, but I don't recall seeing anywhere we actually check it at build time. Perhaps we should...)
Attachment #450850 - Flags: review?(jfkthame) → review+
(In reply to comment #156) > (From update of attachment 450850 [details] [diff] [review]) > This is fine, to fix the issue here. Thanks for review. > (Do we have a static assertion anywhere to verify that PRUnichar is in fact a > 16-bit type? I'm sure this isn't the only place where our code makes this > assumption, but I don't recall seeing anywhere we actually check it at build > time. Perhaps we should...) I think that the assumption PRUnichar = utf16 char = 16 bits is safe enough that we don't need such assertions.
Keywords: checkin-needed
Whiteboard: checkin: comment 146
(In reply to comment #156) > (From update of attachment 450850 [details] [diff] [review]) > This is fine, to fix the issue here. > > (Do we have a static assertion anywhere to verify that PRUnichar is in fact a > 16-bit type? I'm sure this isn't the only place where our code makes this > assumption, but I don't recall seeing anywhere we actually check it at build > time. Perhaps we should...) We do, when HAVE_CPP_2BYTE_WCHAR_T: http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsStringAPI.h#1045 I believe wchar_t is not used when that's not set (a short is used instead), but my experience has mostly been non-libxul things.
Target Milestone: --- → mozilla1.9.3a6
Version: unspecified → Trunk
Depends on: 572620
Keywords: checkin-needed
Whiteboard: checkin: comment 146
Other bugs cover the doc-facing issues here.
Keywords: dev-doc-needed
Depends on: 609604
Blocks: 463413
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: