Closed Bug 1487212 Opened 6 years ago Closed 5 years ago

Can we share hyphenation data across processes?

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla72

Tracking Flags:

Tracking

Status

firefox72

---

fixed

People

(Reporter: bzbarsky, Assigned: jfkthame)

References

(Blocks 1 open bug)

Details

(Whiteboard: [overhead:655K])

Attachments

(1 file)

Bug 1487212 - When hyphenation resources are compressed in omnijar, load them into shared memory and share among all content processes. r=heycam 5 years ago Jonathan Kew [:jfkthame] (deleted), text/x-phabricator-request		Details

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

6 years ago

Looking at a DMD report for a content process, I have this: Unreported { 1 block in heap block record 31 of 6,988 655,360 bytes (655,360 requested / 0 slop) 0.14% of the heap (25.97% cumulative) 0.37% of unreported (70.56% cumulative) Allocated at { #01: replace_realloc(void*, unsigned long) (DMD.cpp:1307, in libmozglue.dylib) #02: moz_xrealloc (mozalloc.cpp:94, in libmozglue.dylib) #03: hnj_get_state (hyphen.c:194, in XUL) #04: hnj_hyphen_load_line (hyphen.c:370, in XUL) #05: hnj_hyphen_load_file (hyphen.c:441, in XUL) #06: hnj_hyphen_load (hyphen.c:384, in XUL) #07: nsHyphenationManager::GetHyphenator(nsAtom*) (nsHyphenator.cpp:22, in XUL) #08: nsHyphenationManager::GetHyphenator(nsAtom*) (nsHyphenationManager.cpp:119, in XUL) } } If I read that right, we fopen and read the hyphenation file and end up allocating a 655,360-byte buffer for the hyphenation states. Each state looks like 40 bytes on 64-bit (though seems like we could get it down to 32 bytes with better packing), so we have 16384 HyphenStates in that array... or at least more than 8192. Anyway, this seems like data that might be nice to share across processes if possible.

Xidorn Quan [:xidorn] UTC+11

Comment 1

•

6 years ago

We may have to allocate the data in a shared memory... I don't know how feasible is it. Or we can eagerly load it before forking in a page supposed to be shared?

Component: Layout: Text and Fonts → Internationalization

Kris Maglione [:kmag]

Comment 2

•

6 years ago

(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #1) > We may have to allocate the data in a shared memory... I don't know how > feasible is it. Or we can eagerly load it before forking in a page supposed > to be shared? The only place we're sure we'll be able to fork content processes at all in the future is desktop Linux. Possibly also OS-X. Definitely not Windows. And we don't do it anywhere now. So we'll need to explicitly allocate it in shared memory.

Eric Rahm [:erahm]

Updated

•

6 years ago

Whiteboard: [overhead:655K]

Jonathan Kew [:jfkthame]

Assignee

Comment 3

•

6 years ago

Using shared memory here seems like it'll be difficult, unless we're prepared to fork libhyphen and re-write its runtime data structures to not rely on a bunch of structs that have pointers to each other. Note that we don't load hyphenation data for any given language until a page tries to use it (i.e. we encounter content with hyphens:auto and lang=...), so the memory usage here is highly dependent on the site that's loaded. In the extreme case where a site applies hyphenation to a number of different languages, it might be considerably higher; but in the (common?) case where hyphens:auto isn't used, this shouldn't show up at all.

Jonathan Kew [:jfkthame]

Assignee

Comment 4

•

6 years ago

FWIW, the total memory used by hyphenation data (when loaded) will be considerably more than just the large block used for the array of HyphenStates: each state has pointers to separately-malloc'd match and repl strings, and to an array of HyphenTrans records (again, separately malloc'd). So there are potentially thousands more small malloc'd objects (and associated slop and overhead!) hanging off the array of states. This could certainly be done in a more memory-efficient way (and potentially in a cross-process-sharable way), but unfortunately I think this would be a pretty intrusive modification of the libhyphen code from upstream. :\

Xidorn Quan [:xidorn] UTC+11

Comment 5

•

6 years ago

Another approach may be putting the hyphenation into the font server if we are going to have one? I'm not sure how that would work, though.

Makoto Kato [:m_kato]

Updated

•

6 years ago

Priority: -- → P3

Nathan Froyd [:froydnj]

Comment 6

•

6 years ago

One decent start would be reducing the space required for a state index: https://searchfox.org/mozilla-central/source/intl/hyphenation/hyphen/hyphen.h#88-89 https://searchfox.org/mozilla-central/source/intl/hyphenation/hyphen/hyphen.h#95 Looking at the hyphenation files we have, the most states of any one file is 113772 (intl/locales/hu/hyphenation/hyph_hu.dic), so we only need 17 bits to store a state. (en-US has 15618 states, FWIW.) Storing a state number in 24 bits would enable _HyphenState in the first link above to be represented in 20/32 bytes on a 32/64-bit system, down from 24/40. That would provide the easy win bz mentions. We'd also win by shrinking _HyphenTrans down to 4/4 bytes (down from 8/8), which might provide a little improvement. There are just as many _HyphenTrans objects floating around as _HyphenState objects, though they're not contiguously allocated, as jfkthame notes. I don't know if upstream would accept the changes necessary to shrink those members or not. But this doesn't address cross-process sharing issues...

Jonathan Kew [:jfkthame]

Assignee

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1567437

Jonathan Kew [:jfkthame]

Assignee

Comment 7

•

5 years ago

Bug 1567437 comment 2 gives examples of the total memory footprint of loading hyphenation patterns.

Jonathan Kew [:jfkthame]

Assignee

Updated

•

5 years ago

Depends on: 1590167

Jonathan Kew [:jfkthame]

Assignee

Comment 8

•

5 years ago

Attached file Bug 1487212 - When hyphenation resources are compressed in omnijar, load them into shared memory and share among all content processes. r=heycam (deleted) — Details

Jonathan Kew [:jfkthame]

Assignee

Comment 9

•

5 years ago

Bug 1590167 makes this no longer an issue for desktop Firefox, as the hyphenation resources are stored uncompressed in the omnijar (which is mapped into memory already) and the new mapped_hyph library uses the resources directly from there.

On Android (GeckoView), the omnijar is compressed, so in order to use a hyphenation table it must first be uncompressed. The RAM footprint of this is much smaller with mapped_hyph than it was with libhyphen, but may be as much as a megabyte for the largest hyphenation resources. Therefore, using shared memory to share a single uncompressed copy across all processes will still be beneficial in a multi-content-process world.

Pulsebot

Comment 10

•

5 years ago

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d519e5920a23 When hyphenation resources are compressed in omnijar, load them into shared memory and share among all content processes. r=heycam,froydnj

Arthur Iakab [arthur_iakab]

Comment 11

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d519e5920a23

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox72: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla72

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Assignee: nobody → jfkthame

Alice0775 White

Updated

•

3 years ago

Regressions: 1751840

You need to log in before you can comment on or make changes to this bug.