Open Bug 67715 Opened 24 years ago Updated 2 years ago

layout of full-justified text is sub-optimal

Categories

(Core :: Layout: Text and Fonts, enhancement, P5)

enhancement

Tracking

()

Future

People

(Reporter: jmcbray, Unassigned)

References

()

Details

(Keywords: helpwanted)

(full-) justified text in Mozilla is not as good as it could be -- there are often wide spaces within a line, even for reasonably wide columns. Is there some fundamental reason that Mozilla can't use a multiline H&J (hyphenation and justification) algorithm, like the one in TeX?
Apart from the fact that hyphenation is very language-specific while Mozilla does not do (and cannot easily do) language detection on the page source? (Note that this is not the same as charset detection since different languages often use the same charset.) This would be very nice to have, though difficult. setting status to new.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Dang, I thought I was going to be the one who was going to file this RFE eventually. :-) We can do hyphenation where the author has included the ­ (soft hyphen) character -- but smart justification can be done independently of that. So when a justified block element has completely loaded, something like the following should occur: finishedJustifying = false; while (not finishedJustifying) { finishedJustifying = true; worstLine = number of line in block element which has the widest spaces; if ( (last word, or soft-hyphenated part-word, of (line(worstLine - 1)) can fit on line(worstLine)) and (moving this word would not make line(worstLine - 1) worse than worstLine is currently) /* though ideally you'd probably want to recurse right back to the first line of the block element */ ) { shift the last word of line(worstLine - 1) to the beginning of worstLine; finishedJustifying = false; } } That's probably not the perfect algorithm, but it would be a start. (The perfect algorithm probably takes time which increases exponentially with the number of lines in the block element, or something horrid like that.) Now Ian is going to come along and tell us all this is impossible because it can't be done with CSS, and things which can't be done with CSS are sinful.
Reassigning to Buster.
Assignee: karnaze → buster
I'd love to see something like this, but we (Netscape) don't have the resources for it. Any takers?
Keywords: helpwanted
Priority: -- → P5
Target Milestone: --- → Future
I believe scc has an even better algorithm, and since he has experience with word processors, he'd be the guy to talk about this. mpt: It isn't against CSS, you can do whatever text justification algorithm you want. However, I would have thought that it would be better to have a suboptimal text justification algorithm to one that causes jumpiness while the page is loading or is otherwise incrementally reflowed. (Note: Moving the mouse over the page can cause an incremental reflow.) In other words, I would have thought you'd be the one against this, not me...
It may be that smart justification should only be used when printing -- it would help counter the `printed HTML sucks' attitude which causes so much useful information which might otherwise be on the Web to be locked up as PDF or PS files instead of HTML (on the grounds that PDF or PS looks better when printed). (Note: If a mouseover can cause an incremental reflow, the W3C should be thoroughly ashamed of itself.)
TeX's layout and hyphenation algorithms probably produce the best looking output, however, it's a great deal of work to implement, and the entire paragraph can re-wrap with the addition of a single word at the end (e.g., to better balance white-space). I'll happily to point you to the appropriate doc, if you so desire.
please set the url to it, even if we don't implement it, i'd like to read it.
Heh. Should have guessed Hermann Zapf would be behind this somehow.
I have text that simply doesn't justify if the browser window is set to some sizes -- it reverts to left-aligned... not nice at all. Using CSS on a plain paragraph not nested inside anything at all, although there aree <br>'s within the paragraph. This is on an FAQ page and most of the other paragraphs with the same style behave just fine. Bizzare. Grant
Grant, that should be filed as a separate bug. This RFE is something else entirely.
Build reassigning Buster's bugs to Marc.
Assignee: buster → attinasi
I've looked a bit at descriptions of the TeX algorithm and read over the CSS specs, and here's a brief review of the subject: CSS appears to separate hyphenation from justification. CSS 2.1 describes justification as solely a matter of adjusting line-spacing; there's really not much of anything in there about hyphenation, but the current CSS 3 Text CR provides a "word-break-inside" property which must be explicitly set to "hyphenate" to invoke language-specific hyphenation from UAs. For the time being, then, hyphenation is out of the picture, which is fine given the likely peformance and debugging costs of coming up with hyphenation dictionaries. Note also that CSS limits stretching/compression of interlinear spaces to text formatted with text-align: justify, so at least initially, the introduction of a more sophisticated justification algorithm is likely to have minimal impact on real Web pages. The TeX algorithm operates on a per-paragraph basis. It assigns each line a degree of "badness" depending on how much it needs to be stretched/compressed at spaces to fit the desired width; this also accomodates control over how much spaces can be stretched in a given line, e.g., not at all when CSS "word-spacing" is set. A penalty is also attached to the different points in the line where it can be broken (spaces, hyphens, soft hyphens, etc.) The TeX algorithm lays out the paragraph so as to minimize the least-squares sum of line-breaking penalties and stretching badness. This leads to two major issues, namely, incremental reflow at the paragraph level and the effects of triggering incremental reflow after initial layout (e.g., by :hover, as mentioned above). How "jumpy" we'd look if we did incremental reflow paragraph-by-paragraph rather than line-by-line is a question the layout gurus will have to answer. The second issue is a little more tricky: if CSS sets a larger font-size on :hover, the line will expand (although technically apply styles that would cause reflow on :hover is optional per CSS), and the lines below it in the block may be reflowed. Using the TeX algorithm, it appears that the entire contents of the block would have to be reflowed, instead of just the lines below. So using TeX algorithms (sans dictionary-based hyphenation) is not de facto impractical, but the tradeoffs (block-based reflow, probable perf costs, some additional reflow on :hover) would have to be carefully weighed against the gains (improved readability and appearance of justified text). As the original URL has rotted, I've replaced it with a link to Han The Tranh's thesis, which describes the Knuth hyphenation & justification algorithm as a prelude to discussion of the Zapf techniques, which build on the H&J algorithm to achieve a more uniform text density and other desirable typographic qualities.
Note comment 6 (about printing); :hover is a non-issue there, as are incremental reflows and so forth. And printing is really where this would come in handy.
Depends on: 253317
(In reply to comment #1) > Apart from the fact that hyphenation is very language-specific while Mozilla > does not do (and cannot easily do) language detection on the page source? Yes you could, by using the lang attribute. Hyphenation is even stated as an application of this attribute, in the HTML4 specification: http://www.w3.org/TR/html4/struct/dirlang.html#h-8.1 For XHTML xml:lang should take precedence according to http://www.w3.org/TR/xhtml1/#C_7 (In reply to comment #13) > but the current CSS 3 Text CR > provides a "word-break-inside" property which must be explicitly set to > "hyphenate" to invoke language-specific hyphenation from UAs. I can't find a "word-break-inside" property, but instead a "hyphenate" property that can be set to "auto": http://www.w3.org/TR/css3-text/#hyphenate Note that HTML4 also mentions explicit break points in http://www.w3.org/TR/html4/struct/text.html#hyphenation As they affect justified layout as well, I'd make this depend on bug #9101 as well---if I had the right to change this setting. Would it be possible to use the OpenOffice Hyphenator, or one of its predecessors? http://lingucomponent.openoffice.org/hyphenator.html
Assignee: attinasi → nobody
QA Contact: chrispetersen → layout
Component: Layout → Layout: Text
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.