Closed
Bug 809020
Opened 12 years ago
Closed 8 years ago
Hyphenated text in narrow box sometimes fails to wrap and exceeds bounds
Categories
(Core :: Layout: Text and Fonts, defect)
Core
Layout: Text and Fonts
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: u444543, Assigned: jfkthame)
References
(Blocks 1 open bug, )
Details
Attachments
(7 files, 1 obsolete file)
(deleted),
text/html
|
Details | |
(deleted),
text/html
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
patch
|
masayuki
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
masayuki
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
masayuki
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0
Build ID: 20121024073032
Steps to reproduce:
Browsed the TestCase.htm page in Firefox 16.0.2 on Windows XP SP3.
Actual results:
The hyphenated text does not wrap to a second line as expected but instead exceeds the bounds of the table cell, even though the cell has width and max-width inline CSS set.
Expected results:
The text should have wrapped at the hyphen like it does in Chrome, IE and Opera. Similarly if the max-width inline CSS attribute is stripped.
Updated•12 years ago
|
Attachment #678720 -
Attachment mime type: text/plain → text/html
Comment 1•12 years ago
|
||
Component: Untriaged → Layout: Text
OS: Windows XP → All
Product: Firefox → Core
Updated•12 years ago
|
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Component: Layout: Text → Untriaged
OS: All → Windows XP
Product: Core → Firefox
Resolution: --- → DUPLICATE
Updated•12 years ago
|
Component: Untriaged → Layout: Text
OS: Windows XP → All
Product: Firefox → Core
Hardware: x86 → All
Assignee | ||
Comment 3•12 years ago
|
||
I don't think this is a dupe of 95067. We do break lines after hyphen when needed - usually. But there appears to be a bug whereby we sometimes fail to break when we ought to; I've been able to reproduce this with a small <div>, too, it's not unique to table cells.
The exact width needed to demonstrate the problem may be system-dependent (as fonts and hence text width may vary), but these examples illustrate the issue for me on OS X with default font settings:
(a) data:text/html,<div style="border:1px solid red;width:80px">Flaschen-teste</div>
(b) data:text/html,<div style="border:1px solid red;width:80px">Flasch-en-tes-te</div>
Example (a) fails to wrap at the hyphen, although it clearly should. Example (b) does wrap, showing that we are capable of line-breaking after a hyphen (sometimes!).
Even more curious is example (c):
(c) data:text/html,<div style="border:1px solid red;width:80px">Flaschen-testes</div>
This breaks correctly at the hyphen, which is the same breakpoint as we failed to find in example (a), although the only difference is the addition of an extra letter at the end of the word! It seems as though there have to be at least 6 characters following the hyphen, otherwise we don't break the line. (I'm not sure why 6 should be the magic number, or whether this is really consistent in all cases.)
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: DUPLICATE → ---
Summary: Table cell hyphenated text fails to wrap and exceeds bounds → Hyphenated text in narrow box sometimes fails to wrap and exceeds bounds
Version: 16 Branch → Trunk
Assignee | ||
Comment 4•12 years ago
|
||
Aha - I think the "magic number" 6 comes from here:
http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp#566
While avoiding breaks very near the start/end of a word is generally a good idea, I think the number here may be "too conservative", particularly when we're dealing with narrow lines (as in the original example here), or when no other line break opportunity exists. We need to relax this constraint, or find a way to make this more flexible.
Comment 7•11 years ago
|
||
The source link in comment 4 is broken (by now).
So what would be a solution to this issue? (I can't see the responsible code to make a good suggestion...)
Proper link for comment 4:
http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/nsJISx4051LineBreaker.cpp#565
>#define CONSERVATIVE_BREAK_RANGE 6
added in bug 389056
http://hg.mozilla.org/mozilla-central/rev/579d8ab862ee
ni? the author of these changes regarding comment 3.
Flags: needinfo?(masayuki)
Comment 10•9 years ago
|
||
The limitation is necessary for preventing odd line-breaking in smiley (face mark).
Flags: needinfo?(masayuki)
Comment 11•9 years ago
|
||
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #10)
> The limitation is necessary for preventing odd line-breaking in smiley (face
> mark).
Not a day goes by without me encountering another case of Firefox's reluctance for linebreaking breaking the layout of text.
Isn't there something which can be done?
Comment 12•9 years ago
|
||
Are smileys more important than proper text-layout?
Flags: needinfo?(masayuki)
Comment 13•9 years ago
|
||
(In reply to Elbart from comment #12)
> Created attachment 8657177 [details]
> hyphens matter.png
>
> Are smileys more important than proper text-layout?
Basically, I think so. However, if you can suggest better idea to distinguish "-" between 2 words, we should take it.
Flags: needinfo?(masayuki)
Comment 14•9 years ago
|
||
How about: There are only letters (and maybe digits, like in "100-jährig") and no punctuation around the hyphen. Smileys with a hyphen usually have a dot or colon or parentheses around it. Other smileys might not even be recognised by a human... It's simple and probably still a hack, but less so than right now.
Comment 17•8 years ago
|
||
I just filed 1341198 because of this, after I couldn't explain to my college why a word with two columns breaks as expected in Edge, Chrome, etc. but not in Firefox. It would be great if this could be changed so that four-letters or 100-jährig wraps if needed (and as expected).
Assignee | ||
Comment 18•8 years ago
|
||
Here's a reftest (currently marked as failing) with a bunch of examples of the problem here: hyphenated compounds that would reasonably be expected to break (if necessary), along with some examples where we really don't want to break at a hyphen (smiley sequences, and IMO also single-character prefixes or suffixes such as "T-shirt"). I don't think it's possible to come up with a heuristic that will be "right" 100% of the time, partly because the desirability (or otherwise) of allowing a break at an explicit hyphen depends on many details of context and semantics, and even then is often a subjective matter; but I do think we can make things substantially better than they are right now, such that the examples in this test all behave as per the reference file.
Attachment #8841248 -
Flags: review?(VYV03354)
Assignee | ||
Comment 19•8 years ago
|
||
Proposed patch that adjusts the "conservative" breaking range where we avoid breaks at hyphens (among other things) near word edges. The basic idea here is to check whether we're dealing with letter or non-letter characters, and use different "conservative" ranges for the two cases. This makes the above reftest pass. It also leads to failures on a few existing tests, but IMO they are acceptable changes to cases that are not well-specified and where browsers differ radically in their behavior, so there's no existing interop to worry about. Patch to adjust the existing tests follows...
Attachment #8841249 -
Flags: review?(VYV03354)
Assignee | ||
Comment 20•8 years ago
|
||
(To be folded into the preceding patch.) Here are the adjustments needed to existing reftests if we take the patch above. I think these behavior changes are reasonable, and the improvement in behavior for "normal" hyphenated compounds is well worth the impact here.
Attachment #8841251 -
Flags: review?(VYV03354)
Comment 21•8 years ago
|
||
Comment on attachment 8841248 [details] [diff] [review]
Reftest for breaking near word beginning/end (the "conservative" breaking range)
I think Nakano-san should review this.
Attachment #8841248 -
Flags: review?(VYV03354) → review?(masayuki)
Updated•8 years ago
|
Attachment #8841249 -
Flags: review?(VYV03354) → review?(masayuki)
Updated•8 years ago
|
Attachment #8841251 -
Flags: review?(VYV03354) → review?(masayuki)
Comment 22•8 years ago
|
||
Comment on attachment 8841251 [details] [diff] [review]
part 1.1 - Update existing reftests that are affected by the changed behavior
>-<p>2007-Jan-01</p>
>+<p>2007-<br>Jan-01</p>
Hmm, is this change really okay? Although, this is hacky idea, shouldn't we keep not breaking after "-" if it's previous character is a numeric?
>-<p>2007-Jan-01 00:00:00</p>
>+<p>2007-<br>Jan-01 00:00:00</p>
Same here.
But I'm not a native speaker of Western languages and I'm Japanese speaker, so, breaking at almost all points doesn't make me feel odd. So, I don't know if this change is acceptable for Western users. How do you think (Or, do you know somebody who is familiar with typography of Western languages)?
Flags: needinfo?(jfkthame)
Assignee | ||
Comment 23•8 years ago
|
||
(In reply to Masayuki Nakano [:masayuki] from comment #22)
> Comment on attachment 8841251 [details] [diff] [review]
> part 1.1 - Update existing reftests that are affected by the changed behavior
>
> >-<p>2007-Jan-01</p>
> >+<p>2007-<br>Jan-01</p>
>
> Hmm, is this change really okay? Although, this is hacky idea, shouldn't we
> keep not breaking after "-" if it's previous character is a numeric?
It's hard to say. In general, it might be preferable not to break such a date; but on the other hand, if it comes down to a choice between breaking the date or overflowing the line, it's better to break. Moreover, there are plenty of examples (e.g. "100-jährig", see comment 14) where allowing the break seems quite reasonable.
Also note that Chrome allows these breaks (more widely than even the patched version of Firefox); it breaks
<p>2007-Jan-01</p>
as
2007-
Jan-
01
in that reftest. So there is far from being any interop that authors can rely on; and our (patched) behavior is still more conservative than Chrome's.
>
> >-<p>2007-Jan-01 00:00:00</p>
> >+<p>2007-<br>Jan-01 00:00:00</p>
>
> Same here.
The here suggests an author who is deliberately trying to avoid line-breaks within the time/date string; but in that case, they should have also used non-breaking hyphens (‑) to make sure of that. Again, note that Chrome would break this as
2007-
Jan-
01 00:00:00
so authors who want to avoid such breaks already need to be more careful about their content.
>
> But I'm not a native speaker of Western languages and I'm Japanese speaker,
> so, breaking at almost all points doesn't make me feel odd. So, I don't know
> if this change is acceptable for Western users. How do you think (Or, do you
> know somebody who is familiar with typography of Western languages)?
My view here is that relaxing our "conservative" rules slightly for these examples is acceptable. Ideally, we'd have multiple "levels" of hyphenation points with differing priorities: some that are always available to use, and others that are only used as a last resort to avoid overflow. But our current line-breaking architecture isn't that sophisticated.
The problem of -failing- to break in examples where the hyphen would be expected to allow a break is much more serious (see comment 11).
Flags: needinfo?(jfkthame)
Assignee | ||
Comment 24•8 years ago
|
||
Updated patch to fix a debug-build assertion failure in UseConservativeBreaking().
Attachment #8841634 -
Flags: review?(masayuki)
Assignee | ||
Updated•8 years ago
|
Attachment #8841249 -
Attachment is obsolete: true
Attachment #8841249 -
Flags: review?(masayuki)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → jfkthame
Status: REOPENED → ASSIGNED
Comment 25•8 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #23)
> (In reply to Masayuki Nakano [:masayuki] from comment #22)
> Also note that Chrome allows these breaks (more widely than even the patched
> version of Firefox); it breaks
Well. Makoto-san and I discussed the compatibility with Chrome and our traditional behavior, though, perhaps, both browsers should use ICU for Unicode characters but ASCII characters should keep traditional behavior if it's better than other browser's behavior.
As you said, I think that breaking after 000- may be better for generic case, but not so for date-time. The important thing is, which is major case for the users.
> > >-<p>2007-Jan-01 00:00:00</p>
> > >+<p>2007-<br>Jan-01 00:00:00</p>
> >
> > Same here.
>
> The here suggests an author who is deliberately trying to avoid
> line-breaks within the time/date string; but in that case, they should have
> also used non-breaking hyphens (‑) to make sure of that.
Indeed.
> > But I'm not a native speaker of Western languages and I'm Japanese speaker,
> > so, breaking at almost all points doesn't make me feel odd. So, I don't know
> > if this change is acceptable for Western users. How do you think (Or, do you
> > know somebody who is familiar with typography of Western languages)?
>
> My view here is that relaxing our "conservative" rules slightly for these
> examples is acceptable. Ideally, we'd have multiple "levels" of hyphenation
> points with differing priorities: some that are always available to use, and
> others that are only used as a last resort to avoid overflow. But our
> current line-breaking architecture isn't that sophisticated.
Yeah, if we have prioritized line breaker, this issue must be simpler (bug 389710). It's okay to break date-time format if there is no other line break opportunity.
I'll be back soon, let me think this deeper before review.
Updated•8 years ago
|
Blocks: line-breaking
Assignee | ||
Comment 26•8 years ago
|
||
(In reply to Masayuki Nakano [:masayuki] from comment #25)
> As you said, I think that breaking after 000- may be better for generic
> case, but not so for date-time. The important thing is, which is major case
> for the users.
My gut feeling (not supported by any actual data) is that date-time formats such as 2017-Feb-28, where we should arguably try to avoid breaking at the hyphen after 2017-, are not very commonly used in running text (where line-breaking is more important); they're more likely to occur in contexts such as tabular data or lists, where they occur as standalone items that are not usually subject to line breaking anyway. If breaking -is- needed in such a context, it's because the column allocated for the date is very narrow, and in that case breaking at the hyphen may be needed anyway to avoid overflow.
So my view is currently that the proposed change here provides more improvement than regression, on balance. Specifically, it improves behavior for common natural-language cases that authors cannot be expected to handle in any other way, whereas the (debatable) regression is in a specific context where the author does have the option to control the behavior more precisely by use of non-breaking hyphens -- and indeed, the author already needs to do this for the sake of other browsers.
Comment 27•8 years ago
|
||
Ah, tables... I worry about that date-time is separated by 2 lines when a date-time is unfortunately positioned end of a line. However, Chromium and Edge breaks in it.
https://jsfiddle.net/d_toybox/22q5jcL7/
Okay, let's take it. I think that when we support prioritized line break, we should mark such point as lower priority.
Updated•8 years ago
|
Attachment #8841248 -
Flags: review?(masayuki) → review+
Updated•8 years ago
|
Attachment #8841251 -
Flags: review?(masayuki) → review+
Comment 28•8 years ago
|
||
Comment on attachment 8841634 [details] [diff] [review]
part 1 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc
>- // Note that index is always larger than CONSERVATIVE_BREAK_RANGE here.
>- for (uint32_t i = index; index - CONSERVATIVE_BREAK_RANGE < i; --i) {
>+ // Note that index is always larger than conservativeRange here.
It might be better to use MOZ_ASSERT here.
>- // Note that index is always less than mLength - CONSERVATIVE_BREAK_RANGE.
>+ // Note that index is always less than mLength - conservativeRange.
And here.
>+ // If the character at index is a letter (rather than various punctuation
>+ // characters, etc) then we want a shorter "conservative" range
>+ uint32_t conservativeRangeStart, conservativeRangeEnd;
>+ if (index < mLength &&
>+ GetGenCategory(GetCharAt(index)) == nsIUGenCategory::kLetter) {
>+ // Primarily for hyphenated word prefixes/suffixes; we add 1 to Start
>+ // to get more balanced behavior (if we break off a 2-letter prefix,
>+ // that means the break will actually be three letters from start of
>+ // word, to include the hyphen; whereas a 2-letter suffix will be
>+ // broken only two letters from end of word).
>+ conservativeRangeEnd = CONSERVATIVE_RANGE_LETTER;
>+ conservativeRangeStart = CONSERVATIVE_RANGE_LETTER + 1;
Hmm, I may not understand this well, though, looks like that this block assumes that the last line break is caused by a close punctuation. If it's caused by a open punctuation like "(", does this work as you expected? E.g., "/a/ab/abc/abcd", "/abcd/abc/ab/a", etc.
If it works as you expected, r=masayuki (and it's better to add above cases to the testcase.)
Attachment #8841634 -
Flags: review?(masayuki) → review+
Assignee | ||
Comment 29•8 years ago
|
||
A sequence such as "/a/ab/abc/abcd" is unaffected by the change here, it continues to use the 6-char "conservative breaking" range. We have some URL examples somewhat like that already, but I'll add a test file with additional sequences of varying lengths, which passes unchanged both before and after the patch.
Assignee | ||
Comment 30•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/283f43e8790b268c6f5316b007fddb6658b6dfba
Bug 809020 - Reftest for breaking near word beginning/end (the "conservative" breaking range). r=masayuki
https://hg.mozilla.org/integration/mozilla-inbound/rev/e980c683af8cb856dbb7409373d2cef7972fb4ca
Bug 809020 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc., and adjust existing tests accordingly. r=masayuki
Comment 31•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/283f43e8790b
https://hg.mozilla.org/mozilla-central/rev/e980c683af8c
Status: ASSIGNED → RESOLVED
Closed: 12 years ago → 8 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in
before you can comment on or make changes to this bug.
Description
•