<a class="header-button" href="https://bugzilla-dev.allizom.org/home" title="Go to home page"> Bugzilla

Comment 5

•

23 years ago

Build reassigning Buster's bugs to Marc.

Assignee: buster → attinasi

Simon Fraser [no longer active]

Comment 6

•

23 years ago

FWIW, <URL:http://www.unicode.org/unicode/reports/tr14/> provides some guidance on line-breaking implementation.

Aaron Kaluszka

Comment 7

•

23 years ago

*** Bug 147836 has been marked as a duplicate of this bug. ***

Martin Tomasek

Comment 8

•

23 years ago

as I see, this is 2 years old bug. I found it today again and reported as bug 147836. there is't possibly simple solution for all languages, but what do you think about starting with simple wordsplitting using fixed maximal number of characters for every word? it will solve worst cases of this bug. I have seen this on a page with many paragraphs, all were turned into 10 monitors wide lines by this bug.

Comment 9

•

22 years ago

Layout doesn't even break lines on hyphens, for crying out loud.

OS: Windows 98 → All

Hardware: PC → All

Comment 10

•

22 years ago

There are two types of hyphens according to the HTML4.01 spec: http://www.w3.org/TR/html401/struct/text.html#h-9.3.3 In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character (i.e no special breaking behavior) The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored. In HTML, the plain hyphen is represented by the "-" character (- or -). The soft hyphen is represented by the character entity reference  ( or )

David Baron :dbaron:

Comment 11

•

22 years ago

"(i.e., no special breaking behavior)" isn't part of the spec and isn't really what the spec means. Rather, the relevant section is lower, in 9.3.5, which uses Western scripts as an example and gives incorrect rules. I'd say the example is non-normative and can be ignored.

Comment 12

•

22 years ago

dbaron: Should we be breaking a long repeating line of hyphens? This is an issue in http://bugscape.mcom.com/show_bug.cgi?id=15288

Henry Jia

Updated

•

22 years ago

Blocks: 168902

Jo Hermans

Comment 13

•

22 years ago

see also bug 157967 : Mac OS X needs to use the ATSUI-services which, among other things, will do the line breaking for you.

Comment 14

•

22 years ago

Altering summary: Unicode seems to lay down some normative linebreaking behavior, although they don't define a full linebreaking alogrithm per se. <URL:http://www.unicode.org/unicode/reports/tr14/>. (More precisely, it defines in what places lines may, must, or must not break; whether or not the layout takes advantage of a possible linebreak is left to a higher-level algorithm.)

Keywords: testcase

Summary: More intelligent linebreaking algorithms needed → More intelligent Unicode-compatible linebreaking algorithms needed

Comment 15

•

22 years ago

*** Bug 175578 has been marked as a duplicate of this bug. ***

kaldari

Comment 16

•

22 years ago

IE breaks on slashes (as the unicode standard recommends), Mozilla does not. Adding [p-ie] to whiteboard.

Whiteboard: [p-ie]

S Woodside

Comment 17

•

22 years ago

ATSUI claims to support Unicode 3.2 (which defines the line breaks as noted above, http://www.unicode.org/unicode/reports/tr14/): http://developer.apple.com/techpubs/macosx/Carbon/text/ATSUI/ ATSUI_Concepts/atsui_app_unicode/index.html (yeah I inserted a space ... ;-) "ATSUI provides full layout support for Unicode 3.2 and supports text rendering for all the features required by scripts included with version 2.1 of the Unicode standard or later. " So using ATSUI to render at the paragraph level on Mac OS X would fix this bug on Mac OS X at least (obviously this == fixing bug 157967).

Comment 18

•

22 years ago

Mozilla currently implements JIS X 4051 and Thai linebreaking. (see files in http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/src/). There are several differences between JIS X 4051 (the only linebreaker implemented so far) and UTR #14. They include (but are not limited to) - treatment of NBSP, ZWNBSP, CGJ : the current linebreaker doesn't implement 'do not break after. do not break before, either' UTR #14> GL * Non-breaking (“Glue”) NBSP, ZWNBSP,CGJ prohibit line breaks before or after Currently, Mozilla breaks before(after) NBSP if what follows(preceeds) it is CJK Ideograph or Hangul syllables. - In (current) JIS X 4051 (implementation), Euro (U+20AC) and other currency signs are class 8 while Yen(U+00A5) and Pound(U+00A3) are class 3. UTR #14 stipulates that they be treated consistently. - comma is treated per UTR, but fullstop is not (see bug 164759. A simple 2-line patch will fix this). Other characters in UTR#14 IS category need to be taken care of. UTR> IS - Numeric Separator (Infix) (XB) Characters that usually occur inside a numerical expression may not be separated from following numeric characters, unless space character intervenes. Since they are otherwise sentence ending punctuation, they prevent breaks before. - UTR #14 prohibits break before ‘]’ or ‘!’ or ‘;’ or ‘/’, even after spaces, but JIS X 4051 allows break before '/'. FYI, other bugs on linebreaking we may need a tracking bug opened) are : bug 193212, bug 203016, bug 178290, bug 172052, bug 164759(dup: bug 202833), bug 162049(closed), bug 162940 and more. It has to be noted that *not all* rules in UTR #14 are normative and we can tailor them(non-normative rules) as we see fit (per lang/locale or based on other criteria).

Keywords: intl

Comment 19

•

22 years ago

> fullstop is not (see bug 164759. A simple 2-line patch will fix this). Actually, a bit more work is necessary. We have to add a new class (break neither before nor after) and assign that class to fullstop in some context (for instance, between 'e' and 'g' in 'e.g.'). We need that class anyway for NBSP/ZWNBSP/CGJ.

Torsten Bronger

Comment 20

•

22 years ago

Will Mozilla's linebreaking algorithm include BPH -- break permitted here? (Or something equivalent.) Does bug 172819 belong to this "bug family" too?

Comment 21

•

22 years ago

Sorry for spamming. Adding a few more to CC. BTW, note that I uploaded a fix for fullstop case in bug 164759 (attachment 121406 [details] [diff] [review]).

Updated

•

22 years ago

Depends on: line-breaking

Boris Zbarsky [:bzbarsky]

Comment 22

•

21 years ago

->Fonts & Text

Assignee: attinasi → font

Component: Layout → Layout: Fonts and Text

QA Contact: petersen → ian

Updated

•

21 years ago

Priority: P3 → --

Target Milestone: Future → ---

S Woodside

Comment 23

•

21 years ago

This is a serious usability problem, because it causes pages with long hyperlinks to grow excessively wide. If fixing to Unicode is going to be futured, then a workaround to line-break on slashes and hyphens should be put in place in the meantime.

Updated

•

21 years ago

Priority: -- → P4

Target Milestone: --- → Future

Boris Zbarsky [:bzbarsky]

Updated

•

21 years ago

Blocks: line-breaking

No longer depends on: line-breaking

Mike Cowperthwaite

Comment 24

•

21 years ago

re: comment 23 (Simon Woodside): Breaking after ASCII hyphen (hyphen-minus) is now bug 95067; breaking after slash is bug 218580. Also, handling of soft-hyphen is bug 9101. re: comment 20 (Torsten Bronger): I am unfamiliar with "BPH", but Unicode does provide a zero-width space (U200B -- note that 'zwsp' is not a defined entity name) to use as a non-visible author-specified break point -- and Mozilla does in fact handle this correctly.

Torsten Bronger

Comment 25

•

21 years ago

I had a descussion about U200B in comp.text.xml last year. I don't like it because the Unicode specs say that it "may expand in justification" which is totally unacceptable of course. But with this Unicode description I must assume that some XML interpreters do/will treat it like this.

David Baron :dbaron:

Comment 26

•

21 years ago

*** Bug 222057 has been marked as a duplicate of this bug. ***

David Feuer

Comment 27

•

19 years ago

This bug hasn't seen any activity in a couple years, but it still seems to be a problem. What's up?

Comment 28

•

19 years ago

Lack of resources. Patches accepted.

Simon Montagu :smontagu

Updated

•

18 years ago

Blocks: 359179

Simon Montagu :smontagu

Updated

•

18 years ago

Blocks: 346969

fantasai

Comment 29

•

18 years ago

smontagu asked me to comment here -- basically I agree with Chris Hoess's assessment in comment 14. Most of UAX #14 is non-normative. (This will be even clearer in the next revision.) The normative rules deal mainly with line breaking control characters, and should be implemented as specced in the proposed update. The rest of it is tailorable, and many of those rules are impractical unless we also implement prioritization. E.g. we should allow breaks after hyphens as suggested, but only if they are at a lower priority than spaces. IMHO UAX#14's non-normative rules should be viewed more as hints on how to do things right rather than a specification for how to do things right. It collects together a lot of hard-to-find and useful information about line breaking, but it's not a complete usable algorithm, its heuristics are not always the best, and sometimes it's just wrong. So, to summarize, work on line breaking at punctuation other than spaces should a) implement prioritization b) use UAX14 as a starting point but also b) use common sense, expert opinion, and/or research to support any changes from what we do today, not just blindly implement UAX14's pairs table c) use the latest proposed update to UAX 14 [1], as it fixes some substantial errors in the latest approved version [1] http://unicode.org/reports/tr14/tr14-20.html

Simo Kaupinmäki

Comment 30

•

17 years ago

This bug is affected by the recent fix to bug 95067 (which, I suppose, is actually a duplicate, although at a little more specific level). The fix allows linebreaking in connection with hyphens, slashes and a number of other characters, mainly by imitating the linebreaking behavior of WinIE 7. See the comparison table between Gecko, IE 7 and Opera 9.2: http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/tools/spec_table.html While offering a solution to the lay-out problems caused by URLs and other very long strings, the fix seems to introduce rather undesirable side-effects. For example, linebreaking is allowed after the slash in "c/o", and both before and after the parentheses in "colo(u)ring". I was considering filing bugs for some of the new issues but I haven't had the opportunity to test them properly. And then I found this bug and realized that basically they all concern the same subject, so perhaps the discussion should continue here. I don't think imitating IE's over-simplified linebreaking algorithms is the right thing to do. Mozilla has made its reputation by being better than IE, even if doing so caused some web-sites that were optimized for IE to look bad. Now, the competition has finally forced even Microsoft to bring its browser to the 21st century. This is not the time to lower the standards and start trailing them.

Simo Kaupinmäki

Comment 31

•

17 years ago

By the way, when considering the applicability of UAX #14, the general criticism by Jukka Korpela might be worth taking into account (although it isn't quite up to date with the most recent revisions): http://www.cs.tut.fi/~jkorpela/unicode/linebr.html There is even a more extensive article about word division in IE and the problems it causes especially from the point of web-authoring: http://www.cs.tut.fi/~jkorpela/html/nobr.html

jrblier

Comment 32

•

17 years ago

Can we have an update on this bug. By the way, bug 346969 is now fixed, but I cannoy close it.

Wu Yongwei

Comment 33

•

16 years ago

Bug 450088 is related to this issue. I also have a zlib-licensed implementation of UAX #14 available at: http://vimgadgets.cvs.sourceforge.net/vimgadgets/common/tools/linebreak/

Phil Ringnalda (:philor)

Updated

•

15 years ago

Assignee: layout.fonts-and-text → nobody

QA Contact: ian → layout.fonts-and-text

Comment 34

•

12 years ago

I think that this is now important for compatibility with other browsers. I'll try to implement by a new class which can be chosen with pref. I think that when we enable the new class in default settings, we should remove the pref and current implementation.

Assignee: nobody → masayuki

Severity: minor → normal

Component: Layout: Text → Internationalization

Priority: P4 → --

Summary: More intelligent Unicode-compatible linebreaking algorithms needed → More intelligent Unicode-compatible linebreaking algorithms (UAX #14) needed

Makoto Kato [:m_kato]

Assignee

Comment 35

•

12 years ago

If we import libicu (bug 724531 and bug 820261) into our code, we can handle this more easily instead of creating new table.

Comment 36

•

12 years ago

(In reply to Makoto Kato from comment #35) > If we import libicu (bug 724531 and bug 820261) into our code, we can handle > this more easily instead of creating new table. Is it enough for our requirement? Probably, if we implement UAX #14 strictly, we break compatibility with a lot of websites. So, we need to add similar customization added in current line breaker. Is it possible?

Comment 37

•

12 years ago

Looks like it's not capable of CSS3 text, such as line-break. I don't think that we should use 3rd party's library for line breaker because it's too sensitive for compatibility and performance.

John Daggett (:jtd)

Comment 38

•

12 years ago

(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #37) > Looks like it's not capable of CSS3 text, such as line-break. What exactly do you mean here? Do you think we need to change the spec? If so, which part (5.1? 5.2?)

Comment 39

•

12 years ago

It seems that Chrominum also uses their own table for compatibility: http://mxr.mozilla.org/chromium/source/src/third_party/WebKit/Source/WebCore/rendering/break_lines.cpp#71 (In reply to John Daggett (:jtd) from comment #38) > (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #37) > > Looks like it's not capable of CSS3 text, such as line-break. > > What exactly do you mean here? Do you think we need to change the spec? If > so, which part (5.1? 5.2?) No. If we would use ICU line breaker, the library should have all behavior defined by CSS3 Text and the behavior should have compatibility with current Gecko and other browsers moderately, especially in ASCII character range.

Comment 40

•

12 years ago

If ICU supports complex line breaking script, it's worthwhile to use ICU only for them, I think. Currently, we use native API's line breaker for them. So, Gecko doesn't behave same on all platforms for such language users.

Makoto Kato [:m_kato]

Assignee

Comment 41

•

12 years ago

(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #40) > If ICU supports complex line breaking script, it's worthwhile to use ICU > only for them, I think. Currently, we use native API's line breaker for > them. So, Gecko doesn't behave same on all platforms for such language users. Platform's line breaker may not handle correct line break position for complex language such as khmer. We should use another way (ex. using libicu) for these languages. Also, actually, even if not complex script, line breaker isn't compatible on each browser implementation. See http://w3c-test.org/framework/results/i18n-css3-text/.

Comment 42

•

12 years ago

Hmm, chromimum might use ICU for fallback class of non-ASCII characters. But I'm not sure if the build option (ICU_UNICODE) is enabled in the default setting. And if it's enabled, I'm not sure how do they think about supporting line-break property in the future. (In reply to Makoto Kato from comment #41) > Also, actually, even if not complex script, line breaker isn't compatible on > each browser implementation. See > http://w3c-test.org/framework/results/i18n-css3-text/. Yes, but I think we can improve the compatibility in non-ASCII range since we have never used UAX #14 yet.

Desigan Chinniah [:cyberdees] [:dees] [London - GMT]

Comment 43

•

12 years ago

And probably, if we use ICU, it becomes more difficult to fix bug 389710.

Updated

•

8 years ago

Whiteboard: [p-ie] → [p-ie] [platform-rel-Intel]

Comment hidden (offtopic)

Desigan Chinniah [:cyberdees] [:dees] [London - GMT]

Updated

•

8 years ago

Whiteboard: [p-ie] [platform-rel-Intel] → [p-ie]