Closed Bug 389056 Opened 17 years ago Closed 17 years ago

Don't break line between periods and quote

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jruderman, Assigned: masayuki)

References

(
URL
)

Details

(Keywords: regression, testcase)

Attachments

(4 files, 21 obsolete files)

Patch rv1.0 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv2.0pre1 (work in progress) 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv2.0pre2 (work in progress) 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv2.0pre3 (work in progress) 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv2.0 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv2.0.1 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv3.0 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv3.0.1 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
testcases #1 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), text/html		Details
testcase #2 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), text/html		Details
Patch rv4.0 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv5.0 (work in progress) 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv5.1 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
testcase #3 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), text/html		Details
Patch rv5.2 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
testcase #4 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), text/html		Details
Patch rv5.3 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv5.4 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv5.5 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
Patch rv5.5.1 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch	roc : review+ roc : superreview+	Details \| Diff \| Splinter Review
fix some nits. 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
fix some nits. v2. 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
fix some nits. v3. 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch	roc : review+ roc : superreview+ roc : approval1.9+	Details \| Diff \| Splinter Review
fix some nits. v4. (for check-in) 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch		Details \| Diff \| Splinter Review
reftests for line-breaking 17 years ago Masayuki Nakano [:masayuki] (he/him)(JST, +0900) (deleted), patch	roc : review+ roc : superreview+ roc : approval1.9+	Details \| Diff \| Splinter Review

Jesse Ruderman

Reporter

Description

•

17 years ago

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.9a7pre) Gecko/2007072004 Minefield/3.0a7pre <div style="width: 1px">"foo..."</div> Renders as "foo... " This looks silly. Apparently it only happens if there are two or more periods.

Jesse Ruderman

Reporter

Comment 1

•

17 years ago

This regressed some time in July. Guessing it's due to the change in bug 255990.

Blocks: 255990

Flags: blocking1.9?

Keywords: regression

j.j.

Comment 2

•

17 years ago

confirmed on win XP. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a7pre) Gecko/2007072005 Minefield/3.0a7pre (btw <div style="width:1px"> foo=bar </div> renders as foo= bar )

j.j.

Comment 3

•

17 years ago

(In reply to comment #2) > foo= > bar This is intended, of course. Sorry for the spam. j.j.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 4

•

17 years ago

This can be fixed easy, I think. # The equal issue is designed. We should break in the point for the URL params. # And please file new bugs for other issues! I think that always we should not break around |"|. We defined it as "starting parenthesis" (e.g., like as |(|). But it is often used as "finished parenthesis". It should be character class, I think. (|'| is character class.) I'll attach a patch.

Assignee: nobody → masayuki

OS: Mac OS X → All

Hardware: PC → All

Version: unspecified → Trunk

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 5

•

17 years ago

(In reply to comment #4) > I think that always we should not break around |"|. We defined it as "starting > parenthesis" (e.g., like as |(|). But it is often used as "finished > parenthesis". It should be character class, I think. (|'| is character class.) Oops, this comment is wrong... U+0022(") is character U+0027(') is character

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Updated

•

17 years ago

Status: NEW → ASSIGNED

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 6

•

17 years ago

Attached patch Patch rv1.0 (obsolete) (deleted) — Details — Splinter Review

I think that this patch can fix this and similar issues. 1. If current and next character is U_PERIOD, current character class should be 8 (character). 2. ContextualAnalysis should be able to know the previous character class. Because the previous character may be "contextual analysis"ed. 3. "," should be character class as default. e.g., we have same problem in |i.e.,"|

Attachment #273311 - Flags: superreview?(roc)

Attachment #273311 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 7

•

17 years ago

(In reply to comment #6) > 1. If current and next character is U_PERIOD, current character class should be > 8 (character). Or, should we define the default class is 8? (U_PERIOD default class is 1, it means "end parenthesis".)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 8

•

17 years ago

Perhaps we could fix this (and some other issues) in a simple way by requiring that there be at least one white-space or non-punctuation character between any two breaks (including treating the end of the string as a break)? This means that we would never break inside any string of punctuation at the end of a word. However currently we don't distinguish punctuation from normal characters in the character classes. How hard would that be?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 9

•

17 years ago

I think that we have two ways: 1. we should have punctuation tables. 2. we should add 'a' class for punctuations. I think that 2 is better. It makes simple code.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 10

•

17 years ago

The punctuations are several classes. There are connector, dash, open, close, initial quote, final quote and others, so, we may not be able to fix this by "2" easy. "1" may be better...

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 11

•

17 years ago

We don't need to distinguish all those, right? We just need one new class for all the punctuation that currently is classified as "character". BTW can you add comments somewhere explaining what all the (compressed) classes actually mean?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 12

•

17 years ago

Attached patch Patch rv2.0pre1 (work in progress) (obsolete) (deleted) — Details — Splinter Review

This is a draft for new approach, but I have following issues: 1. The class name translating of JIS X 4051 is so hard for me. I need help of native speakers. 2. Need to change over U+0080 too. 3. Now, we are using class 7 (Alphabet) *only* for U+2126 (OHM SIGN), I think that it should be class 8 (Character), therefore, we can remove class 7. (And we can use class 7 for new class.)

Attachment #273311 - Attachment is obsolete: true

Attachment #273311 - Flags: superreview?(roc)

Attachment #273311 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 13

•

17 years ago

er, the table of breakable (section 5) is wrong, we should break between [c] and (16 or 17 or 18). 1 [a] 7 8 9 [b]15 16 18 COMPLEX [c] 1 X X X X X X X X X X X [a] X 7 X X 8 X X 9 X [b] X 15 X X X X X 16 X X X X 18 X X X X X COMPLEX X T [c] X X X X X should be: 1 [a] 7 8 9 [b]15 16 18 COMPLEX [c] 1 X X X X X X X X X X X [a] X 7 X X 8 X X 9 X [b] X 15 X X X X X 16 X X X X 18 X X X X X COMPLEX X T [c] X X But I think that we need special handling for period (for file path) and single/double quote(They can be both open/close parenthesis).

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 14

•

17 years ago

er, I'm being confused... please ignore comment 13. the table should be: 1 [a] 7 8 9 [b]15 16 18 COMPLEX [c] 1 X X X X X X X X X X X [a] X X 7 X X 8 X X 9 X [b] X 15 X X X X X 16 X X X X 18 X X X X X COMPLEX X T [c] X X X X X We don't need to break after [c] when next is character/numeric... And we should not break before [c] when prev is [a], right?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

17 years ago

> 1. The class name translating of JIS X 4051 is so hard for me. I need help of > native speakers. What you have there is pretty good except I don't know what some of them are. In particular can you explain a bit more: + 5: Middle dot + 8: Prefix (ellipsis) symbol + 9: Suffix (ellipsis) symbol + 19: Split line note (Warichu) begin quote + 20: Split line note (Warichu) end quote ? Yes, make OHM a character. #define NUMERIC_CLASS 6 // JIS x4051 class 15 is now map to simplified class 6 #define CHARACTER_CLASS 8 // JIS x4051 class 18 is now map to simplified class 8 +#define PUNCTUATION_CLASS 0xA // see bug 389056 Also create #defines for all our compressed classes. I'll help you come up with names.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 16

•

17 years ago

(In reply to comment #15) > > 1. The class name translating of JIS X 4051 is so hard for me. I need help of > > native speakers. > > What you have there is pretty good except I don't know what some of them are. > In particular can you explain a bit more: > + 5: Middle dot e.g., · > + 8: Prefix (ellipsis) symbol e.g., $, ¥ and "No." > + 9: Suffix (ellipsis) symbol e.g., cent sign, % and degree sign > + 19: Split line note (Warichu) begin quote > + 20: Split line note (Warichu) end quote http://www.w3.org/TR/2003/CR-css3-text-20030514/#text-combine-prop Warichu is a ruby layout pattern. It will be styled as "text-combine: lines;". 19 is open parenthesis of Warichu, 20 is close it. We don't support these class.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 17

•

17 years ago

I'm rearranging the definition table (jisx4501class.txt). bug 388096 will be fixed by the next patch.

Blocks: 388096

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 18

•

17 years ago

Attached patch Patch rv2.0pre2 (work in progress) (obsolete) (deleted) — Details — Splinter Review

I misunderstood the class 7 of JIS X 4051. It is not a class for hyphens. It means that the characters are breakable before and after, but they are not breakable between same class characters.

Attachment #273615 - Attachment is obsolete: true

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 19

•

17 years ago

And this patch changes the class of '<' and '>' to new punctuation class.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Updated

•

17 years ago

Blocks: 389595

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 20

•

17 years ago

Attached patch Patch rv2.0pre3 (work in progress) (obsolete) (deleted) — Details — Splinter Review

Don't break around '(' and ')' and ';'. Cleaning up the code with Mozilla standard coding style.

Attachment #273852 - Attachment is obsolete: true

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 21

•

17 years ago

umm... by comment 19, the rendering of https://bugzilla.mozilla.org/attachment.cgi?id=273944&action=diff#mozilla/intl/lwbrk/tools/spec_table.html_sec1 is broken. Should we break only between '>' and '<' for SGML/HTML/XML?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 22

•

17 years ago

er, the current approach for '(' and ')' are wrong. data:text/html,<div%20style="width:1px;">a[([b])]c</div> This case, cannot clear. I'll try to fix it...

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 23

•

17 years ago

Attached patch Patch rv2.0 (obsolete) (deleted) — Details — Splinter Review

O.K. This patch works fine for me. I create new two classes, [c] and [d]. [c] is character like open parenthesis. [d] is character like close parenthesis (or punctuation). By the two class, '><' problem is cleared. And also '[([' problem too. And contextual analysis is changed to very simple.

Attachment #273944 - Attachment is obsolete: true

Attachment #274126 - Flags: superreview?(roc)

Attachment #274126 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 24

•

17 years ago

Attached patch Patch rv2.0.1 (obsolete) (deleted) — Details — Splinter Review

sorry, for the spam. this is better names.

Attachment #274126 - Attachment is obsolete: true

Attachment #274128 - Flags: superreview?(roc)

Attachment #274128 - Flags: review?(roc)

Attachment #274126 - Flags: superreview?(roc)

Attachment #274126 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 25

•

17 years ago

It looks OK, but I don't see how it fixes this bug. The text ..." is just a run of punctuation of class [d], and we can break between those characters. Right? http://groups.google.com/group/mozilla.dev.tech.layout/browse_thread/thread/47c49202f2d13f94

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

17 years ago

Er, what I was going to say is, I think we should discuss this on this thread: http://groups.google.com/group/mozilla.dev.tech.layout/browse_thread/thread/47c49202f2d13f94 before we go ahead. Also I'd like to find out what the jp-critical bugs are that require breaking of Latin-1 text around punctuation.

David Baron :dbaron:

Updated

•

17 years ago

Flags: blocking1.9? → blocking1.9+

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Updated

•

17 years ago

Blocks: 390920

Serge Gautherie (:sgautherie)

Updated

•

17 years ago

Component: Layout → Layout: Fonts and Text

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 27

•

17 years ago

Attached patch Patch rv3.0 (obsolete) (deleted) — Details — Splinter Review

This changes many character's behaviors by the feedbacks. 1. Added the new class which is not breakable around them even if there are breakable. But if there is SPACE, it is breakable. This behavior is defined in UAX#14. 2. [] is not breakable now if around characters are character/numeric class. 3. / and \ are not breakable now. But if a word has two or more /s or \s, they are breakable except the first one. And they are breakable *before* them at the time. Therefore, the path text always has / or \. I believe that this is best way. But / is not breakable everytime if next is numeric for date format. (2007/01/01) 4. DEGREE SIGN is not breakable if after character is character class. But it's still breakable if after character is numeric, for compatibility. 5. The hyphen is not breakable if the text doesn't have character class, or next is numeric class. Therefore, we cannot break 2007-Aug-07 6. Pound/Yen sign are not same behavior with IE7.

Attachment #274128 - Attachment is obsolete: true

Attachment #275307 - Flags: superreview?(roc)

Attachment #275307 - Flags: review?(roc)

Attachment #274128 - Flags: superreview?(roc)

Attachment #274128 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 28

•

17 years ago

Attached patch Patch rv3.0.1 (obsolete) (deleted) — Details — Splinter Review

Oops, sorry for the mistake and the spam.

Attachment #275307 - Attachment is obsolete: true

Attachment #275308 - Flags: superreview?(roc)

Attachment #275308 - Flags: review?(roc)

Attachment #275307 - Flags: superreview?(roc)

Attachment #275307 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 29

•

17 years ago

Attached file testcases #1 (obsolete) (deleted) — Details

Hmm... I'm working for this testcase... I hate smiley :-(

Attachment #275308 - Attachment is obsolete: true

Attachment #275308 - Flags: superreview?(roc)

Attachment #275308 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 30

•

17 years ago

Attached file testcase #2 (obsolete) (deleted) — Details

Attachment #275854 - Attachment is obsolete: true

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 31

•

17 years ago

Attached patch Patch rv4.0 (obsolete) (deleted) — Details — Splinter Review

I suppress the breaking in most ASCII characters if the around characters are characters or numerics. This patch is changed from previous version: 1. Add new class that is for punctuation like character, it is same as character class, but it is needed for hypen processing. 2. The date format is checked the word length, therefore, the date format is not broken, but the chemical context is breakable. 3. If the last character of a word is CLASS_CLOSE or CLASS_CLOSE_LIKE_CHARACTER, we never break before it. 4. '%' is not breakable. But if the 3rd before or after character is '%', It may be '%' encoded URL param, therefore, it is broken before only then. 5. '&' and ';' is not breakable, But if there are one or more '='s before it, they are broken before them. It may be a param of URL. 6. '-' is not breakable, But if the both sides are numeric or character class, it is broken after. Note that they are checked the context only in a word, therefore, the all testcases are cleared by this patch.

Attachment #275972 - Attachment is obsolete: true

Attachment #275975 - Flags: superreview?(roc)

Attachment #275975 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Updated

•

17 years ago

Attachment #275972 - Attachment is obsolete: false

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 32

•

17 years ago

Are these testcases added to the reftests?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 33

•

17 years ago

It's difficult issue, because the line-breaking spec may be fluid now... # If some languages users have serious issue, we may give up some non-important cases. But for non-intended regressions, it's a good thing...

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 34

•

17 years ago

Sure, you can change the tests as you change the desired behaviour; the tests are there to tell you when you make changes that _aren't_ desired. :)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 35

•

17 years ago

roc: Should we land it for getting new feedbacks?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 36

•

17 years ago

Attached patch Patch rv5.0 (work in progress) (obsolete) (deleted) — Details — Splinter Review

work in progress for new "safety" approach.

Attachment #275975 - Attachment is obsolete: true

Attachment #275975 - Flags: superreview?(roc)

Attachment #275975 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 37

•

17 years ago

Attached patch Patch rv5.1 (obsolete) (deleted) — Details — Splinter Review

This should be 'safety' for western language. 1. Using CLASS_OPEN_LIKE_CHARACTER and CLASS_CLOSE_LIKE_CHARACTER for western punctuations. 2. If a breakable point is 'near' from start of word, end of word and previous breakable point in western language context, we use GetPairForWestern instead of GetPair. It is not breaking between numeric, western characters, CLASS_OPEN_LIKE_CHARACTER and CLASS_CLOSE_LIKE_CHARACTER. # now, I defined that the 'near' is 6 characters. By this, the date format is not broken. (E.g., 01/01/2007 and 2007-01-01) 3. If the word has CJK characters, we always use GetPair. So, CJK context is always broken by JIS X 4051 rules.

Attachment #279309 - Attachment is obsolete: true

Attachment #279339 - Flags: superreview?(roc)

Attachment #279339 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 38

•

17 years ago

Attached file testcase #3 (obsolete) (deleted) — Details

Attachment #275972 - Attachment is obsolete: true

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 39

•

17 years ago

Comment on attachment 279339 [details] [diff] [review] Patch rv5.1 er, this has a mistake, please wait next.

Attachment #279339 - Attachment is obsolete: true

Attachment #279339 - Flags: superreview?(roc)

Attachment #279339 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 40

•

17 years ago

Attached patch Patch rv5.2 (obsolete) (deleted) — Details — Splinter Review

Attachment #279352 - Flags: superreview?(roc)

Attachment #279352 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 41

•

17 years ago

Attached file testcase #4 (deleted) — Details

Attachment #279340 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 42

•

17 years ago

+IS_WORDSEPARATOR(PRUnichar u) +{ + return (0x0009 <= u && u <= 0x000D) || u == 0x0020 || u == 0x0085 || + u == 0x00A0 || u == 0x1680 || u == 0x180E || + (0x2000 <= u && u <= 0x200B) || u == 0x2028 || u == 0x2029 || + u == 0x202F || u == 0x205F || u == 0x3000; +} Could you add comments or #defines explaining what all these are? Also for the first range, if we're going to include 0x000B and 0x000C, why not include all characters <= 0x0020? +GetPairForWestern(PRInt8 c1, PRInt8 c2) I think this name is misleading. It's not "western" in particular because the other GetPair is also used to break "Western" words if they're long enough. This function is more like GetPairConservative; we use it where we want to err on the side of *not* breaking. This would mean changes to some of the comments. + void SetIndex(PRUint32 aIndex) { + NS_ASSERTION(mIndex < aIndex || mIndex == aIndex && mIndex == 0, + "not supported to index decreasing"); Call this AdvanceIndexTo. The comment can say "the index cannot decrease." You could just test mIndex <= aIndex. + PRBool UseJISX4051() { We're not using JISx4051, really, so we shouldn't add new references to it, right? + PRUnichar mPrevOfHyphen; Call it "mCharBeforeHyphen" and add a comment explaining exactly what it is. + PRPackedBool mHasCJKChar; Call it mWordHasCJKChar and document exactly what it means. + PRUint32 mCountOfSlash; + PRUint32 mCountOfBackSlash; + PRUint32 mCountOfEqual; Instead of storing these, why not just calculate them when they're needed, since they're hardly ever needed? + if (prevIsNum && nextIsNum) + return CLASS_NUMERIC; + // If both sides are numeric or character, this hyphen should be breakable. Hmm ... it looks like if both sides are numeric, we don't allow breaking? + // This text has two or more (BACK)SLASHs, this may be file path or URL. Should be "If this text..." + // This text has two or more (BACK)SLASHs, this may be file path or URL. + if (aState.UseJISX4051()) { + PRInt32 count = + (cur == U_SLASH) ? aState.CountOfSlash() : aState.CountOfBackSlash(); + if (count > 1) + return CLASS_OPEN; + } Why do we need this? We would still break after a slash without this code, right? I also don't see why we need special code for % characters. + // If this may be a separator of params of URL, we should break after. + if (aState.UseJISX4051AtNext() && aState.CountOfEqual() > 0) + return CLASS_CLOSE; Why don't we make these CLASS_CLOSE always?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 43

•

17 years ago

(In reply to comment #42) > + PRBool UseJISX4051() { > > We're not using JISx4051, really, so we shouldn't add new references to it, > right? Do you have the idea for the better name? > + if (prevIsNum && nextIsNum) > + return CLASS_NUMERIC; > + // If both sides are numeric or character, this hyphen should be > breakable. > > Hmm ... it looks like if both sides are numeric, we don't allow breaking? Right. Do we need to break? > + // This text has two or more (BACK)SLASHs, this may be file path or URL. > + if (aState.UseJISX4051()) { > + PRInt32 count = > + (cur == U_SLASH) ? aState.CountOfSlash() : aState.CountOfBackSlash(); > + if (count > 1) > + return CLASS_OPEN; > + } > > Why do we need this? We would still break after a slash without this code, > right? No, U_SLASH and U_BACKSLASH is CLASS_CHARACTER now. Therefore, they are not breakable. We need to change the CLASS when we need to break. > I also don't see why we need special code for % characters. > > + // If this may be a separator of params of URL, we should break after. > + if (aState.UseJISX4051AtNext() && aState.CountOfEqual() > 0) > + return CLASS_CLOSE; > > Why don't we make these CLASS_CLOSE always? I think U_AMPERSAND and U_SEMICOLON should not be breakable in normal sentence, right? E.g., "A&B" in western language or " " in the testcase.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 44

•

17 years ago

(In reply to comment #43) > (In reply to comment #42) > > + PRBool UseJISX4051() { > > > > We're not using JISx4051, really, so we shouldn't add new references to it, > > right? > > Do you have the idea for the better name? How about UseConservativeBreaking() (which is the opposite of what UseJISX4051 returns now)? > > + if (prevIsNum && nextIsNum) > > + return CLASS_NUMERIC; > > + // If both sides are numeric or character, this hyphen should be > > breakable. > > > > Hmm ... it looks like if both sides are numeric, we don't allow breaking? > > Right. Do we need to break? Ok, the comment needs to be more clear. I guess it should say "if one side is numeric and the other is a character, the hyphen should be breakable". > > + // This text has two or more (BACK)SLASHs, this may be file path or URL. > > + if (aState.UseJISX4051()) { > > + PRInt32 count = > > + (cur == U_SLASH) ? aState.CountOfSlash() : aState.CountOfBackSlash(); > > + if (count > 1) > > + return CLASS_OPEN; > > + } > > > > Why do we need this? We would still break after a slash without this code, > > right? > > No, U_SLASH and U_BACKSLASH is CLASS_CHARACTER now. Therefore, they are not > breakable. We need to change the CLASS when we need to break. OK > > I also don't see why we need special code for % characters. > > > > + // If this may be a separator of params of URL, we should break after. > > + if (aState.UseJISX4051AtNext() && aState.CountOfEqual() > 0) > > + return CLASS_CLOSE; > > > > Why don't we make these CLASS_CLOSE always? > > I think U_AMPERSAND and U_SEMICOLON should not be breakable in normal sentence, > right? E.g., "A&B" in western language or " " in the testcase. OK, although both of those would not be broken thanks to DONT_BREAK_RANGE. BTW "DONT_BREAK_RANGE" would be better named "CONSERVATIVE_BREAK_RANGE" since we can actually break in that region... I don't like this special-case (approximate) detection of URLs and file paths. But I don't have a better idea right now.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 45

•

17 years ago

Attached patch Patch rv5.3 (obsolete) (deleted) — Details — Splinter Review

> +IS_WORDSEPARATOR(PRUnichar u) > +{ > + return (0x0009 <= u && u <= 0x000D) || u == 0x0020 || u == 0x0085 || > + u == 0x00A0 || u == 0x1680 || u == 0x180E || > + (0x2000 <= u && u <= 0x200B) || u == 0x2028 || u == 0x2029 || > + u == 0x202F || u == 0x205F || u == 0x3000; > +} > > Could you add comments or #defines explaining what all these are? Also for the > first range, if we're going to include 0x000B and 0x000C, why not include all > characters <= 0x0020? They are listed in Unicode spec as white spaces. http://www.unicode.org/Public/UNIDATA/PropList.txt

Attachment #279352 - Attachment is obsolete: true

Attachment #279495 - Flags: superreview?(roc)

Attachment #279495 - Flags: review?(roc)

Attachment #279352 - Flags: superreview?(roc)

Attachment #279352 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 46

•

17 years ago

> HasCharacterInWordAlrady HasCharacterInWordAlready + void Reset() { Call this "NextWord" instead ... "Reset" sounds like it will reset everything to the initial state. IS_WORDSEPARATOR is really complicated. I don't think we need to care about all those spaces. How about we just limit it to 0x0A, 0x0D, 0x20, 0xA0, 0x200B, and 0x3000? Then we could change nsLineBreaker::IsSpace to match, and then simplify your nsLineBreakder code by removing the word-separator related code from ContextState, since we know IS_WORDSEPARATOR characters will not occur in the parameter to GetJISx4051Breaks. Likewise you would change WordMove so that it stops at IS_WORDSEPARATOR instead of IS_SPACE. What do you think of that? I still don't understand why we need to break before %. I also still don't like the URL-detection stuff that's going on here, although I still haven't thought of a better way.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 47

•

17 years ago

+ PRBool UseConservativeBreakingAtNext() { Instead of writing the same function twice, how about taking a PRUint32 parameter that represents the offset from the current position that you want to test (which will be 1 or 0)? Is semicolon really a URL parameter separator?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 48

•

17 years ago

Sorry, for the delay. (In reply to comment #46) > I still don't understand why we need to break before %. Most Japanese character is 3 bytes in UTF-8. Therefore, the encoded Japanese text is: (the count of Japanese character) * 3 bytes * 3 (%xx) So, the encoded string is usually very long word. (In reply to comment #47) > Is semicolon really a URL parameter separator? HTML4 spec said: http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2 > We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner. But I don't know whether such implementations are on the Earth :-(

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 49

•

17 years ago

(In reply to comment #46) > IS_WORDSEPARATOR is really complicated. I don't think we need to care about all > those spaces. How about we just limit it to 0x0A, 0x0D, 0x20, 0xA0, 0x200B, and > 0x3000? Then we could change nsLineBreaker::IsSpace to match, and then simplify > your nsLineBreakder code by removing the word-separator related code from > ContextState, since we know IS_WORDSEPARATOR characters will not occur in the > parameter to GetJISx4051Breaks. Likewise you would change WordMove so that it > stops at IS_WORDSEPARATOR instead of IS_SPACE. What do you think of that? Sounds good. But if I change nsLineBreaker::IsSpace, it seems that I need to change other codes. Because the current code assume that arround |nsLineBreaker::IsSpace| can break. But 0x0A is not so... And we should keep to check |0x2000 <= u && u <= 0x200B|. Because they can be in Western text. We should keep the spec in Western context, I think.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 50

•

17 years ago

OK, drop 0x0A from IS_WORDSEPARATOR/IsSpace. 0x0D can stay; I think it's OK to allow a soft break after that. The text-frame code won't ever use a soft break at the very start of a line. > And we should keep to check |0x2000 <= u && u <= 0x200B| OK Let's drop breaking after ';', I've never seen that as a URL parameter separator. Breaking before a % is OK.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 51

•

17 years ago

(In reply to comment #50) > OK, drop 0x0A from IS_WORDSEPARATOR/IsSpace. 0x0D can stay; I think it's OK to > allow a soft break after that. The text-frame code won't ever use a soft break > at the very start of a line. hmm... If I drop 0x0A from them, we cannot clear the test of 48, 49 and 50 in the latest testcase, is it O.K.? It seems that they are realistic cases...

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 52

•

17 years ago

> 0x0A er, 0xA0.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 53

•

17 years ago

Attached patch Patch rv5.4 (obsolete) (deleted) — Details — Splinter Review

how about this? In UseConservativeBreaking, the no-breakable spaces are searched directly. (Only when the text has one or more non-breakable spaces.) # and 0x2007 is not a space, it's no-breakable, sorry.

Attachment #279495 - Attachment is obsolete: true

Attachment #280777 - Flags: superreview?(roc)

Attachment #280777 - Flags: review?(roc)

Attachment #279495 - Flags: superreview?(roc)

Attachment #279495 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 54

•

17 years ago

> test of 48, 49 and 50 in the latest testcase What testcase? + u == 0x000A || // LINE FEED Don't treat this as nsLineBreaker::IsSpace or IS_SPACE +static inline PRBool IS_SPACE(PRUnichar u) Can we move this to a static inline function in nsILineBreaker? Then it could be shared with nsLineBreaker.h so we don't have to duplicate code and keep it consistent. There is a problem here which is that spaces which combine with an adjacent combining mark are not really breakable. I guess I will try to solve this at a higher level by changing the text that gets input to the linebreaker and textrun. + for (PRUint32 i = index; index - CONSERVATIVE_BREAK_RANGE < i;) { + if (IS_NO_BREAKABLE_SPACE(GetCharAt(--i))) Rewrite this to not use predecrement, please. I find it confusing. Also here: + if (GetCharAt(--i) == aCh) + PRUnichar ch = GetCharAt(--i); mHasNoBreakableSpace should be "mHasNonbreakableSpace" and IS_NO_BREAKABLE_SPACE should be IS_NONBREAKABLE_SPACE. ("mHasNoBreakableSpace" would mean that the word has no breakable spaces in it, which is always true. Also, "non" is not an English word on its own so we shouldn't treat it as one.) + // If this text has no-breakable space, we need to check whether the index + // is near it. Delete "If" --- at this point, we know the text has nonbreakable space. This is looking good.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 55

•

17 years ago

Comment on attachment 280777 [details] [diff] [review] Patch rv5.4 clearing request pending comments

Attachment #280777 - Flags: superreview?(roc)

Attachment #280777 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 56

•

17 years ago

Attached patch Patch rv5.5 (obsolete) (deleted) — Details — Splinter Review

>> test of 48, 49 and 50 in the latest testcase > > What testcase? attachment 279353 [details].

Attachment #280777 - Attachment is obsolete: true

Attachment #281318 - Flags: superreview?(roc)

Attachment #281318 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 57

•

17 years ago

Attached patch Patch rv5.5.1 (deleted) — Details — Splinter Review

er, sorry, I killed wrong line...

Attachment #281318 - Attachment is obsolete: true

Attachment #281322 - Flags: superreview?(roc)

Attachment #281322 - Flags: review?(roc)

Attachment #281318 - Flags: superreview?(roc)

Attachment #281318 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 58

•

17 years ago

Comment on attachment 281322 [details] [diff] [review] Patch rv5.5.1 OK I like this. Let's land it. Can you turn that testcase into a reftest?

Attachment #281322 - Flags: superreview?(roc)

Attachment #281322 - Flags: superreview+

Attachment #281322 - Flags: review?(roc)

Attachment #281322 - Flags: review+

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 59

•

17 years ago

checked-in the patch. (In reply to comment #58) > Can you turn that testcase into a reftest? O.K. I'll try to create it tonight.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 60

•

17 years ago

ah, my working files were lost by a trouble of my system... :-( please wait the reftests. sorry.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 61

•

17 years ago

Roc: (In reply to comment #48) > (In reply to comment #46) > (In reply to comment #47) > > Is semicolon really a URL parameter separator? > > HTML4 spec said: > > http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2 > > > We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner. > > But I don't know whether such implementations are on the Earth :-( I found such implementation: http://git.kernel.org/

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 62

•

17 years ago

Attached patch fix some nits. (obsolete) (deleted) — Details — Splinter Review

I found 3 bugs at creating reftests. 1. '?' is defined as CLASS_CLOSE, it should be CLASS_CLOSE_LIKE_CHARACTER for prohibiting the breaking between alphabets. 2. ContextState::GetPreviousNonHyphenCharacter returns PR_FALSE if mIndex == 0, it should be U_NULL. 3. The class of HYPHEN and FIGURE DASH is deferent from HYPHEN/MINUS in ASCII. They should be same class for same behavior. And for comment 61, we should break after SEMICOLON if it is a part of param of URL. We should respect the HTML spec. And PHP can use ';' as '&'. See http://www.php.net/manual/en/ini.core.php#ini.arg-separator.input

Attachment #281998 - Flags: superreview?(roc)

Attachment #281998 - Flags: review?(roc)

Attachment #281998 - Flags: approval1.9?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 63

•

17 years ago

er, And EN DASH was not added to IS_HYPHEN, it should be hyphen in UAX#14.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 64

•

17 years ago

Attached patch fix some nits. v2. (obsolete) (deleted) — Details — Splinter Review

And the leaders (U+2024 ONE DOT LEADER, U+2025 TWO DOT LEADER and U+2026 HORIZONTAL ELLIPSIS) should be CLASS_CLOSE_LIKE_CHARACTER. (They should be same class as FULL STOP of ASCII.)

Attachment #281998 - Attachment is obsolete: true

Attachment #281999 - Flags: superreview?(roc)

Attachment #281999 - Flags: review?(roc)

Attachment #281999 - Flags: approval1.9?

Attachment #281998 - Flags: superreview?(roc)

Attachment #281998 - Flags: review?(roc)

Attachment #281998 - Flags: approval1.9?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 65

•

17 years ago

Attached patch fix some nits. v3. (obsolete) (deleted) — Details — Splinter Review

hmmm.... The for loops were changed the meaning after roc's review... I hate the unsigned index loop :-(

Attachment #281999 - Attachment is obsolete: true

Attachment #282034 - Flags: superreview?(roc)

Attachment #282034 - Flags: review?(roc)

Attachment #282034 - Flags: approval1.9?

Attachment #281999 - Flags: superreview?(roc)

Attachment #281999 - Flags: review?(roc)

Attachment #281999 - Flags: approval1.9?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 66

•

17 years ago

Comment on attachment 282034 [details] [diff] [review] fix some nits. v3. you could write those for loops as for (PRUint32 i = mIndex; i > 0; --i) { if (GetCharAt(i - 1) == aCh) return PR_TRUE; }

Attachment #282034 - Flags: superreview?(roc)

Attachment #282034 - Flags: superreview+

Attachment #282034 - Flags: review?(roc)

Attachment #282034 - Flags: review+

Attachment #282034 - Flags: approval1.9?

Attachment #282034 - Flags: approval1.9+

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 67

•

17 years ago

Attached patch fix some nits. v4. (for check-in) (deleted) — Details — Splinter Review

Thank you roc.

Attachment #282034 - Attachment is obsolete: true

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 68

•

17 years ago

Attached patch reftests for line-breaking (deleted) — Details — Splinter Review

sorry, for the delay. This is a first post for the testing. If I find more tests, I'll post is in new bug.

Attachment #283979 - Flags: superreview?(roc)

Attachment #283979 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 69

•

17 years ago

I updated the wiki to latest line-breaking spec. And I also removed the fixed issues documents. http://wiki.mozilla.org/Gecko:Line_Breaking

j.j.

Comment 70

•

17 years ago

What's the deeper sense of the "don't break near  " rule? Isn't it reasonable to break after the hyphen here: xxx xxx-yyyyyy ?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 71

•

17 years ago

(In reply to comment #70) > Isn't it reasonable to break after the hyphen here: > xxx xxx-yyyyyy ? no, 'xxx-' is too short. non-breakable spaces might be word separator.

j.j.

Comment 72

•

17 years ago

But xxx xxx isn't too short, and xxx xxx-yyyyyy is long enough to destroy layout. Something like this should definitly break after "-": xxxxxxxxxxxxxxxxx xxxx-yyyyyyyyyyyyyyyy Furthermore, I dislike "don't break between hyphens". Very long chains of "-" should break IMHO (min. 6 characters?). And something like this should break also: -------------------------------------------------------------xxxxxxxxxxxxx---------------------------------------------------------------------------------------------

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 73

•

17 years ago

(In reply to comment #72) > But xxx xxx isn't too short, and xxx xxx-yyyyyy is long enough to > destroy layout. I have a question, do you think what cases? It seems that we cannot care your suggestion without hyphen. Isn't it a very special case? Note that we cannot support all cases on Earth. > Furthermore, I dislike "don't break between hyphens". > Very long chains of "-" should break IMHO (min. 6 characters?). I think that this case really needs prioritized line-breaking. So, we should not care this case on Gecko 1.9. Note that I know the current spec is not enough for many cases. Before this bug's patch was checked-in, very many ASCII characters were breakable. But it broke many text layout of Western language. See the bugs which was blocked by this bug. Current approach is designed with following: 1. Don't break western text layout compatibility with Fx2. 2. Should break in URI for Japanese marketing. 3. Should break after hyphen for some Western language. 4. Don't use a risky way.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Updated

•

17 years ago

Depends on: 399321

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 74

•

17 years ago

I don't think we should break long chains of consecutive hyphens. Someone's probably using them for graphic effect. It's best to err on the side of caution.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 75

•

17 years ago

Comment on attachment 283979 [details] [diff] [review] reftests for line-breaking + WON SIGN? These look good! Thanks!

Attachment #283979 - Flags: superreview?(roc)

Attachment #283979 - Flags: superreview+

Attachment #283979 - Flags: review?(roc)

Attachment #283979 - Flags: review+

Attachment #283979 - Flags: approval1.9+

Stuart Parmenter

Comment 76

•

17 years ago

can this be marked FIXED?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Comment 77

•

17 years ago

er, exactly. Sorry. -> FIXED

Status: ASSIGNED → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Jeff Walden [:Waldo]

Updated

•

17 years ago

Flags: in-testsuite+

j.j.

Comment 78

•

17 years ago

(In reply to comment #74) > I don't think we should break long chains of consecutive hyphens. Someone's > probably using them for graphic effect. That's unlikely since IE, Safari3 and Opera9.2 break between any "--". (Opera 9.5, latest weekly, doesn't) > It's best to err on the side of caution. Isn't it best to reduce browser incompatibilities?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Assignee

Updated

•

14 years ago

Depends on: 608721

Daniel Holbert [:dholbert]

Updated

•

12 years ago

Depends on: 805456

You need to log in before you can comment on or make changes to this bug.