Closed Bug 161328 Opened 22 years ago Closed 22 years ago

CJK string is not breakable before joined frames

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.0.2

People

(Reporter: shanjian, Assigned: shanjian)

References

Details

(Keywords: intl, topembed+, Whiteboard: [adt2] [ETA 09/12])

Attachments

(4 files, 3 obsolete files)

test case 22 years ago Shanjian Li (deleted), text/html		Details
patch (include patch for 160547) 22 years ago Shanjian Li (deleted), patch	bzbarsky : superreview+	Details \| Diff \| Splinter Review
a test case copied from dup bug 161319 22 years ago Yuying Long (deleted), text/html		Details
new patch (IsBreakable replaced CanBreakBefore) 22 years ago Shanjian Li (deleted), patch	shanjian : review+ shanjian : superreview+	Details \| Diff \| Splinter Review
new patch to address the regression 22 years ago Shanjian Li (deleted), patch	rbs : review+ bzbarsky : superreview+	Details \| Diff \| Splinter Review
revised patch to address regression 163034 22 years ago Shanjian Li (deleted), patch	rbs : review+ rbs : superreview+	Details \| Diff \| Splinter Review
the patch updated for branch 22 years ago Shanjian Li (deleted), patch	shanjian : review+ shanjian : superreview+ chofmann : approval+	Details \| Diff \| Splinter Review

Shanjian Li

Assignee

Description

•

22 years ago

This bug is splitted from bug 160547. For CJK string (or mixed) without space, we failed to mark it breakable when processing joined word. Using following string as an example, if joined string "..123456789.123456789.123456789." is later found not fit , it should be wrapped to next line. But we failed to do so because the CJK part of the string is not marked as breakable., Ç°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑ.<font>.123456789.123456789.123456789.</font>

Shanjian Li

Assignee

Comment 1

•

22 years ago

Attached file test case (deleted) — Details

Shanjian Li

Assignee

Comment 2

•

22 years ago

from bug 160547: The name "aIsBreakable" is a little misleading. If it is false, it indicates that the string is the first word in the line, and must be kept there (unbreakable). Otherwise, it means the current word can be moved to next line. In MeasureText function, aTextData.mIsBreakable is set to true when a space character is first met. This trick works perfect for latin language, but not for language that does not use space as word breaker, like CJK. A more generic solution is to set aTextData.mIsBreakable to true after the first word is processed. In Yuying's testcase, since there is not space character before dot, the whole string is considered unbreakable, which mean it must remain in the current line. but that is incorrect. It should break after the last chinese character. Not to break dot and dot (ie. "..") is a correct behavior. I

Status: NEW → ASSIGNED

Shanjian Li

Assignee

Comment 3

•

22 years ago

Attached patch patch (include patch for 160547) (obsolete) (deleted) — Details — Splinter Review

Shanjian Li

Assignee

Comment 4

•

22 years ago

*** Bug 161319 has been marked as a duplicate of this bug. ***

Boris Zbarsky [:bzbarsky]

Comment 5

•

22 years ago

So... are there two independent issues here? 1) What should aTextData.mIsBreakable be set to? 2) What should aIsBreakable be set to? Note that my original intent was to set the latter based on aTextData.mWrapping (which I did not notice at the time, so went with mIsBreakable instead). I'm not sure what mIsBreakable being misset affects right now, other than the value of aIsBreakable... but it sounds like we want to change the way we set mIsBreakable no matter what and then consider what aIsBreakable should be set to....

Yuying Long

Comment 6

•

22 years ago

Attached file a test case copied from dup bug 161319 (deleted) — Details

A test page that contains some double byte period sign case. Also by following html standard, is it legal that a single byte dot "." in the beginning of line in CJK pages? if we break line like: <CJK characters> .12345

Shanjian Li

Assignee

Comment 7

•

22 years ago

>1) What should aTextData.mIsBreakable be set to? This variable should only be set to true for first word in the line. >2) What should aIsBreakable be set to? It is reasonable to use aTextData.mIsBreakable. aTextData.mWrapping is not a good choice. If we can't keep at least one word in the line, it is possible to enter a endless loop.

Shanjian Li

Assignee

Comment 8

•

22 years ago

> Also by following html standard, is it legal that a single byte dot "." in the > beginning of line in CJK pages? if we break line like: > <CJK characters> > .12345 Yes. "." can be interpret together with number as a decimal number. In fact, there is no html standard for this.

Chris Petersen

Updated

•

22 years ago

Priority: -- → P3

Shanjian Li

Assignee

Comment 9

•

22 years ago

Boris, could you review my patch?

Boris Zbarsky [:bzbarsky]

Comment 10

•

22 years ago

Comment on attachment 94204 [details] [diff] [review] patch (include patch for 160547) >+ if (!aTextData.mIsBreakable && !firstThing && !isWhitespace) >+ aTextData.mIsBreakable = PR_TRUE; >+ } Why not just if (!firstThing && !isWhitespace) { aTextData.mIsBreakable = PR_TRUE; } ? That's just a no-op if aTextData.mIsBreakable is already true, so... You're missing a '{' after that |if|'s condition. Have you compiled and tested this? ;) sr=bzbarsky otherwise, but someone who really understands this code needs to do the r=...

Attachment #94204 - Flags: superreview+

Shanjian Li

Assignee

Comment 11

•

22 years ago

> Why not just > if (!firstThing && !isWhitespace) { > aTextData.mIsBreakable = PR_TRUE; > } !firstThing is almost always true except the first time, and isWhitespace varies. But aTextData.mIsBreakable only only be false the for the first word. To test aTextData.mIsBreakable first will save 2 testing for most situations. That's a performance consideration. > You're missing a '{' after that |if|'s condition. Have you compiled and tested > this? ;) I found out that when I compiled the patch, but forgot to update the patch before I posted it.

Boris Zbarsky [:bzbarsky]

Comment 12

•

22 years ago

OK. Sounds good, then.

Shanjian Li

Assignee

Comment 13

•

22 years ago

frank raised a question regarding to this bug and 161139. He asked if we did something inconsistent in wordbreaking when measuring text for cell length and when measuring text for rendering? In my understanding, maxWordLength is used to determine cell length, but we failed to break 2 words. That leads to a long word longer than the cell width calculated from maxwordlength.

Shanjian Li

Assignee

Comment 14

•

22 years ago

rbs, could you r=?

rbs

Comment 15

•

22 years ago

From comment 2, comment 5, and bug 160547 comment 8, it is now apparent that the name "aIsBreakable" is the cause of much confusion. So it can be changed to avoid further confusion in the future. What about using aTextData.mIsFirstWord (i.e., rename and negate the clauses throughout). With the rename, it will be clear why you still needed the |else|, and it might help to avoid other regressions. Wasn't that possible for the word-breaker to break after the punctuations, i.e., after "..", "...", as generally the case in typography? People often do "<a>link</a>." or "<a>link</a>,"

Shanjian Li

Assignee

Comment 16

•

22 years ago

A second thought about this,and I think the existing name is OK. In the context, we really don't care if it is the first word or not, but only care if it is breakable. A more accurate description might be "aIsBreakableBefore", but that sounds a little bit clumsy. After we make it clear that word is processed as a unit, and word is unbreakable, the name stands well. "mIsFirstWord" has its own problem. Is whitespace a word? Non-skipping whitespace make next word breakable, and skipping whitespace does not. > Wasn't that possible for the word-breaker to break after the punctuations, i.e., > after "..", "...", as generally the case in typography? People often do > "<a>link</a>." or "<a>link</a>," that's a separate issue. We considered about this when design the word breaker. In order to handle smily face sequence, like :-), we can't break just because it is after the punc.

rbs

Comment 17

•

22 years ago

Since there is now ample evidence that the other name is confusing, I like "aIsBreakableBefore". r=rbs with that. (LXR:breakbefore shows that the name isn't as clumsy as it seems.)

Shanjian Li

Assignee

Comment 18

•

22 years ago

I just got another idea. Howabout "aIsWrappable"?

rbs

Comment 19

•

22 years ago

Not as interesting. If you don't like very much aIsBreakableBefore, what about aCanBreakBefore? My line of thought was for a name such that when you do if (aCanBreakBefore) ... do this else // ---> nobody wonders why there is still an 'else'... ... do that

Shanjian Li

Assignee

Comment 20

•

22 years ago

Attached patch new patch (IsBreakable replaced CanBreakBefore) (obsolete) (deleted) — Details — Splinter Review

Attachment #94204 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 21

•

22 years ago

Comment on attachment 94933 [details] [diff] [review] new patch (IsBreakable replaced CanBreakBefore) carry over r/sr

Attachment #94933 - Flags: superreview+

Attachment #94933 - Flags: review+

Shanjian Li

Assignee

Comment 22

•

22 years ago

fix checked in to trunk.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

David Baron :dbaron:

Comment 23

•

22 years ago

This caused bug 162670, which is a smoketest blocker. (The browser is hanging reliably on bonsai query results and intermittently on various bugzilla pages.)

Shanjian Li

Assignee

Comment 24

•

22 years ago

backed out my patch, reopen this bug. I need to figure out why it cause the regression.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Aaron Leventhal

Comment 25

•

22 years ago

Please say "Chinese Japenese Korean" in the comments you check in. It took me a while to figure out what it means, not everyone knows that. Thanks :)

Shanjian Li

Assignee

Comment 26

•

22 years ago

Attached patch new patch to address the regression (obsolete) (deleted) — Details — Splinter Review

Attachment #94933 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 27

•

22 years ago

My guess what's happening: When the currect frame only contains part of the word, mCanBreakBefore is set to true in 2nd loop. Since there is no more text in current frame, we will continue to check next frame to check the possibility of joined word. Problem will happen if the joined word is too long. Since mCanBreakBefore is set to true, we back up the part, and thus make no progress in the whole call. That will lead to endless loop. Follow this theory, I made a patch and it fix the regression. boris, rbs, r/sr?

Status: REOPENED → ASSIGNED

Boris Zbarsky [:bzbarsky]

Comment 28

•

22 years ago

Comment on attachment 95472 [details] [diff] [review] new patch to address the regression sr=bzbarsky

Attachment #95472 - Flags: superreview+

rbs

Comment 29

•

22 years ago

Comment on attachment 95472 [details] [diff] [review] new patch to address the regression r=rbs

Attachment #95472 - Flags: review+

Shanjian Li

Assignee

Comment 30

•

22 years ago

Attached patch revised patch to address regression 163034 (deleted) — Details — Splinter Review

Shanjian Li

Assignee

Comment 31

•

22 years ago

Just as rbs pointed out in bug 160547, the regression is because we failed to do the "RevertSpacesToNBSP" before line break. To check if the 2 connecting piece is breakable in between is absolutely correct. Since this is a linebreaker call, "RevertSpacesToNBSP" should also be called in this case. As a result, "RevertSpacesToNBSP" should be called regardless of "aCanBreakBefore".

Shanjian Li

Assignee

Updated

•

22 years ago

Attachment #95472 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 32

•

22 years ago

boris, rbs, r/sr?

rbs

Comment 33

•

22 years ago

Comment on attachment 96368 [details] [diff] [review] revised patch to address regression 163034 r/sr=rbs bz isn't there -- but I think you can carry his review in this instance (known regression if the patch isn't updated).

Attachment #96368 - Flags: superreview+

Attachment #96368 - Flags: review+

rbs

Comment 34

•

22 years ago

Cc:ing jrgm so that he can keep this in his pageload radar when the patch lands.

Shanjian Li

Assignee

Comment 35

•

22 years ago

checked in to trunk. We need to verify this ASAP so that we are confident to land it one branch.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago → 22 years ago

Resolution: --- → FIXED

Malcolm Rowe

Updated

•

22 years ago

Blocks: 163034

Teruko Kobayashi

Comment 36

•

22 years ago

I verified this in 2002-08-24-04-trunk Win32 and Linux builds. However, we have not received new Mac trunk build, yet. I cannot verify this on Mac build.

Keywords: intl

Boris Zbarsky [:bzbarsky]

Comment 37

•

22 years ago

posthumous sr=bzbarsky for attachment 96368 [details] [diff] [review]

Teruko Kobayashi

Comment 38

•

22 years ago

Verified fixed in 8-26-08 Mac trunk build.

Jaime Rodriguez, Jr.

Comment 39

•

22 years ago

petersen: Chris, is it ok that we mark this as verified per Comment #38 From Teruko Kobayashi? Thanks! This is needed by a major embeddor. Marking as Topembed+ per Valeski.

Severity: normal → major

Keywords: nsbeta1+, topembed+

Priority: P3 → P2

Whiteboard: [adt2]

Target Milestone: --- → mozilla1.0.2

Jaime Rodriguez, Jr.

Updated

•

22 years ago

Blocks: 154896

Keywords: approval, edt1.0.2, mozilla1.0.2

Chris Petersen

Comment 40

•

22 years ago

Verified on Windows ME (2002-09-10-08 ) and OS X (2002-09-09-08) Trunk.

Status: RESOLVED → VERIFIED

Randell Jesup [:jesup] (needinfo me)

Comment 41

•

22 years ago

Is this still wanted for 1.0.2? Is the final patch there one that will apply against branch cleanly?

Jaime Rodriguez, Jr.

Comment 42

•

22 years ago

Thanks for asking Randell. The answer is *YES*, we still need this checked into the 1.0 branch, prior to Mozilla1.0.2 being tagged, to satisfy the need of a majpr gecko embedding customer.

Shanjian Li

Assignee

Comment 43

•

22 years ago

Attached patch the patch updated for branch (deleted) — Details — Splinter Review

Shanjian Li

Assignee

Comment 44

•

22 years ago

Comment on attachment 98676 [details] [diff] [review] the patch updated for branch This patch is a little different from the trunk one because bug 160547 was checked into the branch but not trunk. So the trunk patch in fact includes the patch for 160547. Carry over r/sr

Attachment #98676 - Flags: superreview+

Attachment #98676 - Flags: review+

Shanjian Li

Assignee

Comment 45

•

22 years ago

I verified the patch on branch with test case in bug 160547, 161328, and 162034.

chris hofmann

Comment 46

•

22 years ago

Comment on attachment 98676 [details] [diff] [review] the patch updated for branch a=chofmann for 1.0.2

Attachment #98676 - Flags: approval+

Jaime Rodriguez, Jr.

Comment 47

•

22 years ago

Marking as mozilla1.0.2+ per Comment #46 From chris hofmann. pls check this in asap, then replace "mozilla1.0.2+" with "fixed1.0.2". thanks!

Keywords: edt1.0.2, mozilla1.0.2 → edt1.0.2+, mozilla1.0.2+

Whiteboard: [adt2] → [adt2] [ETA 09/12]

Shanjian Li

Assignee

Comment 48

•

22 years ago

fix checked into branch.

Keywords: mozilla1.0.2+ → fixed1.0.2

Chris Petersen

Comment 49

•

22 years ago

Verified on branch.

Keywords: verified1.0.2

You need to log in before you can comment on or make changes to this bug.