Closed Bug 161328 Opened 22 years ago Closed 22 years ago

CJK string is not breakable before joined frames

Categories

(Core :: Layout, defect, P2)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla1.0.2

People

(Reporter: shanjian, Assigned: shanjian)

References

Details

(Keywords: intl, topembed+, Whiteboard: [adt2] [ETA 09/12])

Attachments

(4 files, 3 obsolete files)

This bug is splitted from bug 160547. For CJK string (or mixed) without space, we failed to mark it breakable when processing joined word. Using following string as an example, if joined string "..123456789.123456789.123456789." is later found not fit , it should be wrapped to next line. But we failed to do so because the CJK part of the string is not marked as breakable., Ç°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑ.<font>.123456789.123456789.123456789.</font>
Attached file test case (deleted) —
from bug 160547: The name "aIsBreakable" is a little misleading. If it is false, it indicates that the string is the first word in the line, and must be kept there (unbreakable). Otherwise, it means the current word can be moved to next line. In MeasureText function, aTextData.mIsBreakable is set to true when a space character is first met. This trick works perfect for latin language, but not for language that does not use space as word breaker, like CJK. A more generic solution is to set aTextData.mIsBreakable to true after the first word is processed. In Yuying's testcase, since there is not space character before dot, the whole string is considered unbreakable, which mean it must remain in the current line. but that is incorrect. It should break after the last chinese character. Not to break dot and dot (ie. "..") is a correct behavior. I
Status: NEW → ASSIGNED
Attached patch patch (include patch for 160547) (obsolete) (deleted) — Splinter Review
*** Bug 161319 has been marked as a duplicate of this bug. ***
So... are there two independent issues here? 1) What should aTextData.mIsBreakable be set to? 2) What should aIsBreakable be set to? Note that my original intent was to set the latter based on aTextData.mWrapping (which I did not notice at the time, so went with mIsBreakable instead). I'm not sure what mIsBreakable being misset affects right now, other than the value of aIsBreakable... but it sounds like we want to change the way we set mIsBreakable no matter what and then consider what aIsBreakable should be set to....
A test page that contains some double byte period sign case. Also by following html standard, is it legal that a single byte dot "." in the beginning of line in CJK pages? if we break line like: <CJK characters> .12345
>1) What should aTextData.mIsBreakable be set to? This variable should only be set to true for first word in the line. >2) What should aIsBreakable be set to? It is reasonable to use aTextData.mIsBreakable. aTextData.mWrapping is not a good choice. If we can't keep at least one word in the line, it is possible to enter a endless loop.
> Also by following html standard, is it legal that a single byte dot "." in the > beginning of line in CJK pages? if we break line like: > <CJK characters> > .12345 Yes. "." can be interpret together with number as a decimal number. In fact, there is no html standard for this.
Priority: -- → P3
Boris, could you review my patch?
Comment on attachment 94204 [details] [diff] [review] patch (include patch for 160547) >+ if (!aTextData.mIsBreakable && !firstThing && !isWhitespace) >+ aTextData.mIsBreakable = PR_TRUE; >+ } Why not just if (!firstThing && !isWhitespace) { aTextData.mIsBreakable = PR_TRUE; } ? That's just a no-op if aTextData.mIsBreakable is already true, so... You're missing a '{' after that |if|'s condition. Have you compiled and tested this? ;) sr=bzbarsky otherwise, but someone who really understands this code needs to do the r=...
Attachment #94204 - Flags: superreview+
> Why not just > if (!firstThing && !isWhitespace) { > aTextData.mIsBreakable = PR_TRUE; > } !firstThing is almost always true except the first time, and isWhitespace varies. But aTextData.mIsBreakable only only be false the for the first word. To test aTextData.mIsBreakable first will save 2 testing for most situations. That's a performance consideration. > You're missing a '{' after that |if|'s condition. Have you compiled and tested > this? ;) I found out that when I compiled the patch, but forgot to update the patch before I posted it.
OK. Sounds good, then.
frank raised a question regarding to this bug and 161139. He asked if we did something inconsistent in wordbreaking when measuring text for cell length and when measuring text for rendering? In my understanding, maxWordLength is used to determine cell length, but we failed to break 2 words. That leads to a long word longer than the cell width calculated from maxwordlength.
rbs, could you r=?
From comment 2, comment 5, and bug 160547 comment 8, it is now apparent that the name "aIsBreakable" is the cause of much confusion. So it can be changed to avoid further confusion in the future. What about using aTextData.mIsFirstWord (i.e., rename and negate the clauses throughout). With the rename, it will be clear why you still needed the |else|, and it might help to avoid other regressions. Wasn't that possible for the word-breaker to break after the punctuations, i.e., after "..", "...", as generally the case in typography? People often do "<a>link</a>." or "<a>link</a>,"
A second thought about this,and I think the existing name is OK. In the context, we really don't care if it is the first word or not, but only care if it is breakable. A more accurate description might be "aIsBreakableBefore", but that sounds a little bit clumsy. After we make it clear that word is processed as a unit, and word is unbreakable, the name stands well. "mIsFirstWord" has its own problem. Is whitespace a word? Non-skipping whitespace make next word breakable, and skipping whitespace does not. > Wasn't that possible for the word-breaker to break after the punctuations, i.e., > after "..", "...", as generally the case in typography? People often do > "<a>link</a>." or "<a>link</a>," that's a separate issue. We considered about this when design the word breaker. In order to handle smily face sequence, like :-), we can't break just because it is after the punc.
Since there is now ample evidence that the other name is confusing, I like "aIsBreakableBefore". r=rbs with that. (LXR:breakbefore shows that the name isn't as clumsy as it seems.)
I just got another idea. Howabout "aIsWrappable"?
Not as interesting. If you don't like very much aIsBreakableBefore, what about aCanBreakBefore? My line of thought was for a name such that when you do if (aCanBreakBefore) ... do this else // ---> nobody wonders why there is still an 'else'... ... do that
Attached patch new patch (IsBreakable replaced CanBreakBefore) (obsolete) (deleted) — Splinter Review
Attachment #94204 - Attachment is obsolete: true
Comment on attachment 94933 [details] [diff] [review] new patch (IsBreakable replaced CanBreakBefore) carry over r/sr
Attachment #94933 - Flags: superreview+
Attachment #94933 - Flags: review+
fix checked in to trunk.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
This caused bug 162670, which is a smoketest blocker. (The browser is hanging reliably on bonsai query results and intermittently on various bugzilla pages.)
backed out my patch, reopen this bug. I need to figure out why it cause the regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Please say "Chinese Japenese Korean" in the comments you check in. It took me a while to figure out what it means, not everyone knows that. Thanks :)
Attached patch new patch to address the regression (obsolete) (deleted) — Splinter Review
Attachment #94933 - Attachment is obsolete: true
My guess what's happening: When the currect frame only contains part of the word, mCanBreakBefore is set to true in 2nd loop. Since there is no more text in current frame, we will continue to check next frame to check the possibility of joined word. Problem will happen if the joined word is too long. Since mCanBreakBefore is set to true, we back up the part, and thus make no progress in the whole call. That will lead to endless loop. Follow this theory, I made a patch and it fix the regression. boris, rbs, r/sr?
Status: REOPENED → ASSIGNED
Comment on attachment 95472 [details] [diff] [review] new patch to address the regression sr=bzbarsky
Attachment #95472 - Flags: superreview+
Comment on attachment 95472 [details] [diff] [review] new patch to address the regression r=rbs
Attachment #95472 - Flags: review+
Just as rbs pointed out in bug 160547, the regression is because we failed to do the "RevertSpacesToNBSP" before line break. To check if the 2 connecting piece is breakable in between is absolutely correct. Since this is a linebreaker call, "RevertSpacesToNBSP" should also be called in this case. As a result, "RevertSpacesToNBSP" should be called regardless of "aCanBreakBefore".
Attachment #95472 - Attachment is obsolete: true
boris, rbs, r/sr?
Comment on attachment 96368 [details] [diff] [review] revised patch to address regression 163034 r/sr=rbs bz isn't there -- but I think you can carry his review in this instance (known regression if the patch isn't updated).
Attachment #96368 - Flags: superreview+
Attachment #96368 - Flags: review+
Cc:ing jrgm so that he can keep this in his pageload radar when the patch lands.
checked in to trunk. We need to verify this ASAP so that we are confident to land it one branch.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Blocks: 163034
I verified this in 2002-08-24-04-trunk Win32 and Linux builds. However, we have not received new Mac trunk build, yet. I cannot verify this on Mac build.
Keywords: intl
Verified fixed in 8-26-08 Mac trunk build.
petersen: Chris, is it ok that we mark this as verified per Comment #38 From Teruko Kobayashi? Thanks! This is needed by a major embeddor. Marking as Topembed+ per Valeski.
Severity: normal → major
Keywords: nsbeta1+, topembed+
Priority: P3 → P2
Whiteboard: [adt2]
Target Milestone: --- → mozilla1.0.2
Verified on Windows ME (2002-09-10-08 ) and OS X (2002-09-09-08) Trunk.
Status: RESOLVED → VERIFIED
Is this still wanted for 1.0.2? Is the final patch there one that will apply against branch cleanly?
Thanks for asking Randell. The answer is *YES*, we still need this checked into the 1.0 branch, prior to Mozilla1.0.2 being tagged, to satisfy the need of a majpr gecko embedding customer.
Attached patch the patch updated for branch (deleted) — Splinter Review
Comment on attachment 98676 [details] [diff] [review] the patch updated for branch This patch is a little different from the trunk one because bug 160547 was checked into the branch but not trunk. So the trunk patch in fact includes the patch for 160547. Carry over r/sr
Attachment #98676 - Flags: superreview+
Attachment #98676 - Flags: review+
I verified the patch on branch with test case in bug 160547, 161328, and 162034.
Comment on attachment 98676 [details] [diff] [review] the patch updated for branch a=chofmann for 1.0.2
Attachment #98676 - Flags: approval+
Marking as mozilla1.0.2+ per Comment #46 From chris hofmann. pls check this in asap, then replace "mozilla1.0.2+" with "fixed1.0.2". thanks!
Whiteboard: [adt2] → [adt2] [ETA 09/12]
fix checked into branch.
Verified on branch.
Keywords: verified1.0.2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: