Closed
Bug 161328
Opened 22 years ago
Closed 22 years ago
CJK string is not breakable before joined frames
Categories
(Core :: Layout, defect, P2)
Tracking
()
VERIFIED
FIXED
mozilla1.0.2
People
(Reporter: shanjian, Assigned: shanjian)
References
Details
(Keywords: intl, topembed+, Whiteboard: [adt2] [ETA 09/12])
Attachments
(4 files, 3 obsolete files)
(deleted),
text/html
|
Details | |
(deleted),
text/html
|
Details | |
(deleted),
patch
|
rbs
:
review+
rbs
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
shanjian
:
review+
shanjian
:
superreview+
chofmann
:
approval+
|
Details | Diff | Splinter Review |
This bug is splitted from bug 160547.
For CJK string (or mixed) without space, we failed to mark it breakable when
processing joined word.
Using following string as an example, if joined string
"..123456789.123456789.123456789." is later found not fit , it should be wrapped
to next line. But we failed to do so because the CJK part of the string is not
marked as breakable.,
Ç°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑÇ°ÒÑ.<font>.123456789.123456789.123456789.</font>
Assignee | ||
Comment 1•22 years ago
|
||
Assignee | ||
Comment 2•22 years ago
|
||
from bug 160547:
The name "aIsBreakable" is a little misleading. If it is false, it indicates
that the string is the first word in the line, and must be kept there
(unbreakable). Otherwise, it means the current word can be moved to next line.
In MeasureText function, aTextData.mIsBreakable is set to true when a space
character is first met. This trick works perfect for latin language, but not for
language that does not use space as word breaker, like CJK. A more generic
solution is to set aTextData.mIsBreakable to true after the first word is
processed.
In Yuying's testcase, since there is not space character before dot, the whole
string is considered unbreakable, which mean it must remain in the current line.
but that is incorrect. It should break after the last chinese character. Not to
break dot and dot (ie. "..") is a correct behavior.
I
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•22 years ago
|
||
Assignee | ||
Comment 4•22 years ago
|
||
*** Bug 161319 has been marked as a duplicate of this bug. ***
Comment 5•22 years ago
|
||
So... are there two independent issues here?
1) What should aTextData.mIsBreakable be set to?
2) What should aIsBreakable be set to?
Note that my original intent was to set the latter based on aTextData.mWrapping
(which I did not notice at the time, so went with mIsBreakable instead).
I'm not sure what mIsBreakable being misset affects right now, other than the
value of aIsBreakable... but it sounds like we want to change the way we set
mIsBreakable no matter what and then consider what aIsBreakable should be set to....
Comment 6•22 years ago
|
||
A test page that contains some double byte period sign case.
Also by following html standard, is it legal that a single byte dot "." in the
beginning of line in CJK pages? if we break line like:
<CJK characters>
.12345
Assignee | ||
Comment 7•22 years ago
|
||
>1) What should aTextData.mIsBreakable be set to?
This variable should only be set to true for first word in the line.
>2) What should aIsBreakable be set to?
It is reasonable to use aTextData.mIsBreakable. aTextData.mWrapping is not a
good choice. If we can't keep at least one word in the line, it is possible to
enter a endless loop.
Assignee | ||
Comment 8•22 years ago
|
||
> Also by following html standard, is it legal that a single byte dot "." in the
> beginning of line in CJK pages? if we break line like:
> <CJK characters>
> .12345
Yes. "." can be interpret together with number as a decimal number. In fact,
there is no html
standard for this.
Updated•22 years ago
|
Priority: -- → P3
Assignee | ||
Comment 9•22 years ago
|
||
Boris, could you review my patch?
Comment 10•22 years ago
|
||
Comment on attachment 94204 [details] [diff] [review]
patch (include patch for 160547)
>+ if (!aTextData.mIsBreakable && !firstThing && !isWhitespace)
>+ aTextData.mIsBreakable = PR_TRUE;
>+ }
Why not just
if (!firstThing && !isWhitespace) {
aTextData.mIsBreakable = PR_TRUE;
}
? That's just a no-op if aTextData.mIsBreakable is already true, so...
You're missing a '{' after that |if|'s condition. Have you compiled and tested
this? ;)
sr=bzbarsky otherwise, but someone who really understands this code needs to do
the r=...
Attachment #94204 -
Flags: superreview+
Assignee | ||
Comment 11•22 years ago
|
||
> Why not just
> if (!firstThing && !isWhitespace) {
> aTextData.mIsBreakable = PR_TRUE;
> }
!firstThing is almost always true except the first time, and isWhitespace varies.
But aTextData.mIsBreakable only only be false the for the first word. To test
aTextData.mIsBreakable first will save 2 testing for most situations. That's a
performance consideration.
> You're missing a '{' after that |if|'s condition. Have you compiled and tested
> this? ;)
I found out that when I compiled the patch, but forgot to update the patch
before I posted it.
Comment 12•22 years ago
|
||
OK. Sounds good, then.
Assignee | ||
Comment 13•22 years ago
|
||
frank raised a question regarding to this bug and 161139. He asked if we did
something inconsistent in wordbreaking when measuring text for cell length and
when measuring text for rendering? In my understanding, maxWordLength is used to
determine cell length, but we failed to break 2 words. That leads to a long word
longer than the cell width calculated from maxwordlength.
Assignee | ||
Comment 14•22 years ago
|
||
rbs, could you r=?
Comment 15•22 years ago
|
||
From comment 2, comment 5, and bug 160547 comment 8, it is now apparent that the
name "aIsBreakable" is the cause of much confusion. So it can be changed to
avoid further confusion in the future. What about using aTextData.mIsFirstWord
(i.e., rename and negate the clauses throughout). With the rename, it will be
clear why you still needed the |else|, and it might help to avoid other
regressions.
Wasn't that possible for the word-breaker to break after the punctuations, i.e.,
after "..", "...", as generally the case in typography? People often do
"<a>link</a>." or "<a>link</a>,"
Assignee | ||
Comment 16•22 years ago
|
||
A second thought about this,and I think the existing name is OK. In the context,
we really don't care if it is the first word or not, but only care if it is
breakable. A more accurate description might be "aIsBreakableBefore", but that
sounds a little bit clumsy. After we make it clear that word is processed as a
unit, and word is unbreakable, the name stands well. "mIsFirstWord" has its own
problem. Is whitespace a word? Non-skipping whitespace make next word breakable,
and skipping whitespace does not.
> Wasn't that possible for the word-breaker to break after the punctuations, i.e.,
> after "..", "...", as generally the case in typography? People often do
> "<a>link</a>." or "<a>link</a>,"
that's a separate issue. We considered about this when design the word breaker.
In order to handle smily face sequence, like :-), we can't break just because it
is after the punc.
Comment 17•22 years ago
|
||
Since there is now ample evidence that the other name is confusing, I like
"aIsBreakableBefore". r=rbs with that.
(LXR:breakbefore shows that the name isn't as clumsy as it seems.)
Assignee | ||
Comment 18•22 years ago
|
||
I just got another idea. Howabout "aIsWrappable"?
Comment 19•22 years ago
|
||
Not as interesting. If you don't like very much aIsBreakableBefore, what about
aCanBreakBefore?
My line of thought was for a name such that when you do
if (aCanBreakBefore)
... do this
else // ---> nobody wonders why there is still an 'else'...
... do that
Assignee | ||
Comment 20•22 years ago
|
||
Attachment #94204 -
Attachment is obsolete: true
Assignee | ||
Comment 21•22 years ago
|
||
Comment on attachment 94933 [details] [diff] [review]
new patch (IsBreakable replaced CanBreakBefore)
carry over r/sr
Attachment #94933 -
Flags: superreview+
Attachment #94933 -
Flags: review+
Assignee | ||
Comment 22•22 years ago
|
||
fix checked in to trunk.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 23•22 years ago
|
||
This caused bug 162670, which is a smoketest blocker. (The browser is hanging
reliably on bonsai query results and intermittently on various bugzilla pages.)
Assignee | ||
Comment 24•22 years ago
|
||
backed out my patch, reopen this bug. I need to figure out why it cause the
regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 25•22 years ago
|
||
Please say "Chinese Japenese Korean" in the comments you check in. It took me a
while to figure out what it means, not everyone knows that. Thanks :)
Assignee | ||
Comment 26•22 years ago
|
||
Attachment #94933 -
Attachment is obsolete: true
Assignee | ||
Comment 27•22 years ago
|
||
My guess what's happening:
When the currect frame only contains part of the word, mCanBreakBefore is set to
true in 2nd loop. Since there is no more text in current frame, we will continue
to check next frame to check the possibility of joined word. Problem will happen
if the joined word is too long. Since mCanBreakBefore is set to true, we back up
the part, and thus make no progress in the whole call. That will lead to endless
loop.
Follow this theory, I made a patch and it fix the regression.
boris, rbs, r/sr?
Status: REOPENED → ASSIGNED
Comment 28•22 years ago
|
||
Comment on attachment 95472 [details] [diff] [review]
new patch to address the regression
sr=bzbarsky
Attachment #95472 -
Flags: superreview+
Comment 29•22 years ago
|
||
Comment on attachment 95472 [details] [diff] [review]
new patch to address the regression
r=rbs
Attachment #95472 -
Flags: review+
Assignee | ||
Comment 30•22 years ago
|
||
Assignee | ||
Comment 31•22 years ago
|
||
Just as rbs pointed out in bug 160547, the regression is because we failed to do
the "RevertSpacesToNBSP" before line break. To check if the 2 connecting piece
is breakable in between is absolutely correct. Since this is a linebreaker call,
"RevertSpacesToNBSP" should also be called in this case. As a result,
"RevertSpacesToNBSP" should be called regardless of "aCanBreakBefore".
Assignee | ||
Updated•22 years ago
|
Attachment #95472 -
Attachment is obsolete: true
Assignee | ||
Comment 32•22 years ago
|
||
boris, rbs, r/sr?
Comment 33•22 years ago
|
||
Comment on attachment 96368 [details] [diff] [review]
revised patch to address regression 163034
r/sr=rbs
bz isn't there -- but I think you can carry his review in this instance (known
regression if the patch isn't updated).
Attachment #96368 -
Flags: superreview+
Attachment #96368 -
Flags: review+
Comment 34•22 years ago
|
||
Cc:ing jrgm so that he can keep this in his pageload radar when the patch lands.
Assignee | ||
Comment 35•22 years ago
|
||
checked in to trunk.
We need to verify this ASAP so that we are confident to land it one branch.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → FIXED
Comment 36•22 years ago
|
||
I verified this in 2002-08-24-04-trunk Win32 and Linux builds. However, we have
not received new Mac trunk build, yet. I cannot verify this on Mac build.
Keywords: intl
Comment 37•22 years ago
|
||
posthumous sr=bzbarsky for attachment 96368 [details] [diff] [review]
Comment 38•22 years ago
|
||
Verified fixed in 8-26-08 Mac trunk build.
Comment 39•22 years ago
|
||
petersen: Chris, is it ok that we mark this as verified per Comment #38 From
Teruko Kobayashi? Thanks!
This is needed by a major embeddor. Marking as Topembed+ per Valeski.
Comment 40•22 years ago
|
||
Verified on Windows ME (2002-09-10-08 ) and OS X (2002-09-09-08) Trunk.
Status: RESOLVED → VERIFIED
Comment 41•22 years ago
|
||
Is this still wanted for 1.0.2? Is the final patch there one that will apply
against branch cleanly?
Comment 42•22 years ago
|
||
Thanks for asking Randell. The answer is *YES*, we still need this checked into
the 1.0 branch, prior to Mozilla1.0.2 being tagged, to satisfy the need of a
majpr gecko embedding customer.
Assignee | ||
Comment 43•22 years ago
|
||
Assignee | ||
Comment 44•22 years ago
|
||
Comment on attachment 98676 [details] [diff] [review]
the patch updated for branch
This patch is a little different from the trunk one because bug 160547 was
checked into the branch but not trunk. So the trunk patch in fact includes the
patch for 160547.
Carry over r/sr
Attachment #98676 -
Flags: superreview+
Attachment #98676 -
Flags: review+
Assignee | ||
Comment 45•22 years ago
|
||
I verified the patch on branch with test case in bug 160547, 161328, and
162034.
Comment 46•22 years ago
|
||
Comment on attachment 98676 [details] [diff] [review]
the patch updated for branch
a=chofmann for 1.0.2
Attachment #98676 -
Flags: approval+
Comment 47•22 years ago
|
||
Marking as mozilla1.0.2+ per Comment #46 From chris hofmann. pls check this in
asap, then replace "mozilla1.0.2+" with "fixed1.0.2". thanks!
Whiteboard: [adt2] → [adt2] [ETA 09/12]
You need to log in
before you can comment on or make changes to this bug.
Description
•