Closed
Bug 36322
Opened 25 years ago
Closed 20 years ago
Japanese text justification
Categories
(Core :: Layout: Text and Fonts, defect, P5)
Core
Layout: Text and Fonts
Tracking
()
RESOLVED
FIXED
Future
People
(Reporter: emk, Assigned: masayuki)
References
()
Details
(Keywords: intl)
Attachments
(3 files, 8 obsolete files)
(deleted),
image/png
|
Details | |
(deleted),
text/html
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
Steps to reproduce:
1. Launch Mozilla.
2. Navigate to the following URL.
http://www.asahi-net.or.jp/%7Esd5a-ucd/css1/test/ja/sec546.htm
Actual result:
The fourth paragraph is not justified at all.
Expected result:
We should justify Japanese text using inter-letter spacing since Japanese text
may have no white space.
IE4 works as expected.
IE5 works as expected if we also assign 'text-justify: inter-ideograph'.
here's a fun for you buster. You probably thought you were done with
justification, huh?
Assignee: troy → buster
Robert: is this a task you'd be interested in doing as well?
Erik, Frank: could you guys comment on what the requirements are for non-latin
justification?
Status: NEW → ASSIGNED
Summary: Japanese text justifycation → Japanese text justification
Comment 3•25 years ago
|
||
Japanese text does not have spaces between words, so justification is performed
by inserting space between characters. Here is a link to the spec that MSIE is
using:
http://www.w3.org/TR/1999/WD-i18n-format-19990910/#a5-1
First of all, bug #36069 addresses some cleanup of the justification code that
should probably be taken care of before we worry about this.
I actually just noticed this issue myself at
http://www.pghchinesechurch.org/en/2000EasterCrusade/EasterCrusade.htm
It demonstrates how justified mixed Chinese and Latin text (mis?)behaves.
It probably would not be difficult to implement the modes "inter-word",
"inter-ideograph", "distribute" and "distribute-all-lines". (The "newspaper"
mode looks nightmarish.) For each of these modes, it seems that we can just
count "number of sites to insert justification space" (generalizing "number of
spaces"), and distribute justification space evenly among the sites.
The actually difficult work would be to correctly identify the sites in each
case. The "distribute" and "distribute-all-lines" modes are easy in that
respect, giving one site between each letter. (I assume that the output of
nsTextTransformer gives one Unicode char per letter; if not, there would be more
work to do here and letter-spacing would already be broken.)
For "inter-ideograph" mode, I suspect that nsILineBreaker would give the correct
breaks --- can someone confirm that?
I'm not sure what the intent of "inter-word" mode is. If the boundaries between
"words" are meant to be whitespace only, then we already implement it :-), but
it's not useful for CJK :-(. Otherwise, I'm not sure where the boundaries are
supposed to be...
The most rational approach might be to add a new "nsIJustificationBreaker" to
LWBrk, which takes the justification mode as a parameter. If someone provides
that, I can do the rest :-), but probably not for a couple of weeks.
Another issue is what mode we should actually use. Are we just going to pick one
mode and implement that, or are we actually going to acquire and use the style
information from this W3C draft? If we just pick one mode, which one?
By the way, I wonder if it's really a good idea for a W3C draft to have its
editor and contributors all owned by the same company, particularly when that
company is one where the phrase "putting the competition on a treadmill" has
wide currency.
Reporter | ||
Comment 5•25 years ago
|
||
>For "inter-ideograph" mode, I suspect that nsILineBreaker would give the
>correct breaks --- can someone confirm that?
I think so, too (at least Japanese and Latin script).
>Another issue is what mode we should actually use. Are we just going to pick
>one mode and implement that, or are we actually going to acquire and use the
>style information from this W3C draft? If we just pick one mode, which one?
The "inter-ideograph" (like) mode seems to be the most preferable.
The "inter-word" is useless for CJK (as you said). The "distribute"
and "distribute-all-lines" will be not desiable for Latin script.
>By the way, I wonder if it's really a good idea for a W3C draft to have its
>editor and contributors all owned by the same company, particularly when that
>company is one where the phrase "putting the competition on a treadmill" has
>wide currency.
I think we do not have to persist a Working-Draft. CSS2 spec only says:
|Note. The actual justification algorithm used is user-agent and written
|language dependent.
Robert: are you interested in owning this bug? I'd be grateful.
Assignee: buster → roc+moz
Status: ASSIGNED → NEW
Sure, no problem.
Status: NEW → ASSIGNED
OS: Windows 2000 → All
Priority: P3 → --
Hardware: PC → All
Target Milestone: --- → Future
Comment 8•22 years ago
|
||
Mozilla's (2003040708/win98) rendering seem reasonable to me. Reporter,
does the problem still persist?
Keywords: intl
Priority: -- → P5
Comment 9•21 years ago
|
||
re comment #4 and #5 on CJK and interword vs inter-ideograph spacing:
When it comes to linebreaking and justification, C and J go together (most of
time), but K has its own way (words are delimetered by spaces in modern Korean
text).
re comment #8: I guess we still need to implement 'inter-ideograph'-equivalent
(I believe a new CSS draft uses a different term) and some other justification
methods.
Updated•21 years ago
|
Component: Layout → Layout: Fonts and Text
Comment 10•21 years ago
|
||
Comment 11•21 years ago
|
||
Still repro on 1.7 RC1.
Assignee | ||
Comment 12•20 years ago
|
||
I created test patch. But this patch doesn't work fine.
In |nsTextFrame::ComputeExtraJustificationSpacing|,
the extra space is solved by |mRect.width - trueDimensions.width|.
This |mRect.width| is good when English test case is tested.
http://www.w3.org/Style/CSS/Test/CSS1/current/sec546.htm
i.e., if the justifying paragraph has 3 lines, first line's |mRect.width| and
second line's one are same value.
However, the value is not good when Japanese test case is tested.
http://www.asahi-net.or.jp/%7Esd5a-ucd/css1/test/ja/sec546.htm
If the justifying paragraph has 3 lines, first line's one and second line's one
are different value.
I cannot solve the problem.
Please tell me.
Assignee | ||
Comment 13•20 years ago
|
||
e.g.,
In English test case,
mRect.width: 9510
trueDimensions.width: 8970
extraSpace: 540
mRect.width: 9510
trueDimensions.width: 8985
extraSpace: 525
mRect.width: 7590
trueDimensions.width: 7590
extraSpace: 0
In Japanese test case,
mRect.width: 9675
trueDimensions.width: 9675
extraSpace: 0
mRect.width: 9525
trueDimensions.width: 9525
extraSpace: 0
mRect.width: 7650
trueDimensions.width: 7650
extraSpace: 0
Assignee | ||
Comment 14•20 years ago
|
||
> In Japanese test case,
>
> mRect.width: 9675 <- A
> trueDimensions.width: 9675
> extraSpace: 0
>
> mRect.width: 9525 <- B
> trueDimensions.width: 9525
> extraSpace: 0
>
> mRect.width: 7650
> trueDimensions.width: 7650
> extraSpace: 0
A and B should be same value.
Assignee | ||
Comment 15•20 years ago
|
||
This patch works fine.
I will create clean patch soon.
Attachment #163004 -
Attachment is obsolete: true
Assignee | ||
Comment 16•20 years ago
|
||
Attachment #163119 -
Attachment is obsolete: true
Assignee | ||
Comment 17•20 years ago
|
||
Comment on attachment 163128 [details] [diff] [review]
Patch rv1
Test site(written in Japanese)
http://www.d-toybox.com/studio/weblog/show.php
Attachment #163128 -
Flags: review?(roc)
Assignee | ||
Comment 18•20 years ago
|
||
I created test build here.
http://www.d-toybox.com/mozilla/testbuilds/bug2964.zip
Assignee | ||
Comment 19•20 years ago
|
||
Oops.. I forgot to talk it.
This patch makes same as 'inter-ideograph'.
I think that this selection is best solution.
Because it is beautiful :-)
Assignee | ||
Comment 20•20 years ago
|
||
Jungshik Shin:
I exclude Hangul in this patch.
Because Hangul uses SPACE between words.
Is it OK?
Comment 21•20 years ago
|
||
I'm not sure if it's a good idea to change the behavior of justification based
on Unicode character ranges.
In addition, I need more explanation as to what this patch does before answering
your question.
Assignee | ||
Comment 22•20 years ago
|
||
The patch increase space after IS_HAVING_SPACE_FOR_JUSTIFY characters.
# I think that 'text-justify' will be able to be implemented.
Attachment #163128 -
Flags: superreview?(roc)
Attachment #163128 -
Flags: review?(roc)
Attachment #163128 -
Flags: review?(jshin)
Comment 23•20 years ago
|
||
Comment on attachment 163128 [details] [diff] [review]
Patch rv1
I'm not the best person to review this. However, I don't believe this patch can
be checked in as it is for the following reason:
I'm not so fond of excluding Hangul syllables from characters that get extra
space when justifying because a. it will make Korean documents with both Hangul
and Chinese characters look strange. b. I don't think 'char-based' criterion is
such a good idea.
Attachment #163128 -
Flags: review?(jshin) → review?(dbaron)
Jungshik, I'll do the review, I just want you to sign off on it first.
Comment 25•20 years ago
|
||
At the minimum, 'langGroup' has to be checked and what's done in the current
patch('inter-ideograph' justification) has to be applied only when 'langGroup'
is 'ja' or 'zh-*'. (
http://www.w3.org/TR/2003/CR-css3-text-20030514/#justification-prop)
Assignee | ||
Comment 26•20 years ago
|
||
Jungshik:
Can you show the test case of Hanglu + Chinese characters?
Assignee | ||
Comment 27•20 years ago
|
||
Umm...
I think that Jungsik's way is better way.
But I don't know how get lang group in nsTextFrame.
Please help me.
Assignee | ||
Comment 28•20 years ago
|
||
I think that there are two ways.
1. Include Hanglu to IS_HAVING_SPACE_FOR_JUSTIFY
This way is according to CSS3 'inter-ideograph' justification.
But I don't know it is better way for Hanglu.
2. Implement 'text-justify' property and author should use
'text-justify: inter-ideograph' for Japanese and Chinese.
Comment 29•20 years ago
|
||
CSS3 Text module (http://www.w3.org/TR/2003/CR-css3-text-20030514/) has the
following:
inter-ideograph
In this mode, letter-spacing modification only occurs for the CJK group.
Others only use inter-word expansion. No kashida effect takes place. This is the
preferred justification in the context of the Japanese writing system, but not
Latin nor Korean.
I more or less agree with the above (i.e. inter-ideograph is not suitable for
Korean text *unless* explicitly requested in a specific context). Note that the
suitability of a particular justification mode is talked in terms of language
and/or writing system (not at the character level).
Falling short of implementing 'text-justify' fully (comment #4) yet, we have to
come up with a reasonable default. The current behavior is not suitable for
Japanese and Chinese text (which is what this bug is about) so that we have to
do what I suggested in comment #25.
How to get 'langGroup' in nsTextFrame? roc, can you help?
what is 'langGroup'? Is that what is set by the HTML/XML lang="kr" attribute?
I don't know off the top of my head how to get that, but we certainly maintain
it because the Seamonkey element properties window shows it (from the context menu).
Comment 31•20 years ago
|
||
Can't you just do |GetPresContext()->GetLangGroup()| ?
Assignee | ||
Comment 32•20 years ago
|
||
> Can't you just do |GetPresContext()->GetLangGroup()| ?
No, |GetLangGroup()| is language of charset of the document.
I need a value of (xml:)lang attribute.
Assignee | ||
Comment 33•20 years ago
|
||
I think that if we can get the value of lang attribute, but the author has not
necessarily set so.
I have an idea.
We implement 'text-justify' propery.
And adding following style to default stylesheet.
:lang(ja), :lang(zh), :lang(zh-TW),
:lang(zh-HK), :lang(zh-CN) {
text-justify: inter-ideograph;
}
Assignee | ||
Comment 34•20 years ago
|
||
Oops...
Sorry. It is not good.
The stylesheet need |* { text-justify: auto; }|.
But it should not be in default stylesheet.
Assignee | ||
Comment 35•20 years ago
|
||
On IE6, Japanese text justification is not occured by 'text-align: justify;'
only. But if 'text-align: justify; text-justify: inter-ideograph;', Japanese
text justification is occured.
Comment 36•20 years ago
|
||
> How to get 'langGroup' in nsTextFrame?
It is |nsStyleVisibility| on the frame style context, see e.g.
|SetFontFromStyle| in nsFrame.cpp:
void SetFontFromStyle(nsIRenderingContext* aRC, nsStyleContext* aSC)
{
const nsStyleFont* font = aSC->GetStyleFont();
const nsStyleVisibility* visibility = aSC->GetStyleVisibility();
aRC->SetFont(font->mFont, visibility->mLangGroup);
}
Assignee | ||
Comment 37•20 years ago
|
||
O.K. Thank you.
This patch refers to langGroup.
Attachment #163128 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #164579 -
Flags: superreview?(roc)
Attachment #164579 -
Flags: review?(dbaron)
Assignee | ||
Updated•20 years ago
|
Attachment #163128 -
Flags: superreview?(roc)
Attachment #163128 -
Flags: review?(dbaron)
Assignee | ||
Comment 38•20 years ago
|
||
Assignee | ||
Comment 39•20 years ago
|
||
Comment on attachment 164579 [details] [diff] [review]
Patch rv2
Sorry, I have some mistake.
Attachment #164579 -
Attachment is obsolete: true
Attachment #164579 -
Flags: superreview?(roc)
Attachment #164579 -
Flags: review?(dbaron)
Comment 40•20 years ago
|
||
You can add layout atoms for zh*'s in content/shared/public/nsLayoutAtomList.h.
Assignee | ||
Comment 41•20 years ago
|
||
Assignee | ||
Updated•20 years ago
|
Attachment #164599 -
Flags: superreview?(roc)
Attachment #164599 -
Flags: review?(dbaron)
Assignee | ||
Comment 42•20 years ago
|
||
Comment on attachment 164599 [details] [diff] [review]
Patch rv3
I sent requesting mail to dbaron. But he does not review and reply.
I'm changing the reviewer to default module owner.
Attachment #164599 -
Flags: review?(dbaron) → review?(roc)
This is basically looking good but I think we need a few changes.
In PrepareUnicodeText, I would like to only count the 'spaces' for justification
when we're actually doing justification. Otherwise I think we might see a slight
performance hit. You should also avoid filling textBuffer in this case, so you'd
need to restore the "if (textBuffer)" checks.
+ PRBool IsCJLangGroup();
I think you should avoid the abbreviation; just say "IsChineseJapaneseLangGroup()".
+#define IS_HAVING_SPACE_FOR_JUSTIFY(u, langIsCJ) \
I think this should be an inline function. It needs a better name: I suggest
IsJustificationCharacter. I also suggest that you make it test (langIsCJ && u >
0xff) so that ASCII characters drop out early.
A lot of comments and variable names refer to "spaces" but now can include many
non-space characters. I think you should change all references to "spaces" to
say "justification characters". It's a bigger change but if we get the words
right now, it will save confusion later.
+/*
+ * The having space characters for justification are only lower than 0xffff on
current code.
+ * If upper 0x10000 characters need the spacing, you have to take care when you
hack.
+ * Therefor, this macro must not include 0xd800 to 0xdbff range.
+ * Because these characters are High surrogate.
+ */
The English here needs some cleanup. (I know it's hard, I once studied Japanese
and gave up!) I suggest this:
/*
* Currently only Unicode characters below 0x10000 have their spacing modified
* by justification. If characters above 0x10000 turn out to need
* justification spacing, that will require extra work. Currently,
* this function must not include 0xd800 to 0xdbff because these characters
* are surrogates.
*/
Comment 44•20 years ago
|
||
Another question to ask is exactly what characters to put them into the
'justifiable' characters. I'm not sure if the current classification scheme is
'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight
modification thereof):
http://lxr.mozilla.org/seamonkey/source/intl/unicharutil/util/nsUnicharUtils.h#91
As for the comment about non-BMP characters not being taken care of, you'd
better use 'U+yyyy' in place of '0xyyyy'. Or, you can just drop the comment
altogether because it's not just your code but also virtually all other code in
layout that has that problem.
Assignee | ||
Comment 45•20 years ago
|
||
> I'm not sure if the current classification scheme is
> 'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight
> modification thereof):
I implement to the function.
Because we will implement 'text-justify' property, the code may complicate.
Comment on attachment 164599 [details] [diff] [review]
Patch rv3
minusing because we need a new patch
Attachment #164599 -
Flags: superreview?(roc)
Attachment #164599 -
Flags: superreview-
Attachment #164599 -
Flags: review?(roc)
Attachment #164599 -
Flags: review-
Assignee | ||
Comment 47•20 years ago
|
||
Attachment #164599 -
Attachment is obsolete: true
Assignee | ||
Comment 48•20 years ago
|
||
Attachment #166296 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #166297 -
Flags: superreview?(roc)
Attachment #166297 -
Flags: review?(roc)
Comment on attachment 166297 [details] [diff] [review]
Patch rv3.1
You missed this comment:
A lot of comments and variable names refer to "spaces" but now can include many
non-space characters. I think you should change all references to "spaces" to
say "justification characters".
but I'll not worry about it. Thanks, this is good code.
Attachment #166297 -
Flags: superreview?(roc)
Attachment #166297 -
Flags: superreview+
Attachment #166297 -
Flags: review?(roc)
Attachment #166297 -
Flags: review+
Comment 50•20 years ago
|
||
(In reply to comment #45)
> > I'm not sure if the current classification scheme is
> > 'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight
> > modification thereof):
>
> I implement to the function.
> Because we will implement 'text-justify' property, the code may complicate.
You missed my point. I was questioning the wisdom of including characters like
these:
Supplemental Arrows-A, Braille Patterns, Supplemental Arrows-B,
Miscellaneous Mathematical Symbols-B, Supplemental Mathematical Operators,
Miscellaneous Symbols and Arrows
IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h
Assignee | ||
Comment 51•20 years ago
|
||
(In reply to comment #50)
> You missed my point. I was questioning the wisdom of including characters like
> these:
>
> Supplemental Arrows-A, Braille Patterns, Supplemental Arrows-B,
> Miscellaneous Mathematical Symbols-B, Supplemental Mathematical Operators,
> Miscellaneous Symbols and Arrows
>
> IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h
>
Sorry.
I think these characters should be included justificable charachter.
Because these characters have FULL WIDTH in Japanese font.
I think that the symbols of full width should be justification in |lang="ja"|.
Assignee | ||
Comment 52•20 years ago
|
||
I change "space" to "justificable charachter".
Attachment #166297 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #166313 -
Flags: superreview?(roc)
Attachment #166313 -
Flags: review?(roc)
Assignee | ||
Comment 53•20 years ago
|
||
> I think that the symbols of full width should be justification in |lang="ja"|.
But exclude the |Box Drawing|.
Because these characters should adhere on all language.
Comment on attachment 166313 [details] [diff] [review]
Patch rv3.2
great!
Attachment #166313 -
Flags: superreview?(roc)
Attachment #166313 -
Flags: superreview+
Attachment #166313 -
Flags: review?(roc)
Attachment #166313 -
Flags: review+
Assignee | ||
Comment 55•20 years ago
|
||
Jungshik:
Are you having questions?
If you don't have question, would you check-in the patch?
Comment 56•20 years ago
|
||
Ok. I'll check that in with a minimal change in comments. However, the following
question is still open and we have to revisit them. I'm adding a few people to
seek their opinions.
(In reply to comment #51)
> > IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h
> I think these characters should be included justificable charachter.
> Because these characters have FULL WIDTH in Japanese font.
> I think that the symbols of full width should be justification in |lang="ja"|.
I'm afraid I'm not fully convinced. I'm well aware that some of them (not all of
them because only a subset of them is covered by pre-Unicode Japanese character
sets) are twice as wide as Latin letters in **fixed-width** Japanese fonts.
However, I doubt being 'full-width' in Japanese fixed width fonts necessarily
means having to be justified the same way as Kanjis/Hiraganas/Katakanas are? If
it does, Greek and Cyrillic letters have to be included as well, but obviously
nobody would do that (unless explicitly requested)
Also note that 'full width vs half-width' are concepts only applicable to
text-cell-based rendering (e.g. text terminal). Do they have the same width as
Hiragana, Katakana and Kanjis in **variable** width fonts?
The tree is still frozen:
"The tree is FROZEN for the 1.8 Alpha 5 release. The sooner we can get fixes for
the blockers, the sooner the tree will open for 1.8a6 so please help. All
checkins during the freeze require drivers' approval."
Assignee | ||
Comment 58•20 years ago
|
||
Yes, I wrote that the justificable character should be 'Full-Width' and *'Symbol'*.
I.e., Greek and Cyrillic are full-width, but these character should not be
justificable character.
In Japanese paragraph, the full-width symbols are used just like Japanese character.
Therfore, the full width symbols should have space for justify.
Assignee | ||
Comment 59•20 years ago
|
||
Nobady reply to comment 54 in this 20 days.
If the patch has the problem, we will get new bug report.
Robert, please check-in the patch.
Assignee | ||
Comment 60•20 years ago
|
||
> Nobady reply to comment 54 in this 20 days.
sorry, it is comment 56.
Comment 61•20 years ago
|
||
updated patch v3.2 due to the layout file reorganization
Attachment #166313 -
Attachment is obsolete: true
Updated•20 years ago
|
Assignee: roc → masayuki
Status: ASSIGNED → NEW
Comment 62•20 years ago
|
||
I checked this patch in now, as-is. I'm not marking this fixed yet because I'm
unsure whether there is any more work to do. please mark fixed if it is.
Checking in base/nsLayoutAtomList.h;
/cvsroot/mozilla/layout/base/nsLayoutAtomList.h,v <-- nsLayoutAtomList.h
new revision: 1.92; previous revision: 1.91
done
Checking in generic/nsLineLayout.cpp;
/cvsroot/mozilla/layout/generic/nsLineLayout.cpp,v <-- nsLineLayout.cpp
new revision: 3.215; previous revision: 3.214
done
Checking in generic/nsTextFrame.cpp;
/cvsroot/mozilla/layout/generic/nsTextFrame.cpp,v <-- nsTextFrame.cpp
new revision: 1.490; previous revision: 1.489
done
Assignee | ||
Comment 63•20 years ago
|
||
Christian Biesinger (:bi):
Thank you for check-in.
I mark to fixed.
If we have some problems(e.g., the definition of justificable charachters),
I will report new bug, if bugzilla-jp will be reported.
Jungshik Shin:
If you think that you are unpleasant, I'm sorry.
But I think that we should get feed back by Nightly Build testers.
-> FIXED
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•