Closed Bug 36322 Opened 25 years ago Closed 20 years ago

Japanese text justification

Categories

(Core :: Layout: Text and Fonts, defect, P5)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: emk, Assigned: masayuki)

References

()

Details

(Keywords: intl)

Attachments

(3 files, 8 obsolete files)

Steps to reproduce: 1. Launch Mozilla. 2. Navigate to the following URL. http://www.asahi-net.or.jp/%7Esd5a-ucd/css1/test/ja/sec546.htm Actual result: The fourth paragraph is not justified at all. Expected result: We should justify Japanese text using inter-letter spacing since Japanese text may have no white space. IE4 works as expected. IE5 works as expected if we also assign 'text-justify: inter-ideograph'.
here's a fun for you buster. You probably thought you were done with justification, huh?
Assignee: troy → buster
Robert: is this a task you'd be interested in doing as well? Erik, Frank: could you guys comment on what the requirements are for non-latin justification?
Status: NEW → ASSIGNED
Summary: Japanese text justifycation → Japanese text justification
Japanese text does not have spaces between words, so justification is performed by inserting space between characters. Here is a link to the spec that MSIE is using: http://www.w3.org/TR/1999/WD-i18n-format-19990910/#a5-1
First of all, bug #36069 addresses some cleanup of the justification code that should probably be taken care of before we worry about this. I actually just noticed this issue myself at http://www.pghchinesechurch.org/en/2000EasterCrusade/EasterCrusade.htm It demonstrates how justified mixed Chinese and Latin text (mis?)behaves. It probably would not be difficult to implement the modes "inter-word", "inter-ideograph", "distribute" and "distribute-all-lines". (The "newspaper" mode looks nightmarish.) For each of these modes, it seems that we can just count "number of sites to insert justification space" (generalizing "number of spaces"), and distribute justification space evenly among the sites. The actually difficult work would be to correctly identify the sites in each case. The "distribute" and "distribute-all-lines" modes are easy in that respect, giving one site between each letter. (I assume that the output of nsTextTransformer gives one Unicode char per letter; if not, there would be more work to do here and letter-spacing would already be broken.) For "inter-ideograph" mode, I suspect that nsILineBreaker would give the correct breaks --- can someone confirm that? I'm not sure what the intent of "inter-word" mode is. If the boundaries between "words" are meant to be whitespace only, then we already implement it :-), but it's not useful for CJK :-(. Otherwise, I'm not sure where the boundaries are supposed to be... The most rational approach might be to add a new "nsIJustificationBreaker" to LWBrk, which takes the justification mode as a parameter. If someone provides that, I can do the rest :-), but probably not for a couple of weeks. Another issue is what mode we should actually use. Are we just going to pick one mode and implement that, or are we actually going to acquire and use the style information from this W3C draft? If we just pick one mode, which one? By the way, I wonder if it's really a good idea for a W3C draft to have its editor and contributors all owned by the same company, particularly when that company is one where the phrase "putting the competition on a treadmill" has wide currency.
>For "inter-ideograph" mode, I suspect that nsILineBreaker would give the >correct breaks --- can someone confirm that? I think so, too (at least Japanese and Latin script). >Another issue is what mode we should actually use. Are we just going to pick >one mode and implement that, or are we actually going to acquire and use the >style information from this W3C draft? If we just pick one mode, which one? The "inter-ideograph" (like) mode seems to be the most preferable. The "inter-word" is useless for CJK (as you said). The "distribute" and "distribute-all-lines" will be not desiable for Latin script. >By the way, I wonder if it's really a good idea for a W3C draft to have its >editor and contributors all owned by the same company, particularly when that >company is one where the phrase "putting the competition on a treadmill" has >wide currency. I think we do not have to persist a Working-Draft. CSS2 spec only says: |Note. The actual justification algorithm used is user-agent and written |language dependent.
Robert: are you interested in owning this bug? I'd be grateful.
Assignee: buster → roc+moz
Status: ASSIGNED → NEW
OS: Windows 2000 → All
Priority: P3 → --
Hardware: PC → All
Target Milestone: --- → Future
Mozilla's (2003040708/win98) rendering seem reasonable to me. Reporter, does the problem still persist?
Keywords: intl
re comment #4 and #5 on CJK and interword vs inter-ideograph spacing: When it comes to linebreaking and justification, C and J go together (most of time), but K has its own way (words are delimetered by spaces in modern Korean text). re comment #8: I guess we still need to implement 'inter-ideograph'-equivalent (I believe a new CSS draft uses a different term) and some other justification methods.
Component: Layout → Layout: Fonts and Text
Still repro on 1.7 RC1.
Attached patch test patch (obsolete) (deleted) — Splinter Review
I created test patch. But this patch doesn't work fine. In |nsTextFrame::ComputeExtraJustificationSpacing|, the extra space is solved by |mRect.width - trueDimensions.width|. This |mRect.width| is good when English test case is tested. http://www.w3.org/Style/CSS/Test/CSS1/current/sec546.htm i.e., if the justifying paragraph has 3 lines, first line's |mRect.width| and second line's one are same value. However, the value is not good when Japanese test case is tested. http://www.asahi-net.or.jp/%7Esd5a-ucd/css1/test/ja/sec546.htm If the justifying paragraph has 3 lines, first line's one and second line's one are different value. I cannot solve the problem. Please tell me.
e.g., In English test case, mRect.width: 9510 trueDimensions.width: 8970 extraSpace: 540 mRect.width: 9510 trueDimensions.width: 8985 extraSpace: 525 mRect.width: 7590 trueDimensions.width: 7590 extraSpace: 0 In Japanese test case, mRect.width: 9675 trueDimensions.width: 9675 extraSpace: 0 mRect.width: 9525 trueDimensions.width: 9525 extraSpace: 0 mRect.width: 7650 trueDimensions.width: 7650 extraSpace: 0
> In Japanese test case, > > mRect.width: 9675 <- A > trueDimensions.width: 9675 > extraSpace: 0 > > mRect.width: 9525 <- B > trueDimensions.width: 9525 > extraSpace: 0 > > mRect.width: 7650 > trueDimensions.width: 7650 > extraSpace: 0 A and B should be same value.
Attached patch patch(ad hoc) (obsolete) (deleted) — Splinter Review
This patch works fine. I will create clean patch soon.
Attachment #163004 - Attachment is obsolete: true
Attached patch Patch rv1 (obsolete) (deleted) — Splinter Review
Attachment #163119 - Attachment is obsolete: true
Oops.. I forgot to talk it. This patch makes same as 'inter-ideograph'. I think that this selection is best solution. Because it is beautiful :-)
Jungshik Shin: I exclude Hangul in this patch. Because Hangul uses SPACE between words. Is it OK?
I'm not sure if it's a good idea to change the behavior of justification based on Unicode character ranges. In addition, I need more explanation as to what this patch does before answering your question.
The patch increase space after IS_HAVING_SPACE_FOR_JUSTIFY characters. # I think that 'text-justify' will be able to be implemented.
Attachment #163128 - Flags: superreview?(roc)
Attachment #163128 - Flags: review?(roc)
Attachment #163128 - Flags: review?(jshin)
Comment on attachment 163128 [details] [diff] [review] Patch rv1 I'm not the best person to review this. However, I don't believe this patch can be checked in as it is for the following reason: I'm not so fond of excluding Hangul syllables from characters that get extra space when justifying because a. it will make Korean documents with both Hangul and Chinese characters look strange. b. I don't think 'char-based' criterion is such a good idea.
Attachment #163128 - Flags: review?(jshin) → review?(dbaron)
Jungshik, I'll do the review, I just want you to sign off on it first.
At the minimum, 'langGroup' has to be checked and what's done in the current patch('inter-ideograph' justification) has to be applied only when 'langGroup' is 'ja' or 'zh-*'. ( http://www.w3.org/TR/2003/CR-css3-text-20030514/#justification-prop)
Jungshik: Can you show the test case of Hanglu + Chinese characters?
Umm... I think that Jungsik's way is better way. But I don't know how get lang group in nsTextFrame. Please help me.
I think that there are two ways. 1. Include Hanglu to IS_HAVING_SPACE_FOR_JUSTIFY This way is according to CSS3 'inter-ideograph' justification. But I don't know it is better way for Hanglu. 2. Implement 'text-justify' property and author should use 'text-justify: inter-ideograph' for Japanese and Chinese.
CSS3 Text module (http://www.w3.org/TR/2003/CR-css3-text-20030514/) has the following: inter-ideograph In this mode, letter-spacing modification only occurs for the CJK group. Others only use inter-word expansion. No kashida effect takes place. This is the preferred justification in the context of the Japanese writing system, but not Latin nor Korean. I more or less agree with the above (i.e. inter-ideograph is not suitable for Korean text *unless* explicitly requested in a specific context). Note that the suitability of a particular justification mode is talked in terms of language and/or writing system (not at the character level). Falling short of implementing 'text-justify' fully (comment #4) yet, we have to come up with a reasonable default. The current behavior is not suitable for Japanese and Chinese text (which is what this bug is about) so that we have to do what I suggested in comment #25. How to get 'langGroup' in nsTextFrame? roc, can you help?
what is 'langGroup'? Is that what is set by the HTML/XML lang="kr" attribute? I don't know off the top of my head how to get that, but we certainly maintain it because the Seamonkey element properties window shows it (from the context menu).
Can't you just do |GetPresContext()->GetLangGroup()| ?
> Can't you just do |GetPresContext()->GetLangGroup()| ? No, |GetLangGroup()| is language of charset of the document. I need a value of (xml:)lang attribute.
I think that if we can get the value of lang attribute, but the author has not necessarily set so. I have an idea. We implement 'text-justify' propery. And adding following style to default stylesheet. :lang(ja), :lang(zh), :lang(zh-TW), :lang(zh-HK), :lang(zh-CN) { text-justify: inter-ideograph; }
Oops... Sorry. It is not good. The stylesheet need |* { text-justify: auto; }|. But it should not be in default stylesheet.
On IE6, Japanese text justification is not occured by 'text-align: justify;' only. But if 'text-align: justify; text-justify: inter-ideograph;', Japanese text justification is occured.
> How to get 'langGroup' in nsTextFrame? It is |nsStyleVisibility| on the frame style context, see e.g. |SetFontFromStyle| in nsFrame.cpp: void SetFontFromStyle(nsIRenderingContext* aRC, nsStyleContext* aSC) { const nsStyleFont* font = aSC->GetStyleFont(); const nsStyleVisibility* visibility = aSC->GetStyleVisibility(); aRC->SetFont(font->mFont, visibility->mLangGroup); }
Attached patch Patch rv2 (obsolete) (deleted) — Splinter Review
O.K. Thank you. This patch refers to langGroup.
Attachment #163128 - Attachment is obsolete: true
Attachment #164579 - Flags: superreview?(roc)
Attachment #164579 - Flags: review?(dbaron)
Attachment #163128 - Flags: superreview?(roc)
Attachment #163128 - Flags: review?(dbaron)
Attached file Testcase (deleted) —
Comment on attachment 164579 [details] [diff] [review] Patch rv2 Sorry, I have some mistake.
Attachment #164579 - Attachment is obsolete: true
Attachment #164579 - Flags: superreview?(roc)
Attachment #164579 - Flags: review?(dbaron)
You can add layout atoms for zh*'s in content/shared/public/nsLayoutAtomList.h.
Attached patch Patch rv3 (obsolete) (deleted) — Splinter Review
Attachment #164599 - Flags: superreview?(roc)
Attachment #164599 - Flags: review?(dbaron)
Comment on attachment 164599 [details] [diff] [review] Patch rv3 I sent requesting mail to dbaron. But he does not review and reply. I'm changing the reviewer to default module owner.
Attachment #164599 - Flags: review?(dbaron) → review?(roc)
This is basically looking good but I think we need a few changes. In PrepareUnicodeText, I would like to only count the 'spaces' for justification when we're actually doing justification. Otherwise I think we might see a slight performance hit. You should also avoid filling textBuffer in this case, so you'd need to restore the "if (textBuffer)" checks. + PRBool IsCJLangGroup(); I think you should avoid the abbreviation; just say "IsChineseJapaneseLangGroup()". +#define IS_HAVING_SPACE_FOR_JUSTIFY(u, langIsCJ) \ I think this should be an inline function. It needs a better name: I suggest IsJustificationCharacter. I also suggest that you make it test (langIsCJ && u > 0xff) so that ASCII characters drop out early. A lot of comments and variable names refer to "spaces" but now can include many non-space characters. I think you should change all references to "spaces" to say "justification characters". It's a bigger change but if we get the words right now, it will save confusion later. +/* + * The having space characters for justification are only lower than 0xffff on current code. + * If upper 0x10000 characters need the spacing, you have to take care when you hack. + * Therefor, this macro must not include 0xd800 to 0xdbff range. + * Because these characters are High surrogate. + */ The English here needs some cleanup. (I know it's hard, I once studied Japanese and gave up!) I suggest this: /* * Currently only Unicode characters below 0x10000 have their spacing modified * by justification. If characters above 0x10000 turn out to need * justification spacing, that will require extra work. Currently, * this function must not include 0xd800 to 0xdbff because these characters * are surrogates. */
Another question to ask is exactly what characters to put them into the 'justifiable' characters. I'm not sure if the current classification scheme is 'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight modification thereof): http://lxr.mozilla.org/seamonkey/source/intl/unicharutil/util/nsUnicharUtils.h#91 As for the comment about non-BMP characters not being taken care of, you'd better use 'U+yyyy' in place of '0xyyyy'. Or, you can just drop the comment altogether because it's not just your code but also virtually all other code in layout that has that problem.
> I'm not sure if the current classification scheme is > 'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight > modification thereof): I implement to the function. Because we will implement 'text-justify' property, the code may complicate.
Comment on attachment 164599 [details] [diff] [review] Patch rv3 minusing because we need a new patch
Attachment #164599 - Flags: superreview?(roc)
Attachment #164599 - Flags: superreview-
Attachment #164599 - Flags: review?(roc)
Attachment #164599 - Flags: review-
Attached patch Patch rv3.1 (obsolete) (deleted) — Splinter Review
Attachment #164599 - Attachment is obsolete: true
Attached patch Patch rv3.1 (obsolete) (deleted) — Splinter Review
Attachment #166296 - Attachment is obsolete: true
Attachment #166297 - Flags: superreview?(roc)
Attachment #166297 - Flags: review?(roc)
Comment on attachment 166297 [details] [diff] [review] Patch rv3.1 You missed this comment: A lot of comments and variable names refer to "spaces" but now can include many non-space characters. I think you should change all references to "spaces" to say "justification characters". but I'll not worry about it. Thanks, this is good code.
Attachment #166297 - Flags: superreview?(roc)
Attachment #166297 - Flags: superreview+
Attachment #166297 - Flags: review?(roc)
Attachment #166297 - Flags: review+
(In reply to comment #45) > > I'm not sure if the current classification scheme is > > 'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight > > modification thereof): > > I implement to the function. > Because we will implement 'text-justify' property, the code may complicate. You missed my point. I was questioning the wisdom of including characters like these: Supplemental Arrows-A, Braille Patterns, Supplemental Arrows-B, Miscellaneous Mathematical Symbols-B, Supplemental Mathematical Operators, Miscellaneous Symbols and Arrows IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h
(In reply to comment #50) > You missed my point. I was questioning the wisdom of including characters like > these: > > Supplemental Arrows-A, Braille Patterns, Supplemental Arrows-B, > Miscellaneous Mathematical Symbols-B, Supplemental Mathematical Operators, > Miscellaneous Symbols and Arrows > > IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h > Sorry. I think these characters should be included justificable charachter. Because these characters have FULL WIDTH in Japanese font. I think that the symbols of full width should be justification in |lang="ja"|.
Attached patch Patch rv3.2 (obsolete) (deleted) — Splinter Review
I change "space" to "justificable charachter".
Attachment #166297 - Attachment is obsolete: true
Attachment #166313 - Flags: superreview?(roc)
Attachment #166313 - Flags: review?(roc)
> I think that the symbols of full width should be justification in |lang="ja"|. But exclude the |Box Drawing|. Because these characters should adhere on all language.
Comment on attachment 166313 [details] [diff] [review] Patch rv3.2 great!
Attachment #166313 - Flags: superreview?(roc)
Attachment #166313 - Flags: superreview+
Attachment #166313 - Flags: review?(roc)
Attachment #166313 - Flags: review+
Jungshik: Are you having questions? If you don't have question, would you check-in the patch?
Ok. I'll check that in with a minimal change in comments. However, the following question is still open and we have to revisit them. I'm adding a few people to seek their opinions. (In reply to comment #51) > > IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h > I think these characters should be included justificable charachter. > Because these characters have FULL WIDTH in Japanese font. > I think that the symbols of full width should be justification in |lang="ja"|. I'm afraid I'm not fully convinced. I'm well aware that some of them (not all of them because only a subset of them is covered by pre-Unicode Japanese character sets) are twice as wide as Latin letters in **fixed-width** Japanese fonts. However, I doubt being 'full-width' in Japanese fixed width fonts necessarily means having to be justified the same way as Kanjis/Hiraganas/Katakanas are? If it does, Greek and Cyrillic letters have to be included as well, but obviously nobody would do that (unless explicitly requested) Also note that 'full width vs half-width' are concepts only applicable to text-cell-based rendering (e.g. text terminal). Do they have the same width as Hiragana, Katakana and Kanjis in **variable** width fonts?
The tree is still frozen: "The tree is FROZEN for the 1.8 Alpha 5 release. The sooner we can get fixes for the blockers, the sooner the tree will open for 1.8a6 so please help. All checkins during the freeze require drivers' approval."
Yes, I wrote that the justificable character should be 'Full-Width' and *'Symbol'*. I.e., Greek and Cyrillic are full-width, but these character should not be justificable character. In Japanese paragraph, the full-width symbols are used just like Japanese character. Therfore, the full width symbols should have space for justify.
Nobady reply to comment 54 in this 20 days. If the patch has the problem, we will get new bug report. Robert, please check-in the patch.
> Nobady reply to comment 54 in this 20 days. sorry, it is comment 56.
Attached patch patch merged to trunk (deleted) — Splinter Review
updated patch v3.2 due to the layout file reorganization
Attachment #166313 - Attachment is obsolete: true
Assignee: roc → masayuki
Status: ASSIGNED → NEW
I checked this patch in now, as-is. I'm not marking this fixed yet because I'm unsure whether there is any more work to do. please mark fixed if it is. Checking in base/nsLayoutAtomList.h; /cvsroot/mozilla/layout/base/nsLayoutAtomList.h,v <-- nsLayoutAtomList.h new revision: 1.92; previous revision: 1.91 done Checking in generic/nsLineLayout.cpp; /cvsroot/mozilla/layout/generic/nsLineLayout.cpp,v <-- nsLineLayout.cpp new revision: 3.215; previous revision: 3.214 done Checking in generic/nsTextFrame.cpp; /cvsroot/mozilla/layout/generic/nsTextFrame.cpp,v <-- nsTextFrame.cpp new revision: 1.490; previous revision: 1.489 done
Christian Biesinger (:bi): Thank you for check-in. I mark to fixed. If we have some problems(e.g., the definition of justificable charachters), I will report new bug, if bugzilla-jp will be reported. Jungshik Shin: If you think that you are unpleasant, I'm sorry. But I think that we should get feed back by Nightly Build testers. -> FIXED
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: