Closed Bug 135323 Opened 23 years ago Closed 22 years ago

A return code between the two CJK characters is converted to a space code

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: hsaito54, Assigned: shanjian)

References

Details

(Keywords: intl, Whiteboard: [adt2 rtm])

Attachments

(4 files, 10 obsolete files)

a patch to remove a space between two kanji caracters 23 years ago Hideo Saito (deleted), patch		Details \| Diff \| Splinter Review
testcase 23 years ago Hideo Saito (deleted), text/html		Details
additional patch #1 23 years ago Hideo Saito (deleted), patch		Details \| Diff \| Splinter Review
new patch to remove a space between two kanji caracters 23 years ago Hideo Saito (deleted), patch		Details \| Diff \| Splinter Review
patch to remove a space between two CJK caracters 23 years ago Hideo Saito (deleted), patch		Details \| Diff \| Splinter Review
patch v4 to remove a space between two CJK characters 23 years ago Hideo Saito (deleted), patch		Details \| Diff \| Splinter Review
patch v5, 23 years ago Shanjian Li (deleted), patch		Details \| Diff \| Splinter Review
patch v6 23 years ago Shanjian Li (deleted), patch		Details \| Diff \| Splinter Review
patch v7 23 years ago Shanjian Li (deleted), patch		Details \| Diff \| Splinter Review
patch v8 22 years ago Shanjian Li (deleted), patch		Details \| Diff \| Splinter Review
screen shot of the doc in comment 53 22 years ago Takayuki Tei (deleted), image/jpeg		Details
patch v9 22 years ago Takayuki Tei (deleted), patch	shanjian : review+	Details \| Diff \| Splinter Review
patch v10 22 years ago Takayuki Tei (deleted), patch	shanjian : review+ waterson : superreview+	Details \| Diff \| Splinter Review
a screen shot after fixed 22 years ago Yuying Long (deleted), image/jpeg		Details

Hideo Saito

Reporter

Description

•

23 years ago

A space code between the two kanji characters is desirable to remove. For example, this Japanese sentence "¤¢ ¤¤" is displayed "¤¢ ¤¤" but I expect to display "¤¢¤¤" without a space.

Matthias Versen [:Matti]

Comment 1

•

23 years ago

-> Int

Assignee: Matti → yokoyama

Component: Browser-General → Internationalization

QA Contact: imajes-qa → ruixu

Hideo Saito

Reporter

Comment 2

•

23 years ago

Attached patch a patch to remove a space between two kanji caracters (obsolete) (deleted) — Details — Splinter Review

Please apply this patch next to apply a patch "patch to wrap long word by key-characters" at "http://bugzilla.mozilla.org/attachment.cgi?id=75233&action=view" in "http://bugzilla.mozilla.org/show_bug.cgi?id=95067".

Hideo Saito

Reporter

Comment 3

•

23 years ago

Attached file testcase (deleted) — Details

Koike Kazuhiko

Comment 4

•

23 years ago

This bug is related to bug 95067. I believe most Japanese people prefer that a line-break isn't rendered as space.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Hardware: Other → All

Hideo Saito

Reporter

Comment 5

•

23 years ago

Attached patch additional patch #1 (obsolete) (deleted) — Details — Splinter Review

I made a error in my patch carelessly. I correct errors with this patch program.

Rui Xu

Updated

•

23 years ago

Keywords: intl

QA Contact: ruixu → ylong

Roy Yokoyama

Comment 6

•

23 years ago

saito-san: Your patch looks simple; but I think the patch is rotten. Can you update the patch for the current trunk? I'd like to apply the patch to verify the fix. Thanks

Hideo Saito

Reporter

Comment 7

•

23 years ago

Attached patch new patch to remove a space between two kanji caracters (obsolete) (deleted) — Details — Splinter Review

The patch was remade.

Roy Yokoyama

Comment 8

•

23 years ago

assign to shanjian. Shanjian: can you review his code? It's in the layout. Thanks

Assignee: yokoyama → shanjian

Shanjian Li

Assignee

Comment 9

•

23 years ago

saito san, I think '\n' should be removed only between CJK characters. That's say inside function "ScanNormalWhiteSpace_F", you need to check character before and after '\n' before setting the flag. If '\n' is the last character in this fragment, it is OK to only check its precedence. (This will happen when text span multiple fragment.) I am also wondering about this change: - *aContentLenResult = offset - mOffset; + *aContentLenResult = offset - save_offset; Why did you use save_offset instead of mOffset, which is more updated? I assumed you have tested the former one and it didn't work well. But I would like to know why.

Status: NEW → ASSIGNED

Hideo Saito

Reporter

Comment 10

•

23 years ago

To be sure, my patch program is wrong. Testing the following pages, it found two errors. http://www.asahi-net.or.jp/~wq6k-yn/para.html Every time Reload, a vertical scrill bar moves the first error. But returning a program to a basis, *aContentLenResult = offset - mOffset; the error was corrected. A patch needs to be added to correct the next error. The next sentence is not sometimes displayed accurately. 用途を詳しく分析した上で DTDを設計すれば、そのようなことはあまり起きない (かもしれない)。しかし、 <===== error HTMLのように一つの <===== error DTDを様々なタイプの文書に用いるのでは、この種の不都合は避 <===== error け難い。 A variable contentLength is in the function nsTextFrame::MeasureText() at nsTextFrame.cpp. When a space between Kanji characters is deleted, 1 must be subtracted from this variable.

Hideo Saito

Reporter

Comment 11

•

23 years ago

Can't I write the document of Japanese? 用途を詳しく分析した上で DTDを設計すれば、そのようなことはあまり起きない (かもしれない)。しかし、 HTMLのように一つの DTDを様々なタイプの文書に用いるのでは、この種の不都合は避け難い。

Shanjian Li

Assignee

Comment 12

•

23 years ago

>>Can't I write the document of Japanese? What do you mean? As a test case, that's fine. But if you plan to use japanese for communication, I am incapable to read it.

Hideo Saito

Reporter

Comment 13

•

23 years ago

I made a mistake. I wrote in an Shift-JIS code and displayed it in an EUC-JP code. > I think '\n' should be removed only between CJK characters. I appreciate your suggestion. I will try to investigate whether a patch can act only to the CJK characters.

Hideo Saito

Reporter

Comment 14

•

23 years ago

Attached patch patch to remove a space between two CJK caracters (obsolete) (deleted) — Details — Splinter Review

The patch was remade. It is happy if you can evaluate.

Shanjian Li

Assignee

Comment 15

•

23 years ago

saito san, I really appreciated your help one this issue. This part of code is complicated and very sensitive, so please be patient with me. Please also excuse me if my question does not make sense to you. I am learning this part of code as well. Several suggestions: 1, It is probably not a good idea to add Probe method to word breaker. You can add the macro to local file directly. In different places, we may have different definition of CJK char, so it is fine for me to define it multiply. Of cause, if we can add widely used CJK macro to XPCOM module to avoid this, it will be nice. But let's leave that for later. 2. Can we avoid the use of SpaceInfo? In nsTextTransformer::GetNextWord, we have some code dealing with German Szlig character, we need to transform this one to "SS". I assume you can do it the same way. But I am not very sure. I hope DeleteOneChar can also be avoided.

Hideo Saito

Reporter

Comment 16

•

23 years ago

I am sorry, there was a spelling mistake. > Every time Reload, a vertical scrill bar moves the first error. Every time Reload, a vertical scroll bar moves the first error. > patch to remove a space between two CJK caracters patch to remove a space between two CJK characters > DeleteOneChar can also be avoided. If I can also do, I do not want to use this function. But a space will remain by the copy & paste using the mouse. The place of correction was not able to be found.

Summary: A return code between the two kanji caracters is converted to a space code → A return code between the two CJK characters is converted to a space code

Hideo Saito

Reporter

Comment 17

•

23 years ago

please let me know. Although the character is stored to the buffer "mTransformBuf.mBuffer", I recognize that this buffer is not used by the copy & paste using the mouse. Is it right? If right, when copying & pasting by the mouse, where is the processing which transposes a return code to a space or the continuous space to one space?

Shanjian Li

Assignee

Comment 18

•

23 years ago

I can't answer your question without looking into the code. But I guess that we do not do any transformation for copy-n-paste. If we do collapse multiple spaces to one in copy-n-paste, we need to do this transformation (remove space between kanji) in the same way.

Hideo Saito

Reporter

Comment 19

•

23 years ago

Attached patch patch v4 to remove a space between two CJK characters (obsolete) (deleted) — Details — Splinter Review

The patch was remade again. Please give me a review.

Shanjian Li

Assignee

Comment 20

•

23 years ago

This patch is acceptable to me, but the changes in nsTextFrame.cpp does not seem necessary. So could you take one more look?

Hideo Saito

Reporter

Comment 21

•

23 years ago

The changes in nsTextFrame.cpp is required in order that a variable "totLen" in nsPlainTextSerializer.cpp may acquire the right length of a character sequence. Is it better to look for other methods for a its variable acquiring the length of a character sequence? content/base/src/nsPlainTextSerializer.cpp: 1513 void 1514 nsPlainTextSerializer::Write(const nsAString& aString) 1515 { 1524 PRInt32 totLen = aString.Length(); An example is shown below, "xx" shows the Kanji character of one character. "xx yy" The value of a variable "totLen" is 2, when a patch is not applied to nsTextFrame.cpp. However, the value should be 3.

Shanjian Li

Assignee

Comment 22

•

23 years ago

Saito san, Sorry for not being able to response promptly. I was busy in fixing some urgent issues. I did spend some time today and check your patch carefully. I still could not understand why the change in nsTextFrame.cpp is necessary. Display and copy-n-paste works fine without that change. So could you post a test case to show the problem when that part is not there? thanks.

Hideo Saito

Reporter

Comment 23

•

23 years ago

I appreciate for your preview and suggestions. Please test a testcase "Created an attachment (id=77572)". Can you copy "¤¢¤¤" using the mouse and paste it? When a patch was not applied to nsTextFrame.cpp , "¤¢ " was pasted. I tested by the version 0.9.8. The latest version cannot immediately be made up, but should I do so? P.S. "¤¢¤¤" is "xx" and "yy" that indicates two Kanji characters. "¤¢ " is "xx" and a white space.

Hideo Saito

Reporter

Comment 24

•

23 years ago

I am sorry, it mistook again. > I appreciate for your preview and suggestions. I appreciate for your review and suggestions.

Hideo Saito

Reporter

Comment 25

•

23 years ago

The previous message containing kanji characters was written in the code of EUC-JP.

Shanjian Li

Assignee

Comment 26

•

23 years ago

Attached patch patch v5, (obsolete) (deleted) — Details — Splinter Review

Shanjian Li

Assignee

Comment 27

•

23 years ago

Saito, I finally got enough time to understand all your changes. I changed the patch to make it more local. (I hope you don't mind my modification.) The idea is the same. Please review my patch to see if I missed any of your points. I appreciate you contribution, this is a complicate area and you did a good job.

Hideo Saito

Reporter

Comment 28

•

23 years ago

I agree with your changes, but the next program needs to change. nsTextTransformer.cpp: old + if (firstChar == '\n' && + offset - mOffset == 1 && + fragLen > offset) nsTextTransformer.cpp: new + if (firstChar == '\n' && + offset - mOffset == 1 && mOffset > 0 && + fragLen > offset) This change is added and a test is O.K.

Shanjian Li

Assignee

Comment 29

•

23 years ago

Attached patch patch v6 (obsolete) (deleted) — Details — Splinter Review

Shanjian Li

Assignee

Updated

•

23 years ago

Attachment #77569 - Attachment is obsolete: true

Attachment #77597 - Attachment is obsolete: true

Attachment #80550 - Attachment is obsolete: true

Attachment #82001 - Attachment is obsolete: true

Attachment #82597 - Attachment is obsolete: true

Attachment #82989 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 30

•

23 years ago

good point. I updated the patch. Now let's ask for r/sr. rbs/attinasi, could you give r/sr?

rbs

Comment 31

•

23 years ago

Comment on attachment 83046 [details] [diff] [review] patch v6 - i = contentLen; + if (1 == wordLen && contentLen == 2 && IS_CJK_CHAR(*bp)) { + i = contentLen; + } else { + i = contentLen; } No need to duplicate the statement. What happens when there are _multiple_ '\n' between the CJK chars? I am not a big fan of macros defined in several places. You can define IS_CJK_CHAR in one place (next to IS_HIGH/LOW_SURROGATE) in intl/unicharutil/util/nsUnicharUtils.h

Shanjian Li

Assignee

Comment 32

•

23 years ago

Attached patch patch v7 (obsolete) (deleted) — Details — Splinter Review

Moved macro to nsUnicharUtils.h.

Attachment #83046 - Attachment is obsolete: true

Hideo Saito

Reporter

Comment 33

•

23 years ago

> What happens when there are _multiple_ '\n' between the CJK chars? One space is replaced, when two or more '\n' are between CJK characters, or when a space is included. It is the conventional processing.

Hideo Saito

Reporter

Comment 34

•

22 years ago

When changed into patch-v6 from patch-v5, a mistake occurs. Please check. patch-v5: + if (firstChar == '\n' && + offset - mOffset == 1 && + fragLen > offset) + { patch-v6: + if (firstChar == '\n' && + offset - mOffset == 1 && + mOffset > 0) + { new: + if (firstChar == '\n' && + offset - mOffset == 1 && + mOffset > 0 && + fragLen > offset) + {

Shanjian Li

Assignee

Comment 35

•

22 years ago

Attached patch patch v8 (obsolete) (deleted) — Details — Splinter Review

Attachment #83587 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 36

•

22 years ago

rbs/waterson, could you r/sr?

rbs

Comment 37

•

22 years ago

Comment on attachment 85265 [details] [diff] [review] patch v8 I didn't understand the reply to my question about what happens when there are multiple '\n' and/or tabs and/or space chars in-between the CJK chars?

Hideo Saito

Reporter

Comment 38

•

22 years ago

Two or more '\n' between CJK characters replaces one space. Although one or more TAB and spaces between CJK characters also replace one space, the above both processing is the conventional processing and this patch does not bar the processing.

Shanjian Li

Assignee

Comment 39

•

22 years ago

rbs, let me explain hideo's statement. We only remove a single return between 2 CJK characters. We do nothing to multiple return, tab, space, whatsoever. Why is that? For tab, space, and other non-return space equivalent characters, we believe user has a reason to put it there when it is there, so we don't want to do anything. In case of multiple return, that's arguable. When user put a single return between 2 CJK char, they just want to shorten line length and they expect text flow to be continuous. We decide to only handle this situation and nothing else. I personally believe hideo's approach is reasonable. To remove multiple return between CJK char also make sense to me, but I didn't find it really necessary.

rbs

Comment 40

•

22 years ago

Is there a real-life page with the problem? Something that has been bugging me is why there wasn't a duplicate/precedent to this bug. (If there is one in bugzilla, care to point it?). What is conventional elsewhere (e.g., telegraphic text -- just a funny example) might be irrelevant in HTML. Surely, there should be a real-life page, in this massive web, with this bug? Is the patch not going to cause a suprise and regress pages that have been using '\n' as end-of-word *and* end-of-line -- not just end-of-text-line? Also, I wonder about the rendering of <pre>formatted text, the patch might mess all that. I applied the patch and it seemed to "fix" the testcase. I had some doubts about the necessity of the plain-text part, but again, it might be the same doubt as above (BTW, double-check/confirm that mInWhitespace is in sync).

rbs

Comment 41

•

22 years ago

Again, as a reminder, this isn't an easy area, and tip-toeing to arrive safely is better -- although it might take longer. So bear with the picky r/sr.

Hideo Saito

Reporter

Comment 42

•

22 years ago

By the text of usual Japanese, a space is not placed between kanji characters. In case a long document is edited, starting a new line, and continuing and describing a text in the suitable place of immediately after a punctuation and kanji characters other than a punctuation is performed ordinarily. Please see the following examples (was written in the code of EUC-JP). If HTML describes the text of 1, like 2, although it is visible like 3, we do not desire a text of 3. 1, ºé¤¤¤¿¡¢ºé¤¤¤¿¡¢ºù¤¬ºé¤¤¤¿¡£ 2, ºé¤¤¤¿¡¢ ºé¤¤¤¿¡¢ ºù¤¬ ºé¤¤¤¿¡£ 3, ºé¤¤¤¿¡¢ ºé¤¤¤¿¡¢ ºù¤¬ ºé¤¤¤¿¡£ It is possible to continue writing a Japanese text without a new-line for a long time. However, it is pain for us. It is checking that this patch does not influence the <pre> element.

Hideo Saito

Reporter

Comment 43

•

22 years ago

> Is there a real-life page with the problem? It shows one example, but not a good example, since CSS is used and the space between kanji characters spreads, http://www.asahi-net.or.jp/~wq6 k-yn/para.html > Something that has been bugging me is why there wasn't a duplicate/precedent to this bug. The argument is made even now at Bugzilla-jp (http://www.mozilla.gr.jp/), and there is a contrary opinion about this affair. That is, it says that it is the specification of HTML and the specification should be followed, and has been considered so until now.

Hideo Saito

Reporter

Comment 44

•

22 years ago

Sorry, it is the following URL. http://www.asahi-net.or.jp/~wq6k-yn/para.html

rbs

Comment 45

•

22 years ago

> It is possible to continue writing a Japanese text without a new-line for a > long time. Do you mean that there are words that excessively long and cannot be broken by users as they type them? Such a word would be weird to read... > The argument is made even now at Bugzilla-jp (http://www.mozilla.gr.jp/), and > there is a contrary opinion about this affair. > That is, it says that it is the specification of HTML and the specification > should be followed, and has been considered so until now. I am leaning towards the opinion of leaving things as they are myself (i.e., retain the HTML-spec way). This will avoid undue uncertainties with <pre>, not to mention the (still unknown) ramifications when typing in Composer. Plus, IE6 does what Mozilla is currently doing, and a change will not only be non-spec compliant, but it will also deviate from what other popular browsers are doing.

Koike Kazuhiko

Comment 46

•

22 years ago

> Do you mean that there are words that excessively long and cannot be broken by > users as they type them? Such a word would be weird to read... CJK text doesn't have a space between words. So, if somebody hates current Mozilla behaviour, he/she have to write a paragraph in a single line. IE6/Win doesn't display a line-break as a space.

rbs

Comment 47

•

22 years ago

> CJK text doesn't have a space between words. So, if somebody hates > current Mozilla behaviour, he/she have to write a paragraph in a single line. But if you use the Mozilla's editor, |nsJISx4501LineBreaker| does line-breaking nicely as you type, right?

Hideo Saito

Reporter

Comment 48

•

22 years ago

It is different to everybody what is used for a text editor. Can't it be interpreted as deletion being possible about the space between words by HTML 4.01 specifications depending on a each languege?

Koike Kazuhiko

Comment 49

•

22 years ago

HTML spec: <http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1> > This layout may involve putting space between words (called inter-word space), > but conventions for inter-word space vary from script to script. For example, > in Latin scripts, inter-word space is typically rendered as an ASCII space > ( ), while in Thai it is a zero-width word separator (). > In Japanese and Chinese, inter-word space is not typically rendered at all. CSS spec: http://www.w3.org/TR/2002/WD-css3-text-20020515/#linefeed-treatment

Hideo Saito

Reporter

Comment 50

•

22 years ago

> Plus, IE6 does what Mozilla is currently doing, and a change will not only be > non-spec compliant, but it will also deviate from what other popular browsers > are doing. The specifications described by "9.1 White space" of HTML is right. > In Japanese and Chinese and inter-word space is not typically rendered at all. I believe that it is left to the browser how inter-word space is rendered. I consider that those browsers force the custom of a Latin scripts on us.

Takayuki Tei

Comment 51

•

22 years ago

Another real life example of this problem. http://alltheweb.com/

Takayuki Tei

Comment 52

•

22 years ago

Oh, this one too. http://www.google.com/ To see the problem, have Japanese at the top of your preferred langage list, and search something in Japanese.

nhottanscp

Comment 53

•

22 years ago

http://www.asahi-net.or.jp/~wq6k-yn/para.html Looking at that page, it looks bad. Although that page is intentionally creating the problem, it can happen in real pages like machine generated output (e.g. google). 　 I think we want to fix this on the trunk.

Takayuki Tei

Comment 54

•

22 years ago

I propose to fix this in branch so that it makes to Netscape 7 release. We shouldn't ignore 2nd and 3rd mostly used language on the internet.

nhottanscp

Comment 55

•

22 years ago

Taka, you can nominate this as nsbeta1 if you think this is important. Please provide some info. 1) How frequent does the user encounter the page with the problem. 2) How bad the problem can be? Please provide a screen shot.

Takayuki Tei

Comment 56

•

22 years ago

Attached image screen shot of the doc in comment 53 (deleted) — Details

Naoki, It's not a matter of how often users encounter this. As far as we state that Mozilla is HTML 4.0 compliant browser and it supports Chinese and Japanese, the document in comment 53, which is error free HTML 4.0 Strict document, must be rendered as native language speakers expect. Please see the screenshot and judge how bad we are doing with error free HTML document. This issue has been around since Netscape 4.0 and discussed on various ML, BBS, Newsgroup, etc. I don't want to see people blaming on Mozilla/Netscape about this issue anymore. Nominating to nsbeta1.

Yuying Long

Updated

•

22 years ago

Keywords: nsbeta1

Takayuki Tei

Comment 57

•

22 years ago

Attached patch patch v9 (obsolete) (deleted) — Details — Splinter Review

Exclude Korean from patch v8.

Attachment #85265 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 58

•

22 years ago

Comment on attachment 88737 [details] [diff] [review] patch v9 r=shanjian

Attachment #88737 - Flags: review+

Shanjian Li

Assignee

Comment 59

•

22 years ago

I just put r=shanjian since this patch is originally proposed by saito, and new one is touched by taka. chris waterson, could you please sr? rbs, feel free to express your objection if you still have some.

Frank Tang

Comment 60

•

22 years ago

can we reduce this patch by take out the non essential change in nsJISx4501LineBreaker.cpp and +#define IS_CJK_CHAR(u) in nsUnicharUtils.h [adt2] since this a strong recommendation/complains from Japanese mozilla community.

Keywords: nsbeta1 → nsbeta1+

Whiteboard: [adt2]

Frank Tang

Updated

•

22 years ago

Blocks: 141008

Takayuki Tei

Comment 61

•

22 years ago

Attached patch patch v10 (deleted) — Details — Splinter Review

don't touch nsJISx4501LineBreaker.cpp.

Attachment #88737 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 62

•

22 years ago

After we added the macro to nsUnicharUtils.h, it does not make sense to keep another copy in nsJISx4501LineBreaker.cpp. Since those are identical macro, there should be no risk. I would suggest to use v9 instead of v10.

Takayuki Tei

Comment 63

•

22 years ago

Actually, IS_CJK_CHAR(u) is not used in this patch anymore because we have to exclude Korean from special case handling. I added IS_CJ_CHAR() in nsUnicharUtils.h and that's the macro we are using.

Shanjian Li

Assignee

Comment 64

•

22 years ago

Comment on attachment 89113 [details] [diff] [review] patch v10 I am Ok with v10.

Attachment #89113 - Flags: review+

Chris Waterson

Comment 65

•

22 years ago

Comment on attachment 89113 [details] [diff] [review] patch v10 sr=waterson

Attachment #89113 - Flags: superreview+

Takayuki Tei

Comment 66

•

22 years ago

checked into trunk. /cvsroot/mozilla/layout/html/base/src/nsTextFrame.cpp,v <-- nsTextFrame.cpp new revision: 1.374; previous revision: 1.373 /cvsroot/mozilla/layout/html/base/src/nsTextTransformer.cpp,v <-- nsTextTransf ormer.cpp new revision: 1.69; previous revision: 1.68 /cvsroot/mozilla/content/base/src/nsPlainTextSerializer.cpp,v <-- nsPlainTextS erializer.cpp new revision: 1.60; previous revision: 1.59 /cvsroot/mozilla/intl/unicharutil/util/nsUnicharUtils.h,v <-- nsUnicharUtils.h new revision: 1.15; previous revision: 1.14

Takayuki Tei

Comment 67

•

22 years ago

Two more real life examples. http://www.php.net/manual/ja/function.setcookie.php http://www.kondara.org/about.html.ja

Takayuki Tei

Comment 68

•

22 years ago

One more. http://www.vinelinux.org/

Shanjian Li

Assignee

Comment 69

•

22 years ago

I talked to naoki and we double checked the code. The patch will only affect chinese and japanese characters. For characters of other languages, the behavior should be the same. For verification testing, we can also use any non-chinese and non-japanese website to verify this. Since '\n' is used in many long peice of text. If there is something, it shouldn't be difficult to identify. For chinese/japanese verification, see previous comment for testcases. Be sure to test copy-n-paste.

Yuying Long

Comment 70

•

22 years ago

On 07-02 trunk build: 1. It works fine when a return code between 2 Japanese text characters. However, when there is a return code HTML tag mixed together with character(s), there still like "an extra space" shows in browser window. http://www.asahi-net.or.jp/~wq6k-yn/para.html 2. When the HTML source has space(s) between return code, like page: http://www.php.net/manual/ja/function.setcookie.php or: http://www.vinelinux.org/ There is a single byte space between 2 characters. I don't if it is OK, but on Netscape I can highlight the space between characters while on IE I can not. 3. No Copy/Paste regression was found right now. 4. I don't find any regression with English or other languages so far.

Yuying Long

Comment 71

•

22 years ago

Attached image a screen shot after fixed (deleted) — Details

Notice: For the display in browser, there is no different with this case before or after the fix. When I copy/paste the string from Broswer window to Composer, the space will be shorter.

nhottanscp

Comment 72

•

22 years ago

Taka, the first two issues in Comment #70 were they intended to be fixed by the last patch? If not, we can verify this as fixed.

nhottanscp

Comment 73

•

22 years ago

>Taka, the first two issues in Comment #70 were they intended to be fixed by the >last patch? If not, we can verify this as fixed. Let me ask this to the original author of the patch. Saito san, could you answer my last comment? Is it okay for mozilla community in Japan about the current level of the support (i.e. remaining problems are not so important)?

Hideo Saito

Reporter

Comment 74

•

22 years ago

There are about three future examination matters. A new-line replaces a space, without being deleted in the following examples. (1) a new-line is between a Kanji character and an English character. KK HTML KK (2) the Kanji character is inserted into HTML tag KK <a>KK</a> KK (3) there is a space following a new-line KK KK KK is the Kanji character of one character. The user group of Bugzilla-jp examines whether deletion of a space is required.

nhottanscp

Comment 75

•

22 years ago

Okay, please file a separate bug for those problems. This bug can be marked as fixed. I think the current fix mostly covers exisiting problems. Nominate for adt1.0.1

Keywords: adt1.0.1

Whiteboard: [adt2] → [adt2 rtm]

scottputterman

Comment 76

•

22 years ago

adding adt1.0.1- per ADT. Let's get this in the next release.

rbs

Comment 77

•

22 years ago

> re: comment #59 > > rbs, feel free to express your objection if you still have some. I have been travelling, and was out of bugzilla. I don't like the fix very much (especially that it doesn't cover everything and it is short-sighted for the remaining parts), but I don't know what else to suggest -- except perhaps the following: would that possible to transform the space (when in the CJ context) into a zero-width space (ZWSP) to make it disappear in a natural/visual way? Motivation: avoid complications in nsTextFrame, if possible.

Shanjian Li

Assignee

Comment 78

•

22 years ago

What rbs suggested is a interesting idea. But I am not sure how much it can do to simplify the code. I will mark this bug as fixed. File another bug anybody like to explore that idea.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Yuying Long

Comment 79

•

22 years ago

Mark as verified for this one.

Status: RESOLVED → VERIFIED

Yuying Long

Comment 80

•

22 years ago

Bug 156369 is filed for remaining problems.

Paul Wyskoczka

Comment 81

•

22 years ago

Adding adt1.0.1- per the adt. Let's try to get it into the next release.

Keywords: adt1.0.1 → adt1.0.1-

hippytrail

Comment 82

•

22 years ago

I'm not very familiar with the Mozilla source but if the macro IS_CJK_CHAR or IS_CJ_CHAR in intl/unicharutil/util/nsUnicharUtils.h works like IS_HIGH/LOW_SURROGATE, won't Chinese and Japanese characters composed of two surrogates be missed and still cause this bug?

You need to log in before you can comment on or make changes to this bug.