200582 - No line wrap for text without spaces in nsHTMLContentSerializer.cpp

Reporter

Description

•

22 years ago

In Mozilla mail (HTML), sending out long lines which do not include spaces (e.g. Japanese text) do not wrap. E.g. the next line has 100 characters and it does not wrap. 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789

nhottanscp

Reporter

Comment 1

•

22 years ago

http://lxr.mozilla.org/seamonkey/source/content/base/src/nsHTMLContentSerializer.cpp#740 The code is trying to find a white space for a line break. Note that mMaxColumn is not checked here. 740 // if we're past the wrapping width with no place to wrap at, 741 // find the next whitespace and wrap there 742 while (pos != end && *pos != ' ' && *pos != '\n') { 743 ++pos; 744 } 745 mAddSpace is false so going to else 746 if (mAddSpace) { 747 // whitespace was needed before the next segment, so we can put 748 // a newline instead of a space, and avoid getting a lone line 749 aOutputStr.Append(mLineBreak); 750 addLineBreak = PR_FALSE; 751 752 mColPos = pos - segStart; 753 754 // if the string doesn't end in whitespace, set mAddSpace to false 755 if (pos == end) { 756 mAddSpace = PR_FALSE; 757 } 758 } Here is set addLineBreak to true. But at that point the position reaches at end, so no linebreak is added before returning at line 766. But the loop at line 742 does not do a column check, the line could be very long even if we change to put one linebreak before return. 759 else { 760 // no choice but to write a long line and wrap immediately after it 761 addLineBreak = PR_TRUE; 762 } 763 aOutputStr.Append(segStart, pos - segStart); 764 765 if (pos == end) { 766 return; 767 }

nhottanscp

Reporter

Comment 2

•

22 years ago

At the loop at line 742, nsILineBreaker may be used to get a linebreak for text without spaces like Japanese text, that is already used in nsPlainTextSerializer. It can also check like (mMaxColumn * 2) to prevent the line to be very long.

Keywords: embed

Akkana Peck

Comment 3

•

22 years ago

See also bug 56921, which covers a bug in this code which makes ascii lines too long.

nhottanscp

Reporter

Comment 4

•

22 years ago

Attached patch When looping for white space, also check for max line. (obsolete) (deleted) — Details — Splinter Review

The patch is to prevent very long lines which may cause data corruption for encoded Japanese text (e.g. ISO-2022-JP) when the server forces line breaks. I think the integration of the line breaker for better formatting can be done separately by someone more familiar with the code.

nhottanscp

Reporter

Updated

•

22 years ago

Attachment #119697 - Flags: review?(akkana)

Akkana Peck

Comment 5

•

22 years ago

I'm not sure I'm understanding this correctly. Why doesn't the while (mColPos < mMaxColumn) take care of this -- why isn't the new check redundant with that one? By the time we get to the place where the patch adds the new check, isn't mColPos >= mMaxColumn already, without seeing a space? So the change made by the patch is that if there wasn't a space in the line, the line will be forcibly broken in the middle of a word, whether or not a space was seen? Am I understanding that correctly, and if so, is that really a change we want to make? Will it cause a line of 79 ascii dashes to be broken into long-short pairs (with the default wrapcol of 72)? I don't have a tree that builds right now, and we're having unexplained network slowness today so it may be a while before I can update and test it. I wish we could just do wrapping right, using nsILineBreaker like in the plaintext serializer and searching backwards as described in bug 56921, instead of applying more band-aids on top of the existing band-aid layers. Burpmaster, do you have time to glance at the patch and see if I'm misinterpreting? You're probably the most familiar with this code.

nhottanscp

Reporter

Comment 6

•

22 years ago

>the patch is that if there wasn't a space in the line, the line will be forcibly >broken in the middle of a word, whether or not a space was seen? Am I Yes, for Japanese text that is okay (and this is for HTML source not affect actual display), for English it could be a problem but it only happens if the word is very long like greater than 72, so I assume practically okay. >I wish we could just do wrapping right, using nsILineBreaker like in the >plaintext serializer and searching backwards as described in bug 56921, instead >of applying more band-aids on top of the existing band-aid layers. I agree if the right implementation is available soon then the current patch would not be needed. If the wrapping has to happen within the column then the search has to go backwards. And the line breaker will take care Japanese text but a long word without a space like I mentioned in the original report would not be broken even with the line breaker.

nhottanscp

Reporter

Comment 7

•

22 years ago

nsbeta1 The data corruption I mentioned in comment #4 is generic and can happen with Mozilla mail.

Keywords: nsbeta1

Akkana Peck

Comment 8

•

22 years ago

Comment on attachment 119697 [details] [diff] [review] When looping for white space, also check for max line. Sorry, but no, I can't approve breaking wrapping latin text just because nobody wants to spend the time writing code that calls nsILineBreaker. Lots of people send plaintext mail with long separator lines of "*" or "=" or whatever, as well as ascii art, but more important, people also send long urls in plaintext mail, and breaking them means that the url can't be pasted into a browser.

Attachment #119697 - Flags: review?(akkana) → review-

nhottanscp

Reporter

Comment 9

•

22 years ago

As I mentioned before, the line breaker can handle the Japanese text case but not necessary can break a very long Ascii word like in my original report (see my comment #4).

Akkana Peck

Comment 10

•

22 years ago

Leaving long words unbroken is intentional: breaking them at the normal wrapcol of 72 is bad, especially on urls. See, for example, bug 137253. In fact, we did that at one time, and got bugs reported on it. I could see an argument for having a separate limit for forcibly breaking text (e.g. at 512 bytes or whatever mail and news servers need), but that's a separate issue, and probably harder than just calling the line breaker.

nhottanscp

Reporter

Comment 11

•

22 years ago

I see, then hooking up the line breaker is the way that will take care the Japanese text case at least.

nhottanscp

Reporter

Comment 12

•

22 years ago

I looked at this again and found that Japanese text do not go to the function (AppendToStringWrapped) we have looked at so far. There is a function nsHTMLContentSerializer::AppendText which calls AppendToStringConvertLF or AppendToStringWrapped. For Japanese text, AppendToStringConvertLF is called and that function does not do line wrapping. http://lxr.mozilla.org/seamonkey/source/content/base/src/nsHTMLContentSerializer.cpp#162 So, in addition to change AppendToStringWrapped, it is also required to change the caller or somewhere else to call AppendToStringWrapped for long Japanese text. This really has to be looked by someone who is familiar with that area. Adding 'topembed' keyword.

Keywords: topembed

When looping for white space, also check for max line. 22 years ago nhottanscp (deleted), patch	akkzilla : review-	Details \| Diff \| Splinter Review
Long Japanese text for testing. 22 years ago nhottanscp (deleted), patch		Details \| Diff \| Splinter Review
Same as the last attachment but attach as HTML. 22 years ago nhottanscp (deleted), text/html		Details
Patch v2 22 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
Patch v2b 22 years ago Kai Engert (:KaiE:) (deleted), patch	jst : review-	Details \| Diff \| Splinter Review
Patch v3 22 years ago Kai Engert (:KaiE:) (deleted), patch	jst : review+	Details \| Diff \| Splinter Review
Patch v4 22 years ago Kai Engert (:KaiE:) (deleted), patch	KaiE : review+	Details \| Diff \| Splinter Review
Patch v5 22 years ago Kai Engert (:KaiE:) (deleted), patch	KaiE : review+ peterv : superreview+	Details \| Diff \| Splinter Review
Patch v6 22 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
Patch v7 22 years ago Kai Engert (:KaiE:) (deleted), patch	KaiE : review+ KaiE : superreview+ asa : approval1.4+	Details \| Diff \| Splinter Review