Closed Bug 85184 Opened 23 years ago Closed 23 years ago

[serializer]Composer breaks lines at inappropriate positions

Categories

(Core :: DOM: Serializers, defect, P3)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: biro.arpad, Assigned: t_mutreja)

References

Details

(Whiteboard: [C][patch needs a=])

Attachments

(2 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.1) Gecko/20010607 BuildID: 2001060703 1. Occasionally Composer introduces an unwanted space by breaking line after a tag (see example file). 2. In certain cases, Mozilla displays an extra "<" character (again, see the example file). (these are two bugs, but can be reproduced in the same way) Reproducible: Always Steps to Reproduce: 1. start Composer (Mozilla 0.9.1) 2. open the sample HTML file that is included in this bug report (it's 7607 bytes long with Windows EOLs) 3. save the document under a different name 4. exit Composer and open both files with Mozilla 5. scroll to the end of both documents Actual Results: Now, compare the last paragraphs of the two documents visually. You should see two differences in the last paragraph. In the new document there's an unwanted space between "classic-852-16.psf.gz" and the closing parenthesis (which the original document does not have). The second difference: in the new document there's a "<" sign before "iso02_cp852.trans", which is not there in the original document. Expected Results: 1. Composer: do not add extra space 2. Mozilla: do not display that "<" sign <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html lang="hu"> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-2"> <meta http-equiv="Content-Language" content="hu"> <title>Magyarul-HOGYAN</title> </head> <body bgcolor="#ffffff"> <div align=center> <img src="Magyarul-HOGYAN.gif" alt="Magyarul-HOGYAN"> </div> <!-- h1 align=center>Magyarul-HOGYAN</h1 --> <h3 align=center> avagy tippek és trükkök magyarul beszél? Linux-felhasználóknak </h3> <p> <center><div align=center> Verzió: <tt>0.5.6</tt><br> 1999. július 19. </div></center> <p> <table border=0 bgcolor="#e0e0e0" cellspacing=8 cellpadding=9> <tr><td> Ha valaki olyan böngész?vel rendelkezik, amelyikben ez a lap nem jól olvasható, kérem írja meg. </td></tr> </table> <p> Ez a lap a Linux operációs rendszercsalád magyarosításával foglalkozik. F?bb témái: magyar bet?k, billenty?zet használata; magyar nyelv? dokumentumok az Interneten, különféle programok beállítása. A felsorolt megoldások, beállítások általában RedHat változat használatát feltételezik, ez azonban csak azt jelenti, hogy a szerz? ilyet használ, és nem azt hogy más Linux változatok (pl. Debian, SuSe, Slackware, stb.) ne lennének ugyanolyan jók a magyar felhasználók számára, és a közölt megoldások némi változtatással ne lennének használhatóak az utóbb említett változatokban. A szerz? örömmel venné a nem RedHat Linux-hoz készült leírásokat, megoldásokat melyeket közzé is tenne ezen a helyen. <p> <i>vbzoli<i>@</i>vbzo.li</i> <p> <h2>TARTALOM</h2> <ul> <li><a href="#hol">0. Hol érhet? el a Magyarul-HOGYAN?</a> <li><a href="#bl2">1. Magyar billenty?kiosztás és latin-2 (ISO-8859-2) bet?k használata</a> <ul> <li><a href="#bl2-lat2">1.1. A latin-1 és latin-2 kódkiosztás</a> <li><a href="#bl2-lin">1.2. A Linuxon használt kódkiosztások</a> <li><a href="#bl2-kl2">1.3. Latin-2 kiosztás használata konzolon</a> &nbsp; <b>&lt;FRISSÍTETT&gt;</b> <li><a href="#bl2-kl2-r">1.4. Latin-2 kiosztás használata konzolon (régi megoldás, nem ajánlott)</a> <li><a href="#bl2-kbill">1.5. Magyar billenty?kiosztás használata konzolon</a> <li><a href="#bl2-kdeb">1.6. Konzol-beállítás Debian 1.2 alatt</a> <li><a href="#bl2-krh6">1.7. Konzol-beállítás RedHat 6.0 alatt</a> &nbsp; <b>&lt;ÚJ&gt;</b> <li><a href="#bl2-krh5">1.8. Konzol-beállítás RedHat 5.x alatt</a> <li><a href="#bl2-kslak">1.9. Magyar Slackware csomag (régi megoldás, nem ajánlott)</a> <li><a href="#bl2-l2x">1.10. Latin-2 kiosztás használata X-Window felületen</a> <li><a href="#bl2-kx">1.11. Magyar billenty?zet használata X-Window felületen</a> </ul> <li><a href="#p">2. Egyes programok beállításai</a> <ul> <li><a href="#p-shell">2.1. bash, tcsh</a> <li><a href="#p-less">2.2. less</a> <li><a href="#p-tex">2.3. TeX, LaTeX</a> <li><a href="#p-lyx">2.4. LyX</a> <li><a href="#p-joe">2.5. joe</a> <li><a href="#p-emacs">2.6. emacs</a> <li><a href="#p-netscape">2.7. netscape</a> <li><a href="#p-nedit">2.8. nedit</a> <li><a href="#p-lynx">2.9. lynx</a> <li><a href="#p-xterm">2.10. xterm</a> <li><a href="#p-ls">2.11. ls</a> <li><a href="#p-pgsql">2.12. postgresql</a> </ul> <li><a href="#tipp">3. Tippek</a> <ul> <li><a href="#tipp-pstxt">3.1. Hogyan nyomtassunk latin-2 kódolású szövegállományokat bármilyen - Ghostscript által támogatott, vagy postscript - nyomtatón?</a> <li><a href="#tipp-l2html">3.2. Hogyan készítsünk magyar (latin-2-ben kódolt) WWW (HTML) oldalakat?</a> </ul> <li><a href="#rotsuveg">4. R?t Süveg, azaz magyar Linux</a> <li><a href="#inet">5. Magyar vonatkozású Linuxos INTERNET források</a> <ul> <li><a href="#inet-ftp">5.1. ftp-helyek</a> <li><a href="#inet-www">5.2. www-helyek</a> <li><a href="#inet-lev">5.3. Levelez?listák</a> <li><a href="#inet-news">5.4. Újság</a> </ul> <li><a href="#doksi">6. Magyar nyelv? Linux (UNIX) leírások</a> <li><a href="#tanf">7. Linux tanfolyamok</a> <li><a href="#kozre">8. Közrem?köd?k</a> </ul> <p><hr><p> <h2><a name="hol"> 0. Hol érhet? el a Magyarul-HOGYAN? </a></h2> <ul> <li><a href="http://vbzo.li/linux/Magyarul-HOGYAN.html"> http://vbzo.li/linux/Magyarul-HOGYAN.html</a> <li><i>Ha valaki tükrözné ezt a lapot, jelezze nekem, hogy ide felvehessem!</i> </ul> <p><hr><p> <h2><a name="bl2"> 1. Magyar billenty?kiosztás és latin-2 (ISO-8859-2) bet?k használata </a></h2> <p><h4><a name="bl2-lat2"> 1.1. A latin-1 és latin-2 kódkiosztás </a></h4> A <code>UNIX</code> világában elterjedt 8-bites kódkiosztás amely tartalmazza a magyar ékezetes bet?ket is az <code>ISO-8859-2</code>, azaz a latin-2 kiosztás. Ez a kódkiosztás tartalmazza a latin bet?s szláv nyelvek (horvát, szlovén, szlovák, cseh, lengyel), és a magyar, román, német nyelv ékezetes bet?it. <p> A Nyugat-Európai országok az <code>ISO-8859-1</code> kódkiosztást használják (latin-1). A latin-1 kiosztás tartalmazza a magyar ékezetes bet?ket is (ugyanazon kóddal) az ? (o") és ? (u") kivételével. Az ? (o") és ? (u") bet?k helyén a latin-1-es kiosztásban az o~ és az u^ szerepel (o tetején hullámvonal, u tetején kalap); így a latin-2-ben kódolt magyar szövegek olvashatóak latin-1-es kiosztás használatával is (jobb híján). <p><h4><a name="bl2-lin"> 1.2. A Linuxon használt kódkiosztások </a></h4> A Linux két legjobban elterjedt felhasználói felülete a konzol és az X-Window grafikus rendszer. Konzol alatt a szöveges üzemmódú képerny?t (általában VGA-monitor) és billety?zetet értjük. Az esetleges küls? terminálok beállításaival (egyel?re még) nem foglalkozunk. <p> A linux rendszermag alapesetben a konzolon a latin-1-es kiosztást használja úgy, hogy a latin-1-es kódokat leképezi a PC-s <tt>437</tt>-es kódlapra. (A 437-es kódlapot egyébként a monitorvezérl?-kártya tartalmazza PC-n.) Ezzel a módszerrel csak olyan latin-1 bet?ket tud megjeleníteni, melyek szerepelnek a <tt>437</tt>-es lapon. <p> Az X-Window rendszer alapesetben a latin-1 (<code>ISO-8859-1</code>) kódolást használja, de rendelkezésre állnak latin-2 és egyéb kódú bet?készletek is már szép számban. Pl. az 5.2-es RedHat-ben már az alap-telepít?készlet része néhány latin-2-es bet?csomag. <p><h4><a name="bl2-kl2"> 1.3. Latin-2 kiosztás használata konzolon </a></h4> Latin-2-es bet?k használatát legcélszer?bben (hasonlóan az alapeset <tt>437</tt>-es kódlapjához) 852-es kódlap szerint kódolt bet?készlettel valósíthatjuk meg. A 852-es kódlapot csak a képrny?fontok kódolására használjuk, és a latin-2 (ISO8859-2) kódokat leképezzük a 852-es kódokra. <p> Két ok miatt célszer? a 852 kódlap használata a latin-2-ben kódolt bet?készlethez képest: <ul> <li>Nem kell átírni a <tt>termcap</tt>, <tt>terminfo</tt> bejegyzéseket; <li>A VGA kártyák a 9. bit kiegészítését csak a megfelel? helyen lev? vonal(keret)rajzoló karaktereknél támogatják. <li>A fentiek miatt pl. az <code>mc</code> és egyéb konzolon futó programok rendes vonalrajzoló karaktereket írnak ki. </ul> <p> <b>Újabb <tt>console-tools</tt> csomag (konzol-eszközök) használatánál</b> <p> A megfelel? bet?kiosztást tartalmazó állományt (pl. <tt>classic-852-16.psf.gz</tt>) a <tt>/usr/lib/kbd/consolefonts</tt> könyvtárba, míg a képerny?-leképezést (pl. <tt>iso02_cp852.trans</tt>) tartalmazó állományt a <tt>/usr/lib/kbd/consoletrans</tt> könyvtárba másoljuk. </html>
I see the problem -- there is a <p> just before the table toward the top of the file. After the document has gone through the parser, the file is normalized and the </p> is inserted after the table. When I moved the </p> to before the table, it all worked fine. went from this (snippet): <center><div align=center> Verzió: <tt>0.5.6</tt><br> 1999. július 19. </div></center> <p> <table border=0 bgcolor="#e0e0e0" cellspacing=8 cellpadding=9> <tr><td> Ha valaki olyan böngész?vel rendelkezik, amelyikben ez a lap nem jól olvasható, kérem írja meg. </td></tr> </table> to this: <center><div align=center> Verzió: <tt>0.5.6</tt><br> 1999. július 19. </div></center> <p> </p> !!!!!!!!!!!!!!NOTE THE END </P> <table border=0 bgcolor="#e0e0e0" cellspacing=8 cellpadding=9> <tr><td> Ha valaki olyan böngész?vel rendelkezik, amelyikben ez a lap nem jól olvasható, kérem írja meg. </td></tr> </table>
actually this is a parser issue, reassiging to parser
Assignee: beppe → harishd
Status: UNCONFIRMED → NEW
Component: Editor → Parser
Ever confirmed: true
QA Contact: sujay → bsharma
This could be related to bug 77145 ( fix is in ) and bug 82971 ( fix in hand ).
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.3
Priority: -- → P1
Ok, I'm not able to reproduce the problem. Marking WFM.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
One of the two bugs have been fixed. But the "extra space" bug remains (also seen it in the 2001071104 Win32 build). Try with this small sample (162 bytes with Windows EOLs): <html> <head> <title>test</title> </head> <body> <p> 1 123456789 1234567890123 1234567890 123456789 (123 <tt>123456789012345678901</tt>) 123456. </html> Composer saves this as: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>test</title> </head> <body> <p> 1 123456789 1234567890123 1234567890 123456789 (123 <tt>123456789012345678901</tt> ) 123456. </p> </body> </html> So a space is introduced before the ")".
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Content model for the above testcase ( Ref. comment 2001-07-12 02:22 ): *********************************************************************** docshell=0135E310 html@02D02B30 refcount=8< head@02D02A40 refcount=2< title@02D23140 refcount=2< Text@02D23420 refcount=2<test> > > Text@02D1FBB0 refcount=3<\n> body@02D233C0 refcount=3< Text@01394A70 refcount=3<\n> p@013949A0 refcount=3< Text@01394940 refcount=3<\n1 123456789 1234567890123 1234567890 123456789\ n(123 > tt@013947E0 refcount=3< Text@01394780 refcount=3<123456789012345678901> > Text@01394640 refcount=4<) 123456.> <<<<<<<< NO NEW LINE <<<<<<<< > Text@0137A650 refcount=3<\n> > > Content model after saving the testcase thro' composer: ******************************************************* docshell=0135E310 html@0251CA90 refcount=8< head@0251B200 refcount=2< title@0245E6E0 refcount=2< Text@0245AA90 refcount=2<test> > > Text@02458C70 refcount=3<\n> body@0245AA30 refcount=3< Text@0250A7E0 refcount=3<\n \n> p@0250A730 refcount=3< Text@0250A6D0 refcount=3< 1 123456789 1234567890123 1234567890 123456789 ( 123 > tt@0250A570 refcount=3< Text@0250A510 refcount=3<123456789012345678901> > Text@0250A3D0 refcount=4<\n) 123456.> <<<<< THERE IS THE NEW LINE <<<< > Text@0250A090 refcount=3<\n \n> > > Looks like composer has added the extra new line. Back to Beppe.
Assignee: harishd → beppe
Status: REOPENED → NEW
not sure why we are inserting a space. I would normally give this to Joe, but handing over to akkana and cc kin. reducing from P1 to P3
Assignee: beppe → akkana
Priority: P1 → P3
This extra space/newline is coming from nsHTMLContentSerializer::AppendToStringWrapped() because the particular line containing the </tt> exceeds 72 characters. Akk, are we supposed to be enforcing 72 col hard wraps in composer output, or just MsgCompose? Perhaps we should be setting some flag in composer to avoid this Serializer behavior. On a side note, the extra new line after <body> is happening, because "body" is listed in nsHTMLContentSerializer::LineBreakAfterClose().
We definitely want to have wrapping of composer output, not just mail, otherwise source composer-generated documents will be very difficult to read and edit. But we need to be smarter about wrapping just before or just after a tag when there's no adjacent whitespace, apparently.
Status: NEW → ASSIGNED
This is basically the same as bug 56921: the nsHTMLContentSinkStream had wrapping code that worked, and for some reason it was replaced with new wrapping code which doesn't work. Handing over to Anthonyd for when the rewrite of the wrapping code (or plugging in the code that was previously there) happens since these bugs should go together, since there's no point in fixing it in the current code if it's going to be replaced. I note that we're no longer calling the nsLineBreaker interfaces needed for I18n, either. For the record, the offending newline is coming from the last AppendToString call in nsHTMLContentSerializer::AppendToStringWrapped, currently line 560.
Assignee: akkana → anthonyd
Status: ASSIGNED → NEW
moving to 0.9.4
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.3 → mozilla0.9.4
after mucho discussion with kin, this bug is not going to be easy to fix, if it should be even fixed at all. more investigation with the mail news team needs to be done to figure out a solution to this. part of the problem is that when we are feeding out the document from the stream, by the time we get a line that is longer than 72 characters, we cant go back to break at an earlier spot. Not sure how we are going to get around that small technical problem. anthonyd
Whiteboard: [C]
--> kin
Assignee: anthonyd → kin
Status: ASSIGNED → NEW
-->DOM to Text Conversion module owner
Component: Parser → DOM to Text Conversion
Summary: Composer breaks lines at inappropriate positions → [serializer]Composer breaks lines at inappropriate positions
-->module owner
Assignee: kin → anthonyd
QA Contact: bsharma → sujay
accodring to syd, harishd is the new module owner of serializer. -->harishd
Assignee: anthonyd → harishd
And when did that happen! I wasn't aware of it until I spoke to heikki. FYI: The decision is not final. I need to talk to people before accepting ownership. For now I'm not the owner. In anycase this is not going to get fixed for 0.9.4. Moving to 0.9.5
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Reassigning plaintext serializer bugs to Peter ;-)
Assignee: harishd → peterv
All these missed the bus/train/plane/boat/whatever. Sad.
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Target Milestone: mozilla0.9.6 → mozilla0.9.8
Assigning to myself after a discussion with Nisheeth.
-> Tanu
Assignee: peterv → tmutreja
Attached patch Patch fot fixing it... (deleted) — Splinter Review
In the existing serializer(nsHTMLContentSerializer.cpp) code, whenever we see a new node appearing at/after the 72 column, we insert a line break. Though it provides a better view of the source but at times, this changes the original HTML view too. As all the new line character(s) and white space(s) are squeezed to a single white space while viewing the HTML, in the patch I inserted the break only at the places having a new line or white space. In that case the 'view source' would be formatted to some extent but these additional breaks will not affect the original HTML in terms of looks. Additionally, as we also add the line breaks before/after certain tags like <p>, breaking at such a point should work in most of the cases.
Nisheeth and I just went through the patch. It looks like it should fix the problem. While understanding the logic, we also found the cause of bug 56921: the "else" clause (line 581 in the patched file) starts at the wrap column and then searches forward to the next space, hence always makes lines longer than the wrapcol. That's an old bug, not made any worse by this patch, so it doesn't block acceptance of this patch. I'll add more comments in that bug. The only thing I worry about with the present fix is what happens if we create a file in the editor with a long block of open/close tags with no spaces between them, only newlines (which might happen in a table, for example). I'm going to apply the patch in my tree and do some testing, but if you've already tested this sort of case and are confident that it works, please say so.
Oh, one other issue: you should probably run this by someone in the intl group to see if you should be using nsLineBreaker instead of explicitly searching for a space and nothing else. Naoki, can you take a look at this patch, or pass this bug along to someone else who can look at tell us if this is okay?
I am not familiar with this bug, so just comment general info. Some languages can break without a space. The line breaker interface can return possible breakable position. Shanjian is the owner of the code, cc to him. I think the patch of looking for spaces only will not do anything wrong for languages which does not requre a space for line breaking. But using the linebreaker would make the code benefit more languages.
Tried the 0.9.8 final build (mozilla-win32-0.9.8-talkback.zip). The extra space bug remains. Just try it with the small sample previously (2001-07-12) posted.
When testing, please be sure to have the Reformat ("pretty print") HTML source option selected (Preferences/Composer).
Past 0.9.8, moving forward.
Target Milestone: mozilla0.9.8 → mozilla0.9.9
*** Bug 101755 has been marked as a duplicate of this bug. ***
*** Bug 104144 has been marked as a duplicate of this bug. ***
Biro, the patch is not yet checked-in. Would you please verify it once the bug is closed!
Whiteboard: [C] → [C][patch needs r/sr=]
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.9 → mozilla1.0
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+, topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword. Please send any questions or feedback about this to adt@netscape.com. You can search for "Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
Comment on attachment 64804 [details] [diff] [review] Patch fot fixing it... sr=jst. Akkana should r=.
Attachment #64804 - Flags: superreview+
Whiteboard: [C][patch needs r/sr=] → [C][patch needs r=/a=]
Attachment #64804 - Flags: review+
Whiteboard: [C][patch needs r=/a=] → [C][patch needs a=]
Comment on attachment 64804 [details] [diff] [review] Patch fot fixing it... I checked the review box but bugzilla didn't list a comment that I'd done it. Trying again: r=akkana imeanitthistime.
Comment on attachment 64804 [details] [diff] [review] Patch fot fixing it... a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #64804 - Flags: approval+
Checking in for tmutreja Fixed with checkin D:\mozilla\content\base\src>cvs commit cvs commit: Examining . Checking in nsHTMLContentSerializer.cpp; /cvsroot/mozilla/content/base/src/nsHTMLContentSerializer.cpp,v <-- nsHTMLCont entSerializer.cpp new revision: 1.41; previous revision: 1.40 done
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Using build 04-01, I am unable to reproduce the original problem, or the problem discussed in comment #6. Marking VERIFIED. If anyone is still able to reproduce this problem, feel free to reopen this bug.
Status: RESOLVED → VERIFIED
*** Bug 149200 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: