Closed Bug 107927 Opened 23 years ago Closed 21 years ago

Composer added <cr> to html code, leading to unwanted spaces

Categories

(Core :: Layout, defect, P2)

defect

Tracking

()

RESOLVED DUPLICATE of bug 2750
Future

People

(Reporter: cowwoc2020, Assigned: kmcclusk)

References

()

Details

(Whiteboard: [needs r/sr=][needs-work/testing][ADT3])

Attachments

(4 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:0.9.5) Gecko/20011016 BuildID: 2001101616 Mozilla interprets a carriage return as a space. When opening http://bbs.darktech.org/scp.html and saving it, Composer inserts <cr> characters at the end of each line, leading to unwanted spaces in the title (fading from one color to the next). Composer shouldn't add <cr> characters. Reproducible: Always Steps to Reproduce: 1. Open http://bbs.darktech.org/scp.html in composer 2. Save file to disk using composer 3. Hit saved (local) file using Mozilla browser and you will notice unwanted spaces in the title Actual Results: Unwanted spaces appear all over the file (most noticably in title) Expected Results: Don't insert <cr> characters, thereby avoiding unwanted spaces If http://bbs.darktech.org/scp.html is down, try again later. It's up most of the time..
Moving to All. not an Os/2 only problem.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: OS/2 → All
Hardware: PC → All
Feels like a data loss bug, we should not be causing change merely by opening up a file and saving it immediately. Joe, is this worth EDITORBASE? Do we make some transformation on the content (the DOM) when we load a file into composer?
Assignee: syd → jfrancis
Component: Editor: Composer → Editor: Core
This is not actually the editor, perse. It is the serializer, after you are done editing and when the doc is written out (saved). Over to serializer...
Assignee: jfrancis → harishd
Component: Editor: Core → DOM to Text Conversion
--> peterv
Assignee: harishd → peterv
Priority: -- → P3
Target Milestone: --- → mozilla0.9.9
This bug also introduces unwanted spaces in link text. For example, if there are two adjacent images that are used as links: [A HREF="foo.html"][IMG SRC="foo.jpg"][/A] [A HREF="bar.html"][IMG SRC="bar.jpg"][/A] The images are displayed next to each other with a space inbetween them due to the carriage return after the first line, but this is acceptable since the space isn't really visible. However, when Composer "pretty prints" the HTML, it reformats the HTML and adds a carriage return like so: [A HREF="foo.html"][IMG SRC="foo.jpg"] [/A][A HREF="bar.html"][IMG SRC="bar.jpg"] [/A] The carriage return at the end of the line is interpretted as a space and because it is part of the link, it becomes and underlined gap between the images. Oops. Not cool. This could be a potential bug for any tag that displays text between the start and end delimiter. The pretty print feature should be smart enough determine which tags it is okay to add a carriage return after and which it is not.
Bug seems to be resolved with the same patch that I attached for Bug#85184. Second part of the bug which 'Brahm' mentioned above, looks different from the original bug.
Removing "(aName == nsHTMLAtoms::img)" from nsHTMLContentSerializer::LineBreakAfterOpen() [line# 830] solves the second part of the bug but is there any specification based on which we add a line break after these tags? In general, I feel that we should not add the line breaks just before/after the tag unless we reach to the next white space or new line, as these added characters are interpreted as 'space' by HTML. Though when added with an existing space/new line character, there is no effect in the original HTML looks.
--> myself.
Assignee: peterv → tmutreja
Instead of removing the "img" tag from the list of tags that need line break after the opening tag, patch modifies the code in the following manner: LineBreakAfterOpen() will still return TRUE for "img" but instead of actually adding a linebreak based on the true value of LineBreakAfterOpen(), we are setting a flag "isLineBreakDue" to be True. This flag indicates that a line break is due and we need to add it as soon as we get a right place. Right place here is the immediate occurrence of a white space or a new line character. So we move ahead to deal with coming tag(s) and try to find the right place and only then we add a line break. Considering the fact that chances of not getting a white space/ new line for quite a long (good enough to make the view source real ugly) are really rare, I feel that we should break the things at right place to retain the correctness rather than a pretty view source. Additionally, when combined this patch with the one for 85184 and 103207, hope we would be having a nice enough view source too.
> PRBool isLineBreakDue = PR_FALSE; This should be a new member variable, PRPackedBool mIsLineBreakDue in nsHTMLContentSerializer.h. What if two files are being serialized at once, and one sets this flag and then the other checks it? I also notice that somehow a PRInt32 got inserted in between the list of PRPackedBools in the .h file (not your fault, it happened some time earlier). PRPackedBools should all be declared together so that the size of the object ends up being smaller. Could you add your new flag before or after the other PRPackedBool members in the .h file, and move mPreLevel to either before or after the list of packed bools? Thanks! > spaceIndex = tempdata.FindChar(PRUnichar(' '), 0); > newLineIndex = tempdata.FindChar(PRUnichar('\n'), 0); nsString has FindCharInSet, which would look for more than one character at the same time, which would be a little more efficient (you would only have to search as far as the first of the two characters). At line 178 it looks like you're assuming that PR_MIN will do the right thing if they're both kNotFound. Could that cause problems? (Though that logic will probably go away if you use FindCharInSet, so it may not matter.) I don't understand why you're only doing this for img tags. The new logic seems like the right thing to do -- shouldn't we be doing it everywhere that we consider inserting a line break? In fact, shouldn't it be setting isLineBreakDue everywhere that it's currently calling AppendLineBreak?
Patch still deals with "img" tag only. I think we can extend the same logic in following cases: 1. LineBreakAfterOpen : link , map and area tags. 2. LineBreakBeforeOpen : script, link, style and select tags. 3. LineBreakAfterClose: td, p, select, map and div tags. Once we have consensus about it, I'll incorporate those changes too.
Comment on attachment 67569 [details] [diff] [review] Incorporating Akkana's comments... r=akkana pending replacing the special img tag code with something more general. I think we should always do the same thing if LineBreakAfterOpen, etc ... no need for special-casing for certain tags.
Attachment #67569 - Flags: review+
*** Bug 90257 has been marked as a duplicate of this bug. ***
Keywords: patch
Whiteboard: [needs r/sr=]
Comment on attachment 68320 [details] [diff] [review] Incorporating Akkana's suggestions for applying the same logic to all tags. >+ if (mIsLineBreakDue) { >+ PRInt32 indx = 0; >+ PRInt32 length = 0; >+ indx = tempdata.FindCharInSet(" \t\n\r", 0); >+ >+ if (indx != kNotFound) { >+ length = tempdata.Length(); >+ tempdata.Left(data, indx); >+ AppendToString(data, aStr); >+ AppendLineBreak(aStr); >+ tempdata.Right(data, length - (indx + 1)); >+ } >+ else >+ data.Append(tempdata); >+ } >+ else >+ data.Append(tempdata); >+ Be consistent with the indentation here, use 2 spaces, not 4. sr=jst if you fix that.
Attachment #68320 - Flags: superreview+
Attachment #68320 - Flags: review+
Attached patch fixed indentation... (deleted) — Splinter Review
Ship it!
Comment on attachment 69450 [details] [diff] [review] fixed indentation... I like what you're doing in the code here ... but unfortunately when I try it on a real page, it's dropping a lot of linebreaks that it should be putting out. All the linebreak after close tags are being dropped, possibly others as well. To see the problem, try visiting a fairly simple page like http://shallowsky.com/index.html in the editor, and click on the source formatting tab; try both with and without the patch, and note the difference.
Attachment #69450 - Flags: needs-work+
nsbeta1+, let's get this in.
Keywords: nsbeta1+
Whiteboard: [needs r/sr=] → [needs r/sr=][needs-work/testing]
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.9 → mozilla1.0
*** Bug 128695 has been marked as a duplicate of this bug. ***
Akkana, it looks tough to find a solution which may resolve this bug without ruining nice view source and indentation. Basically the bug looks like in Parser. Isn't that the parser is supposed to ignore CRLF immediately after an opening tag and before a closing tag. Only thing I can think to make the patch "a bit" better is "not" to apply the patch for all tags e.g. script, style, select, table, tbody may have line break after opening tag. Though it would append a line break(replaced to a space later) yet the chances of getting a wrong view is quite less in fact nil for some tags like script and style.
Marking it as depending on 15738. I still feel that if we can follow the HTML specification http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks that "The following two HTML examples must be rendered identically: <P>Thomas is watching TV.</P> <P> Thomas is watching TV. </P>", This bug will automatically get resolved with no side effects on "View Source" or "Indentation", that means all the new lines that we add here before/after tags (opening/closing) essentially those are going to get neglected.
Depends on: 15378
ADT3 per ADT triage.
Whiteboard: [needs r/sr=][needs-work/testing] → [needs r/sr=][needs-work/testing][ADT3]
Tanu is right. According to the W3C spec, we shouldn't be showing those line breaks immediately after open tags or immediately before close tags as spaces. I don't think this is actually a parser problem; the parser needs to preserve that information, so that we'll have it at output time if we're trying to preserve the original source formatting of the document. So I think it has to be up to layout to follow that W3C specification. Cc'ing Kin and Joe for layout advice, and Simon because he might have useful advice.
I've asked around on #mozilla and nobody is sure whether we're deliberately ignoring the W3C spec on newlines before/after tags or not. Cc Kevin, the expert on what our intent is for layout. Kevin, are we doing this deliberately, or is it a bug that we might fix? Also cc Beth, who knows everything about W3C specs and may have been privy to earlier discussions regarding our handling of this part of the spec.
Timeless thinks there's already a bug on this, suggests choess may know about it.
Sounds like bug 2750.
Yes Akkana is right, Tanu is right, Hixie is right, even rickg was right. It's a bug, but in order to preserve round-tripping of the source, we have to fix it in layout rather than parser.
Depends on: 2750
Moving to next milestone-> mozilla1.0.1
Target Milestone: mozilla1.0 → mozilla1.0.1
I've written up a little guide to whitespace handling and how it (supposedly) interacts among the different standards and specifications we follow; see <URL:http://www.stwing.upenn.edu/~choess/mozilla/whitespace.html>.
Component->layout, reassigned->Nisheeth.
Assignee: tmutreja → nisheeth
Status: ASSIGNED → NEW
Component: DOM to Text Conversion → Layout
-> Layout team's manager, Kevin Mccluskey.
Assignee: nisheeth → kmcclusk
clearing nsbeta1+, nominating for EDITORBASE. We need to discuss this in the EDITORBASE triage.
Keywords: nsbeta1+nsbeta1
Whiteboard: [needs r/sr=][needs-work/testing][ADT3] → [needs r/sr=][needs-work/testing][ADT3][EDITORBASE]
Minusing. EDITORBASE triage
Keywords: nsbeta1nsbeta1-
Whiteboard: [needs r/sr=][needs-work/testing][ADT3][EDITORBASE] → [needs r/sr=][needs-work/testing][ADT3][EDITORBASE-]
In Mozilla 1.1a+ Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1a+) Gecko/20020707, it still happens in my past 1.01 composer, but only in the <style> tag in the <head> and the <body> looks fine except for the <script> tag where it happens too.
I don't know if this is the same bug. But empty line spaces are added when editing and saving a document in the <HTML>Source tab. Editing and saving in the Normal tab preserves the document. But once you start editing from the Source tab, everytime you switch to Normal or save the document an empty line will be added in the head and on different parts of the document.
That's bug 145196 and you don't need to use the HTML source view, only reload and save again.
Target Milestone: mozilla1.0.1 → ---
Priority: P3 → P2
Target Milestone: --- → Future
editorbase?
QA Contact: sujay → beppe
Whiteboard: [needs r/sr=][needs-work/testing][ADT3][EDITORBASE-] → [needs r/sr=][needs-work/testing][ADT3][EDITORBASE?]
removing editorbase
Whiteboard: [needs r/sr=][needs-work/testing][ADT3][EDITORBASE?] → [needs r/sr=][needs-work/testing][ADT3]
Could someone attach a testcase for this "layout bug"? We should most certainly not be ignoring linefeeds in cases like: <a href="whatever"> text </a> The problem, most likely, is that the HTML spec can't make up its little mind on whitespace handling, and all it says is pretty much incompatible with CSS. So being a CSS browser, we go with the CSS way....
From 9.1 of the spec: In order to avoid problems with SGML line break rules and inconsistencies among extant implementations, authors should not rely on user agents to render white space immediately after a start tag or immediately before an end tag. Thus, authors, and in particular authoring tools, should write: <P>We offer free <A>technical support</A> for subscribers.</P> and not: <P>We offer free<A> technical support </A>for subscribers.</P> The SGML part refers to Appendix B of the spec, which says: SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception. The following two HTML examples must be rendered identically: <P>Thomas is watching TV.</P> <P> Thomas is watching TV. </P> So must the following two examples: <A>My favorite Website</A> <A> My favorite Website </A> Seems pretty clear to me.
> authors should not rely All other browsers render said whitespace, and authors do rely on it. Life is tough. Furthermore, what happens when white-space:pre is set? The point is, what the HTML spec says is not what any actual tag-soup UA does. A and since we are a CSS browser at heart, what the CSS spec says about layout supercedes what the HTML spec says.
CSS white-space properties do not bear on this issue. "white-space: pre" operates on whatever whitespace exists in the "document tree" after parsing. A proper SGML parse will remove these spaces before the document tree is created. If bug 2750 is deemed WONTFIX, so be it, but there is no intrinsic conflict between the requirements of HTML/SGML and CSS here.
See comment 28. The argument being made is that the parser has to keep the whitespace "to preserve round-tripping" on parse and editor wants to produce the whitespace on output to make the source look pretty and that we should hack layout to somehow deal. But at that point the whitespace is in the DOM and CSS applies all the way.
bz: can you point me at the conflicting part of css? Are you just referring to the pre-formatted case?
For whitespace? Not only that, no. As far as CSS is concerned all whitespace between inline boxes is the same and should be rendered as whitespace even when white-space is not pre (but collapsed to a single whitespace, if white-space is not pre).
Ok, so if people are arguing that the HTML spec has anything to say here, then this is just a dup of bug 2750. Note that we already skip the first newline inside some tags (eg <pre>). See http://lxr.mozilla.org/seamonkey/source/htmlparser/src/CNavDTD.cpp#1077. It should not be difficult to adapt that code to skip the first newline in any HTML tag, if desired. Probably regress some pages, though. Doing something with the last newline is much harder. *** This bug has been marked as a duplicate of 2750 ***
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: