Closed Bug 145196 Opened 23 years ago Closed 22 years ago

Blank lines before tags in head (meta, link, etc...) created after editing source

Categories

(Core :: DOM: Serializers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.2beta

People

(Reporter: brian, Assigned: brian)

References

Details

Attachments

(6 files, 8 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
akkzilla
: review+
Details | Diff | Splinter Review
(deleted), patch
akkzilla
: review+
jag+mozilla
: superreview+
jesup
: approval+
Details | Diff | Splinter Review
(deleted), patch
akkzilla
: review+
Details | Diff | Splinter Review
(deleted), patch
brian
: review+
kinmoz
: superreview+
Details | Diff | Splinter Review
A remaining issue from bug 46227. After making changes in html source view then switching to normal view and back, there are blank lines before each tag in the <head> after title. These lines contain spaces which accumulate each time you do this. Akkana: CC'ing you so you can look at this. I have a patch.
Attached patch patch (not final) (obsolete) (deleted) — Splinter Review
Here is the patch. I first started by attacking the issue of the accumulating spaces, then I was going to remove the meta tag from LineBreakBeforeOpen like you said. So I wrote some code to remove all spaces at the beginning of each line, and tried it out, and I found that the newline went away completely, without touching LineBreakBeforeOpen. This also fixes the problem where when you edit a file that has more indenting than Mozilla would give, the extra spaces of the indents can be kept as is, but rewrapped so they aren't at the beginning of a line. Unfortunately, there is a problem with this patch. There are cases where the space at the beginning of the string does mean something: test<b> case</b> " case" becomes "case", and this becomes one word.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: patch
Burpmaster: are you sure that these lines are coming for every tag, not just tags that are listed in LineBreakBeforeOpen? I tried putting some random tags, like <b>, in the head, and they didn't get blank lines prepended; but meta and script did. (LineBreakBeforeOpen will return true for any block node, as well as the tags listed explicitly.) I'm not sure the extra line is bad in the case of script; I agree it looks funny for the meta tags; what other tags cause this behavior?
Attached patch new patch (obsolete) (deleted) — Splinter Review
I don't think listing an element in LineBreakBeforeOpen was ever intended to put two lines breaks in. I can see that it's making a check if the current line is blank (if mColPos is zero) before inserting a newline. As long as an element (meta or link) should start on its own line, it should be listed there. We don't want <meta> on the same line as <title>. The real problem is that there were always a few spaces on the line at the time LineBreakBeforeOpen was called, so LineBreakBeforeOpen decided there needed to be a line break. I also noticed the same thing was happening before </tr> in tables. This patch removes any unnecessary spaces at the beginning of each line, which fixes both the <meta> and </tr> problem. I also found a bug introduced by my fix to bug 46227. I had assumed the first and last newline served no purpose and could be removed without being converted to spaces. This wasn't always the case. For example: a<b> b </b>c would have the newlines removed and no spaces would take their place. It now checks if mColPos==0 to determine if the first newline should be removed, and I moved addSpace outside of AppendToStringWrapped and renamed it to mAddSpace. To deal with the ending newline, I moved addSpace out of AppendToStringWrapped and put it in the header file (named mAddSpace now). This is to wait and see if a newline is added for the next element appended before adding the space.
Attached patch newer patch (obsolete) (deleted) — Splinter Review
Removes all leading and trailing spaces in the old lines when pretty-printing, but makes sure that at least one space is preserved when it makes a difference, such as "a<b> b </b>c" (also handles the situation correctly if you replace those spaces with newlines) This fixes all the current newline/space problems I'm aware of, including the <meta> and <links> issues, as well as extra spaces after </tr> in tables, and extra spaces after the <td> before a table nested within that <td>. It also a problem introduced in the fix for bug 46227. With this patch, you can now open a file, make a change, save it, open again, change, and save again, and not have it grow each time!
"It also a problem introduced in the fix for bug 46227." That should be "It also fixes a problem introduced in the fix for bug 46227." I'm talking about the issue with a<b> b </b>c where the newlines are eliminated and and "a b c" becomes "abc".
The logic looks good (I haven't tested it yet -- I've been away at offsites for the last few days). Two issues: 1. Please initialize mAddSpace (to PR_FALSE, I assume) in nsHTMLContentSerializer::nsHTMLContentSerializer. Most of the time it may get initialized in time by hitting tags with break before, but let's not count on it. 2. if (Substring(aStr, strOffset, 1) == NS_LITERAL_STRING(" ")) I'm worried about this possibly having performance problems. Cc'ing jag, the string expert: jag, can you comment on whether this needs to be rewritten to use string iterators? Do you think this patch might also cure bug 141983? I asked Joe to cc both of us on that bug when he mentioned it in a meeting today; it sounds similar.
Comment on attachment 85413 [details] [diff] [review] newer patch >+ while (strOffset < length) { >+ >+ // Cut all leading spaces >+ if (Substring(aStr, strOffset, 1) == NS_LITERAL_STRING(" ")) { >+ strOffset++; >+ while ((Substring(aStr, strOffset, 1) == NS_LITERAL_STRING(" ")) && (strOffset < length)) { >+ strOffset++; >+ } This is pretty expensive since you'll be creating two temporary objects per space you need to skip. A cheaper alternative is to use iterators (not quite unlike using const char* to iterate over C strings): nsAString::const_iterator begin, end, iter, endIter; aStr.BeginReading(begin); aStr.EndReading(end); iter = begin; while (iter != end) { if (*iter == ' ') { ++iter; while (iter != end && *iter == ' ') ++iter; strOffset = Distance(begin, iter); if (mColPos > 0) mAddSpace = PR_TRUE; } // This is how much is needed to fill up the new line ... } Or you could get rid of strOffset and completely switch to iterators if you want. For skipping the spaces at the end: // this logic is kinda odd looking and would go away if one were to use // endIter = iter; // start searching from iter // FindCharInReadable(PRUnichar(' '), endIter, end); // in the code above this, since endIter will be set to |end| if // no space was found. if (indx != kNotFound) { endIter = begin; endIter.advance(indx); } else { endIter = end; } // skip all trailing spaces if (endIter != iter) { --endIter; // endIter points after the end of the string, move it one back if (*endIter == ' ') { while (endIter != iter) { --endIter; if (*endIter != ' ') { ++endIter; // make endIter point after the last non-space break; } } // Add back one space. This will get cancelled when necessary. mAddSpace = PR_TRUE; } else { ++endIter; // make endIter point after the last non-space } lineLength = Distance(iter, endIter); if (lineLength > 0) { line = Substring(aStr, strOffset, lineLength); // alternatively: // if (iter != endIter) { // line = Substring(iter, endIter); It's up to the module owner, reviewer and super reviewer to decide whether to take the patch as is or rewrite this code using iterators, the above is merely some guidance on how they can be used. If you do decide to use iterators and have questions, feel free to ask me.
Oh, I had one formatting issue with the patch that I forgot to mention before: please split the two long while lines, to keep line length less than 80 columns, so that it displays everywhere and can be printed on all printers. My guess is that jst will want to see the iterators used rather than Substring when it's time to ask for his sr, and I would prefer that too (but wouldn't block it for that if it's okay with jst -- cc'ing).
Attached patch Rewrite of AppendToStringWrapped, v. 1 (obsolete) (deleted) — Splinter Review
Here's a rewrite using string iterators. I keep the position of the last space encountered in order to always wrap before column 72, not always after. This patch fixes the bug, but it currently only removes leading spaces, not trailing spaces. I think consecutive spaces should be reduced to one, since they wouldn't be preserved in a rewrap anyway. I'm currently figuring out the best way to do that. I haven't added code to remove trailing spaces yet because the code to reduce consecutive whitespace chars will handle that. Any comments on this patch?
Comment on attachment 87082 [details] [diff] [review] Rewrite of AppendToStringWrapped, v. 1 Two comments after a quick look: + pos++; + while ((*pos == ' ' || *pos == '\n') && (pos != end)) { + pos++; + } Never ever use the post-increment (or decrement) operators on iterators (or any other non-trivial types for that matter), they're *way* more expensive than using the pre-increment operators (i.e. do ++pos, not pos++). And also, you need to move the (pos != end) check to the beginning of the condition, accessing an iterator that points past the end of the string is not safe.
Attachment #87082 - Flags: needs-work+
I'll probably do another rewrite. I saw nsDependentSubstring in nsHTMLContentSerializer::AppendToStringConvertLF. I couldn't find any documentation, but I could tell it was being used to append a substring to another string, and it sounded like it would be pretty fast. I changed it to use nsDependentSubstring, and changed to pre-increments. But before I attach the new version, I need to ask if by telling me to move the (pos != end) check to the beginning of the condition, you mean I should change while ((*pos == ' ' || *pos == '\n') && (pos != end)) to while ((pos != end) && (*pos == ' ' || *pos == '\n')) I was under the impression that there is no guarantee across all compiler settings/platforms that the check will end early if I get a false. But to get to the reason I wanted to do another rewrite, I figured the biggest performance problem was that an intermediate string was created every time we needed to append, so I coded it like the original code, to append an entire string of words seperated by spaces from the original string instead of appending one word at a time. But I wonder if by using both string iterators and nsDependentSubstring, I could still get good performance appending one word at a time. The code would be a whole lot simpler, that's for sure. How would I measure performance here?
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.1beta
> I was under the impression that there is no guarantee across all compiler > settings/platforms that the check will end early if I get a false. The check will end early on all compilers we care about (since the language requires them to end it early).
Attached patch Rewrite of AppendToStringWrapped, v. 2 (obsolete) (deleted) — Splinter Review
Fixing those two problems and using nsDependentSubstring.
Much cleaner way of doing things. I need to figure out how to do some performance testing.
Attached patch Rewrite of AppendToStringWrapped, v. 3 (obsolete) (deleted) — Splinter Review
Oops, had a little last-minute typo in v2.
Comment on attachment 89000 [details] [diff] [review] Rewrite of AppendToStringWrapped, v. 3 >Index: nsHTMLContentSerializer.cpp >=================================================================== [ ... ] >+ // cut leading spaces >+ if (*pos == ' ' || *pos == '\n') { >+ ++pos; This sometimes cuts off spaces that need to be there. For example, type: Here is some <b>bold</b> text (followed by some gibberish to make it wrap to a couple of lines) Go into source mode and you'll see it correctly; but go back to normal and do OutputHTML, and you'll see that the space between bold and text disappears. Also, the first line is really long -- maybe it's not actually going through the wrapping code at that point? When I save it, the space is there again; but I don't trust it always to be there if it's not there in OutputHTML. I'm not quite sure why it's happening. This is also kind of ironic: you've completely rewritten wrapping, which is fine, it needed to be rewritten; but the reason it needed to be rewritten is that it doesn't use the I18n-approved nsILineBreaker like nsPlainTextSerializer uses. That's okay, you're not making it any less internationalizable (and I wouldn't let that block a review if you get it working); but this latest patch is actually farther from being able to use the line breaker, so if anyone ever does decide to go in and make it I18n clean, they'll probably have to tear out all your new code (unless you can find a line breaker interface that will take a character at a time from string iterators, which maybe it can.) Sorry, I probably should have noticed that and mentioned it earlier, but I didn't realize what a complete rewrite it was. The actual liklihood of anyone rewriting the code soon to use the line breaker is fairly small anyway, and your algorithm is better than the code you're replacing (it made lines too long, yours doesn't), so I have no problem taking this if you can fix the minor issue with spaces after tags, and add a few more comments for people like me who find wrapping algorithms hard to follow. I was initially not clear on the difference between mAddSpace and spaceInLine. Could you add a short comment somewhere explaining early on what spaceInLine will be used for. Basically it's to guard against lines with abnormally long words, right? But you can describe it in whatever way seems clearest to you. >+ else if (c == '\n') { >+ >+ if (mAddSpace) { Minor nit: is the blank line here intentional? I was confused about what the clause beginning here was doing, but I think I have it now -- if we get an embedded newline, then we save what we have so far, plus a space if one is called for, and re-start the segment -- but why is mAddSpace true since we might have just written a space to the output string? For the rest: I like the break if we get to max col, then break at the saved last space. That will fix the "lines too long" bug in the current code, and spaceInLine handling the long-word case; I don't see any problems with that part. I think this is really close and will be better than what is there now; just put my mind at ease about the one dropped space example I wish we had a good tough wrapping test in the output tests (both plaintext and html could use it). I should try to write one sometime.
I set mAddSpace to true if we are not at the beginning of the line, so if some more text continues on the current line, a space is added first. I was unable to reproduce the missing space from the Output HTML command in the debug menu of Composer. I wanted to time the two versions of AppendToStringWrapped that I wrote, so the first one isn't finished yet. I still need to add back the cutting of trailing spaces.
Blocks: 147494
Attached patch Rewrite of AppendToStringWrapped, v. 4 (obsolete) (deleted) — Splinter Review
Did a little more work on it, and added some comments. Hopefully this is clearer.
Attachment #84032 - Attachment is obsolete: true
Attachment #84198 - Attachment is obsolete: true
Attachment #85413 - Attachment is obsolete: true
Attachment #87082 - Attachment is obsolete: true
Attachment #88918 - Attachment is obsolete: true
Attachment #89000 - Attachment is obsolete: true
Now that I'm cutting trailing spaces as well, I'm making sure to set mAddSpace to true when the string ends with trailing spaces, and false when it doesn't. I think this patch is just about ready...
Attachment #91624 - Attachment is obsolete: true
Comment on attachment 91815 [details] [diff] [review] Rewrite of AppendToStringWrapped, v. 5 >+ segStart = pos; >+ spaceSeen = PR_FALSE; >+ ++segStart; Can these two seqStart lines be collapsed into one, e.g. seqStart = pos+1 ? > if (c == ' ') { > lastSpace = pos; > spaceSeen = PR_TRUE; > } > else if (c == '\n') { I'm not really clear on why spaces and newlines in the input stream aren't equivalent. Can you explain? It does work (at least for the cases I've thought of to test, and doesn't break the serializer tests) but I'd like to understand it better. I'll go ahead and stamp it r=akkana now on the assumption that there's a good reason for it -- the patch certainly makes things better than before.
Attachment #91815 - Flags: review+
Newlines and spaces in the input aren't treated the same because I'm doing the same thing the old code did, trying to speed things up by appending in larger chunks, rather than a simple cycle of append word, append space, append word, and so on. Several words can be taken and appended together as long as they are separated only by spaces and not newlines. If a newline is encountered where one shouldn't be, the only option is to append what we have so far, add a space, and then start reading again from after the newline. This was obviously important for performance with the old code, where a temporary string was created for each append, but now that I'm using string iterators and have found nsDependentSubstring, I wonder if appending one word at a time could now have good performance. I actually did write a patch to append one word at a time, which is far simpler. It is attachment 88919 [details] [diff] [review]. Perhaps you missed that one. It writes long lines like the old code, but I can easily fix that if we decide to go the simple route, and I know of no other problems with that patch.
What you have is fine -- let's stick with it (unless you want to go back to the other patch and fix the long-lines problem).
In that case, I think it's ready. No, those two lines can't be condensed into one, as the behavoir for adding a number to an iterator doesn't seem to be defined. Out of curiosity, I figured out how to time things, and I measured the time spent in nsDocumentEncoder::EncodeToString serializing about 1 meg of random characters and spaces. Current CVS code: 357ms The complex code: 450ms The simpler code: 571ms
jst or jag, could you sr this, please?
Is there anything we can do about the significant slowdown that this patch introduces? I guess this code is fast enough for what we use it for, but still, if we can speed it up we should...
I suppose this is why: http://www.mozilla.org/projects/xpcom/string-guide.html#Looping_Iterators Yes, it is fast enough for what we use it for. As far as I know it's only used in Composer, when pretty print is on, and probably always with HTML mail. Even half a meg is unusually large to edit in Composer or send as mail, but I'll go ahead and try out what is suggested at that URL.
Are we simply doing more work (looking at it on a higher level), or is this due to the chosen implementation, or a combination of both? I think your code will become quite a bit more complex if you use an outer loop (fragments) and inner loop (characters within fragment) approach. Hmmm, after looking at this code some more, does |aStr| really need to be of type |nsAString| or could it be |nsASingleFragmentString|? It's only called from two places, both of which pass in an nsAutoString. burpmaster, could you try the following and see how it impacts your time measurements: Change |aStr|'s type to |const nsASingleFragmentString&| and change the iterator types to |nsASingleFragmentString::const_char_iterator|. That should be enough to get your code working with |const PRUnichar*| iterators instead of the multi-fragment ones, and should give you an idea of the impact of the multi-fragment iterators on the times you measured.
Attached patch patch using nsASingleFragmentString (obsolete) (deleted) — Splinter Review
373ms
So 20ms slower than "current" CVS. Could anything else (e.g. updating to the trunk) explain those 20ms? What does this patch do better than the old code?
When I posted that last patch, I did another test of the current code in CVS, and got the same result, so the 20ms difference is because of the patch. The patch avoids writing spaces at the beginning of a line to prevent indentation from accumulating on each pretty-printing. When the string ends in a space, it isn't written immediately so that it won't be written if a newline follows (such as in the case of tags in LineBreakBeforeOpen). Wrapping now occurs before the column limit, instead of always after it. So this stops the blank lines in the <head> tag and fixes bug 147494.
Target Milestone: mozilla1.1beta → mozilla1.2alpha
Well, fixing a couple of bugs, cleaning up code, and it only costs 20ms. I'm pretty happy with that. Akkana, if you could r= this last patch, I'll sr it if/when you're happy.
Comment on attachment 93401 [details] [diff] [review] patch using nsASingleFragmentString >+ aOutputStr.Append(segStart, lastChar - segStart + 1); In the previous patch you did a lastChar++ here (instead of just adding 1 to the argument), and you make that change in one other place. You also changed if (pos != lastChar) to if (pos != lastChar+1) (that's a diff between the two patches) which is consistent with that. But the handling of lastChar didn't change in the other couple of places where it's referenced. The code looks right and works for my tests, but please reassure me that this was intentional. >+ // write up to the last space encountered >+ //nsDependentSubstring dataSubstring(segStart, lastSpace); >+ //aOutputStr.Append(dataSubstring); >+ aOutputStr.Append(segStart, lastSpace - segStart); Should probably remove the commented-out lines. Otherwise, r=akkana.
Attachment #93401 - Flags: review+
Attached patch commented out code removed (deleted) — Splinter Review
Yeah, the + 1 stuff was intentional, because I want to include that char in the string, where with lastSpace I don't. This is just taking into account that Substring(start, end) actually refers to the characters from start to end-1.
Attachment #93401 - Attachment is obsolete: true
Comment on attachment 96175 [details] [diff] [review] commented out code removed Transferring r=akkana from previous patch.
Attachment #96175 - Flags: review+
Er, is this waiting for something? Jag? Akkana?
Target Milestone: mozilla1.2alpha → mozilla1.2beta
Jag? You offered in comment 32 to sr this ... does the offer still hold? Maybe we can get it in in time for the 1.2beta freeze tomorrow night.
Comment on attachment 96175 [details] [diff] [review] commented out code removed sr=jag Sorry, too many bugs to keep track of, it helps if you ping me in e-mail (not bugmail).
Attachment #96175 - Flags: superreview+
Comment on attachment 96175 [details] [diff] [review] commented out code removed a=rjesup@wgate.com for trunk
Attachment #96175 - Flags: approval+
I've checked in the fix for Burpmaster; taking the liberty of marking the bug fixed (reopen if there are any issues remaining).
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
This commit have added a "may be used unitialized" warning on brad TBox: +content/base/src/nsHTMLContentSerializer.cpp:658 + `const PRUnichar * lastSpace' might be used uninitialized in this function
This fix corrected problems when "Reformat HTML source" is selected in Composer Preferences. There are still many similar problems existing when "Retain original source formatting" is selected. They have existed since at least August, 2002. I was waiting to see if this bug fix corrected them. Still exist in build 2002102204. See Bug 159615 Comment #6 and attached test case for these problems.
Reopening the bug to deal with the warning fix. Burpmaster? Should lasSpace be initialized to end, or what?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Warning fix (deleted) — Splinter Review
lastSpace is never used unless spaceSeen is true, and spaceSeen is only set to true after lastSpace is set. A null value could probably be used here instead. So there's no real problem here, but to fix the warning I'll initialize lastSpace to the beginning of the string. That makes more sense to me as lastSpace refers to the last space encountered, and it would always be at or before the current position.
What is the state of this bug? Is it still broken or only in some cases?
*** Bug 190443 has been marked as a duplicate of this bug. ***
*** Bug 188616 has been marked as a duplicate of this bug. ***
The only thing I see in 1.2.1 is that the tags inside the <head> tag, are wrapped if they are too long. For example this tag, get wrapped: <meta name="keywords" content="very long, very long, very long, very long, very long, very long, very long, very long"> When switching from normal view to html source, nothing happend for me anymore. I have: Mozilla 1.2.1 WinXP SP1 The bug #159615 deals with the <style> tags inside the <head> tag.
This bug is fixed, where "this bug" refers to issues when HTML reformatting is turned on. The only remaining issue I'm aware of is bug 176866, which is very similar, but only occurs when HTML source reformatting is off. If you're wondering about the reopened status, see comment 43. This bug was reopened to fix a compiler warning.
*** Bug 194579 has been marked as a duplicate of this bug. ***
Comment on attachment 104262 [details] [diff] [review] Warning fix I guess I should get this patch reviewed.
Attachment #104262 - Flags: review?(akkana)
Comment on attachment 104262 [details] [diff] [review] Warning fix The fix looks fine, r=akkana. But as long as we're in there, can we fix the remaining warning in the file? I'll attach a patch that does that (and also fixes a too-long line).
Attachment #104262 - Flags: review?(akkana) → review+
Comment on attachment 118303 [details] [diff] [review] Fix the last remaining warning (sign comparison error) Burpmaster, can you review my additional warning fix? Thanks!
Attachment #118303 - Flags: review?(burpmaster)
Comment on attachment 118303 [details] [diff] [review] Fix the last remaining warning (sign comparison error) Yeah, looks good. r=burpmaster
Attachment #118303 - Flags: review?(burpmaster) → review+
Comment on attachment 118303 [details] [diff] [review] Fix the last remaining warning (sign comparison error) Kin: could you please sr this 2-line warning/formatting fix? Thanks!
Attachment #118303 - Flags: superreview?(kin)
Comment on attachment 118303 [details] [diff] [review] Fix the last remaining warning (sign comparison error) sr=kin@netscape.com
Attachment #118303 - Flags: superreview?(kin) → superreview+
Fix is in. Thanks, burpmaster! That's the last issue, and it's safe to mark this fixed now, right?
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Yeah, since bug 176866 covers the case when "Retain original source formatting" is turned off.
Note that Bug 176866 has been marked as a dupe of Bug 97278. See Bug 97278 for "Retain original source formatting" mode where this behavior still exists.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: