Closed
Bug 549295
Opened 15 years ago
Closed 15 years ago
new lines are inserted in downloaded html.
Categories
(Core :: DOM: Serializers, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: michel, Assigned: laurent)
References
()
Details
(Keywords: regression)
Attachments
(4 files, 1 obsolete file)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
smaug
:
review+
bzbarsky
:
superreview+
dveditz
:
approval1.9.2.7+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)
Moin Moin.
From Firefox 3 something on Firefox started to manimuplate the downloaded html-text by inserting new-lines.
As this effectively hinders one from removing unwanted html-parts by using search and replace, I do have to use the ie now to download the html text.
Is there a possibility to save the html-text verbatim.
The aforementioned hyperling points to an msd-page, where ff 3.6 inserts new-lines even in the title of a paragraph or in simple sentences:
Search for:
_________________________________________________________________
Language Identifier
Constants and Strings
_________________________________________________________________
which should be one line.
_Tschuess,
__Michael.
Reproducible: Always
Steps to Reproduce:
1.Open "http://msdn.microsoft.com/en-us/library/dd318693%28VS.85%29.aspx"
2.Press File/Save Page as
3.View the written html file.
Actual Results:
Unsolicited new-lines breaking the searchability.
Expected Results:
The original html text.
Comment 1•15 years ago
|
||
This seems to have been a deliberate change from 3.5 to 3.6 -- it's trying to word-wrap the HTML so that it is more readable, but in this case it makes things worse. I agree that there should be a way to force it to save exactly the bits that came over the network, at least.
Status: UNCONFIRMED → NEW
Component: General → File Handling
Ever confirmed: true
Keywords: regressionwindow-wanted
Product: Firefox → Core
QA Contact: general → file-handling
Comment 2•15 years ago
|
||
Reporter, are you saving as "Web page, complete" (as oppose to "Web page, HTML only", which does not change the source in any way) in both cases?
Zack, there is such a way; save as "HTML only" as opposed to "complete".
Comment 3•15 years ago
|
||
I just tried, and Fx 3.5 and 3.6 seem to have the same behavior for me here when saving as "complete" (and in neither case do I see an obvious newline being inserted anywhere; certainly not in the "Language Identifier Constants and Strings" heading, as comment 0 seems to imply).
Comment 4•15 years ago
|
||
I tested it again and indeed 3.6 doesn't modify the file when saving as "HTML only" - but it does seem to reformat it when saving as "complete".
Comment 5•15 years ago
|
||
Comment 6•15 years ago
|
||
Yes, when saving as "complete" what's saved is a serialization of the DOM. This includes some reformatting in some cases, depends on what scripts did to the page, rewrites a bunch of URI attribute values to point to the saved copies of some subresources, etc, etc.
So what exactly is the problem? I can't tell at this point...
Comment 7•15 years ago
|
||
Reformatting is definitely happening when the page is saved as "complete" - it's most obvious if you compare the block of "javascript:__doPostBack(..." table entries near the head, in this and the other attachment.
I didn't do a 3.5 vs 3.6 comparison.
Updated•15 years ago
|
Attachment #429761 -
Attachment mime type: application/octet-stream → text/plain
Comment 8•15 years ago
|
||
Well, the OP wants no reformatting, so I guess the live question is whether he's happy with using "HTML only" instead of "complete".
Comment 9•15 years ago
|
||
Wait. Is this all about reformatting in the source as opposed to visual?
With "complete" there's no guarantee of what you get looking anything like the original source. The point is to preserve the DOM, not the source. So if that's all this is about, the bug is invalid.
Comment 1 says there's a change from 3.5 to 3.6. Is there?
Comment 10•15 years ago
|
||
It looks to me like 3.6 is much more aggressive about inserting line breaks in reserialized DOM than 3.5 was. Search for "Ethiopia" in both this attachment and attachment 429765 [details] - 3.6 broke the line immediately after that, 3.5 didn't. 3.6 also will insert a line break in between "<a" and "href", and in the middle of style="..." attribute values.
Whether that's a *bug* is disputable, I think.
Comment 11•15 years ago
|
||
Laurent, did we make purposeful changes of some sort here?
Reporter | ||
Comment 12•15 years ago
|
||
Moin Moin.
> Wait. Is this all about reformatting in the source as opposed to visual?
Yes, as sometimes You need to edit the text in an editor, which is verrrrry tedious if it has been reformatted, as the former versions of FF did not.
> With "complete" there's no guarantee of what you get looking anything like the
> original source. The point is to preserve the DOM, not the source. So if
> that's all this is about, the bug is invalid.
Well it used to be, as the name implies, so please, forgive me if I sticked to the description "Web page, complete" instead of an internal definition.
However, I cant save the document twice (as complete page and then the original text.).
Which solves the bug for me.
But as I assume, that this behaviour might irritate other users as well (who just do not know bugzilla).
Hence I suggest to add a checkbox in the files save dialog, upon choosing complete allowing to save the verbatim text using the same routine as in the html-only case (maybe saving it twice and overwriting the firstly saved file.).
Or just addin a new entry like "Web page, complete, with verbatim HTML preservation" .
_Tschuess,
__Michael.
Reporter | ||
Comment 13•15 years ago
|
||
P.S.: I hope I am not too pushy here (or even obnoxious). If You do think, that this topic is a waste of time, then drop it please.
P^2.S.: "However, I cant" should be "However, I can", of course ("http://www.inanime.com/imgx/ryoohki_eaten_carrot.jpg").
Comment 14•15 years ago
|
||
There's no way to preserve the original HTML when saving "complete" because you have to rewrite various attributes to point to local versions of the files.
That said, if this changed from Firefox 3.5, would you be willing to narrow down the date range when this changed using nightly builds?
Comment 15•15 years ago
|
||
Actually, nevermind. I can reproduce in 3.6 locally; I'll hunt down the range.
Comment 16•15 years ago
|
||
Ok, this changed in the range in which bug 422403 landed.
Assignee: nobody → laurent
Blocks: 422403
blocking1.9.2: --- → ?
blocking2.0: --- → ?
Keywords: regressionwindow-wanted → regression
Comment 17•15 years ago
|
||
Laurent, did that patch change the behavior of the serializer when neither OutputRaw nor OutputFormatted is set?
Should webbrowserpersist now be setting OutputRaw to get the old behavior?
Comment 18•15 years ago
|
||
Uh, yeah. This is totally a regression from bug 422403. Old code flow in AppendText for HTML:
if (inPre) {
// appendconvertlf
} else if (raw) {
// append
} else if (!format) {
// if long lines, wrap, else append
} else {
// wrap
}
"long" in this context is 128 chars of text on the line.
New code flow:
if (inPre || raw) {
// appendconvertlf
} else if (format) {
// append formatted wrapped
} else if (wrap) {
// wrap
} else {
// if has long lines, wrap, else appendconvertlf
}
But the new init code for the XHTML content serializer always sets OutputWrap if OutputRaw is not set. So for XHTML that last |else| is never hit. That's wrong, wrong, and again wrong. Especially since the claim is that this behavior is "to keep compatible behaviors for old callers of this interface", since it does nothing of the sort.
Comment 19•15 years ago
|
||
Now maybe the firefox ui should still pass OutputRaw here. But there's still a serializer bug.
Component: File Handling → Serializers
QA Contact: file-handling → dom-to-text
Comment 20•15 years ago
|
||
Not blocking on the branch, but please do request branch approval when you have a patch.
blocking1.9.2: ? → ---
status1.9.2:
--- → wanted
Assignee | ||
Comment 21•15 years ago
|
||
Ok, I'm going to work on it.
Assignee | ||
Comment 22•15 years ago
|
||
This patch just removed an assignement of nsIDocumentEncoder::OutputWrap to default flag. I also updated tests. Results are more consistent with the expected behavior. In fact, expected results are same results before the patch of bug 422403. I must admit I made a mistake :-)
I still have to check some things before asking a review.
Reporter | ||
Comment 23•15 years ago
|
||
P.S.: It is warming to see, that others make errors similar to the ones I do make.
Thank You for hunting it down. "http://www.welt.de/multimedia/archive/00262/Baer5_DW_Reise_Khut_262781g.jpg".
Assignee | ||
Comment 24•15 years ago
|
||
a new patch, with more tests fixed, ready for a review.
Attachment #430827 -
Attachment is obsolete: true
Attachment #434372 -
Flags: superreview?(bzbarsky)
Attachment #434372 -
Flags: review?(Olli.Pettay)
Comment 25•15 years ago
|
||
Comment on attachment 434372 [details] [diff] [review]
patch v1.1
sr=bzbarsky.
Don't forget to backport to 1.9.2 as needed!
Attachment #434372 -
Flags: superreview?(bzbarsky) → superreview+
Assignee | ||
Comment 26•15 years ago
|
||
Thank you Boris.
As we just discussed here https://bugzilla.mozilla.org/show_bug.cgi?id=524975#c37 I will modify a bit the patch for the trunk, in order to remove the flag I added in the attachment 409324 [details] [diff] [review] (http://hg.mozilla.org/mozilla-central/rev/4ae32b827a7f) for an attempt to fix bug 524975.
Comment 27•15 years ago
|
||
Comment on attachment 434372 [details] [diff] [review]
patch v1.1
Sorry for the delay.
Updated•15 years ago
|
Attachment #434372 -
Flags: review?(Olli.Pettay) → review+
Assignee | ||
Updated•15 years ago
|
Attachment #434372 -
Flags: approval1.9.2.3?
Comment 28•15 years ago
|
||
I have no problems taking this for 1.9.3, but I wouldn't hold the release for it.
blocking2.0: ? → -
Assignee | ||
Comment 29•15 years ago
|
||
landed http://hg.mozilla.org/mozilla-central/rev/16d33b6f570e (sorry Olli, I made a mistake in your name in the message of the commit :-( )
Assignee | ||
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 31•15 years ago
|
||
Comment on attachment 434372 [details] [diff] [review]
patch v1.1
Since the bug broke the old behavior of the html serializer, and causes issue for copy paste etc, The patch should be applied in the 1.9.2 branch
Attachment #434372 -
Flags: approval1.9.2.4? → approval1.9.2.5?
Updated•15 years ago
|
blocking1.9.2: --- → needed
Comment 32•15 years ago
|
||
Comment on attachment 434372 [details] [diff] [review]
patch v1.1
a=beltzner for mozilla-1.9.2 default
Thanks for writing tests!
Updated•14 years ago
|
Attachment #434372 -
Flags: approval1.9.2.5? → approval1.9.2.5+
Updated•14 years ago
|
Attachment #434372 -
Flags: approval1.9.2.5+ → approval1.9.2.6+
Assignee | ||
Comment 33•14 years ago
|
||
patch checked in mozilla-1.9.2 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/4c49ce3c8275 (gecko 1.9.2.6)
Assignee | ||
Updated•14 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•