Closed Bug 126669 Opened 23 years ago Closed 23 years ago

Save As XML adds traling slashes to start tags when saving XHTML

Categories

(Core :: DOM: Serializers, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: brentmh, Assigned: hjtoi-bugzilla)

References

()

Details

(Keywords: dataloss, regression, xhtml, Whiteboard: [Hixie-1.0][Hixie-P1][ADT2])

Attachments

(1 file, 2 obsolete files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8+) Gecko/20020220 BuildID: 2002022003 If you go to an XML page in the browser and then Save As an xml file, the saved file does not match what was browsed. Mozilla is adding a trailing slash to some (but not all) of the start tags. It doesn't remove the content of the tag or the end tag. It just adds a slash, making the start tag into an empty tag. This causes the saved page to be not well-formed XML. Reproducible: Always Steps to Reproduce: 1. Go to the above URL 2. Select "Save As", either under the file menu or the context menu 3. Save the page as XML 4. View the page in a text editor Actual Results: The <body> start tag is now an empty tag: <body/> Expected Results: <body> should be preserved as <body>
Confirmed. Over to Adam -- this is a webbrowserpersist problem.
Assignee: law → adamlock
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: dataloss, mozilla1.0
Not persist but an encoder problem. Who actually "owns" the document encoder since I have 2 or 3 bugs which that component is causing?
There are also some private -moz attributes that show up. I use a lot of these for various annotations to modulate MathML rendering. (Such attributes can only be set programmatically since Expat would flag an XML parsing error if an attribute starts with '-'). I would like the SaveAs to skip such attributes too. They turn the saved document into an invalid document anyway.
Oh how I hate our -moz attribute crap, it's wrong, wrong, wrong. And the DOM API's shouldn't allow those being set either, one day we will validate attribute names when attributes are set too, like the DOM spec says we should... (just wenting...)
Perhaps the DOM WG could consider formalizing anonymous attributes in the same way as there are anonymous nodes. What I find very useful with the -moz attributes is that they also allow to automatically match desired style rules with the Style System without further fuss.
All that comes to my mind when I see those -moz attributes is "hack", I don't see any good uses of those attributes, and I don't think advocating their use is a good idea, at all. Poking attributes into the DOM tree just for the purpose of making something happen simply because it will due to other surrounding systems that happen to be in place is just wrong. We should be taking out those attributes from our code, not adding more.
It's off-topic for this bug, but the -moz- attribute stuff could be rewritten to use the architecture suggested in bug 115462. Regarding this bug, it's rather serious dataloss. (To the unexperienced user, it makes their documents unusable.)
Whiteboard: [Hixie-1.0][Hixie-P1]
-> DOM to text conversion
Assignee: adamlock → harishd
Component: File Handling → DOM to Text Conversion
QA Contact: sairuh → sujay
*** Bug 131620 has been marked as a duplicate of this bug. ***
Keywords: nsbeta1
A workaround: Do a 'view source' and then 'save as'.
*** Bug 132263 has been marked as a duplicate of this bug. ***
Tanu, could you have a look at this? Tentatively setting milestones and priorities and reassigning. Reassign to me if you can't take this. This might potentially be a dupe of bug 127300 or bug 120556.
Assignee: harishd → tmutreja
Priority: -- → P1
Target Milestone: --- → mozilla1.0
Attached patch Patch to fix it... (obsolete) (deleted) — Splinter Review
It does not look like an exact dupe of the other bugs. Patch solves this one...
Status: NEW → ASSIGNED
Keywords: patch
Whiteboard: [Hixie-1.0][Hixie-P1] → [Hixie-1.0][Hixie-P1][needs r=]
This won't do it, there are several XML mime types. I will look into this as well. Thanks for finding where this happens.
Checking this based on the mSerializer was a good thing to do, as this is the one created on the basis of mime types but I could not do that. So just assumed that other xml mime type which are like "Application/..." or "img/..." may not need the serilaizer to come in picture. Please see if somehow we can check if the mSerilaizer is pointing to nsXMLContentSerializer.
Looking at the code I think FixupNode() clones some tags but does not clone their children, which means that nsXMLContentSerializer::AppendElementStart() will think the element is empty and end the starting element with "/>". So it looks like FixupNode() is screwing up the XML serializer. Also, we need FixupNode() for XHTML elements so we can not just bybass it. I guess the easiest fix is to add one more parameter to AppendElementStart() called aEmptyElement or something. HTML will just ignore it, but XML will end empty elements with "/>". The has-children check from nsXMLContentSerializer.cpp around line 514 needs to move into nsDocumentEncoder.cpp's SerialializeNodeStart(), line 310 or so.
This does not affect most people, but for those that work with XML (like me) this is so bad that I would not use a product with this kind of bug in it. I have a fix, will attach. Taking.
Assignee: tmutreja → heikki
Status: ASSIGNED → NEW
Summary: Save As XML adds traling slashes to start tags → Save As XML adds traling slashes to start tags when saving XHTML
Whiteboard: [Hixie-1.0][Hixie-P1][needs r=] → [Hixie-1.0][Hixie-P1][ADT3]
Attached patch Proposed fix (obsolete) (deleted) — Splinter Review
Attachment #76378 - Attachment is obsolete: true
What about the private [-|_]moz* attributes? Would be best to filter them too.
File a separate bug please, with a testcase, and I can do that.
Attached patch Forgot to diff public (deleted) — Splinter Review
Attachment #76793 - Attachment is obsolete: true
Follow-up: bug 134298 - SaveAs leaves private [-|_]moz attributes in the output
Comment on attachment 76799 [details] [diff] [review] Forgot to diff public The patch looks fine to me, with one concern, that of changing the serializer API (I'm not sure what the status is of making that a w3c standard). r=akkana, provided that jst or vidur or someone involved with the API also r's or sr's it. Also, please run the TestOutput test (which will turn Tinderbox orange if it breaks) before checking in. If Tanu sees this in time, I hope she can also take a look and make sure she has no objections to the API change.
Attachment #76799 - Flags: review+
Comment on attachment 76799 [details] [diff] [review] Forgot to diff public sr=jst
Attachment #76799 - Flags: superreview+
jst is the editor of the DOM Level 3 Load and Save module, he gave sr= Besides, this is internal Mozilla interface and has nothing to do with that DOM draft.
Status: NEW → ASSIGNED
Whiteboard: [Hixie-1.0][Hixie-P1][ADT3] → [Hixie-1.0][Hixie-P1][ADT3][fixinhand]
Also, I will run the tests before checkin in, although I don't see how this could break HTML or plaintext serializers since they don't care about the API change.
As Hiekki mentioned, I also find this internal API change to be fine with serializer. Patch works fine and succesfully passes all the test cases too.
adt1.0.0+ approval for checkin. Chaning to ADT2 because this negatively affects Web Developers using our product.
Keywords: adt1.0.0adt1.0.0+
Whiteboard: [Hixie-1.0][Hixie-P1][ADT3][fixinhand] → [Hixie-1.0][Hixie-P1][ADT2][fixinhand]
Comment on attachment 76799 [details] [diff] [review] Forgot to diff public a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #76799 - Flags: approval+
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: [Hixie-1.0][Hixie-P1][ADT2][fixinhand] → [Hixie-1.0][Hixie-P1][ADT2]
verified.
Status: RESOLVED → VERIFIED
Kathy thinks that this has regressed composer -- on empty documents, composer is now outputting a single br tag, which comes from composer's bogus node, which should look like: <br _moz_editor_bogus_node>. I don't understand why this would have caused that, but I'm cc'ing Kathy and Joe for discussion.
I think you mean bug 134298. It had fixes for XML and HTML side (I did XML, rbs did HTML). rbs checked in the HTML side fixes on 4/13 on trunk and 4/15 on 1.0 branch.
From further looking, the bug/regression has been there before my checkin.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: