Closed Bug 146062 Opened 22 years ago Closed 22 years ago

nsWebBrowserPersist::SetDocumentBase() does not work with XHTML/XML

Categories

(Core Graveyard :: File Handling, defect, P2)

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.3alpha

People

(Reporter: hjtoi-bugzilla, Assigned: hjtoi-bugzilla)

References

Details

(Keywords: xhtml)

Attachments

(1 file, 2 obsolete files)

nsWebBrowserPersist::SetDocumentBase() uses GetElementsByTagName() method which most likely will work unexpectedly in XHTML/XML documents. Look at bug 133654 how to recognize when it is plain HTML and when it is XHTML, so that you'd need to use GetElementsByTagNameNS(). In case it is not HTML and not XHTML, you might want to use XML Base (http://www.w3.org/TR/xmlbase/)
QA Contact: sairuh → petersen
Target Milestone: --- → Future
I have a patch, taking...
Assignee: adamlock → heikki
Priority: -- → P2
Whiteboard: [fixinhand]
Target Milestone: Future → mozilla1.3alpha
Attached patch Fix v1.0 (obsolete) (deleted) — Splinter Review
The beginning of this patch contains the fix for bug 136718. The rest of the patch is for this bug. This works even for XML documents that have nothing but |<img xmlns="http://www.w3.org/1999/xhtml"/>| as content, in which case no head or base tags will be inserted. There is something weird going on here, though. SetDocumentBase() is called twice: first without |aBaseURI|, then with it. Regardless, the base tag does never appear to be serialized, at least if none existed to begin with. This code seems to create and append one so that is weird. This happens with HTML and XML alike, and has nothing to do with my changes.
Comment on attachment 105664 [details] [diff] [review] Fix v1.0 r=adamlock with some minor nits: NeedXHTMLBaseTag should run a loop over an array of elements to reduce bloat. xhtml_ns, head, base should probably be called kXHTMLNS, kHead etc. for consistency with other consts. EmptyList should be renamed IsEmptyList for readability, or deleted altogether if NeedXHTMLBaseTag is rewritten to use a loop.
Attachment #105664 - Flags: review?(adamlock) → review+
Thanks Adam, good points, will fix. But what about the SetDocumentbase() weirdness, is my observed behavior expected or not? Because right now the output that is produced for a document (say, |<doc><img/></doc>| will output an empty head tag, but no base.
Status: NEW → ASSIGNED
The base tag is removed during persistence to ensure that the relative links are not fouled up when the document is saved to disk. It is then restored afterwards to ensure the DOM document still works properly when control returns to the browser. So actually thinking about it, perhaps there should be no base tag there at all - our code should strip all base tags (XML or otherwise) before saving and restore them afterwards. Local files require all links to external documents need to be made absolute, all links to inline content, stylesheets etc need to be made relative.
Attached patch Fix v1.1 (obsolete) (deleted) — Splinter Review
Fixed problems Adam pointed out.
Attachment #105664 - Attachment is obsolete: true
I logged the weird behavior I saw as bug 179990.
Attachment #106127 - Flags: superreview?(kin) → superreview?(alecf)
As I commented when I realised before, the removal of base tags is actually deliberate during persistence which is why I think this patch needs more consideration. The base tags are removed because it hoses relative links in the file. For example, if the HTML had a base tag pointing to http://www.mozilla.org then an img tag fixed up to point to "mozilla.org_files/image.gif" would resolve to "http://www.mozilla.org/mozilla.org_files/image.gif" which is wrong. Therefore the base tag has to be removed in the local copy to ensure relative links correctly resolve. Anchors and other links are made absolute to ensure they also work properly. As the DOM document was supplied by the caller, the base tag must be reinstated before returning. As an alternative to this remove/reinstate behaviour it may be possible to strip out the base tags during the encoding phase during the call to FixupNode. If the node is a base element, we replace it with an empty comment node as it is saved to disk. We would still have to make a note of what the base tag actually was for fixup, but it would mean the original document would not have to be modified in this weird way.
Comment on attachment 106127 [details] [diff] [review] Fix v1.1 Well.. I gotta say that the use of GetElementsByTagNameNS seems a little ugly to me. I mean, isn't this going to do a search of the whole tree? The especially scary one is of course NeedXHTMLBaseTag() which calls it in a loop. As for the code itself, I think you meant to put kXMLNS into the loop in NeedXHTMLBaseTag(), rather than that NS_LITERAL_STRING()? I guess before I sr= this, I want to hear some justification as to why we can't do something like a depth first, short-circuiting search for the first occurance of a given tag, rather than walking the whole tree.
Attached patch Fix v1.2 (deleted) — Splinter Review
Address Alec's issues, including depth-first-search. The reason I had not done it earlier was because use of XML is still pretty limited, and it could be argued we should only check the content type, and not the contents. But this is almost as fast for typical XHTML-only documents as checking the content type, and this will also work for stuff served without that content type so this should be more robust.
Attachment #106127 - Attachment is obsolete: true
Attachment #106421 - Flags: superreview?(alecf)
Attachment #106421 - Flags: review?(adamlock)
Attachment #106127 - Flags: superreview?(alecf) → superreview-
Comment on attachment 106421 [details] [diff] [review] Fix v1.2 this is better - I might have made abstracted the depth-first searching method a bit so that I could replace GetElementsByTagName[NS] but since GetElementsByTagName is probably only called once or twice per call to SetDocumentBase, Can you just add a comment where you're calling GetElementsByTagName[NS] saying that this is a one-time operation since you're doing base tag fixup? with the comment sr=alecf
Attachment #106421 - Flags: superreview?(alecf) → superreview+
Comment on attachment 106421 [details] [diff] [review] Fix v1.2 I think the patch for bug 179990 should go in first or be incorporated into this one. The issue of having to remove the base tag and then restore it should go away then. This patch does look okay otherwise, except for the way the tag array is defined. Is there a better way to define a PRUnichar string list?
Do I need to wait long for the other bug? I am asking because this patch here also contains the fix to the xml-stylesheet issue which is what is holding me from landing the fix in 136717. The good thing about the string list is that it is done statically. I don't know any better way to do this. I could make it cleaner-looking by making it use dynamic memory, but we don't want that. I guess I could make this code use atoms, but that would add new dependencies I think. I don't think this code is so performance sensitive to warrant that.
Comment on attachment 106421 [details] [diff] [review] Fix v1.2 r=adamlock, though the other bug will have to be updated to remove some of the stuff in SetDocumentBase and the calls to it.
Attachment #106421 - Flags: review?(adamlock) → review+
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: [fixinhand]
Verified per last comments.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: