Closed Bug 103255 Opened 23 years ago Closed 20 years ago

We should throw an error on non-namespace-conforming XML

Categories

(Core :: XML, defect, P3)

Other
Other
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: sicking, Assigned: sicking)

Details

Attachments

(3 files)

We should treat non-namespace-conforming xml like we treat non-valid xml. Basically there are two things we should look out for: 1. Prefixes without a corresponding xmlns attributes. For example the following two documents uses the undeclared 'foo' prefix <doc><foo:bar/></doc> and <doc foo:attr=""/> This is easy to detect since nsXMLContentSink::GetNameSpaceId returns kNameSpaceID_Unknown when the prefixlookup fails. 2. More then one attribute with the same namespaceURI and localName. For example The following element has two attributes with localName 'a' and namespaceURI 'ns' <elem p1:a="foo" p2:a="bar" xmlns:p1="ns" xmlns:p1="ns"/> This should be fairly easy to detect and throw and error in the xml- contentsink. The idea I had to do this without performance hit is to have ::SetAttr return some other successvalue then NS_OK if there allready existed an attribute with that localName+nsURI (Such a value could probobly also be usefull for other clients). However I need some hints on how to report the errors the way that xml- validness errors are reported. I also need to abort documentloading done through document.load() and friends.
Status: NEW → ASSIGNED
Summary: We should thrown an error on non-namespace-conforming XML → We should throw an error on non-namespace-conforming XML
I don't think this is as easy as you think, given our current architecture. Mozilla code does not detect the error, it is done by Expat. You should look into htmlparser/src and how we display the well-formedness errors there. The cleanest solution in my opinion would be to make Expat report namespace errors to Mozilla (if you can figure out how Expat works).
Priority: -- → P3
Target Milestone: --- → mozilla0.9.8
Target Milestone: mozilla0.9.8 → mozilla0.9.9
i doubt i'll have time with this before 1.0 :(
Target Milestone: mozilla0.9.9 → mozilla1.1
With the new architecture this has become easier. We now report errors in the XML content sink, so the sink can also do namespace checking. But I agree, we can fix this later...
don't know what to target this for. I'll try to get to it once things settle down a bit
Target Milestone: mozilla1.1alpha → ---
Note -- s/valid/wellformed/ above. Namespace correctness is a wellformedness issue not a validity issue.
QA Contact: petersen → ian
marking new since i'm not working on these
Status: ASSIGNED → NEW
sicking: Any ETA on this? This is something we really should fix ASAP, before people start relying on our lack of error handling, because then we would not be able to fix it at all without breaking sites.
Attached file Unknown namespace on node (deleted) —
So.. detecting those first two errors and calling ReportError is easy, indeed. The problem is that we also need to stop the parse. Perhaps an error code that will cause nsExpatDriver can to call Terminate() or something along those lines? Also, will transformiix need its own changes to deal? It seems to use expat directly....
Yes other places then the contentsinks will most likly need to be changed. This includes the XSLT sink in transformiix and the RDF sink, and there might be others too, like XBL and/or XUL
Woah, woah, can't all those go through the same layer? We don't want to be replicating core well-formedness checking all over the tree! Ideally this should in fact be in expat itself.
absolutly! I'm not sure that we want to use expats namespace support, it might be a performance hit compared to how things are done now. However, it'd really be nice to have the nsExpatDriver or whatever it's called to do namespace resolution which would give us a single point of namespace-error-checking as well as remove some codeduplication that already exists.
(In reply to comment #14) > absolutly! I'm not sure that we want to use expats namespace support We don't, as far as I can tell. I just hacked nsExpatDriver to create the parser using XML_ParserCreateNS instead of XML_ParserCreate for the XHTML MIME type, and its error handler wasn't called for any of the three testcases in this bug, when loaded from a .xhtml file. So expat doesn't report those errors, it would seem (as the documentation at http://lxr.mozilla.org/seamonkey/source/parser/expat/lib/xmlparse.h#102 suggests...) > it might be a performance hit compared to how things are done now It would be, almost certinly, since it'd just convert the prefixes to namespace strings; we'd still need to handle the nsINameSpace stuff, just with longer strings. > However, it'd really be nice to have the nsExpatDriver or whatever it's called > to do namespace resolution which would give us a single point of > namespace-error-checking as well as remove some codeduplication that already > exists. Agreed. The namespace manager is per-document, right? We'd need to have a way for the driver to do the following: 1) Register a new xmlns on the document (ideally that would return to it the nsINameSpace it wants to use from that point on?). 2) Get the namespace id for a prefix (if it has an nsINameSpace, this is a given). Then it can enforce the validity constraints we want, right? That's about it, no? We'd need to change HandleStartElement/HandleEndElement to pass in namespace IDs / prefixes / localnames instead of the wstrings it uses now... One other issue -- nsIExpatDriver is idl/scriptable, while nsINameSpace is _so_ not. How do we want to handle that? Do we hack together yet another interface (inheriting from nsIExpatDriver) or something?
It's even simpler, there is only one namespace manager so the current code can more or less be moved directly to nsExpatDriver and nsIExpatSink::HandleStartElement can be made to take atoms and namespace ids. The scriptable part is trickier though...
Frankly, I have no clue why nsIExpatSink (not driver, my apologies) is scriptable. Do we have any impls that can be created from script? I don't think we do... Anyway, nsIAtom is scriptable (which is bullshit, of course), namespace ids are just longs, so there is no reason not to add methods using those to this api. As long as nsINameSpace is not in the api (and it wouldn't be, since that would be handled inside the driver), it's not an issue.
Ok, so it looks like the CSS3 Selectors suite has exactly the sort of invalid markup this bug is about... ;) With that in mind, I was looking into fixing this. The first question I ran into is how to pass the strings/atoms/namespaces efficiently... I guess I can stack-allocate an nsIAtom[] and fill it with the atoms and pass it in; same for namespaces. For strings, it sounds like copying all over. :( Ideally, we'd just pass an array of nodeinfos for all the attribute names, I think. But nsINodeInfo is most definitely not scriptable.... Thoughts?
I wouldn't mind making nsIExpatSink non-scriptable.
If we do, I'd probably go with passing nsCOMArray<nsINodeInfo> for the attr names _and_ passing through the original PRUnichar* array with notes in the api that the array should be walked by 2s to get the attr values... Seem reasonable? (Also, this will mean moving nsIExpatSink out of idl or tossing in %{C++ stuff...)
Er, scratch that last part. We should be able to get by with [native] if we go the non-scriptable route.
I think so. Note that I just discovered that newer versions of Expat seem to throw errors for this.
Hmm.. So maybe we want to just upgrade expat and use that? Note that I just discovered that I can't use NS_GetNameSpaceManager in nsExpatSink, since it's not part of gklayout... :(
You can still get it as a service, no?
Ah, yes, indeed. Good catch.
Does the new expat really catch this when used in non-namespace-mode? See comment 14 and comment 15.
Actually, I suspect perf may not be that bad... but would we be able to preseve namespace prefixes like we do now? (And do we care?)
Of course it needs to be in namespace-mode but we can remove the nsINameSpace stuff, Expat passes you the prefixes if you ask it to. The only thing I had to do is make it pass the xmlns attributes, it removes them from the attributes array in namespace-mode. I have a build where all of this works, but putting Expat in namespace-mode increases codesize even more :-/ and I haven't yet tested whether Expat is much slower in namespace-mode.
This was fixed by the fix for bug 274964.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: