Open Bug 349969 Opened 18 years ago Updated 6 years ago

RDFContentSink deals with parseTypeResourcePropertyElt nodes wrong

Categories

(Core Graveyard :: RDF, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: andrew, Unassigned)

References

(Blocks 3 open bugs)

Details

Attachments

(3 files, 4 obsolete files)

Reading in: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="#CoupledPendulum_version01"> <dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Coupled Pendulum Model</dc:title> <bqs:WebResource xmlns:bqs="http://www.cellml.org/bqs/1.0#" rdf:parseType="Resource"> <bqs:url>http://www.cs.wisc.edu/~hasti/cs310/notes/ODE1/ODE1Notes.html</bqs:url> </bqs:WebResource> <cs:simulation xmlns:cs="http://www.cellml.org/metadata/simulation/1.0#" rdf:parseType="Resource"> <cs:simulationName>SwingFor100s</cs:simulationName> <cs:multistepMethod>implicit-runge-kutta-2</cs:multistepMethod> <cs:linearSolver>direct</cs:linearSolver> <cs:variablesImportantInSimulation rdf:parseType="Collection"> <rdf:Description rdf:about="#time"/> <rdf:Description rdf:about="#a_angle"/> <rdf:Description rdf:about="#b_angle"/> </cs:variablesImportantInSimulation> <cs:boundIntervals rdf:parseType="Resource"> <rdf:Description> <cs:boundVariable><rdf:Description rdf:about="#time"/></cs:boundVariable> <cs:maximumStepSize rdf:datatype="http://www.w3.org/2001/XMLSchema#double">1</cs:maximumStepSize> <cs:tabulationStepSize rdf:datatype="http://www.w3.org/2001/XMLSchema#double">0.1</cs:tabulationStepSize> <cs:startingValue rdf:datatype="http://www.w3.org/2001/XMLSchema#double">0</cs:startingValue> <cs:endingValue rdf:datatype="http://www.w3.org/2001/XMLSchema#double">100</cs:endingValue> </rdf:Description> </cs:boundIntervals> </cs:simulation> </rdf:Description> </rdf:RDF> and re-serialising results in: <?xml version="1.0"?> <RDF:RDF xmlns:NS3="http://www.cellml.org/bqs/1.0#" xmlns:NS2="http://purl.org/dc/elements/1.1/" xmlns:NS1="http://www.cellml.org/metadata/simulation/1.0#" xmlns:NC="http://home.netscape.com/NC-rdf#" xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <NS1:multistepMethod RDF:about="rdf:#$AaQgT2" /> <NS1:maximumStepSize RDF:about="rdf:#$GaQgT2" RDF:datatype="http://www.w3.org/2001/XMLSchema#double" /> <RDF:Description RDF:about="file:///people/amil082/code/mozCellML/content/mozCellML/testsuite/coupled_pendulum_model.xml#CoupledPendulum_version01" NS2:title="Coupled Pendulum Model"> <NS3:WebResource RDF:resource="rdf:#$xaQgT2"/> <NS1:simulation RDF:resource="rdf:#$zaQgT2"/> <NS1:simulation RDF:resource="rdf:#$AaQgT2"/> <NS1:simulation RDF:resource="rdf:#$BaQgT2"/> <NS1:simulation RDF:resource="rdf:#$CaQgT2"/> <NS1:simulation RDF:resource="rdf:#$DaQgT2"/> </RDF:Description> <NS1:boundIntervals RDF:about="rdf:#$DaQgT2"> <RDF:Description RDF:resource="rdf:#$FaQgT2"/> <RDF:Description RDF:resource="rdf:#$GaQgT2"/> <RDF:Description RDF:resource="rdf:#$HaQgT2"/> <RDF:Description RDF:resource="rdf:#$IaQgT2"/> <RDF:Description RDF:resource="rdf:#$JaQgT2"/> </NS1:boundIntervals> <NS1:simulationName RDF:about="rdf:#$zaQgT2" /> <NS1:boundVariable RDF:about="rdf:#$FaQgT2"> <RDF:Description RDF:resource="file:///people/amil082/code/mozCellML/content/mozCellML/testsuite/coupled_pendulum_model.xml#time"/> </NS1:boundVariable> <NS1:tabulationStepSize RDF:about="rdf:#$HaQgT2" RDF:datatype="http://www.w3.org/2001/XMLSchema#double" /> <NS1:variablesImportantInSimulation RDF:about="rdf:#$CaQgT2"> <RDF:Description RDF:resource="file:///people/amil082/code/mozCellML/content/mozCellML/testsuite/coupled_pendulum_model.xml#time"/> <RDF:Description RDF:resource="file:///people/amil082/code/mozCellML/content/mozCellML/testsuite/coupled_pendulum_model.xml#a_angle"/> <RDF:Description RDF:resource="file:///people/amil082/code/mozCellML/content/mozCellML/testsuite/coupled_pendulum_model.xml#b_angle"/> </NS1:variablesImportantInSimulation> <NS1:linearSolver RDF:about="rdf:#$BaQgT2" /> <NS1:startingValue RDF:about="rdf:#$IaQgT2" RDF:datatype="http://www.w3.org/2001/XMLSchema#double" /> <NS1:endingValue RDF:about="rdf:#$JaQgT2" RDF:datatype="http://www.w3.org/2001/XMLSchema#double" /> <NS3:url RDF:about="rdf:#$xaQgT2" /> </RDF:RDF>
Could you produce a minimal testcase? And describe what's wrong in your oppinion? A quote to the spec backing that up couldn't hurt either.
There are several things wrong with the output, but for the purposes of this bug, the problem is that the graph has changed from a triple: file:///people/amil082/code/mozCellML/content/mozCellML/testsuite/coupled_pendulum_model.xml#CoupledPendulum_version01 http://www.cellml.org/metadata/simulation/1.0# anonymousNode1 and a series of triples from anonymousNode1 into a series of triples with predicate http://www.cellml.org/metadata/simulation/1.0#. This corresponds to handling production 7.2.18 in the specification incorrectly. Looking at the code, it seems like a major rewrite of the parser is required, because the states the parser can go through don't reflect the final W3C specification at all (they just happen to get the right graph for the very limited set of RDF that the serializer produces). I'm accepting the bug for now while I assess how much work is required and whether I will be able to make a patch, I will assign back to nobody@mozilla.org if it turns out I don't have time.
Status: NEW → ASSIGNED
Summary: RDFContentSink deals with children of anonymous nodes wrong → RDFContentSink deals with parseTypeResourcePropertyElt nodes wrong
This patch rewrites much of the nsRDFContentSink implementation. The parser now follows the general grammar from http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210 (to the extent possible with the current RDF node support). The new parser sets RDF:type triples for Sequences, Bags, and so on, and so I have updated the container code to support this. For now, I have kludged the new parser to set nextVal triples (a non-standard kludge which the container code uses) whenever the RDF:li element is used. When the containers are fixed, this can go away.
Attachment #235861 - Flags: review?(axel)
Comment on attachment 235861 [details] [diff] [review] Rewrite of RDFContentSink to be much more RDF/XML compliant No way this is going to land like this. Given that we only use those parts of RDF that work in our current implementation, the targeted changeset is just too large. Last time I rewrote the parser, I attempted to change much less and regressed, the risk is just too high. There are several subtasks in this bug that could be landed up front, I guess, making this less risky, and offering a better place to cut. Unsetting the review flag, as that's what I'm going to do, I'm not going to try to find the bugs here. There are probably code problems, the whitespace handling doesn't seem to match unicode, for example. Reification and XML literals on the other side are things that I hardly consider worth supporting, shouldn't there be a sudden change in the way the earth revolves around the sun. Executive summary, risk is high, practical impact is low, work to do is lots. Sorry for that.
Attachment #235861 - Flags: review?(axel)
Attachment #235861 - Attachment is obsolete: true
(In reply to comment #4) > (From update of attachment 235861 [details] [diff] [review] [edit]) > No way this is going to land like this. Given that we only use those parts of > RDF that work in our current implementation, the targeted changeset is just too > large. Last time I rewrote the parser, I attempted to change much less and > regressed, the risk is just too high. Do you have any regression tests from them? Do you have a list of regression bugs from this change? This would be useful in testing my patch to make sure we don't get the same regressions (it is likely that it would break in the same, or at least similar places). Aside from that, I am happy to go through the smoketest with my patch on Firefox and Thunderbird. Can you suggest anything else to catch regressions? Just to avoid doubt, I am not proposing we ever land this on a branch, so I don't really see what the regression issue is. > > There are several subtasks in this bug that could be landed up front, I guess, > making this less risky, and offering a better place to cut. The current parser structure is hard to fix, because the states it goes through don't match what is required at all, so that would require a rewrite of most of the parser just there (so to fix that you might as well go with my internal abstraction). Once we have done that, I can't see how going part way will save us from any regressions. Also, if you feel happier about this, perhaps we can put this on a branch for a little while so more people can test it, before merging into trunk? > > Unsetting the review flag, as that's what I'm going to do, I'm not going to try > to find the bugs here. > > There are probably code problems, the whitespace handling doesn't seem to match > unicode, for example. There is whitespace handling in several places, however I will presume you are referring to the handling of the ws production. From the RDF/XML specification: (at http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#ws): "7.2.12 Production ws A text event matching white space defined by [XML] definition White Space Rule [3] S in section Common Syntactic Constructs" The referenced section, at http://www.w3.org/TR/2000/REC-xml-20001006#NT-S says: " 2.3 Common Syntactic Constructs This section defines some symbols used widely in the grammar. S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space [3] S ::= (#x20 | #x9 | #xD | #xA)+ " which seems to match the definition of whitespace in the code. > Reification and XML literals on the other side are things > that I hardly consider worth supporting, shouldn't there be a sudden change in > the way the earth revolves around the sun. I disagree. Reification is an important part of RDF, and without support for it in the RDF/XML parser, it becomes very tedious to write reified statements. I also would like to see the RDF library more accessible to scripts and so used more outside Mozilla (I need to use it in my app, and I have met other people who are using RDF externally on #developers, and if there was a way for unprivileged scripts to get at it, it would probably be widely used). Therefore, by cutting corners like this on the implementation, we are holding up the potential of RDF in the Mozilla platform (not to mention that reification would probably be useful internally). The same concerns about external users applies when we consider XML Literals, and again, this is something which could easily be put to good use inside Mozilla. > > Executive summary, risk is high, practical impact is low, work to do is lots. You have made a good case not to land it on the branch, but no one has ever asked for that. I don't think that risk is a consideration we should make when landing on the trunk, because the trunk is the correct place to make progress. I know that Firefox/Thunderbird at least read in all the RDF files that it previously serialised out, as well as things like updates and the extension RDF I happened to have sitting around, and starts up normally. I have also checked the RDF/XML testcases from the RDF testsuite available at w3c.org, and it generates the right triples (short of things which I was expecting, like getting types of literals wrong for unsupported types). I guess the only remaining risks are: 1) Someone might have an unusual RDF graph that doesn't get roundtripped correctly. However, there already are cases like that, so the new code reduces the risk of this at the same time. 2) Some extensions might rely on some non-standard quirk we no longer support, with either an unparseable EM RDF file, or by code which calls the RDF serialiser. However, there are lots of much greater risks out there, and extensions are versioned to specific ranges of Firefox/Thunderbird anyway. Regarding how much time you have, perhaps I could try another reviewer? I realise that many of the RDF peers aren't that interested in fixing RDF, and just see it as an internal datastore, but perhaps you will be happy with another reviewer?
I personally don't have any testcases beyond what the w3c offers. I tried to do some regression testing based on those for the current spec, and didn't find that to be too useful. Merely because we pass to few of them right now. I previously testing roundtripping stuff like localstore.rdf etc, and that didn't cover all codepaths, either. And stuff broke :-(. Regarding risk, no, it's not ok to break the trunk. And if you can chunk work into smaller pieces to reduce risk, it's usually a good thing to do so. Generally, I don't see a consensus inside the development community to support RDF, there may be somewhat of a consensus to not do so, with which I'd disagree. I tried to get a constructive discussion on this going, and failed. My personal attitude is to reduce the invested effort and be prepared to see RDF die whenever someone dares to make mailnews not use it. If you feel like trying to make that atmosphere swing to the other side, you'd be welcome to do so, I sadly don't have powers to burn left in that area. For example, I poked about js support for an API idea I have at http://groups.google.com/group/mozilla.dev.tech.js-engine/browse_thread/thread/fed6e3c87f54d60f/7796dfec950c3a60?lnk=raot#7796dfec950c3a60, no answer there, too.
To use the test runner attached: 1) Load the page with universal XPConnect privilege. 2) Get the W3C testcases from http://www.w3.org/2000/10/rdf-tests/rdfcore/latest_Approved.zip and extract somewhere. 3) Enter the path to the toplevel test directory into the text box on the page, and press the "Run tests" button. The test results will then appear in the page once all tests have run. All tests currently run synchronously, but don't take long enough for this to be a problem (for the W3C approved test set anyway).
With my latest code (patch coming shortly), the following mismatches happen from running the official W3C testcases: /people/amil082/docs/rdftests/datatypes/test002: Fail: Triple (http://example.org/foo, http://example.org/bar, 0) in datasource but not in triple file. /people/amil082/docs/rdftests/datatypes/test002: Fail: Triple (<http://example.org/foo>, <http://example.org/bar>, "flargh") in triple file but not in datasource This happens because we parse integers early, rather than just storing the type and value as a string, so isn't a parser issue (in other words, this test case doesn't matter to us). /people/amil082/docs/rdftests/rdf-charmod-uris/test001: Fail: Triple (http://example.org/#André, http://example.org/#owes, 2000) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-charmod-uris/test001: Fail: Triple (<http://example.org/#André>, <http://example.org/#owes>, "2000") in triple file but not in datasource Expat doesn't recognise that sequence in its NFC normalisation. This is an XML parser issue, not an RDF/XML parser issue, if anyone cares it can be fixed in Expat. /people/amil082/docs/rdftests/rdf-containers-syntax-vs-schema/test001: Fail: Triple (rdf:#$TYQUI1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 3) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-containers-syntax-vs-schema/test002: Fail: Triple (rdf:#$UYQUI1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 3) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-containers-syntax-vs-schema/test003: Fail: Triple (rdf:#$VYQUI1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 3) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-containers-syntax-vs-schema/test004: Fail: Triple (rdf:#$WYQUI1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 5) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-containers-syntax-vs-schema/test004: Fail: Triple (rdf:#$WYQUI1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 5) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-containers-syntax-vs-schema/test007: Fail: Triple (rdf:#$.YQUI1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 3) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-containers-syntax-vs-schema/test007: Fail: Triple (rdf:#$+YQUI1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 3) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-containers-syntax-vs-schema/test008: Fail: Triple (http://desc, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 2) in datasource but not in triple file. /people/amil082/docs/rdftests/rdfms-rdf-names-use/test-031: Fail: Triple (http://example.org/node1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 2) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-ns-prefix-confusion/test0011: Fail: Triple (file:///people/amil082/docs/rdftests/rdf-ns-prefix-confusion/test0011.rdf#container, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 2) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-ns-prefix-confusion/test0012: Fail: Triple (file:///people/amil082/docs/rdftests/rdf-ns-prefix-confusion/test0012.rdf#container, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 2) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-ns-prefix-confusion/test0013: Fail: Triple (file:///people/amil082/docs/rdftests/rdf-ns-prefix-confusion/test0013.rdf#container, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 2) in datasource but not in triple file. /people/amil082/docs/rdftests/rdf-ns-prefix-confusion/test0014: Fail: Triple (file:///people/amil082/docs/rdftests/rdf-ns-prefix-confusion/test0014.rdf#container, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 2) in datasource but not in triple file. /people/amil082/docs/rdftests/rdfms-identity-anon-resources/test004: Fail: Triple (rdf:#$pZQUI1, http://www.w3.org/1999/02/22-rdf-syntax-ns#nextVal, 2) in datasource but not in triple file. All these failures are because we add the non-standard nextVal triples to be compatible with the old code. /people/amil082/docs/rdftests/rdfms-xml-literal-namespaces/test001: Fail: Triple (file:///people/amil082/docs/rdftests/rdfms-xml-literal-namespaces/test001.rdf#John_Smith, http://my.example.org/Name, <html:h1> <b>John</b> </html:h1> ) in datasource but not in triple file. /people/amil082/docs/rdftests/rdfms-xml-literal-namespaces/test001: Fail: Triple (</rdfms-xml-literal-namespaces/test001.rdf#John_Smith>, <http://my.example.org/Name>, " <html:h1 xmlns:html=\"http://NoHTML.example.org\"> <b xmlns=\"http://www.w3.org/1999/xhtml\">John</b> </html:h1> ") in triple file but not in datasource /people/amil082/docs/rdftests/rdfms-xml-literal-namespaces/test002: Fail: Triple (http://mycorp.example.com/papers/NobelPaper1, http://purl.org/metadata/dublin_core#Title, Ramifications of <apply> <power></power> <apply> <plus></plus> <ci>a</ci> <ci>b</ci> </apply> <cn>2</cn> </apply> to World Peace ) in datasource but not in triple file. /people/amil082/docs/rdftests/rdfms-xml-literal-namespaces/test002: Fail: Triple (<http://mycorp.example.com/papers/NobelPaper1>, <http://purl.org/metadata/dublin_core#Title>, " Ramifications of <apply xmlns=\"http://www.w3.org/TR/REC-mathml\"> <power></power> <apply> <plus></plus> <ci>a</ci> <ci>b</ci> </apply> <cn>2</cn> </apply> to World Peace ") in triple file but not in datasource Tests completed. The above are because we don't put xmlns on XML Literals properly. This is an area for future improvement, but we don't need to worry about it breaking current code, because we didn't support XML Literals at all before.
Attachment #236913 - Attachment is obsolete: true
(In reply to comment #7) > I personally don't have any testcases beyond what the w3c offers. I tried to do > some regression testing based on those for the current spec, and didn't find > that to be too useful. Merely because we pass to few of them right now. See my comment above, it shows that with my patch, we pass virtually all of the tests (the exception is the non-standard nextVal, which is in for backwards compatibility, an NFC related issue in Expat, and issues resulting from our incomplete XML Literal support, and the way we store typed literal). I think that going from passing almost no tests, to passing almost all tests (only the XML literal issue is a problem in the parser, the others are more general issues) is a significant improvement. > I previously testing roundtripping stuff like localstore.rdf etc, and that > didn't cover all codepaths, either. And stuff broke :-(. I have tried to test quite a bit more than that this time. However, in case I missed something, telling me what broke would be useful. I have done the following tests: 1) W3C testcases (see above). 2) Test all the functionality of the download manager. Then exit the browser, make sure everything is still there, and do a few more downloads. I also tested the 'Clean Up' feature. With my latest patch, this works as expected. 3) Install an extension, check status looks right, close EM, check status again, exit browser, restart, and check EM status is correct. Also test the localstore by maximising the window, and checking that after exit and reloading the EM, it is still maximised. 4) Add a bookmark, and make sure it is preserved across load. 5) Add a LiveBookmark, and as for bookmark. 6) Most of the Smoketest steps in Litmus (I skipped the ones forInstaller and Talkback, since I don't have my environment set up to make those properly), for Firefox and Thunderbird. > > Regarding risk, no, it's not ok to break the trunk. And if you can chunk work > into smaller pieces to reduce risk, it's usually a good thing to do so. Its not okay to knowingly break the trunk, or to commit stuff which has not been properly tested. However, unlike the 1.8.* branches, fear of things (not anticipated by the pre-commit tests) breaking is not a good reason to hold off a commit. Chunking doesn't reduce risk, because if there is something which breaks code in an unanticipated way, it will still break the trunk at some point. It makes it easier to figure out what broke, because the change is likely split across several regression windows. In this case, breaking up the patch will actually increase risk, because the internal abstractions and state model in the old code were not good for implementing a W3C spec compliant parser (hence why I rewrote them), and obviously, you cannot keep most of the old code but also get the improvements to the abstraction (which means that any partial solution would have to be implemented as a risky and complex hack on the existing model). > > Generally, I don't see a consensus inside the development community to support > RDF, there may be somewhat of a consensus to not do so, with which I'd > disagree. > I tried to get a constructive discussion on this going, and failed. My personal > attitude is to reduce the invested effort and be prepared to see RDF die > whenever someone dares to make mailnews not use it. If the code gets moved out of the core, it could go into an extension. I am using Mozilla's RDF library for PCEnv (http://www.cellml.org/tools/pcenv), because CellML uses RDF for its metadata. The more users of Mozilla's RDF API there are, the more likely it is to stay. The fact that the RDF parser currently falls so far short of what the RDF specification is probably a major reason why not many people use it now.
For reference, RDF/XML roundtripping perf is easier to be tested with this patch.
Blocks: 240877
Blocks: 306320
Attached patch Patch merged to current trunk (obsolete) (deleted) — Splinter Review
Attachment #237265 - Attachment is obsolete: true
Attachment #277028 - Flags: review?
Attachment #277028 - Flags: review? → review?(axel)
Brendan, this is the bug we just chatted about.
Attachment #277028 - Attachment is obsolete: true
Attachment #280558 - Flags: review?
Attachment #277028 - Flags: review?(axel)
Attachment #280558 - Flags: review? → review?(axel)
This is a mass change. Every comment has "assigned-to-new" in it. I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Status: ASSIGNED → NEW
Comment on attachment 280558 [details] [diff] [review] As before, but don't use NS_REINTERPRET_CAST Making this an r-. The story in this bug didn't really evolve into a world where we need a RDF implementation based on specs or needs that are not in our existing code base. This patch is just too large to take, and perhaps too "right", i.e., regressing our badnesses. Not a satisfying answer for those of use liking RDF, but that battle is long fought and lost.
Attachment #280558 - Flags: review?(axel) → review-
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: