Closed Bug 369394 Opened 18 years ago Closed 18 years ago

Invalid whitespace chomping of XML literals

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: Waldo, Assigned: Waldo)

Details

print(<a> b<c/></a> == <a>b<c/></a>); // == XML.ignoreWhitespace, not false

This is probably (not tested yet) due to:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/js/src/jsxml.c&rev=3.140&mark=1572#1567

A similar effect might happen at:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/js/src/jsxml.c&rev=3.140&mark=3357#3350

except that the only way it can happen is for DeepCopy to be called with the appropriate flags, and the only way that happens is in js_CloneXMLObject, which is called only within jsinterp.c in a case for JSOP_XMLOBJECT, but the only case where that opcode is ever set is within an #if 0.

The testcase e4x/Types/9.1.1.1.js is apparently buggy (more wag-the-dog, most likely), as discovered in bug 368435 comment 3.  The exact extent isn't clear yet, as I haven't had time to check exactly which tests are correct and which tests are incorrect.
There are several more beyond 9.1.1.1 if we accept this is really a bug.  I agree it's not what the spec *says* ... but several other tests make the assumption that we trim whitespace around text nodes when XML.ignoreWhitespace = true.

e4x/Types/9.1.1.3.js
e4x/Types/9.2.1.1.js
e4x/XML/13.4.4.12.js

This may or may not be an exhaustive list; these are the tests that change from success to failure in Rhino when this bug is "fixed."  (There may be other tests that fail for other reasons so are not flipped from success to failure.)

(In reply to comment #1)
> There are several more beyond 9.1.1.1 if we accept this is really a bug.  I
> agree it's not what the spec *says* ... but several other tests make the
> assumption that we trim whitespace around text nodes when XML.ignoreWhitespace
> = true.

E4X has had (and still has, undiscovered) way more than its fair share of errata.  I can believe this is one, although because it's not extremely clear that it is one, I'm not sure I'd feel comfortable leaving things as they are without confirming it with the people currently responsible for E4X.  I'll try and follow up sometime this week.

By the way, I counted 8 different testcases that failed when I fixed it in a custom SpiderMonkey build -- this change *will* impact E4X users if it's made, unlike some, and it's pretty likely to be a far-reaching impact.  The tests were:

e4x/Regress/regress-263936.js
e4x/Types/9.1.1.1.js
e4x/Types/9.1.1.3.js
e4x/Types/9.2.1.1.js
e4x/XML/13.4.3.js
e4x/XML/13.4.4.12.js
e4x/XML/13.4.4.38.js
e4x/XML/13.4.4.39.js
Status: NEW → ASSIGNED
I think given comment 2 we are in complete agreement for the moment.  Let's see what you hear back.
I think we are stuck with the spec based on the BEA code, and with the test-driven AS3 and JS1.6+ implementations. Jeff may have an opinion.

/be
(In reply to comment #4)
> I think we are stuck with the spec based on the BEA code, and with the
> test-driven AS3 and JS1.6+ implementations. Jeff may have an opinion.

I'm having trouble parsing this -- as someone who doesn't know what "the BEA code" is and doesn't know anything about AS3, I can't tell what phrases above modify what other phrases and what it all means.  In other words, I can't figure out exactly what set of things we're "stuck with." :)  Can someone help me out?

I'm looking at this from two perspectives -- primarily in terms of implementing E4X for Rhino, secondarily in maximizing the value of the test suite.  And to that end, I should mention that mozilla/js/tests/e4x/Expressions/11.1.4-04.js disagrees with the 8 tests above and assumes the other interpretation, but to my knowledge it's the only one.
Just parse "stuck with the spec".

The ECMA-357 spec was based on buggy Java code written (IIRC) by BEA. The spec pseudo-code grew its own bugs, of course. The testsuite then (mostly) matched the Java code. This is not how to design languages, but hey -- JS was in the same mold and I am not throwing stones, just saying we probably shouldn't change the "just so" chomping of whitespace behavior. IMHO.

/be
(In reply to comment #6)
> Just parse "stuck with the spec".

> we probably shouldn't
> change the "just so" chomping of whitespace behavior. IMHO.

This is where I'm confused.  We're "stuck with the spec" I understand to mean that we *should* change the behavior (because our behavior does not match it), but the second part I understand to mean that we shouldn't.  Or maybe you mean we're stuck with both -- stuck with the spec and stuck with our non-conformance to same?

My vote (FWIW) is not to change the behavior, as the choices made by the spec lead to odd things.  For example, according to the spec (with ignoreWhitespace, ignoreComments), this throws an exception:

var x = <x>
  <!-- Explanation of foo -->foo
</x>

var y = <x>
  foo
</x>

if (x.toXMLString() != y.toXMLString()) throw "Wow, that's weird; I thought we were ignoring comments but the insertion of a comment makes these different."
Sorry, I was trying to say that the code is the spec. Just like the case for perl (at least up to perl5). The BEA code contributed to Rhino (based on two, now three? Java DOM-like libraries over time) is the spec, even if we don't know it.

We should try to make ECMA-357 match the code-as-spec, if Jeff agrees.

/be
Understood.  Just as a note-to-self (or whomever), we'll want to re-open bug 368435 if we declare this one invalid.
(In reply to comment #8)
> We should try to make ECMA-357 match the code-as-spec, if Jeff agrees.

I agree! (am I the Jeff you are asking?)
(In reply to comment #10)
> (In reply to comment #8)
> > We should try to make ECMA-357 match the code-as-spec, if Jeff agrees.
> 
> I agree! (am I the Jeff you are asking?)

The one and only!

We need to fix that one test too (see comment 5); perhaps it's also in the Tamarin suite? Here's the SpiderMonkey testsuite lxr link:

mozilla/js/tests/e4x/Expressions/11.1.4-04.js

Is this a version of same?

http://lxr.mozilla.org/mozilla/source/js/tamarin/test/e4x/Expressions/e11_1_4.as

/be
Just circling back ... we should mark this as INVALID, right?
Yeah, I think so.  I added a comment to bug 246441 describing this erratum, since that's where we seem to have been keeping track of them.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.