Open Bug 641214 Opened 14 years ago Updated 1 year ago

Use test262 code coverage to identify spec-level gaps in test262 tests

Categories

(Core :: JavaScript Engine, defect)

defect

Tracking

()

People

(Reporter: bruant.d, Unassigned)

References

(Blocks 1 open bug)

Details

User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.2.14) Gecko/20110221 Ubuntu/10.10 (maverick) Firefox/3.6.14 Build Identifier: Test262 (http://test262.ecmascript.org/) is an ECMAScript effort aiming at creating an ECMAScript (5.1 for the moment) test suite in order to help ensuring ECMAScript interoperability. One of their concern (https://bugs.ecmascript.org/show_bug.cgi?id=56) is to have an "academic-like review of existing test coverage versus ES5.1". One of the ideas I've had is to not test spec coverage, but rather implementation coverage (please read bug comments for more on this). How hard would it be to test test262 coverage of SpiderMonkey? I am already aware that testing this test suite coverage will reveal non-covered parts (because the point is to test "spec-level" coverage not "implementation-level" coverage). But the point would be to list non-covered parts and spot places that could have a "spec-level" test that could be added to the test suite. From the SpiderMonkey point of view, one of the advantage of providing such feedback is that the spec-level (interoperability) coverage effort could be "delegated" to test262 and fixing bug 496923 would bring this advantage. Reproducible: Always
OS: Linux → All
Hardware: x86 → All
Blocks: 636940
(In reply to comment #0) > How hard would it be to test test262 coverage of SpiderMonkey? Are you talking about something like running test262 through SpiderMonkey with a code coverage tool like gcov? I don't have much experience with tools like that but people have found interesting gaps in test coverage that way before. One question is, what defines coverage? It looks like gcov does basic block and branch coverage. That's something, but the state space of SpiderMonkey seems to be huge: it would be great to measure path coverage or data coverage (e.g., do nullable variables see nulls and non-nulls), but I don't know if that's feasible.
(In reply to comment #1) > (In reply to comment #0) > > How hard would it be to test test262 coverage of SpiderMonkey? > > Are you talking about something like running test262 through SpiderMonkey with > a code coverage tool like gcov? I don't have much experience with tools like > that but people have found interesting gaps in test coverage that way before. Yes, running test262 through SpiderMonkey with a code coverage tool like gcov. > One question is, what defines coverage? It looks like gcov does basic block and > branch coverage. And line-level coverage. About the definition, in https://bugs.ecmascript.org/show_bug.cgi?id=56#c0, apparently the definition of spec coverage is "for each step of each spec algorithm, there exist a test that covers/'executes' this step/'meta-instruction'" I agree that it's a first important milestone to reach. > That's something, but the state space of SpiderMonkey seems to > be huge: it would be great to measure path coverage or data coverage (e.g., do > nullable variables see nulls and non-nulls), but I don't know if that's > feasible. The state space of SpiderMonkey and the spec state space are both infinite, aren't they? According to the definition I have provided, it is not required to cover this space; a partial line-coverage (path-coverage) would be fine. One "interesting" restriction here is that there is no need to test coverage of the entire JS engine. For instance, nano-jit engine coverage isn't necessary for this test suite as it only tests spec conformity and not implementation details. I am not familiar with SpiderMonkey internals, but there are certainly other engine modules/functions that won't need to be covered. How hard would it be to answer "This line in the engine isn't covered. Is it spec-related or implementation-specific?" ? How hard would it be to answer "This line in the engine isn't covered and it's spec-related. What test case would cover it?" ?
(In reply to comment #2) > > That's something, but the state space of SpiderMonkey seems to > > be huge: it would be great to measure path coverage or data coverage (e.g., do > > nullable variables see nulls and non-nulls), but I don't know if that's > > feasible. > > The state space of SpiderMonkey and the spec state space are both infinite, > aren't they? According to the definition I have provided, it is not required to > cover this space; a partial line-coverage (path-coverage) would be fine. OK. What I was getting at is that many of the nastier bugs we find are related to interactions between different parts of code and would not be found with direct coverage. But that's OK--testing basic coverage still has benefits. > One "interesting" restriction here is that there is no need to test coverage of > the entire JS engine. For instance, nano-jit engine coverage isn't necessary > for this test suite as it only tests spec conformity and not implementation > details. I am not familiar with SpiderMonkey internals, but there are certainly > other engine modules/functions that won't need to be covered. > > How hard would it be to answer "This line in the engine isn't covered. Is it > spec-related or implementation-specific?" ? I'm not sure exactly how one can clearly distinguish those. Are you saying that nanojit doesn't directly implement JS, but rather provides non-JS-specific compilation services, and so you don't want to include it in this check? I guess that seems sensible. In that case it is probably not too hard to tell them apart. > How hard would it be to answer "This line in the engine isn't covered and it's > spec-related. What test case would cover it?" ? That is sometimes hard. If the code in question is "purely spec-related", i.e., the branches guarding the uncovered code all relate to spec/JS-language-visible properties of the program, then it probably isn't too hard. But if they relate to different options the implementation has for implementing a certain bit of spec, then it can require a lot of insight into the implementation. I wonder if it would help to look the tests that hit the covered lines "closest" to the uncovered ones, and try to modify them, possibly even randomly.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #3) > (In reply to comment #2) > > > That's something, but the state space of SpiderMonkey seems to > > > be huge: it would be great to measure path coverage or data coverage (e.g., do > > > nullable variables see nulls and non-nulls), but I don't know if that's > > > feasible. > > > > The state space of SpiderMonkey and the spec state space are both infinite, > > aren't they? According to the definition I have provided, it is not required to > > cover this space; a partial line-coverage (path-coverage) would be fine. > > OK. What I was getting at is that many of the nastier bugs we find are related > to interactions between different parts of code and would not be found with > direct coverage. But that's OK--testing basic coverage still has benefits. Yes. There are other bugs to track testing of other aspects. My goal with this bug is to measure the official test suite coverage (as described in https://bugs.ecmascript.org/show_bug.cgi?id=56) > > One "interesting" restriction here is that there is no need to test coverage of > > the entire JS engine. For instance, nano-jit engine coverage isn't necessary > > for this test suite as it only tests spec conformity and not implementation > > details. I am not familiar with SpiderMonkey internals, but there are certainly > > other engine modules/functions that won't need to be covered. > > > > How hard would it be to answer "This line in the engine isn't covered. Is it > > spec-related or implementation-specific?" ? > > I'm not sure exactly how one can clearly distinguish those. Are you saying that > nanojit doesn't directly implement JS, but rather provides non-JS-specific > compilation services, and so you don't want to include it in this check? Exactly. > I guess that seems sensible. In that case it is probably not too hard to tell > them apart. And other parts if there are other modules dedicated to performance or not spec-related aspects of the implementation. > > How hard would it be to answer "This line in the engine isn't covered and it's > > spec-related. What test case would cover it?" ? > > That is sometimes hard. If the code in question is "purely spec-related", i.e., > the branches guarding the uncovered code all relate to spec/JS-language-visible > properties of the program, then it probably isn't too hard. But if they relate > to different options the implementation has for implementing a certain bit of > spec, then it can require a lot of insight into the implementation. Your response sounds encouraging to me. Afterward, I have also thought that if, for instance, a branch after a "if(isStrictMode)" isn't covered by the test suite, it may lead to two conclusions: the test suite needs a test to cover it or the test suite is complete and the branch is actually dead code. > I wonder if it would help to look the tests that hit the covered lines > "closest" to the uncovered ones, and try to modify them, possibly even > randomly. Interesting idea. It reminds me the idea of "mutation testing" but with the opposite approach: in mutation testing, you change your code and make sure a test in your test suite detects the change (but sometimes a change in the code doesn't change its semantics, so no test can detect it. And it's a complicated topic) while in your idea, you change a test to see if it change code coverage. However, with the current resources that ECMA puts on the test suite effort, I think it will require too much effort and they won't do it.
Jeff, is this bug a duplicate of bug 496923?
Flags: needinfo?(jwalden+bmo)
Reading carefully, I don't think it is. Rather, it's asking for us to 1) run test262 with SpiderMonkey/Firefox; 2) get SpiderMonkey code coverage data results for that run, e.g. line 1234 of vm/Object.cpp is covered, but line 8765 of jit/IonCaches.cpp is not; and 3) use the not-covered line information to suggest or write new test262 tests. This could be an interesting project, although as comment 3 notes such low-level information isn't always easy to transform into an effective test.
Flags: needinfo?(jwalden+bmo)
Assignee: general → nobody
Severity: normal → S3
Summary: Official ECMAScript test suite coverage → Use test262 code coverage to identify spec-level gaps in test262 tests
You need to log in before you can comment on or make changes to this bug.