Closed Bug 750364 Opened 13 years ago Closed 9 years ago

Code coverage of xpcshell tests

Categories

(Testing :: XPCShell Harness, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1204554

People

(Reporter: gps, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

Attached patch Initial hack (obsolete) (deleted) β€” β€” Splinter Review
This is a tracking bug for JavaScript code coverage of xpcshell tests.

Attached is an initial proof-of-concept accomplishing this via the JS Debugger interface. Some unrelated changes from bug 748490 slipped in there. Ignore anything related to testing modules.

According to jorendorff, using the JS Debugger API is not the best way to measure code coverage. Instead, there is a JSAPI that records counts of opcodes. From opcodes, you can get source locations. From that, you can get code coverage. Unfortunately, that API is not available to JavaScript or xpcshell. However, I got a verbal commitment from jorendorff that it could be made available.

That being said, I'm not convinced we should wait for it. We will eventually want to hook up the new JS debugger with the xpcshell test harness. This will require some core changes to how xpcshell tests are executed (I think). A lot of my patch is making these changes. So, even if we swap out to a more efficient code coverage method later, the work to support the JS Debugger will not be lost.

So, about the patch. It is hacky. It fundamentally changes how xpcshell tests are executed. In the new world, we create a new sandbox for test execution and a new sandbox for the debugger. The reasons are complicated. The JS Debugger must exist in a separate compartment from the code it is debugging. Furthermore, the JS Debugger cannot start debugging code that is active on the stack. This means that you at least need to create a sandbox for the debugger. And, I was unable to figure out a way to start debugging from the same file that was being debugged because that file will always be on the stack.

My solution was to create a new entry point for xpcshell tests, main.js. This is the new top-level/main JS compartment. head.js is then loaded in a sandbox and executed.

There is a huge problem with executing xpcshell tests in a sandbox: the new sandbox doesn't have access to all of the xpcshell global functions. I hacked around this by manually defining globals that were used by the xpcshell tests I was testing with, mainly load(). I'm using the subscript loader for load(), which is probably wrong. I can't just sandbox.importFunction(load) because the function is associated with the parent compartment and when called, code gets loaded into the parent compartment! What I really need is a function that populates a new sandbox with a fresh xpcshell environment. Does such an API exist? Until then, manual definition of every xpcshell-provided function (which is far from complete) is needed (missing functions mean test failures).

Also, when evaluating code in the sandbox, "this" is undefined from within functions. wut? That's why there is a bunch of "let that = this" foo. Why it works that way, I don't know.

The patch is missing the code to actually do something with the measurements. I'm planning to write that in Python. I would prefer to use Python's coverage package (http://pypi.python.org/pypi/coverage/) so I don't have to reinvent the wheel for producing Cobertura XML files, HTML reports, etc. I was planning on parsing the JS-produced output files in Python and piping these into coverage.data.CoverageData. From there the Python package can perform all the transforms it knows how. I /think/ I can munge the JS coverage data into this type: the data format seems generic enough. If not, I guess I'll have to reinvent the wheel. If I use this Python package, we should probably import it into the tree. It is under a BSD license, so I don't anticipate an issue here.

Of course, I plan to refactor the code for generating, parsing, and reporting on JS code coverage to be test runner agnostic (i.e. as a library/module). So, once this is implemented for xpcshell, it should be relatively painless to consume the APIs in mochitests, etc.

Anyway, I could use some feedback on JSAPI/XPCOM aspects of the patch. This is voodoo beyond my knowledge. I'm not even sure how I got this patch to even work!
Attached patch Initial hack, cleaned (obsolete) (deleted) β€” β€” Splinter Review
Removed the chunks related to another patch in my queue. This should make it easier to do a drive-by.

The big open question is whether it is easier to:

1) Grant a sandbox a fully functional xpcshell "environment" that isn't tied to the parent compartment.
2) Trick the main compartment into being debugged by the Debugger (by somehow not being on the stack when it is added to the Debugger).

Or, maybe the Debugger folks have some other tricks up their sleeves.
Attachment #619618 - Attachment is obsolete: true
Python's coverage package tries to parse the source files as Python. This obviously fails with JS, so I can't use that package in its current form. I have thus implemented Cobertura XML generation in Python. And, it seems to work: I'm able to get other tools to convert the XML into an HTML report!

Will hopefully upload a polished patch by EOD.
Depends on: 740547, 740546
Attached patch Initial hack with XML generation (deleted) β€” β€” Splinter Review
Cobertura XML generation added. runxpcshelltests.py now aggregates code coverage across multiple tests. Paths are hardcoded, which is hacky. I'm missing modules loaded via Cu.import(). So, I only get code coverage on the test files themselves. Will need bug 740546 before I can proceed with actual code coverage since I need to catch module loads as they happen so I can register loaded modules on the debugger.
Attachment #619812 - Attachment is obsolete: true
Tracked down :njn and :luke in MV today and they helped me unlock the next level \o/. By launching xpcshell with -d, that puts the global compartment in debug mode. That allows me to attach the global compartment to a Debugger instance without running into the "can't add value that is on the stack" issue. This means the whole "how do you make a new sandbox behave like an xpcshell environment" problem goes away! They also informed me that "this" is undefined on global functions when in strict mode. So, removing strict mode made "this" magically appear. I'd like to use strict mode, so I'll probably refactor debugger.js so things are available as a type rather than global functions.

So, all I need is bug 740547, possibly bug 740546, some polish, and this will be ready to land!
(In reply to Gregory Szorc [:gps] from comment #4)
> Tracked down :njn and :luke in MV today and they helped me unlock the next
> level \o/.

I don't think I'm the njn you're thinking of... I certainly wasn't in MV today!
This looks like a really super-invasive patch. I guess I'd have to see what your new approach looks like, but I'm worried about making huge fundamental changes to the harness solely to get code coverage results.
(In reply to Ted Mielczarek [:ted] from comment #6)
> This looks like a really super-invasive patch. I guess I'd have to see what
> your new approach looks like, but I'm worried about making huge fundamental
> changes to the harness solely to get code coverage results.

The new version is much cleaner since you don't need to hack around the issue of attaching while on the stack.

Also, I would argue that having code coverage information is valuable to ensure adequate testing. I personally find it appalling that we don't currently have a means to inspect this. I'm not saying code coverage is perfect, but we should at least have that tool in our bug-fighting arsenal.
It has been suggested that we should also support lcov's output format: http://ltp.sourceforge.net/coverage/lcov/geninfo.1.php

I'm not opposed to this. I think more choices for output formats is a good thing. I only sided with Cobertura's XML format because a lot of existing tools know how to speak it and it is easy to transform into other formats. My patch loosely couples formatters/reporters from the data representation, so adding additional output formats is trivial.
FWIW, getting code coverage of the C++ and JS code in the rest of mozilla-central whilst running tests is also helpful, although I'm not sure if this bug is aiming to do that too.
Another approach, which might be more performant at the cost of more memory up front, would be to register an onNewScript handler that used Debugger.Script.prototype.getAllOffsets to set a breakpoint on every line in that script that has code associated with it, and then remove the breakpoints once they're hit. That way, code will run at full speed after it's been executed once, instead of single-stepping all the way.
This is an alternative approach I've been working on that uses the pccounts work to track code coverage instead of relying on debugger hacks. This just dumps out the data; I have several postprocessing scripts to do other things like "gather info for scripts we never see."

I'm not sure which approach is better. This one required some modification of JS code to work properly (without it, you'll get an assertion failure somewhere), but I've been told it's part of the set of things that will get fixed when cpg actually lands and stays.

Benefits:
+Only requires small changes to xpconnect (read env variable at startup and shutdown, and, if set, dump data out big time)
+Doesn't hit any of the current limitations of Debugger.jsm (e.g., not being able to debug the current global object)

Downsides:
-Requires a lot of post-processing scripts
-Doesn't capture every JSRuntime that could get run (although I think xpconnect accounts for most of them anyways)
-I probably need some sort of proper file-locking service to handle multiprocess races.
-Produces a lot of output, since the pccount summary helpfully dumps a decompilation of the entire file.
Oh, I forgot the other big downside (kind of part of #1):
- the pccount doesn't return non-executed scripts (including functions). This means we need to have a subsequent pass to find all things that are *not* run.
(In reply to Joshua Cranmer [:jcranmer] from comment #11)
> Downsides:
> -I probably need some sort of proper file-locking service to handle
> multiprocess races.

Stick a pid in the filename?

> -Produces a lot of output, since the pccount summary helpfully dumps a
> decompilation of the entire file.

It would be totally reasonable to implement a no-decompilation mode.

Any idea what the slowdown is when running with this enabled?

Why do you build up a giant string and then write it out? Why not write things out directly as you generate them?
Foremost, I don't care which approach "wins:" I just want a way to easily get code coverage for our tests.

Defending my approach, there were concerns it is a bit invasive. I agree. This was necessary to get the Debugger API working with the xpcshell driver. I believe if planned Debugger API and other JS patches land, these changes will be *much less* invasive. That being said, it is highly likely someone will want Debugger API support in the xpcshell test runner as a basic option, so we're likely to implement this eventually. You'll notice from my implementation that the patch primary integrates the Debugger API with the test driver and then adds code coverage support on top of it. i.e. we get the Debugger API for free!

Joshua pointed out on IRC that it is difficult to get branch coverage from the Debugger API. This is an obvious drawback.

I believe it was mentioned on IRC that a drawback of the pccount approach is it needs multiple passes: 1 to obtain the data then a second pass to extract metadata from the parsed source code. I'm not sure how Joshua is doing this since the parsing scripts aren't on this bug. But, if it involves the Debugger API, we could certainly do a hybrid approach where the C++ is dumping pccounts and the xpcshell test driver dumps the script offsets from the Debugger API - all from one execution. Then, we let the (Python) driver of the xpcshell process sort out the output. Of course, any such system should be designed as a library such that any test harness could be hooked up to whatever method we choose.
(In reply to Gregory Szorc [:gps] from comment #14)
> I'm not sure how Joshua is doing this
> since the parsing scripts aren't on this bug.

They are here: <https://github.com/jcranmer/mozilla-coverage/blob/master/pcc-js-coverage.js>.

I can't use the Debugger.jsm API because what I basically need is access to bytecode data; the extremely kludgy hack I have in that file appears to be necessary to get this kind of data in any form.

(In reply to Steve Fink [:sfink] from comment #13)
> Any idea what the slowdown is when running with this enabled?

Running xpcshell tests in one directory cause a ~28% slowdown in user time; on a single test in another directory which is pretty much running only JS code (the code being tested is a .jsm module that does parsing in pure JS) had 10-20% slowdown. The time it takes to dump all the data out at the end is probably a large variable independent factor, but the slowdown in terms of most JS code is probably on the order of 10-30% slowdown.

[Note: these tests were run with n=1 iterations. Don't read too much into exact figures]

> Why do you build up a giant string and then write it out? Why not write
> things out directly as you generate them?

I'm.... not sure. It may have been for something I originally planned to do but gave up on instead.
(In reply to Nicholas Nethercote [:njn] from comment #5)
> (In reply to Gregory Szorc [:gps] from comment #4)
> > Tracked down :njn and :luke in MV today and they helped me unlock the next
> > level \o/.
> 
> I don't think I'm the njn you're thinking of... I certainly wasn't in MV
> today!

Apparently I do a pretty good njn imitation, I guess. (gps, you were talking to Luke and me.)

(In reply to Gregory Szorc [:gps] from comment #14)
> This was necessary to get the Debugger API working with the xpcshell driver.

I'm a little confused at this point how much is still necessary after updating to take advantage of the -d flag. Are you still providing a new capability to utilize Debugger that we wouldn't have otherwise? If so, then that seems totally worthwhile on its own.

> I believe if planned Debugger API and other JS patches land, these changes
> will be *much less* invasive. That being said, it is highly likely someone
> will want Debugger API support in the xpcshell test runner as a basic
> option, so we're likely to implement this eventually.

I totally agree, assuming the integration is still needed (or still will be once chrome debugging is implemented, as I expect it will be before too long.)

> Joshua pointed out on IRC that it is difficult to get branch coverage from
> the Debugger API. This is an obvious drawback.

AIUI, neither truly gives you branch coverage. For that, you need to analyze opcodes to determine all possible jump destinations. David Flanagan did exactly this in CoverMonkey.

> I believe it was mentioned on IRC that a drawback of the pccount approach is
> it needs multiple passes: 1 to obtain the data then a second pass to extract
> metadata from the parsed source code.

By "pccount approach", I guess you're referring to the current control flow setup with the available JSAPI, where you have to turn on the counts, but then collect them at shutdown? I don't think that's particularly fundamental to pccounts, nor what the Debugger-based alternative would look like.

> But, if it involves the
> Debugger API, we could certainly do a hybrid approach where the C++ is
> dumping pccounts and the xpcshell test driver dumps the script offsets from
> the Debugger API - all from one execution. Then, we let the (Python) driver
> of the xpcshell process sort out the output. Of course, any such system
> should be designed as a library such that any test harness could be hooked
> up to whatever method we choose.

So, my naive and uninformed outsider's opinion of how this should all work:

1. I think pccounts + opcode target information is the right underyling data to use for coverage.

The only alternative I know of is what jimb suggested above, which is to break on every line. And for branch coverage, you'd have to break on every opcode. There are various hybrid approaches, eg use pccounts the first time or few times through, then turn them off for that script and switch to breakpoints on all unhit header opcodes after that. But if it's only ~30% overhead to start with, why bother?

2. As much as possible should go through the Debugger API, and I suspect everything can. It might be necessary to implement something like addDebuggeeButOnlyForPCCountsPlease that doesn't provide the full functionality of Debugger, but I'm not sure if it's even necessary. (For completely accurate data, the pccounts stuff has the same activation restrictions as the full Debugger interface, because JIT code needs to be recompiled or you'll miss some counts, though it would be easier to relax those restrictions for pccounts than it would for full Debuggerability.)

3. For extra credit, Cedric Vivier's code coverage UI should be implementable with whatever we end up with, though it's somewhat different so I won't go into it here. (Think of it as a visual display of "what code what recently executed". See https://github.com/neonux/CodeInspector and the demo at http://neonux.com/tmp/live-coverage-allhands-demo.ogv )

I don't think the question is "pccounts vs Debugger". "Explicitly added globals vs automatically getting everything" is more meaningful, but that just means Debugger should grow whatever ability it needs to automatically handle what you want.

In terms of path, it seems like we want to (a) "land" jcranmer's pccount stuff (including the fixes within the JS engine), (b) independently land whatever of gps's changes are needed to enable the use of Debugger in tests, (c) merge jcranmer's scripts with djf's CoverMonkey scripts to get proper branch coverage of the full application, and (d) augment the Debugger API in whatever ways are needed in order to (e) rebase everything on top of it. I think the other part of gps's code may already be starting on (e), or at least the portion that's currently possible.

Feel free to disagree!

Would people be interested in having a meeting to hash this out? I doubt this use case will itself be high priority for Debugger atm, but I'd guess that it has a nonempty intersection with the current high priority stuff, and jimb and jorendorff like to know in advance where the icebergs lie so they can steer towards them.

Er, wait, that didn't come out quite right...

(Jason actually put together a tentative patch to expose pccount info via Debugger a while back, btw, though it is probably well and truly bitrotted by now.)
(In reply to Steve Fink [:sfink] from comment #16)
> > I believe it was mentioned on IRC that a drawback of the pccount approach is
> > it needs multiple passes: 1 to obtain the data then a second pass to extract
> > metadata from the parsed source code.
> 
> By "pccount approach", I guess you're referring to the current control flow
> setup with the available JSAPI, where you have to turn on the counts, but
> then collect them at shutdown? I don't think that's particularly fundamental
> to pccounts, nor what the Debugger-based alternative would look like.

The specific issue at hand is that the pccounts are only triggered for scripts whose run counts are not 0, which means I need to have another pass to find all possible scripts which weren't executed. To some degree, this may be necessary anyways (we'd probably desire some pass that at least figures out which jsms are never called in the first place, etc.).

> 2. As much as possible should go through the Debugger API, and I suspect
> everything can.

The debugger API is probably more useful in the long time, but it would need some exposure of bytecode-level introspection to be useful.

> I don't think the question is "pccounts vs Debugger". "Explicitly added
> globals vs automatically getting everything" is more meaningful, but that
> just means Debugger should grow whatever ability it needs to automatically
> handle what you want.

What I really want is a hook that attaches these things before any other JS execution and keeps on top of all new scripts, which would probably be necessary for a decent chrome debugger anyways. I'm using an environment variable right now because JS code can load command-line arguments, so a bog-standard nsICommandLineHandler is too late. Unfortunately, environment variables are less useful than command line arguments...
(In reply to Joshua Cranmer [:jcranmer] from comment #17)
> What I really want is a hook that attaches these things before any other JS
> execution and keeps on top of all new scripts, which would probably be
> necessary for a decent chrome debugger anyways.

See bug 740547.

As far as the direction of this bug, it looks like pccounts is "winning." I don't really care too much that I "lost" as my work on this bug was just me fooling around with the Debugger API when I learned about it during the Fx team work week. It just so happens I was able to get close enough to something actually usable.

I'll probably take a back seat on this bug and let things play out.

Joshua: if you need help with anything, let me know.
No longer blocks: 750347
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: