Closed Bug 887054 (parxpc) Opened 11 years ago Closed 11 years ago

Refactor xpcshell test harness to support parallel test runs. disabled by default in automation

Categories

(Testing :: XPCShell Harness, enhancement)

x86_64
All
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla26

People

(Reporter: mihneadb, Assigned: mihneadb)

References

Details

Attachments

(4 files, 60 obsolete files)

(deleted), patch
mihneadb
: review+
Details | Diff | Splinter Review
(deleted), patch
mihneadb
: review+
Details | Diff | Splinter Review
(deleted), patch
mihneadb
: review+
Details | Diff | Splinter Review
(deleted), patch
ahal
: review+
Details | Diff | Splinter Review
No description provided.
Attached patch first try (obsolete) (deleted) — Splinter Review
This is the first shot at it. As we talked, this is doing a [pyhton] thread for every test. I'm working on getting all tests to pass and then I'll improve (if necessary, since the desired speedup seems to be here already) the threading model.
Assignee: nobody → mihneadb
Attachment #767499 - Flags: feedback?(ahalberstadt)
Depends on: 886980
How is this different from bug 660788?
As we talked, this takes out the running of the test part from the XPCShellTests class, which does only the test-discovery and aggregation part now, basically.
Depends on: 887064
Blocks: 845748
Comment on attachment 767499 [details] [diff] [review] first try Review of attachment 767499 [details] [diff] [review]: ----------------------------------------------------------------- I think the approach looks good! I'm not crazy about calling things "runners" though. Runner could refer to the harness as a whole (i.e runxpcshelltests.py) or mozrunner or ...? Maybe just dropping the Runner and calling the class XPCShellTest would be better.
Attachment #767499 - Flags: feedback?(ahalberstadt) → feedback+
Blocks: 660788
No longer depends on: 886980
Attached patch run at most NUM_CPUS * 1.5 threads/tests (obsolete) (deleted) — Splinter Review
Attachment #767499 - Attachment is obsolete: true
Attachment #770525 - Flags: feedback?(ahalberstadt)
Attachment #770525 - Attachment is obsolete: true
Attachment #770525 - Flags: feedback?(ahalberstadt)
Attachment #770567 - Flags: feedback?(ahalberstadt)
Attached patch mach command update to handle --sequential (obsolete) (deleted) — Splinter Review
Attachment #770568 - Flags: review?(ahalberstadt)
Comment on attachment 770567 [details] [diff] [review] handle run-sequentially flag in ini and --sequential as argv Review of attachment 770567 [details] [diff] [review]: ----------------------------------------------------------------- Nice, I think this is looking really good so far. f+ because the approach is good, but there are still some things that need fixing, though nothing major. ::: testing/xpcshell/runxpcshelltests.py @@ +11,2 @@ > from glob import glob > +from multiprocessing import cpu_count We'll need to do something about this.. This SO answer is pretty detailed, http://stackoverflow.com/a/1006301/794460, though I don't know if going to that extreme is worth it (maybe we could file a separate bug to get a subset of that method into mozsystemresroucemonitor since it already uses psutil to some degree). For now though I think we can do something like: try: from multiprocessing import cpu_count except ImportError: NUM_THREADS = 1 @@ +22,5 @@ > from automationutils import * > > HARNESS_TIMEOUT = 5 * 60 > > +NUM_THREADS = int(cpu_count() * 1.5) Is 1.5 the magic number that gives the best times? Or is it just a placeholder until we figure out what that number is? Either way we should document what the number is and why spawning more threads than NUM_THREADS speeds up the tests. @@ +52,5 @@ > def markGotSIGINT(signum, stackFrame): > global gotSIGINT > gotSIGINT = True > > +class XPCShellTest(Thread): It took me awhile to figure out that there were two separate classes here, one called XPCShellTest and one called XPCShellTests. I know I you to remove the Runner part (sorry for being so picky).. maybe XPCShellThread? @@ +76,5 @@ > + self.env = copy.deepcopy(self.harness.env) > + self.symbolsPath = self.harness.symbolsPath > + self.logfiles = self.harness.logfiles > + self.xpcshell = self.harness.xpcshell > + self.xpcsRunArgs = self.harness.xpcsRunArgs Passing the whole harness class seems awkward. In XPCShellTests.run_tests() you could do something like: kwargs = locals() # before defining any other local vars thread = XPCShellTest(stuff, **kwargs) Then here you could do kwargs.get('appPath'), etc.. You might need to build kwargs manually a bit still, but at least this way the two classes are less coupled. @@ +1072,5 @@ > + # create a queue of all tests that will run > + tests_queue = deque() > + # also a list for the tests that need to be run sequentially > + sequential_tests = [] > + for test_file in self.alltests: This is just an object right? test_obj might be better.
Attachment #770567 - Flags: feedback?(ahalberstadt) → feedback+
Comment on attachment 770568 [details] [diff] [review] mach command update to handle --sequential Review of attachment 770568 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me.. though fair warning, I'm not super familiar with mach targets. This looks simple enough that I'll r+ it, but might want to get someone else to review if it gets more complicated than this.
Attachment #770568 - Flags: review?(ahalberstadt) → review+
Feel free to tag me for final review on this, I am very interested. I'll try to make time to take an initial look at it soon. For mach commands any build peer is fine for a review, I'm not sure exactly what module they fall into honestly.
re: multiprocessing.cpu_count, see bug 813742 comment 39, this should be fine to use.
Attachment #770567 - Flags: feedback?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11) > re: multiprocessing.cpu_count, see bug 813742 comment 39, this should be > fine to use. We could also use psutil which is already in-tree. I think it uses multiprocess under the hood if available and if not does something else. The --sequential argument and whatever we use for reftests in bug 813742 should also be consistent.
Comment on attachment 770568 [details] [diff] [review] mach command update to handle --sequential Review of attachment 770568 [details] [diff] [review]: ----------------------------------------------------------------- This change to mach is sane.
Attachment #770568 - Flags: review+
Attached patch mach command update to handle --sequential (obsolete) (deleted) — Splinter Review
Corrected the help message for --sequential. Keeping previous r+
Attachment #770568 - Attachment is obsolete: true
Attachment #770946 - Flags: review+
Attached patch parallel xpcshelltest harness (obsolete) (deleted) — Splinter Review
Updated as requested by :ahal. Kept the cpu_count as discussed in IRC. cpus * 1.5 makes the most sense for my machine at least. Will keep it at that for now, and we can further tweak it when we start doing try runs.
Attachment #770567 - Attachment is obsolete: true
Attachment #770567 - Flags: feedback?(ted)
Attachment #771015 - Flags: feedback?(ted)
Attachment #771015 - Flags: feedback?(ahalberstadt)
No longer depends on: 887064
Blocks: 887404
Blocks: 887706
Blocks: 888350
Attached patch parallel xpcshelltest harness (obsolete) (deleted) — Splinter Review
I previously introduced a bug by performing the copy of the env dict before spawning the thread. Fixed that.
Attachment #771015 - Attachment is obsolete: true
Attachment #771015 - Flags: feedback?(ted)
Attachment #771015 - Flags: feedback?(ahalberstadt)
Attachment #771072 - Flags: feedback?(ted)
Attachment #771072 - Flags: feedback?(ahalberstadt)
Comment on attachment 771072 [details] [diff] [review] parallel xpcshelltest harness Review of attachment 771072 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! :) ::: testing/xpcshell/runxpcshelltests.py @@ +22,5 @@ > from automationutils import * > > HARNESS_TIMEOUT = 5 * 60 > > +NUM_THREADS = int(cpu_count() * 1.5) Would still like a comment explaining why NUM_THREADS=cpu_count*1.5 is faster than NUM_THREADS=cpu_count
Attachment #771072 - Flags: feedback?(ahalberstadt) → feedback+
No longer blocks: 887404
Blocks: 889183
Blocks: 889182
Prior to this patch if I have the following in a xpcshell.ini file [include:xpcshell_updater.ini] skip-if = os == 'win' The tests in xpcshell_updater.ini will not be executed for windows. After this patch is applied with a contents of xpcshell_updater.ini as follows: [test_0110_general.js] [test_0111_general.js] [test_0112_general.js] [test_0113_general.js] skip-if = os == "mac" || toolkit == "gonk" reason = bug 820380 [test_0114_general.js] skip-if = os == "mac" [test_0115_general.js] [test_0200_app_launch_apply_update.js] skip-if = toolkit == "gonk" reason = bug 820380 [test_0201_app_launch_apply_update.js] skip-if = toolkit == "gonk" reason = bug 820380 the following occurs TEST-INFO | skipping c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0110_general.js | skip-if: os == 'win' TEST-INFO | skipping c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0111_general.js | skip-if: os == 'win' TEST-INFO | skipping c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0112_general.js | skip-if: os == 'win' TEST-INFO | skipping c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0115_general.js | skip-if: os == 'win' TEST-INFO | c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0114_general.js | running test ... TEST-INFO | c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0113_general.js | running test ... TEST-INFO | c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0200_app_launch_apply_update.js | running test ... TEST-INFO | c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0201_app_launch_apply_update.js | running test ... The expected result is these tests should not have been executed.
(In reply to Robert Strong [:rstrong] (do not email) from comment #18) This appears to be bug 676876
Blocks: 887480
No longer blocks: 887480
No longer blocks: 887706, 888350, 889182, 889183
Depends on: 887064
No longer blocks: 845748
:chmanchester spotted an issue with selftest.py not passing with the patch applied. Main issue is with the testMissingHeadFile and testMissingTailFile tests, that are looking to catch exceptions that now are thrown in the child threads, so they never end up in the selftests.py thread. Not sure how to address this. I also encountered a failure with the xunit test in selftests.py.
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Improved so that: * no more busy loop, using a semaphore * handling exceptions now as well
Attachment #771072 - Attachment is obsolete: true
Attachment #771072 - Flags: feedback?(ted)
Attachment #776766 - Flags: feedback?(ted)
Comment on attachment 771072 [details] [diff] [review] parallel xpcshelltest harness Review of attachment 771072 [details] [diff] [review]: ----------------------------------------------------------------- I had a few comments sitting in draft here, so I might as well hit "Publish" since this patch is already out of date. ::: testing/xpcshell/runxpcshelltests.py @@ +1016,4 @@ > > # If we have an interactive debugger, disable ctrl-c. > + #if self.debuggerInfo and self.debuggerInfo["interactive"]: > + #signal.signal(signal.SIGINT, lambda signum, frame: None) I assume this is just a hack, make sure it gets removed before you get a final patch ready. @@ +1107,5 @@ > + # keep a set of NUM_THREADS running tests and start running the > + # tests in the queue at most NUM_THREADS at a time > + running_tests = set() > + keep_going = True > + while len(tests_queue) or len(running_tests): You don't need the len(), "while tests_queue or running_tests:" works fine.
Attachment #771072 - Attachment is obsolete: false
Comment on attachment 776766 [details] [diff] [review] Parallelize xpcshell Review of attachment 776766 [details] [diff] [review]: ----------------------------------------------------------------- Wasn't asked for a review, but mihneadb and I talked about this yesterday, so offering my two cents... ::: testing/xpcshell/runxpcshelltests.py @@ +1122,5 @@ > + test = tests_queue.popleft() > + running_tests.add(test) > + test.start() > + if len(tests_queue): > + continue # try to keep as many running threads as possible So this "if" block ends by repeating the loop if there are still entries in tests_queue. You may as well just change the block from an "if" to a "while", since it will terminate if there is nothing in tests_queue. @@ +1130,5 @@ > + > + # wait for at least one of the tests to finish > + self.sem.acquire() > + # add 1 back to the semaphore, will be subtracted in the loop > + self.sem.release() This is a pretty interesting use of a semaphore! It makes sense, but I think a comment explaining exactly what you are doing would be really beneficial down the road, namely, that the threads are releasing semaphores without acquiring them in order to raise the semaphore's count. @@ +1141,5 @@ > + done_tests.add(test) > + test.join() > + # did the test encounter any exception? > + if test.exception: > + raise test.exception I guess this aborts all the other tests? Would it make more sense to collect all exceptions, log them, and then fail after they are all done? @@ +1151,3 @@ > > if not keep_going: > break Again, is it okay to just suddenly abort all other running tests? Shouldn't this be used to bail out after all threads are done? @@ +1151,5 @@ > > if not keep_going: > break > > + time.sleep(0) # tricking the scheduler to yield before other threads Don't think you need this anymore.
Comment on attachment 776766 [details] [diff] [review] Parallelize xpcshell Obsoleting this so I can improve it per :mcote's comments.
Attachment #776766 - Attachment is obsolete: true
Attachment #776766 - Flags: feedback?(ted)
Depends on: 892021
Depends on: 892121
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Rebased patch on changes made in bug 892021 and in bug 892121
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Forgot a stray print_stdout
Attachment #777422 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
* raise the exception in the case of sequential tests as well * run the tests in selftest.py sequentially (because they rely on ordering)
Attachment #777428 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Specifically request a sequential run for the remote harness.
Attachment #777496 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Specifically pass sequential=True for b2g as well.
Attachment #777895 - Attachment is obsolete: true
Depends on: 895542
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Properly override the sequential option for mobile tests.
Attachment #777897 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
* log max number of threads used * take out signal handling (does not work in a non-main thread) * take out unnecessary len(queue) calls
Attachment #778136 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
* log traceback of caught exception
Attachment #778160 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Rebased patch on latest m-c
Attachment #778173 - Attachment is obsolete: true
I think the number of threads should be max(4, int(cpu_count * 2)) (or 1.5). The reason is tests tend to use very little CPU and the current thread count doesn't seem to work well with machines with few cores.
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
* changed num threads to :gps's suggestion * when an exception is caused, harness will wait for the currently running threads to end so we don't end up with orphans/zombies; * all caught exceptions(tracebacks) are logged and the first encountered exception is raised
Attachment #778540 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Changed "keep the running pool full" if to a while as suggested by :mcote.
Attachment #778566 - Attachment is obsolete: true
Right now this breaks the mobile (android + b2g) xpc harnesses. The Android and B2G harnesses extend the desktop XPCShellTests class and override some methods that they *know* will be called during runTests. Problem is - my patch takes away the "run a test" part into a separate class, so the mobile harnesses end up never overriding those methods. Not sure how to address this.
Flags: needinfo?(gbrown)
Personally I've found the remote inherits from desktop except not really model to be a huge pain in general. If it's not a simple matter to get the remote tests also using this new class, I'd be ok with duplicating a bit of code. Alternatively you could put just the shared bits into a mixin that your new desktop class and the remote classes inherit from.
(In reply to Andrew Halberstadt [:ahal] from comment #38) > Personally I've found the remote inherits from desktop except not really > model to be a huge pain in general. If it's not a simple matter to get the > remote tests also using this new class, I'd be ok with duplicating a bit of > code. > > Alternatively you could put just the shared bits into a mixin that your new > desktop class and the remote classes inherit from. I don't see how the mixin could help. What I *think* we have to do is make a RemoteXPCShellTestThread class that extends the new one I introduced. Then, we could make the runTests method use that. This is the way I'd do it since it would not create code duplication. I mean we can add a kwargs for the runTests method with the TestClass to be used. Does that make sense?
Blocks: 896087
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
* use psutil if available (for easy profiling) * define stdout and stderr out of the try block so we won't get the 'undefined stdout' error anymore * calling shutdownNode first and then raising any exceptions
Attachment #778569 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Marking test threads as daemons to make sure harness does not hang.
Attachment #778757 - Attachment is obsolete: true
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #39) > Then, we could make the runTests method use that. This is the way I'd do it > since it would not create code duplication. I mean we can add a kwargs for > the runTests method with the TestClass to be used. > > Does that make sense? Yep, makes sense.
(In reply to Andrew Halberstadt [:ahal] from comment #42) > (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #39) > > Does that make sense? > > Yep, makes sense. +1. Sounds like you are back on track.
Flags: needinfo?(gbrown)
Depends on: 896093
Attached patch adapt the mobile harnesses (obsolete) (deleted) — Splinter Review
[this applies over the parxpc patch] This adapts the mobile harnesses by replicating the harness-testthread pattern. I couldn't find a *nicer* way to get the relevant data from the harness to the mobile test threads other than the mobileArgs trick. This try run seems to turn in green (only b2g done so far): https://tbpl.mozilla.org/?tree=Try&rev=457c36eb497b
Attachment #780188 - Flags: feedback?(ahalberstadt)
Comment on attachment 780188 [details] [diff] [review] adapt the mobile harnesses Review of attachment 780188 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me! It's a little hard to tell what changed due to the large refactors but seems like it's just moving code around anyway. ::: testing/xpcshell/remotexpcshelltests.py @@ +17,5 @@ > > +def remoteJoin(path1, path2): > + joined = os.path.join(path1, path2) > + joined = joined.replace('\\', '/') > + return joined I know this was here before, but you can do: import posixpath posixpath.join(...)
Attachment #780188 - Flags: feedback?(ahalberstadt) → feedback+
Attached patch adapt mobile harness (obsolete) (deleted) — Splinter Review
Changed remoteJoin to use posixpath
Attachment #780188 - Attachment is obsolete: true
Depends on: 898658
Did you ever perform a try push with shuffle enabled by default? I have a theory I/O load is concentrated around related tests in specific directories and shuffling test order will help spread it out (at least in the common case). Also, since test order is no longer deterministic in a parallel-executing world, I'd like to toss out the idea of having shuffling enabled by default. We should still give people the option to run in order, of course. I'm interested in what others have to think.
I remember suggesting that and the general idea was that we want automation to be as predictable/reproducible as possible. I can/will schedule such a try push, and you can also check it out locally by passing the --shuffle parameter to mach xpcshell-test.
Alias: parxpc
Summary: Parallelize xpcshell → Parallelize xpcshell test harness
Blocks: 898816
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Rebase over changes from bug 890098
Attachment #778778 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
rebase mobile harness changes as well
Attachment #781376 - Attachment is obsolete: true
Attached patch adapt mobile harnesses (obsolete) (deleted) — Splinter Review
Took out leak logging from the remote harness
Attachment #783361 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Took out a comment about leak logging.
Attachment #783339 - Attachment is obsolete: true
Comment on attachment 783376 [details] [diff] [review] Parallelize xpcshell Don't mind the mobile harness changes, there's a separate patch that addresses those.
Attachment #783376 - Flags: feedback?(ted)
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
So.. I had some fun with the windows test slave. Turns out if we increase the waiting time to 20s (experimented with the value) we won't encounter those file removal hangs. From what I saw this happens only for ~30ish tests out of ~1800, so I'd say this workaround is worth a shot.
Attachment #783376 - Attachment is obsolete: true
Attachment #783376 - Flags: feedback?(ted)
Attached patch adapt mobile harness (obsolete) (deleted) — Splinter Review
Added remoteAPK to the mobileArgs. (Android was not passing on tbpl)
Attachment #783367 - Attachment is obsolete: true
Fix for self.mobileArgs
Attachment #784809 - Attachment is obsolete: true
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #54) > Created attachment 784774 [details] [diff] [review] > Parallelize xpcshell > > So.. I had some fun with the windows test slave. Turns out if we increase > the waiting time to 20s (experimented with the value) we won't encounter > those > file removal hangs. From what I saw this happens only for ~30ish tests out of > ~1800, so I'd say this workaround is worth a shot. Just wanted to confirm that I did multiple runs on the test slave and I encountered no more "file is locked" issues.
I don't think adding a 20s sleep is an acceptable solution. 20*300/4cores = 150 seconds of wasted time per job. Maybe we could spawn a thread that immediately goes to sleep and wakes up every second or so to see if the plugin process has terminated and then performs the cleanup. Or if we could figure out how to "sudo rm -rf" those files, that would also work.
(In reply to Andrew Halberstadt [:ahal] from comment #58) > I don't think adding a 20s sleep is an acceptable solution. 20*300/4cores = > 150 seconds of wasted time per job. Well, if we don't do that, we can hope that running the tests sequentially will fix the issue (which will make us probably more time), and if that does not fix it, then we cannot land the parallel harness at all, and that means even more wasted time. It's also just on windows and just when it's needed. If you prefer we can do a busy loop with a timeout. > > Maybe we could spawn a thread that immediately goes to sleep and wakes up > every second or so to see if the plugin process has terminated and then > performs the cleanup. Or if we could figure out how to "sudo rm -rf" those > files, that would also work. 2 things :) : 1. I looked this thing up for quite a while, there's seems to be no way of unlocking that file if it were indeed in use (there are some tools online that would inject some code and make the process crash) 2. It seems that the process is done, just that the Windows filesystem has a delay until figuring out that "hey, it's ok, we can unlock this file". So.. should I try a busy loop with a 20s timeout?
Attached patch Parallelize xpcshell test harness (obsolete) (deleted) — Splinter Review
Added locking for print_stdout
Attachment #784774 - Attachment is obsolete: true
Attached patch Parallelize xpcshell test harness (obsolete) (deleted) — Splinter Review
Acquiring the stdout lock in case of errors earlier to group all output together.
Attachment #785251 - Attachment is obsolete: true
Attached patch Parallelize xpcshell test harness (obsolete) (deleted) — Splinter Review
Was acquiring too early in one case.
Attachment #785257 - Attachment is obsolete: true
:mihneadb and chatted a bit about this - The lock doesn't actually isolate buffered failure output because other threads calling log for routine output don't acquire the lock. I'm assuming acquiring the lock for each and every call to log would introduce significant overhead. The issue is that this will result in failure output interleaved with routine output - this could be a real problem for debugging failures. We could reconstruct failure output at the end of a test run or at other synchronization points, but it seems whatever we do there will be a compromise between getting output as it's generated and getting output that isn't interleaved.
Attachment #784774 - Attachment is obsolete: false
(In reply to Chris Manchester [:chmanchester] from comment #63) > :mihneadb and chatted a bit about this - The lock doesn't actually isolate > buffered failure output because other threads calling log for routine output > don't acquire the lock. I'm assuming acquiring the lock for each and every > call to log would introduce significant overhead. I'm not convinced the overhead would be that significant. I believe it will be necessary to acquire the lock on stdout. Otherwise, you will run into issues like bug 886498. You may not sees these problems, but trust me, others will.
I'm glad to have been wrong about my assumption - I ran some xpcshell time tests locally for dom, network, and devtools directories and didn't observe any performance penalty for acquiring a lock for each call to the logger. In fact, for devtools, which produces a lot of failure output on my machine, sys time _decreased_ significantly with this patch applied.
Assignee: mihneadb → cmanchester
Assignee: cmanchester → mihneadb
Attachment #786001 - Attachment description: Synchronize blocks of output when running xpcshell tests in parallel; f?mihneadb → Synchronize blocks of output when running xpcshell tests in parallel;
Attachment #786001 - Flags: feedback?(mihneadb)
Comment on attachment 786001 [details] [diff] [review] Synchronize blocks of output when running xpcshell tests in parallel; Review of attachment 786001 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/xpcshell/runxpcshelltests.py @@ +576,5 @@ > def __init__(self, log=None): > """ Init logging and node status """ > if log: > resetGlobalLog(log) > + Let's add a comment here saying what and why we are doing.
Attachment #786001 - Flags: feedback?(mihneadb) → feedback+
Found an issue with the updated android harness. This should fix it. https://tbpl.mozilla.org/?tree=Try&rev=cec9735a998c
Attachment #784846 - Attachment is obsolete: true
Added comments and exhaustive enumeration of log attributes.
This patch fixes an error in the previous patch by removing 'log' from the list of wrapped methods.
Ok, the mobile harness is working as expected now: https://tbpl.mozilla.org/?tree=Try&rev=3747be01371d
Attachment #786017 - Attachment is obsolete: true
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
pro tip: don't name python vars "try"
Attachment #784774 - Attachment is obsolete: true
Attachment #786534 - Flags: review?(ted)
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Had a bug in the recursive cleanupDir retry logic.
Attachment #787027 - Flags: review?(ted)
Attachment #786534 - Attachment is obsolete: true
Attachment #786534 - Flags: review?(ted)
Attached patch Add support for --sequential to mach (obsolete) (deleted) — Splinter Review
Added support for this when no dir is specified.
Attachment #770946 - Attachment is obsolete: true
Comment on attachment 787071 [details] [diff] [review] Add support for --sequential to mach keeping r+
Attachment #787071 - Flags: review+
Attached patch Parallelize xpcshell (obsolete) (deleted) — Splinter Review
Changed the check to 1 second and increased the timeout, as discussed IRL.
Attachment #787128 - Flags: review?(ted)
Attachment #787027 - Attachment is obsolete: true
Attachment #787027 - Flags: review?(ted)
Comment on attachment 786485 [details] [diff] [review] Adapt mobile xpcshell harnesses to the changes from parxpc Can I get your final ok on this? It's the same as when you last reviewed it, just moved some code around to make sure everything works.
Attachment #786485 - Flags: review?(ahalberstadt)
Attachment #786485 - Flags: review?(ahalberstadt) → review+
As discussed on IRC.. miserable but kind of have to.
Attachment #787754 - Flags: review?(ted)
Attached patch Parallelize xpcshell harness (obsolete) (deleted) — Splinter Review
So now we have two consecutive green try runs [1][2], I'd love to get this reviewed, fold the patches and try landing it on inbound as soon as possible. [1] https://tbpl.mozilla.org/?tree=Try&rev=7104ec2d86e5 [2] https://tbpl.mozilla.org/?tree=Try&rev=0a913c0860a5
Attachment #787970 - Flags: review?(ted)
Attachment #787128 - Attachment is obsolete: true
Attachment #787128 - Flags: review?(ted)
Attached patch Parallelize xpcshell harness (obsolete) (deleted) — Splinter Review
There's no need for such a big timeout now, since the "cleanup slacker dirs" patch.
Attachment #788226 - Flags: review?(ted)
Attachment #787970 - Attachment is obsolete: true
Attachment #787970 - Flags: review?(ted)
Did a try run with Chris's patch applied as well. (had to fix some conflicts) https://tbpl.mozilla.org/?tree=Try&rev=aaadd274c58c
Forgot to add self.cleanup_dir_list to the test thread class.
Attachment #788236 - Flags: review?(ted)
Attachment #787754 - Attachment is obsolete: true
Attachment #787754 - Flags: review?(ted)
Comment on attachment 788226 [details] [diff] [review] Parallelize xpcshell harness Review of attachment 788226 [details] [diff] [review]: ----------------------------------------------------------------- This is r- but only for the SIGINT changes. Otherwise it looks pretty good and there are just a few cleanup nits. ::: testing/xpcshell/runxpcshelltests.py @@ +84,5 @@ > + self.env = copy.deepcopy(kwargs.get('env')) > + self.symbolsPath = kwargs.get('symbolsPath') > + self.logfiles = kwargs.get('logfiles') > + self.xpcshell = kwargs.get('xpcshell') > + self.xpcsRunArgs = kwargs.get('xpcsRunArgs') This pattern is a little odd. I'm not sure what ahal is trying to future-proof from here. @@ +147,5 @@ > + f.write(stdout) > + > + finally: > + if f: > + f.close() Can you switch this to use a with statement while you're moving the code? @@ +168,5 @@ > + """ > + Simple wrapper to launch a process. > + On a remote system, this is more complex and we need to overload this function. > + """ > + cmd = wrapCommand(cmd) I think we can drop this now, this was for OS X 10.5 compat. @@ +340,5 @@ > + # filesystem is slow to react to the changes > + try: > + self.removeDir(directory) > + return True > + except Exception: I'd like to see this switched to catching specific exceptions (probably OSError, WindowsError) instead of a catch-all. @@ +348,5 @@ > + # little bit and try again. > + time.sleep(1) > + > + try: > + self.removeDir(directory) This is a little awkward. You could hoist the try_count < TRY_LIMIT block from below up to after the sleep here and get rid of the second try block. Recursion is not how I would have written this, but it's a legitimate way to implement this so that's fine. @@ +1169,5 @@ > + # wait for at least one of the tests to finish > + # (every test releases the semaphore (+1) when it's done) > + self.sem.acquire() > + # add 1 back to the semaphore, will be subtracted in the loop > + self.sem.release() This is pretty confusing, but it does solve your use case. I might want input from someone who's more familiar with synchronization primitives if this is a legit way to use a semaphore. @@ -970,5 @@ > - # - don't move this line above launchProcess, or child will inherit the SIG_IGN > - signal.signal(signal.SIGINT, markGotSIGINT) > - # |stderr == None| as |pStderr| was either |None| or redirected to |stdout|. > - stdout, stderr = self.communicate(proc) > - signal.signal(signal.SIGINT, signal.SIG_DFL) So this code got mostly removed in this patch, which is going to regress the "Ctrl-C to kill just xpcshell" feature. I realize this is basically impossible in the parallel case, but I would like to preserve this behavior in the sequential case (or for running a single test). It should be fine to only do this if you're running sequentially.
Attachment #788226 - Flags: review?(ted) → review-
Comment on attachment 788236 [details] [diff] [review] Clean up slacker directories after the test run ends Review of attachment 788236 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/xpcshell/runxpcshelltests.py @@ +1247,5 @@ > + for directory in self.cleanup_dir_list: > + try: > + shutil.rmtree(directory) > + except: > + pass This could use a comment and a log.info. We agreed to just let this slide instead of causing a failure, since $TEMP should be cleaned up on reboot on test slaves now, and we're doing a best-effort to clean things up here, but I'd like it noted.
Attachment #788236 - Flags: review?(ted) → review+
Comment on attachment 786370 [details] [diff] [review] Synchronize blocks of output when running xpcshell tests in parallel Review of attachment 786370 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/xpcshell/runxpcshelltests.py @@ +361,5 @@ > message = "TEST-UNEXPECTED-FAIL | %s | Failed to clean up directory: %s" % (name, sys.exc_info()[1]) > self.log.error(message) > self.print_stdout(stdout) > self.print_stdout(traceback.format_exc()) > + LOG_MUTEX.release() You can write this as: with LOG_MUTEX: ... http://docs.python.org/2/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement @@ +591,5 @@ > + def wrapped(*args, **kwargs): > + LOG_MUTEX.acquire() > + unwrapped(*args, **kwargs) > + LOG_MUTEX.release() > + setattr(self.log, fun_name, wrapped) I don't think this will actually do what you think it will do. You're only actually defining one copy of wrapped, so all your log functions are going to get overwritten... You need to do something like def wrap(fn): def wrapped(*args, **kwargs): ... fn(*args, **kwargs) ... return wrapped setattr(self.log, fun_name, wrap(getattr(self.log, fun_name))
Attachment #786370 - Flags: review?(ted) → review-
Comment on attachment 788569 [details] [diff] [review] Synchronize blocks of output when running xpcshell tests in parallel; r?ted Review of attachment 788569 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/xpcshell/runxpcshelltests.py @@ +596,5 @@ > + def wrapped(*args, **kwargs): > + with LOG_MUTEX: > + fn(*args, **kwargs) > + return wrapped > + setattr(self.log, fun_name, wrap(unwrapped)) Wish there was a less-verbose way to write this, but this will have to do.
Attachment #788569 - Flags: review?(ted) → review+
Attached patch Parallelize xpcshell harness (obsolete) (deleted) — Splinter Review
Folded the [updated according to review] cleanup dir patch into this.
Attachment #788236 - Attachment is obsolete: true
Attachment #788226 - Attachment is obsolete: true
Rebased patch on top of the current parxpc iteration.
Attachment #786485 - Attachment is obsolete: true
Unbitrot. This applies on top of the base and mobile harness patches.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #85) > Comment on attachment 788226 [details] [diff] [review] > Parallelize xpcshell harness > > Review of attachment 788226 [details] [diff] [review]: > ----------------------------------------------------------------- > ::: testing/xpcshell/runxpcshelltests.py > @@ +84,5 @@ > > + self.env = copy.deepcopy(kwargs.get('env')) > > + self.symbolsPath = kwargs.get('symbolsPath') > > + self.logfiles = kwargs.get('logfiles') > > + self.xpcshell = kwargs.get('xpcshell') > > + self.xpcsRunArgs = kwargs.get('xpcsRunArgs') > > This pattern is a little odd. I'm not sure what ahal is trying to > future-proof from here. > http://pastebin.mozilla.org/2841780 Here's Andrew's comment. I feel it makes it a bit easier for the classes that extend this (mobile). I'm fine either way.
Flags: needinfo?(ted)
Attached patch Parallelize desktop xpcshell harness (obsolete) (deleted) — Splinter Review
Updated patch according to the review comments. Still waiting on the "OK" for my semaphore approach.
Attachment #789088 - Attachment is obsolete: true
Attachment #789100 - Attachment is obsolete: true
Attachment #789186 - Flags: review?(ted)
Comment on attachment 789186 [details] [diff] [review] Parallelize desktop xpcshell harness Review of attachment 789186 [details] [diff] [review]: ----------------------------------------------------------------- Using a semaphore as a guard variable seems wrong. For the existing usage, I think a threading.Condition is the better primitive. Just have the tests notify the condition variable on completion and have the results "poller" wait/acquire the condition. Upon notification, just iterate over completed tests and reap them, just as you do now. If you have your heart set on using semaphores, you can initialize a semaphore with NUM_THREADS and have each thread attempt to acquire the semaphore as a means of starting execution. That is traditionally how semaphores are used.
I ran this patch on my Macbook Pro and got the following: :45.97 TEST-PASS | /Users/gps/src/firefox/obj-firefox.noindex/_tests/xpcshell/toolkit/mozapps/extensions/test/xpcshell-unpack/test_blocklistchange.js | test passed (time: 13655.589ms) 1:51.87 TEST-PASS | /Users/gps/src/firefox/obj-firefox.noindex/_tests/xpcshell/toolkit/components/url-classifier/tests/unit/test_partial.js | test passed (time: 27970.437ms) 1:51.87 INFO | Following exceptions were raised: 1:51.87 Traceback (most recent call last): File "/Users/gps/src/firefox/testing/xpcshell/runxpcshelltests.py", line 112, in run self.run_test() File "/Users/gps/src/firefox/testing/xpcshell/runxpcshelltests.py", line 538, in run_test self.cleanupDir(self.profileDir, name, stdout, self.xunit_result) File "/Users/gps/src/firefox/testing/xpcshell/runxpcshelltests.py", line 341, in cleanupDir except (OSError, WindowsError): NameError: global name 'WindowsError' is not defined Error running mach: ['xpcshell-test'] The error occurred in code that was called by the mach command. This is either a bug in the called code itself or in the way that mach is calling it. You should consider filing a bug for this issue. If filing a bug, please include the full output of mach, including this error message. The details of the failure are as follows: NameError: global name 'WindowsError' is not defined File "/Users/gps/src/firefox/testing/xpcshell/mach_commands.py", line 179, in run_xpcshell_test return xpcshell.run_test(**params) File "/Users/gps/src/firefox/testing/xpcshell/mach_commands.py", line 57, in run_test keep_going=keep_going, shuffle=shuffle) File "/Users/gps/src/firefox/testing/xpcshell/mach_commands.py", line 45, in run_suite return self._run_xpcshell_harness(manifest=manifest, **kwargs) File "/Users/gps/src/firefox/testing/xpcshell/mach_commands.py", line 148, in _run_xpcshell_harness result = xpcshell.runTests(**filtered_args) File "/Users/gps/src/firefox/testing/xpcshell/runxpcshelltests.py", line 1229, in runTests raise exceptions[0] isinstance(WindowsError, OSError) -> True, so you can omit WindowsError from the except statement.
Attached patch Parallelize desktop xpcshell harness (obsolete) (deleted) — Splinter Review
After the IRL talk with :gps we agreed to stick to the semaphore. (Can easily change on request, I have a patch that switches to a Cond in my queue.) Fixed the WindowsError thing. Also, set options.sequential to True in the __main__ function. Let's land it this way, so we have all the code there, and then we can just "turn it on" with a small patch. Developers who use mach will get parallel tests, which is fine, because the intermittents only seem to happen on TBPL. (They can pass --sequential if needed.) Here's a try run: https://tbpl.mozilla.org/?tree=Try&rev=1b8407f88d2f
Attachment #789270 - Flags: review?(ted)
Attachment #789186 - Attachment is obsolete: true
Attachment #789186 - Flags: review?(ted)
I meant to say you should use http://docs.python.org/2/library/threading.html#event-objects. But, I suppose Semaphore will work. It was certainly simpler than Condition variables.
Attached patch Parallelize desktop xpcshell harness (obsolete) (deleted) — Splinter Review
Changed this to using an Event instead of the Semaphore.
Attachment #789290 - Flags: review?(ted)
Attachment #789270 - Attachment is obsolete: true
Attachment #789270 - Flags: review?(ted)
Attachment #789192 - Attachment is obsolete: true
Attached patch Add support for --sequential to mach, (obsolete) (deleted) — Splinter Review
Added a warning for eventual local failures.
Attachment #789301 - Flags: review?(gps)
Attachment #787071 - Attachment is obsolete: true
Attachment #789301 - Flags: review?(gps) → review+
Rebased chmanchester's patch as well.
Attachment #789197 - Attachment is obsolete: true
Comment on attachment 789307 [details] [diff] [review] Synchronize blocks of output when running xpcshell tests in parallel; keeping r+
Attachment #789307 - Flags: review+
Comment on attachment 789290 [details] [diff] [review] Parallelize desktop xpcshell harness Review of attachment 789290 [details] [diff] [review]: ----------------------------------------------------------------- Just a few little things to fix, looks good otherwise. ::: testing/xpcshell/runxpcshelltests.py @@ -759,5 @@ > self.pluginsPath = pluginsPath > - > - # If we have an interactive debugger, disable ctrl-c. > - if self.debuggerInfo and self.debuggerInfo["interactive"]: > - signal.signal(signal.SIGINT, lambda signum, frame: None) We might need to preserve this behavior as well, but I'm not 100% sure how it overlaps with the other SIGINT handling. If you want to be thorough and test with --debugger and see how that works that'd be cool, but I won't hold your patch up for it. @@ -970,5 @@ > - # - don't move this line above launchProcess, or child will inherit the SIG_IGN > - signal.signal(signal.SIGINT, markGotSIGINT) > - # |stderr == None| as |pStderr| was either |None| or redirected to |stdout|. > - stdout, stderr = self.communicate(proc) > - signal.signal(signal.SIGINT, signal.SIG_DFL) You lost this line, this wants to get moved to after you finish running tests to restore the default signal handler.
Attachment #789290 - Flags: review?(ted) → review+
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #93) > http://pastebin.mozilla.org/2841780 Here's Andrew's comment. > > I feel it makes it a bit easier for the classes that extend this (mobile). > I'm fine either way. I'm still not wild about it, but since it was ahal's suggestion and it's not terribly important let's just go with it. It's not a big deal to change later if we want to.
Flags: needinfo?(ted)
Attachment #789290 - Attachment is obsolete: true
Attachment #789301 - Attachment is obsolete: true
Attachment #789291 - Attachment is obsolete: true
Comment on attachment 790316 [details] [diff] [review] Part 1 - Refactor xpcshell test harness to support parallel test runs, disabled in automation; keeping previous r+'s from :ahal and :ted.
Attachment #790316 - Flags: review+
Comment on attachment 790317 [details] [diff] [review] Part 2 - Add parallel warning and support for --sequential to mach xpcshell-test, keep previous r+ from :gps
Attachment #790317 - Flags: review+
Added a 1s timeout to the wait call to make sure no deadlocks occur.
Attachment #790316 - Attachment is obsolete: true
Attachment #789307 - Attachment is obsolete: true
Comment on attachment 790327 [details] [diff] [review] Part 1 - Refactor xpcshell test harness to support parallel test runs, disabled in automation; keeping r+'s
Attachment #790327 - Flags: review+
Comment on attachment 790330 [details] [diff] [review] Part 3 - Synchronize blocks of output when running xpcshell tests in parallel; keep r+
Attachment #790330 - Flags: review+
Attachment #790354 - Flags: review?(ahalberstadt) → review+
No longer depends on: 898658
Summary: Parallelize xpcshell test harness → Refactor xpcshell test harness to support parallel test runs. disabled by default in automation
Keywords: checkin-needed
\o/ Awesome work everyone - especially Mihnea!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: