Closed Bug 811130 Opened 12 years ago Closed 12 years ago

Running mochitest-robotium locally doesn't terminate at the right time

Categories

(Testing :: Mochitest, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kats, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

I'm running the following command locally: TEST_PATH=testVkbOverlap make mochitest-robotium where testVkbOverlap is the test I'm writing for bug 810171. It runs fine and then after it's done it just hangs. The last output is this: 30 INFO Failed: 2 31 INFO Todo: 0 32 INFO SimpleTest FINISHED If I ctrl-c to exit, I get this stack dump: ^CTraceback (most recent call last): File "_tests/testing/mochitest/runtestsremote.py", line 547, in <module> main() File "_tests/testing/mochitest/runtestsremote.py", line 538, in main logcat = dm.getLogcat() File "/Users/kats/zspace/mozilla-git/obj-android-debug/_tests/testing/mochitest/devicemanager.py", line 504, in getLogcat root=self.logcatNeedsRoot).split('\r') File "/Users/kats/zspace/mozilla-git/obj-android-debug/_tests/testing/mochitest/devicemanager.py", line 61, in shellCheckOutput retval = self.shell(cmd, buf, env=env, cwd=cwd, timeout=timeout, root=root) File "/Users/kats/zspace/mozilla-git/obj-android-debug/_tests/testing/mochitest/devicemanagerADB.py", line 136, in shell time.sleep(self.pollingInterval) KeyboardInterrupt make: *** [mochitest-robotium] Error 1 Some logging indicates that the adb shell command that is hanging is this: ['adb', 'shell', '/system/bin/logcat -v time -d dalvikvm:S ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S NetworkStateTracker:S; echo $?'] which makes sense since logcat never really exits. So the sleep(self.pollingInterval) loop in deviceManagerADB.py just keeps looping until the timeout is hit. In the case of my test, which takes about a minute to run, waiting for the full 5 minute timeout to expire is unnecessary. IMO there should be a way to distinguish between the "primary" process (i.e. the test that is running) and "daemon" processes (logging, etc) so that we don't block on daemon processes to consider the test completed.
(In reply to Kartikaya Gupta (:kats) from comment #0) > Some logging indicates that the adb shell command that is hanging is this: > > ['adb', 'shell', '/system/bin/logcat -v time -d dalvikvm:S > ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S > NetworkStateTracker:S; echo $?'] > > which makes sense since logcat never really exits. So the > sleep(self.pollingInterval) loop in deviceManagerADB.py just keeps looping > until the timeout is hit. In the case of my test, which takes about a minute > to run, waiting for the full 5 minute timeout to expire is unnecessary. Actually the "-d" parameter to logcat should cause logcat to exit after dumping all the logs. So either you have an absolutely gigantic log for some reason (not supposed to happen because we try to clear the logcat every time before a test) or there's something else going wrong.
You're right, and I'm not sure what's going on. When I do "adb shell ps | grep log" to see if the logcat process is running on the device, I get no results. And if I run "ps | grep log" I do get a "adb shell /system/bin/logcat -v time -d dalvikvm:S ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S NetworkStateTracker:S; echo $?" process running on my host machine. So this seems to indicate that the python shell call has started on my host machine, but for some reason the corresponding "logcat" process isn't running on the device. I don't know if it started and ended or never started at all. Running that adb command normally from a terminal on my host machine behaves as expected.
Looks like you were right in that there's too much output. I was able to reproduce this is in the python repl and I simplified the test case to catting a file: >>> proc = subprocess.Popen(['cat', 'afile'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) >>> print proc.poll() None The above proc.poll() returns "0" if "afile" is 16384 bytes long and returns None if "afile" is 16385 bytes long. So on my machine at least, the pipe buffers are 16 KiB and if the output exceeds that we'll need to pump the buffers (i.e. call proc.communicate()) occasionally for this to work. Don't know if it's worth doing this since so far it seems to only affect me.
(In reply to Kartikaya Gupta (:kats) from comment #3) > Looks like you were right in that there's too much output. I was able to > reproduce this is in the python repl and I simplified the test case to > catting a file: > > >>> proc = subprocess.Popen(['cat', 'afile'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) > >>> print proc.poll() > None > > The above proc.poll() returns "0" if "afile" is 16384 bytes long and returns > None if "afile" is 16385 bytes long. So on my machine at least, the pipe > buffers are 16 KiB and if the output exceeds that we'll need to pump the > buffers (i.e. call proc.communicate()) occasionally for this to work. Don't > know if it's worth doing this since so far it seems to only affect me. This seems worth fixing to me. No doubt someone else is going to have the same problem in the future if we don't. If you're going to use communicate(), we'll need to do something to preserve the timeout. There's some ideas on that here: http://stackoverflow.com/questions/1191374/subprocess-with-timeout
Attached patch Patch (deleted) — Splinter Review
Stole a solution from the stackoverflow post you linked.
Attachment #681175 - Flags: review?(wlachance)
Comment on attachment 681175 [details] [diff] [review] Patch LGTM as long as it's thoroughly tested.
Attachment #681175 - Flags: review?(wlachance) → review+
I tested it but I wouldn't say "thoroughly" - I don't think I know this code well enough to know which code paths will exercise this code in different ways.
(In reply to Kartikaya Gupta (:kats) from comment #7) > I tested it but I wouldn't say "thoroughly" - I don't think I know this code > well enough to know which code paths will exercise this code in different > ways. Well if you tested the common case (small logcat) and the edge case (large logcat like you had) that's probably good enough. I just wanted to make sure we covered at least those.
I tested the large logcat case, but I don't know how to get a "small" logcat. I run a single test which does no output of its own; the output from fennec starting up alone will exceed 16k. However I tested the bug in isolation using the python repl and it worked.
Should I land this on m-i? Or is it supposed to go through github?
(In reply to Kartikaya Gupta (:kats) from comment #10) > Should I land this on m-i? Or is it supposed to go through github? It should go through github, and then be merged into m-c. I can land this one for you, but for future reference: https://wiki.mozilla.org/Auto-tools/Projects/MozBase#Check-in_policy
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: