811130 - Running mochitest-robotium locally doesn't terminate at the right time

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Description

•

12 years ago

I'm running the following command locally: TEST_PATH=testVkbOverlap make mochitest-robotium where testVkbOverlap is the test I'm writing for bug 810171. It runs fine and then after it's done it just hangs. The last output is this: 30 INFO Failed: 2 31 INFO Todo: 0 32 INFO SimpleTest FINISHED If I ctrl-c to exit, I get this stack dump: ^CTraceback (most recent call last): File "_tests/testing/mochitest/runtestsremote.py", line 547, in <module> main() File "_tests/testing/mochitest/runtestsremote.py", line 538, in main logcat = dm.getLogcat() File "/Users/kats/zspace/mozilla-git/obj-android-debug/_tests/testing/mochitest/devicemanager.py", line 504, in getLogcat root=self.logcatNeedsRoot).split('\r') File "/Users/kats/zspace/mozilla-git/obj-android-debug/_tests/testing/mochitest/devicemanager.py", line 61, in shellCheckOutput retval = self.shell(cmd, buf, env=env, cwd=cwd, timeout=timeout, root=root) File "/Users/kats/zspace/mozilla-git/obj-android-debug/_tests/testing/mochitest/devicemanagerADB.py", line 136, in shell time.sleep(self.pollingInterval) KeyboardInterrupt make: *** [mochitest-robotium] Error 1 Some logging indicates that the adb shell command that is hanging is this: ['adb', 'shell', '/system/bin/logcat -v time -d dalvikvm:S ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S NetworkStateTracker:S; echo $?'] which makes sense since logcat never really exits. So the sleep(self.pollingInterval) loop in deviceManagerADB.py just keeps looping until the timeout is hit. In the case of my test, which takes about a minute to run, waiting for the full 5 minute timeout to expire is unnecessary. IMO there should be a way to distinguish between the "primary" process (i.e. the test that is running) and "daemon" processes (logging, etc) so that we don't block on daemon processes to consider the test completed.

William Lachance (:wlach)

Comment 1

•

12 years ago

(In reply to Kartikaya Gupta (:kats) from comment #0) > Some logging indicates that the adb shell command that is hanging is this: > > ['adb', 'shell', '/system/bin/logcat -v time -d dalvikvm:S > ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S > NetworkStateTracker:S; echo $?'] > > which makes sense since logcat never really exits. So the > sleep(self.pollingInterval) loop in deviceManagerADB.py just keeps looping > until the timeout is hit. In the case of my test, which takes about a minute > to run, waiting for the full 5 minute timeout to expire is unnecessary. Actually the "-d" parameter to logcat should cause logcat to exit after dumping all the logs. So either you have an absolutely gigantic log for some reason (not supposed to happen because we try to clear the logcat every time before a test) or there's something else going wrong.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 2

•

12 years ago

You're right, and I'm not sure what's going on. When I do "adb shell ps | grep log" to see if the logcat process is running on the device, I get no results. And if I run "ps | grep log" I do get a "adb shell /system/bin/logcat -v time -d dalvikvm:S ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S NetworkStateTracker:S; echo $?" process running on my host machine. So this seems to indicate that the python shell call has started on my host machine, but for some reason the corresponding "logcat" process isn't running on the device. I don't know if it started and ended or never started at all. Running that adb command normally from a terminal on my host machine behaves as expected.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 3

•

12 years ago

Looks like you were right in that there's too much output. I was able to reproduce this is in the python repl and I simplified the test case to catting a file: >>> proc = subprocess.Popen(['cat', 'afile'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) >>> print proc.poll() None The above proc.poll() returns "0" if "afile" is 16384 bytes long and returns None if "afile" is 16385 bytes long. So on my machine at least, the pipe buffers are 16 KiB and if the output exceeds that we'll need to pump the buffers (i.e. call proc.communicate()) occasionally for this to work. Don't know if it's worth doing this since so far it seems to only affect me.

William Lachance (:wlach)

Comment 4

•

12 years ago

(In reply to Kartikaya Gupta (:kats) from comment #3) > Looks like you were right in that there's too much output. I was able to > reproduce this is in the python repl and I simplified the test case to > catting a file: > > >>> proc = subprocess.Popen(['cat', 'afile'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) > >>> print proc.poll() > None > > The above proc.poll() returns "0" if "afile" is 16384 bytes long and returns > None if "afile" is 16385 bytes long. So on my machine at least, the pipe > buffers are 16 KiB and if the output exceeds that we'll need to pump the > buffers (i.e. call proc.communicate()) occasionally for this to work. Don't > know if it's worth doing this since so far it seems to only affect me. This seems worth fixing to me. No doubt someone else is going to have the same problem in the future if we don't. If you're going to use communicate(), we'll need to do something to preserve the timeout. There's some ideas on that here: http://stackoverflow.com/questions/1191374/subprocess-with-timeout

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 5

•

12 years ago

Attached patch Patch (deleted) — Details — Splinter Review

Stole a solution from the stackoverflow post you linked.

Attachment #681175 - Flags: review?(wlachance)

William Lachance (:wlach)

Comment 6

•

12 years ago

Comment on attachment 681175 [details] [diff] [review] Patch LGTM as long as it's thoroughly tested.

Attachment #681175 - Flags: review?(wlachance) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 7

•

12 years ago

I tested it but I wouldn't say "thoroughly" - I don't think I know this code well enough to know which code paths will exercise this code in different ways.

William Lachance (:wlach)

Comment 8

•

12 years ago

(In reply to Kartikaya Gupta (:kats) from comment #7) > I tested it but I wouldn't say "thoroughly" - I don't think I know this code > well enough to know which code paths will exercise this code in different > ways. Well if you tested the common case (small logcat) and the edge case (large logcat like you had) that's probably good enough. I just wanted to make sure we covered at least those.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 9

•

12 years ago

I tested the large logcat case, but I don't know how to get a "small" logcat. I run a single test which does no output of its own; the output from fennec starting up alone will exceed 16k. However I tested the bug in isolation using the python repl and it worked.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Reporter

Comment 10

•

12 years ago

Should I land this on m-i? Or is it supposed to go through github?

William Lachance (:wlach)

Comment 11

•

12 years ago

(In reply to Kartikaya Gupta (:kats) from comment #10) > Should I land this on m-i? Or is it supposed to go through github? It should go through github, and then be merged into m-c. I can land this one for you, but for future reference: https://wiki.mozilla.org/Auto-tools/Projects/MozBase#Check-in_policy

William Lachance (:wlach)

Comment 12

•

12 years ago

Pushed: https://github.com/mozilla/mozbase/commit/c2fba62dbcad9dcb775dc3cdb334bc9acf4bdd33

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Bugzilla

Running mochitest-robotium locally doesn't terminate at the right time

Categories

(Testing :: Mochitest, defect)

Tracking

(Not tracked)

People

(Reporter: kats, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Attachment

General

Description

File Name

Content Type