Closed Bug 779261 Opened 12 years ago Closed 12 years ago

"Automation error: Error receiving data from socket (possible reboot)" during Robocop

Categories

(Testing :: General, defect)

x86
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 771626

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Robocop tests are currently hidden due to high failure rates. A major source of failure in Robocop is that devicemanagerSUT notices a socket error 54 (econnreset) during a test. This very often occurs during a "ps" command following the launch of the browser. This message is reported and triggers an orange:

Automation error: Error receiving data from socket (possible reboot). cmd={'cmd': 'ps'}; err=[Errno 54] Connection reset by peer

In most cases, the socket can be reconnected and tests continue to run, sometimes without any loss of data. In these cases, a reboot seems unlikely.

Investigation shows that there is often a significant delay - about 17 minutes! - between the launch of the process and the econnreset, which occurs on the first 'ps' command following the launch. Presumably, execution is paused in runCmds('exec ... <fennec>').

Some test runs with additional logging:
https://tbpl.mozilla.org/php/getParsedLog.php?id=14007518&tree=Try&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=13997101&tree=Try&full=1


Speculation: fennec hangs (bug 772672??) and after a very long time there is a  timeout of some sort that closes the socket, completing runCmds('exec') and allowing the 'ps' to be sent...but then the closed socket is noticed and reported.
Blocks: 778956
Tests do continue to run, but the test run itself terminates.  For robocop we run the test harness function in a loop for each test in the robocop.ini file.  This means that we can fail 100% on a test and continue onto the next test.

This is the bug that is the root cause of this:
https://bugzilla.mozilla.org/show_bug.cgi?id=771626

We think we have a solution, waiting on it to be deployed out to the world.
I was going to argue that reboots are only a possibility and there are reasons to question that possibility, but could not find good evidence against reboots...and then found conclusive evidence that reboots happened in the logs I referenced: there is a process list shown before each test, and the pids change for all processes after each "possible reboot" message.

I'll try to wait patiently for the watcher and sutagent deployments....
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.