Closed Bug 967826 Opened 11 years ago Closed 10 years ago

gaiatest fails to restart b2g between tests if it can't connect to Marionette during teardown

Categories

(Firefox OS Graveyard :: Gaia::UI Tests, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1040742

People

(Reporter: jgriffin, Unassigned)

References

Details

Sometimes a test will hang, and then tearDown dies because it can't connect to Marionette.  When this happens, we don't honor --restart and restart the B2G process, instead, every test after that dies with a socket.timeout.

Ex:

https://tbpl.mozilla.org/php/getParsedLog.php?id=33929844&tree=B2g-Inbound&full=1#error0

The first failure is real, and is probably caused by B2G.  The second failure (in the same test) is an error during tearDown:

21:17:21     INFO -  Traceback (most recent call last):
21:17:21     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette_test.py", line 173, in run
21:17:21     INFO -      self.tearDown()
21:17:21     INFO -    File "/builds/slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 1125, in tearDown
21:17:21     INFO -      MarionetteTestCase.tearDown(self)
21:17:21     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette_test.py", line 318, in tearDown
21:17:21     INFO -      self.marionette.set_context("content")
21:17:21     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 854, in set_context
21:17:21     INFO -      return self._send_message('setContext', 'ok', value=context)
21:17:21     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 588, in _send_message
21:17:21     INFO -      raise MarionetteException(message="Please start a session")
21:17:21     INFO -  MarionetteException: MarionetteException: Please start a session
21:17:21     INFO -  TEST-UNEXPECTED-FAIL | test_clock_set_alarm.py test_clock_set_alarm.TestClockSetAlarm.test_clock_set_alarm |

The error at this point appears to prevent us from restarting B2G, and so every subsequent test fails.

We do want to report these errors, but we should make it so that the restart logic can still function.
The problem is that we use GaiaDevice to manage the restart, but GaiaDevice requires a Marionette instance:

https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/gaia_test.py#L875

We both generate the Marionette instance and call new_session on it in setUp:

http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/marionette_test.py#245

...and we're dying in new_session (since we're apparently connected to a hung B2G).

We need to untangle this somehow so that calling new_session isn't prerequisite for restarting B2G.
This doesn't look like a trivial fix, at least not to do it well.  We're currently conflating the lifecycles of several different things at least to a degree:  a device, a gecko binary, a Marionette class instance, a Marionette session, and a test case.  We should be managing at least some of these things independently.
Since it causes socket.timeout errors and prevents tests from running, I'm elevating this to major so it can be dealt with this quarter.
Severity: normal → major
I've made a small step towards reducing some of GaiaDevice's dependencies in:
https://bugzilla.mozilla.org/show_bug.cgi?id=985553

In another unrelated pull I'm capturing the `is_desktop_b2g` and writing it to testvars so we don't need to rely on self.marionette.instance to decide what type of b2g it is during the setUp.

Not sure what the next step is after this, we can try and take it in small iterations.
Depends on: 985553
oops
https://bugzilla.mozilla.org/show_bug.cgi?id=997679
Depends on: 997679
No longer depends on: 985553
I've also experimented with a test for the crash handling in:
https://bugzilla.mozilla.org/show_bug.cgi?id=982158 and https://github.com/mozilla-b2g/gaia/pull/17167
This includes catching an exception from the tearDown which may help with comment #0.

Alternatively could relying more heavily on `wait_for_port` give us a safe point to start a marionette session?
Experimented with wait_for_port. In theory I think we could put wait_for_port before trying to do MarionetteTestCase.setUp as that would let us know that Gecko has started up. then we can start the marionette session and use that to wait for Gaia to be ready.

however wait_for_port requires MarionetteTestCase for the host/port but we can't get MarionetteTestCase without setUp which tries to create a session.

if wait_for_port were higher up in the MarionetteTestCase tree and did not need a marionette session it could be quite useful for us in crash handling.
Sorry I forgot to clarify that once you fix the problem in comment #0 then you hit a problem with the startup in the following test, which is why my last 3 comments have been focussed upon the setUp recovery steps.
Update this one with dependencies
Flags: needinfo?(zcampbell)
Depends on: 1019210, 1001322
Flags: needinfo?(zcampbell)
(In reply to Zac C (:zac) from comment #7)
> if wait_for_port were higher up in the MarionetteTestCase tree and did not
> need a marionette session it could be quite useful for us in crash handling.

You can call wait_for_port without a session:

m=Marionette()
m.wait_for_port()
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.