Closed
Bug 1058158
Opened 10 years ago
Closed 10 years ago
Marionette WebAPI tests should run in either a test container or a clean environment
Categories
(Remote Protocol :: Marionette, defect)
Tracking
(blocking-b2g:2.1+, firefox33 wontfix, firefox34 fixed, firefox35 fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)
People
(Reporter: qdot, Assigned: mdas)
References
Details
Attachments
(3 files, 9 obsolete files)
(deleted),
patch
|
jgriffin
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/x-log
|
Details |
[Blocking Requested - why for this release]: Blocks settings API (bug 900551)
In bug 968709, a patch was landed to reset the gecko environment to the system app so we can make sure we're running in something similar to B2G, since prior unit tests can put the system into odd states due to navigating away from the system app. The problem is that, while this makes the tests work, this leaves the process in an undefined state.
To fix this, we need to either run webapi tests in a test container, or restart the b2g process before running webapi tests.
Comment 1•10 years ago
|
||
WebAPI tests get run in two ways: with and without OOP. With OOP, they're run in a gaia test-container app, but without OOP, they're run in place of the system app.
Another way to fix this, then, would be to run the non-OOP tests in a different test-container that doesn't get run OOP.
Assignee | ||
Comment 2•10 years ago
|
||
Try is closed, and I have to head out for the day. I've only tested this locally against my phone and browser, but no chance to test on emulator. qDot, if you have time, can you test this? It's a patch on top of m-c (should work on m-i too).
If you're running the test by calling runtests.py directly, please make sure that 'b2g' is somewhere in your --type list.
Flags: needinfo?(kyle)
Reporter | ||
Comment 3•10 years ago
|
||
Unfortunately, on the emulator this makes test_import_script crash the test container app, at least locally. I'm patching and pushing to try since it's open again, will post results URL here.
Flags: needinfo?(kyle)
Reporter | ||
Comment 4•10 years ago
|
||
Updated•10 years ago
|
blocking-b2g: 2.1? → 2.1+
Assignee | ||
Comment 6•10 years ago
|
||
testing out the latest patch on try https://tbpl.mozilla.org/?tree=Try&rev=6ca42ba28455, will r? if it passes
Attachment #8479304 -
Attachment is obsolete: true
Assignee | ||
Comment 7•10 years ago
|
||
This is more up to date (https://tbpl.mozilla.org/?tree=Try&rev=abc3a219fe29)
Attachment #8480860 -
Attachment is obsolete: true
Assignee | ||
Comment 8•10 years ago
|
||
So these try runs are passing up until they all hit this timeout at some point:
command timed out: 7200 seconds elapsed running ['/tools/buildbot/bin/python', 'scripts/scripts/marionette.py', '--cfg', 'marionette/automation_emulator_config.py', '--blob-upload-branch', 'try', '--download-symbols', 'ondemand'], attempting to kill
It's unclear to me why this is happening, and I'm heading out for a week's PTO today. Has anyone seen this before?
Comment 9•10 years ago
|
||
Basically, the tests have become much slower...and they're hitting the buildbot 2-hr timeout. Without this change, the tests take ~90 min, for some reason this change makes them take > 2 hr. :(
Comment 10•10 years ago
|
||
A couple of potential solutions: we could try running them on faster VM's (but that typically boosts speed by only about 10%), or we could chunk the tests.
Assignee | ||
Comment 11•10 years ago
|
||
Thanks for the response. The original patch would start and kill a test container per test case, this will instead preserve it until it sees a case that doesn't require the test container. It will then look for the test container app and kill it if it exists.
Try: https://tbpl.mozilla.org/?tree=Try&rev=5dd399f37ce0
If this approach is still too slow, how about we separate the Marionette unit tests from the webapi tests, and enable the test container app using --test-container or using a separate manifest with test_container = true? That way, we keep the system app untouched when running the Mn tests, so we keep the test environment as pristine as possible.
https://bugzilla.mozilla.org/show_bug.cgi?id=1022862 is the separation effort but not much as been done since June.
Assignee | ||
Updated•10 years ago
|
Attachment #8480867 -
Attachment is obsolete: true
Assignee | ||
Comment 12•10 years ago
|
||
"manager" wasn't defined in gaia (let manager = window.wrappedJSObject.AppWindowManager || window.wrappedJSObject.WindowManager;). Maybe from dom bluetooth's test_navigate_to_default_url.py (https://mxr.mozilla.org/mozilla-central/source/dom/bluetooth/tests/marionette/test_navigate_to_default_url.py#16) doing weird stuff to gaia state? In any case, here's a retry that worked well locally:
https://tbpl.mozilla.org/?tree=Try&rev=4947fdcd616f
Attachment #8481613 -
Attachment is obsolete: true
Assignee | ||
Comment 13•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=a49620d09800
Finally green with this patch! Now I can stop thinking about this and go enjoy my vacation :P
Attachment #8481798 -
Attachment is obsolete: true
Attachment #8482260 -
Flags: review?(jgriffin)
Comment 14•10 years ago
|
||
Comment on attachment 8482260 [details] [diff] [review]
run in test-container
Review of attachment 8482260 [details] [diff] [review]:
-----------------------------------------------------------------
This patch basically removes the 'oop' feature of the runner, but leaves a bunch of related code in place. It causes all of our unit tests to be run OOP, regardless of what you specify in the manifest, but allows webapi tests to run in whatever state the phone happens to be in.
I think removing this feature is ok, as I looked and nothing outside of our unit tests currently uses it, but I think we should be more explicit about tearing it out.
We could land this if this patch is urgent and clean it up after mdas returns, or we could just wait until then.
Attachment #8482260 -
Flags: review?(jgriffin) → review-
Assignee | ||
Comment 15•10 years ago
|
||
Removed the oop code from the runner.
Ran in try: https://tbpl.mozilla.org/?tree=Try&rev=db38e9ce01c5
The red, failed build is because I aborted it. The orange is an intermittent.
Attachment #8482260 -
Attachment is obsolete: true
Attachment #8486493 -
Flags: review?(jgriffin)
Comment 16•10 years ago
|
||
Comment on attachment 8486493 [details] [diff] [review]
run in test-container, remove oop option
Review of attachment 8486493 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks, looks good.
Attachment #8486493 -
Flags: review?(jgriffin) → review+
Assignee | ||
Comment 17•10 years ago
|
||
Comment 18•10 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/bebc8e9a6da6 - b2g's Mnw hasn't finished yet to know how it'll fare, but Mn on all non-b2g says "| KeyError: 'b2g'" for every test.
Assignee | ||
Comment 19•10 years ago
|
||
Thanks, I forgot to test that patch locally on browser! Retrying with a new patch:
Mn:
https://tbpl.mozilla.org/?tree=Try&rev=84a07e0dd1f8
Mnw:
https://tbpl.mozilla.org/?tree=Try&rev=1e9544ee90f2
Assignee | ||
Comment 20•10 years ago
|
||
the only change from the previous patch is line 388 of marionette_test.py. This patch ran green on Mn and Mnw
Attachment #8486493 -
Attachment is obsolete: true
Attachment #8487489 -
Flags: review?(jgriffin)
Comment 21•10 years ago
|
||
Comment on attachment 8487489 [details] [diff] [review]
run in test-container, remove oop option
Review of attachment 8487489 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks for pointing out the difference with the previous patch, makes it easy to review!
Attachment #8487489 -
Flags: review?(jgriffin) → review+
Assignee | ||
Comment 22•10 years ago
|
||
Thanks, pushed to m-i
https://hg.mozilla.org/integration/mozilla-inbound/rev/d96632436134
Comment 23•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Comment 24•10 years ago
|
||
status-b2g-v2.1:
--- → fixed
status-b2g-v2.2:
--- → fixed
status-firefox33:
--- → wontfix
status-firefox34:
--- → fixed
status-firefox35:
--- → fixed
Comment 25•10 years ago
|
||
Backed out from Aurora for Mnw permafail.
https://hg.mozilla.org/releases/mozilla-aurora/rev/549846c956b8
https://tbpl.mozilla.org/php/getParsedLog.php?id=48211120&tree=Mozilla-Aurora
Flags: needinfo?(mdas)
Keywords: branch-patch-needed
Reporter | ||
Comment 26•10 years ago
|
||
It's looking like this may break the test_switch_frame.py unit tests on Aurora. I'm not sure how the other two patches would affect that functionality, but I'm not ruling that out either. Will run another try with just the marionette patches.
https://tbpl.mozilla.org/?tree=Try&rev=7da5f9669516
Good news is, this doesn't seem to break the bluetooth tests anymore. So we've at least fixed one thing.
Reporter | ||
Comment 27•10 years ago
|
||
Here's with only the marionette patches and navigate removal.
https://tbpl.mozilla.org/?tree=Try&rev=0197d4f2975b
Reporter | ||
Comment 28•10 years ago
|
||
So everything is green on b2g-i/m-i/m-c, so there's gotta be something else we need to backport to aurora to fix this. Since the frame switch stuff is marionette, is there possibly some marionette patch we should backport? It doesn't look like there's many between the aurora branch and now.
Assignee | ||
Comment 29•10 years ago
|
||
So test_switch_frame.py isn't supposed to be run. Bug 977893 is responsible for adding the 'b2g=false' line to the manifest, which shuts off this test. Adding this patch wholesale is a bit much for this. We can either uplift that patch to m-a, or this modified version of this patch that adds that line.
I'm running the modified version of this on aurora with your navigate patch applied: https://tbpl.mozilla.org/?tree=Try&rev=2b58b78c6fcd
Flags: needinfo?(mdas)
Reporter | ||
Comment 30•10 years ago
|
||
It looks like the patch for test_navigation.py is screwed up between the m-c and aurora patches? We're not doing a skip_if_b2g on test_navigate_frame, which makes the test app crash. That's what's causing the reds, and it's failing locally too.
Here's a new try, with the test skip moved.
https://tbpl.mozilla.org/?tree=Try&rev=609b9f1311fd
Reporter | ||
Comment 31•10 years ago
|
||
Attachment #8490809 -
Attachment is obsolete: true
Reporter | ||
Comment 32•10 years ago
|
||
So along with the failures in navigation and frame, I was noticing landing this up'd our intermittents in bluetooth. I found that I had forgotten to change some settings functions in the webapi tests to use transaction events when landing bug 900551. There's a good chance that if we landed without these fixes, we'd just get backed right out again due to obviousness in orange factor. I got them fixed and have a new try running with those patches on top. I really, really hope this is it.
https://tbpl.mozilla.org/?tree=Try&rev=fe009c7736f2
Reporter | ||
Comment 33•10 years ago
|
||
Oook, well locally bug 1069115 is causing everything to fail, so maybe I stopped that prior build to soon. Here's a try with just the fixed navigation tests again. >.>
https://tbpl.mozilla.org/?tree=Try&rev=90437d10da50
Reporter | ||
Comment 34•10 years ago
|
||
Ok, back to running 1058158 and 1068077 locally against the emulator. I'm getting intermittent failures in telephony tests and wifi. This is the error that shows up for telephony:
09-18 23:35:52.042 E/GeckoConsole( 44): [JavaScript Error: "uncaught exception: RadioNotAvailable"]
For wifi, I get a lot of promise failures due to missing fields.
Reporter | ||
Comment 35•10 years ago
|
||
ni?'ing a couple of people on the RIL team, got names from blaming telephony ril code.
To get the ni?'ers up to date: in this bug, we're trying to run some tests in a container app to make sure each webapi test run is done in a clean environment. While we've landed this patch to m-c, unfortunately this is causing intermittents on Aurora. What I'm wondering about is how we can get to a state where a Promise is getting back RadioNotAvailable error as shown in this attachment. This is running a reduced test set of a few system tests, then the bluetooth test, then the telephony tests. About 80% of the time I can get the crash_emulator and a lot of the outbound telephony tests to fail on this message. Inbound seems to work ok, oddly enough. Just need to figure out where to start tracking this.
Flags: needinfo?(szchen)
Flags: needinfo?(htsai)
Comment 36•10 years ago
|
||
(In reply to Kyle Machulis [:kmachulis] [:qdot] (USE NEEDINFO?) from comment #35)
> Created attachment 8491934 [details]
> Emulator log showing test_emulator_crash failure
>
> ni?'ing a couple of people on the RIL team, got names from blaming telephony
> ril code.
>
> To get the ni?'ers up to date: in this bug, we're trying to run some tests
> in a container app to make sure each webapi test run is done in a clean
> environment. While we've landed this patch to m-c, unfortunately this is
> causing intermittents on Aurora. What I'm wondering about is how we can get
> to a state where a Promise is getting back RadioNotAvailable error as shown
> in this attachment. This is running a reduced test set of a few system
> tests, then the bluetooth test, then the telephony tests. About 80% of the
> time I can get the crash_emulator and a lot of the outbound telephony tests
> to fail on this message. Inbound seems to work ok, oddly enough. Just need
> to figure out where to start tracking this.
Kyle,
How do you run the test? I tried to apply the patch locally on aurora and then execute:
./mach marionette-webapi <telephony-test-suit>
The result is good. How could I reproduce your result? Do I miss some options?
The error "RadioNotAvailable" means that the radio on emulator is not yet turned on.
Does system app runs on your test environment? Ideally, when the emulator power up, system app (gaia/apps/system/js/radio.js) will sent a request to gecko to turn on the radio. All the telephony tests assume that the radio is default on.
Flags: needinfo?(szchen)
Reporter | ||
Comment 37•10 years ago
|
||
I've been running the tests using runtests.py, but marionette-webapi does basically the same thing. If you run /just/ the telephony tests, it will usually pass. The problem comes when you try to run the tbpl bank of tests (in testing/marionette/client/marionette/tests/unit-test.ini).
The way I've been running these is to run 1 or 2 tests out of testing/marionette/client/marionette/tests/unit/unit-tests.ini (just comment most of them out, leave a couple uncommented), just to make sure we run something in the test container. After this, bluetooth and telephony should run, and that's usually how telephony fails. That's the quickest way to repro this, as running all of the tests takes well over an hour.
Flags: needinfo?(szchen)
Reporter | ||
Comment 38•10 years ago
|
||
Ok. Well. Stacked ALL of the patches I need to land to aurora to get settings stablized EXCEPT for bug 1068077 (just because I forgot to move it to before the try in the mq) and ran a try:
https://tbpl.mozilla.org/?tree=Try&rev=e3c5b655b26a
Mnw, as well as everything else, passes.
Ignore the M-e10s, those don't run on m-a. So with this, it looks like we could land everything? I'm retriggering to find out.
The weird part is that this works while still leaving in the navigate test (which is what bug 1068077 removed), which is what this was supposed to remove. I guess we can leave that for later if this works though.
Reporter | ||
Comment 39•10 years ago
|
||
Updated•10 years ago
|
Keywords: branch-patch-needed
Reporter | ||
Comment 40•10 years ago
|
||
Ok, this landed to aurora with no problems, so I'm betting the issues I was seeing with telephony and other tests were due to bug 1068077, since we didn't re-establish the system app before starting the webapi tests. I guess I'll leave bug 1068077 open for right now, but I'll probably close it as a WONTFIX once we divide the webapi tests out from the unit test manifest in bug 1022862.
:aknow, thanks for the help thus far. :)
Flags: needinfo?(szchen)
Flags: needinfo?(htsai)
Updated•2 years ago
|
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•