Closed
Bug 1489264
Opened 6 years ago
Closed 6 years ago
Investigate running packet.net-based Android emulator unit tests on GCP
Categories
(Testing :: General, enhancement, P1)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
(Whiteboard: [geckoview:p2])
Building on wcosta's foundation, let's see how well Android 4.2/4.3 tests run on GCP.
Assignee | ||
Comment 1•6 years ago
|
||
Oops, cloned that too well.
s/Android 4.2/4.3/Android 7.0/
Assignee | ||
Comment 2•6 years ago
|
||
My initial attempt:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0b4e7a111a7a879a141ef45b9e7225516690fc0&filter-tier=1&filter-tier=2&filter-tier=3
https://queue.taskcluster.net/v1/task/enqpmpaER9mhXgCfjatrSA/runs/1/artifacts/public/logs/live.log
The android x86 emulator does not start because kvm is not available.
https://taskcluster-artifacts.net/enqpmpaER9mhXgCfjatrSA/1/public/test_info//emulator-qMHGOf.log
emulator: CPU Acceleration: DISABLED
emulator: CPU Acceleration status: KVM requires a CPU that supports vmx or svm
emulator: ERROR: x86_64 emulation currently requires hardware acceleration!
Please ensure KVM is properly installed and usable.
CPU acceleration status: KVM requires a CPU that supports vmx or svm
Comment 3•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #2)
> The android x86 emulator does not start because kvm is not available.
Paging Wander
Flags: needinfo?(wcosta)
Assignee | ||
Comment 4•6 years ago
|
||
Comment 5•6 years ago
|
||
(In reply to Chris Cooper [:coop] pronoun: he from comment #3)
> (In reply to Geoff Brown [:gbrown] from comment #2)
> > The android x86 emulator does not start because kvm is not available.
>
> Paging Wander
If that's urgent I can look now, if not I will postpone for when I recover.
Flags: needinfo?(wcosta) → needinfo?(coop)
Comment 6•6 years ago
|
||
(In reply to Wander Lairson Costa [:wcosta] from comment #5)
> If that's urgent I can look now, if not I will postpone for when I recover.
Redirecting NI to gbrown to see whether this is a blocker.
Flags: needinfo?(coop) → needinfo?(gbrown)
Assignee | ||
Comment 7•6 years ago
|
||
The geckoview tests currently running on mozilla-central, tier 3:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=android%20x86%207.0&filter-tier=1&filter-tier=2&filter-tier=3
could run on integration branches as tier 1 today, on packet.net, if there was a flexible provisioning solution available (or I suppose, if we just committed to a big pool of packet.net instances). Currently we are paused on committing to packet.net while we investigate gcp; in that sense, this bug is a blocker for getting geckoview tests to tier 1. The geckoview team has been very patient to date, but let's check in with them...
:davidb - Can you comment on how important and urgent it is to get these geckoview tests running in continuous integration?
Flags: needinfo?(gbrown) → needinfo?(dbolter)
Assignee | ||
Updated•6 years ago
|
Priority: -- → P1
Comment 8•6 years ago
|
||
Discussed with Jim and Snorp. This is not super urgent while we have arm coverage - thanks for the ping!
Flags: needinfo?(dbolter)
Comment 9•6 years ago
|
||
The nested-vm feature is enabled.
Assignee | ||
Comment 10•6 years ago
|
||
(In reply to Wander Lairson Costa [:wcosta] from comment #9)
> The nested-vm feature is enabled.
Yes, that helps!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f7ff463dcfe8063d4c335a6c4f9da378dc7ae320&filter-tier=1&filter-tier=2&filter-tier=3
https://treeherder.mozilla.org/logviewer.html#?job_id=198554836&repo=try&lineNumber=849
[task 2018-09-11T01:43:36.807Z] 01:43:36 INFO - Running command: ['ls', '-l', '/dev/kvm']
[task 2018-09-11T01:43:36.807Z] 01:43:36 INFO - Copy/paste: ls -l /dev/kvm
[task 2018-09-11T01:43:36.812Z] 01:43:36 INFO - crw-rw-rw- 1 root root 10, 232 Sep 11 01:42 /dev/kvm
[task 2018-09-11T01:43:36.812Z] 01:43:36 INFO - Return code: 0
[task 2018-09-11T01:43:36.812Z] 01:43:36 INFO - Running command: ['kvm-ok']
[task 2018-09-11T01:43:36.813Z] 01:43:36 INFO - Copy/paste: kvm-ok
[task 2018-09-11T01:43:36.820Z] 01:43:36 INFO - INFO: /dev/kvm exists
[task 2018-09-11T01:43:36.820Z] 01:43:36 INFO - KVM acceleration can be used
[task 2018-09-11T01:43:36.821Z] 01:43:36 INFO - Return code: 0
[task 2018-09-11T01:43:36.821Z] 01:43:36 INFO - Running command: ['emulator', '-accel-check']
[task 2018-09-11T01:43:36.821Z] 01:43:36 INFO - Copy/paste: emulator -accel-check
[task 2018-09-11T01:43:36.837Z] 01:43:36 INFO - accel:
[task 2018-09-11T01:43:36.837Z] 01:43:36 INFO - 0
[task 2018-09-11T01:43:36.837Z] 01:43:36 INFO - KVM (version 12) is installed and usable.
[task 2018-09-11T01:43:36.837Z] 01:43:36 INFO - accel
[task 2018-09-11T01:43:36.838Z] 01:43:36 INFO - Return code: 0
Now the x86 emulator starts and uses kvm - great!!
Assignee | ||
Comment 11•6 years ago
|
||
*But*...when I try to run a full set of Android x86 tests, I find most of them timeout and the tasks retry or fail:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=63e04a2b1573409c9c011ae224b24d0b24a80cc6&filter-tier=1&filter-tier=2&filter-tier=3
It seems that each test task will run successfully if run alone (one task at a time), but fails when ~3 or more such test tasks are running at once. The difference may only be performance: each task runs significantly slower, sufficient to trigger timeouts.
Comment 12•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] (less available Sept 10-14) from comment #11)
> It seems that each test task will run successfully if run alone (one task at
> a time), but fails when ~3 or more such test tasks are running at once. The
> difference may only be performance: each task runs significantly slower,
> sufficient to trigger timeouts.
Geoff: in the mtg yesterday, we briefly discussed setting up a matrix of instance configs like we did for packet.net to hone in on the best combination of price/performance. Given your comment, it sounds like you're ready for this, and that we should aim higher spec-wise until we can match the packet.net performance. We can then compare results from GCP vs packet.net directly.
Wander: can you get the matrix setup for Geoff? We can rope in other people (Brian, John, ...) as required.
Flags: needinfo?(wcosta)
Comment 13•6 years ago
|
||
I set up a grid with n1-standard-{2,4,8,16} machine types. Each machine type has 4 instances. The worker-types are gce/n1-std-{2,4,8,16}.
Flags: needinfo?(wcosta)
Comment 14•6 years ago
|
||
Update: the worker type were renamed gecko-t-linux-{2,4,8,16}
Assignee | ||
Comment 15•6 years ago
|
||
First attempt with gecko-t-linux-2 is not working:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=f4793fdf946e8c2d47395c449979d1f14579b422
Flags: needinfo?(wcosta)
Assignee | ||
Comment 16•6 years ago
|
||
I suppose that might have been affected by bug 1491948. Will re-test when trees re-open.
Flags: needinfo?(wcosta)
Assignee | ||
Comment 17•6 years ago
|
||
Bug 1492553 is a complication -- some tests are currently perma-fail on mozilla-central. I hadn't realized that before...but I don't think it affects comment 15.
Assignee | ||
Comment 18•6 years ago
|
||
gecko-t-linux-4 is not fast enough and we see frequent task retries when the emulator fails to start:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&searchStr=android-em-7&revision=0ce5b4e132d2c19c6963919ac65e031399cf2318
gecko-t-linux-8 eliminates most task retries and tests complete, but take more than twice as long to complete as on packet.net:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&searchStr=android-em-7&revision=dfcaf35b6d2d3b2e4d3125a81fe31daab36f2326
gecko-t-linux-16 shows no significant improvement over gecko-t-linux-8:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&searchStr=android-em-7&revision=a6fed1047e0eee01b4cb3192c171bba377486833
gecko-t-linux-32 shows no significant improvement over gecko-t-linux-8:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&searchStr=android-em-7&revision=443111c064cba983cd9e9c1dadbd8154a9456716
For these configurations, it looks like gecko-t-linux-8 is the best we can do; tests pass, but run 2x to 3x as long as they currently do on packet.net -- disappointing.
(I'm only comparing a couple of mochitests and geckoview-junit here, due to bug 1492553.)
Updated•6 years ago
|
Whiteboard: [geckoview:p2]
Comment 19•6 years ago
|
||
to follow up here are we not interested in using GCP for geckoview x86 based tests due to the longer runtime compared to packet.net, or are we ok with that?
Assignee | ||
Comment 20•6 years ago
|
||
The plan is to continue to use packet.net for geckoview x86 tests unless we can improve performance on gcp. :wcosta is investigating to see if performance improvements on gcp are possible.
Comment 21•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #20)
> The plan is to continue to use packet.net for geckoview x86 tests unless we
> can improve performance on gcp. :wcosta is investigating to see if
> performance improvements on gcp are possible.
Wander: can you comment on what GCP permutations you've tried so far? (standard vs highmem vs highcpu vs custom vs ...)
From our discussion this morning, you haven't found a winning combination of instance types/specs yet. If we're going to come back to this in the future, we should keep a record of what we've already tried.
Updated•6 years ago
|
Flags: needinfo?(wcosta)
Comment 22•6 years ago
|
||
GCP has a different approach, all these different machine types only change the relation CPU/memory, they differ only on how much memory you have, on average, per core. Therefore, for the matter of the speed test, "standard" was all we needed. Given that, n1-standard-8 seems to be the winner, I didn't check in terms of cost.
Also, I spotted we were using slow disks, changing it to SSD improved IO, but only had limited impact on general performance.
The tests I performed was with a focus on the Android x86 emulator.
Flags: needinfo?(wcosta)
Assignee | ||
Comment 23•6 years ago
|
||
We will keep using packet.net for Android x86 emulator tests. gcp is an option, but doesn't provide the same performance.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•