1489264 - Investigate running packet.net-based Android emulator unit tests on GCP

(In reply to Chris Cooper [:coop] pronoun: he from comment #3) > (In reply to Geoff Brown [:gbrown] from comment #2) > > The android x86 emulator does not start because kvm is not available. > > Paging Wander If that's urgent I can look now, if not I will postpone for when I recover.

Flags: needinfo?(wcosta) → needinfo?(coop)

Chris Cooper [:coop] (he/him)

Comment 6

•

6 years ago

(In reply to Wander Lairson Costa [:wcosta] from comment #5) > If that's urgent I can look now, if not I will postpone for when I recover. Redirecting NI to gbrown to see whether this is a blocker.

Flags: needinfo?(coop) → needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Comment 7

•

6 years ago

The geckoview tests currently running on mozilla-central, tier 3: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=android%20x86%207.0&filter-tier=1&filter-tier=2&filter-tier=3 could run on integration branches as tier 1 today, on packet.net, if there was a flexible provisioning solution available (or I suppose, if we just committed to a big pool of packet.net instances). Currently we are paused on committing to packet.net while we investigate gcp; in that sense, this bug is a blocker for getting geckoview tests to tier 1. The geckoview team has been very patient to date, but let's check in with them... :davidb - Can you comment on how important and urgent it is to get these geckoview tests running in continuous integration?

Flags: needinfo?(gbrown) → needinfo?(dbolter)

Geoff Brown [:gbrown]

Assignee

Updated

•

6 years ago

Priority: -- → P1

David Bolter [:davidb] (NeedInfo me for attention)

Comment 8

•

6 years ago

Discussed with Jim and Snorp. This is not super urgent while we have arm coverage - thanks for the ping!

Flags: needinfo?(dbolter)

Chris Cooper [:coop] (he/him)

Updated

•

6 years ago

Depends on: 1490040

Wander Lairson Costa

Comment 9

•

6 years ago

The nested-vm feature is enabled.

Geoff Brown [:gbrown]

Assignee

Comment 10

•

6 years ago

(In reply to Wander Lairson Costa [:wcosta] from comment #9) > The nested-vm feature is enabled. Yes, that helps! https://treeherder.mozilla.org/#/jobs?repo=try&revision=f7ff463dcfe8063d4c335a6c4f9da378dc7ae320&filter-tier=1&filter-tier=2&filter-tier=3 https://treeherder.mozilla.org/logviewer.html#?job_id=198554836&repo=try&lineNumber=849 [task 2018-09-11T01:43:36.807Z] 01:43:36 INFO - Running command: ['ls', '-l', '/dev/kvm'] [task 2018-09-11T01:43:36.807Z] 01:43:36 INFO - Copy/paste: ls -l /dev/kvm [task 2018-09-11T01:43:36.812Z] 01:43:36 INFO - crw-rw-rw- 1 root root 10, 232 Sep 11 01:42 /dev/kvm [task 2018-09-11T01:43:36.812Z] 01:43:36 INFO - Return code: 0 [task 2018-09-11T01:43:36.812Z] 01:43:36 INFO - Running command: ['kvm-ok'] [task 2018-09-11T01:43:36.813Z] 01:43:36 INFO - Copy/paste: kvm-ok [task 2018-09-11T01:43:36.820Z] 01:43:36 INFO - INFO: /dev/kvm exists [task 2018-09-11T01:43:36.820Z] 01:43:36 INFO - KVM acceleration can be used [task 2018-09-11T01:43:36.821Z] 01:43:36 INFO - Return code: 0 [task 2018-09-11T01:43:36.821Z] 01:43:36 INFO - Running command: ['emulator', '-accel-check'] [task 2018-09-11T01:43:36.821Z] 01:43:36 INFO - Copy/paste: emulator -accel-check [task 2018-09-11T01:43:36.837Z] 01:43:36 INFO - accel: [task 2018-09-11T01:43:36.837Z] 01:43:36 INFO - 0 [task 2018-09-11T01:43:36.837Z] 01:43:36 INFO - KVM (version 12) is installed and usable. [task 2018-09-11T01:43:36.837Z] 01:43:36 INFO - accel [task 2018-09-11T01:43:36.838Z] 01:43:36 INFO - Return code: 0 Now the x86 emulator starts and uses kvm - great!!

Geoff Brown [:gbrown]

Assignee

Comment 11

•

6 years ago

*But*...when I try to run a full set of Android x86 tests, I find most of them timeout and the tasks retry or fail: https://treeherder.mozilla.org/#/jobs?repo=try&revision=63e04a2b1573409c9c011ae224b24d0b24a80cc6&filter-tier=1&filter-tier=2&filter-tier=3 It seems that each test task will run successfully if run alone (one task at a time), but fails when ~3 or more such test tasks are running at once. The difference may only be performance: each task runs significantly slower, sufficient to trigger timeouts.

Chris Cooper [:coop] (he/him)

Comment 12

•

6 years ago

(In reply to Geoff Brown [:gbrown] (less available Sept 10-14) from comment #11) > It seems that each test task will run successfully if run alone (one task at > a time), but fails when ~3 or more such test tasks are running at once. The > difference may only be performance: each task runs significantly slower, > sufficient to trigger timeouts. Geoff: in the mtg yesterday, we briefly discussed setting up a matrix of instance configs like we did for packet.net to hone in on the best combination of price/performance. Given your comment, it sounds like you're ready for this, and that we should aim higher spec-wise until we can match the packet.net performance. We can then compare results from GCP vs packet.net directly. Wander: can you get the matrix setup for Geoff? We can rope in other people (Brian, John, ...) as required.

Flags: needinfo?(wcosta)

Wander Lairson Costa

Comment 13

•

6 years ago

I set up a grid with n1-standard-{2,4,8,16} machine types. Each machine type has 4 instances. The worker-types are gce/n1-std-{2,4,8,16}.

Flags: needinfo?(wcosta)

Wander Lairson Costa

Updated

•

6 years ago

Depends on: 1490962

Wander Lairson Costa

Comment 14

•

6 years ago

Update: the worker type were renamed gecko-t-linux-{2,4,8,16}

Geoff Brown [:gbrown]

Assignee

Comment 15

•

6 years ago

First attempt with gecko-t-linux-2 is not working: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=f4793fdf946e8c2d47395c449979d1f14579b422

Flags: needinfo?(wcosta)

Geoff Brown [:gbrown]

Assignee

Comment 16

•

6 years ago

I suppose that might have been affected by bug 1491948. Will re-test when trees re-open.

Flags: needinfo?(wcosta)

Geoff Brown [:gbrown]

Assignee

Comment 17

•

6 years ago

Bug 1492553 is a complication -- some tests are currently perma-fail on mozilla-central. I hadn't realized that before...but I don't think it affects comment 15.

Geoff Brown [:gbrown]

Assignee

Comment 18

•

6 years ago

gecko-t-linux-4 is not fast enough and we see frequent task retries when the emulator fails to start: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&searchStr=android-em-7&revision=0ce5b4e132d2c19c6963919ac65e031399cf2318 gecko-t-linux-8 eliminates most task retries and tests complete, but take more than twice as long to complete as on packet.net: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&searchStr=android-em-7&revision=dfcaf35b6d2d3b2e4d3125a81fe31daab36f2326 gecko-t-linux-16 shows no significant improvement over gecko-t-linux-8: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&searchStr=android-em-7&revision=a6fed1047e0eee01b4cb3192c171bba377486833 gecko-t-linux-32 shows no significant improvement over gecko-t-linux-8: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&searchStr=android-em-7&revision=443111c064cba983cd9e9c1dadbd8154a9456716 For these configurations, it looks like gecko-t-linux-8 is the best we can do; tests pass, but run 2x to 3x as long as they currently do on packet.net -- disappointing. (I'm only comparing a couple of mochitests and geckoview-junit here, due to bug 1492553.)

Chris Peterson [:cpeterson]

Updated

•

6 years ago

Whiteboard: [geckoview:p2]

Joel Maher ( :jmaher ) (UTC -8)

Comment 19

•

6 years ago

to follow up here are we not interested in using GCP for geckoview x86 based tests due to the longer runtime compared to packet.net, or are we ok with that?

Geoff Brown [:gbrown]

Assignee

Comment 20

•

6 years ago

The plan is to continue to use packet.net for geckoview x86 tests unless we can improve performance on gcp. :wcosta is investigating to see if performance improvements on gcp are possible.

Chris Cooper [:coop] (he/him)

Comment 21

•

6 years ago

(In reply to Geoff Brown [:gbrown] from comment #20) > The plan is to continue to use packet.net for geckoview x86 tests unless we > can improve performance on gcp. :wcosta is investigating to see if > performance improvements on gcp are possible. Wander: can you comment on what GCP permutations you've tried so far? (standard vs highmem vs highcpu vs custom vs ...) From our discussion this morning, you haven't found a winning combination of instance types/specs yet. If we're going to come back to this in the future, we should keep a record of what we've already tried.

Chris Cooper [:coop] (he/him)

Updated

•

6 years ago

Flags: needinfo?(wcosta)

Wander Lairson Costa

Comment 22

•

6 years ago

GCP has a different approach, all these different machine types only change the relation CPU/memory, they differ only on how much memory you have, on average, per core. Therefore, for the matter of the speed test, "standard" was all we needed. Given that, n1-standard-8 seems to be the winner, I didn't check in terms of cost. Also, I spotted we were using slow disks, changing it to SSD improved IO, but only had limited impact on general performance. The tests I performed was with a focus on the Android x86 emulator.

Flags: needinfo?(wcosta)

Geoff Brown [:gbrown]

Assignee

Updated

•

6 years ago

Blocks: 1498298

Geoff Brown [:gbrown]

Assignee

Updated

•

6 years ago

No longer blocks: 1498298

Geoff Brown [:gbrown]

Assignee

Updated

•

6 years ago

Blocks: 1425322

Geoff Brown [:gbrown]

Assignee

Comment 23

•

6 years ago

We will keep using packet.net for Android x86 emulator tests. gcp is an option, but doesn't provide the same performance.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WORKSFORME