Closed Bug 1489263 Opened 6 years ago Closed 6 years ago

Investigate running aws-based Android emulator unit tests on GCP

Categories

(Testing :: General, enhancement, P1)

Version 3
enhancement

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

(Whiteboard: [geckoview:p2])

Building on wcosta's foundation, let's see how well Android 4.2/4.3 tests run on GCP.
As seen in comment 1, aws-based Android emulator tests ("Android 4.2 x86 opt" and "Android 4.3 API16+ opt") generally work on GCP, but there are some minor problems. The emulator starts without trouble and most tests run fine. There are a few extra test failures, which might be avoided with test manifest updates, or by increasing timeouts. There are also some task timeouts, and it appears that almost all of the tasks run slower than on aws. Some tasks appear to require 2x the time we normally see on aws. On aws, Android emulator tests normally run on gecko-t-linux-xlarge (c3.xlarge and m3.xlarge). The gce/gecko-t-linux tasks seem to find lots of memory but slower cpu; is there an option on gcp for a bit more cpu speed?
Flags: needinfo?(wcosta)
Priority: -- → P1
I have enabled nested-vm in the instances, could you please run the tests again and see how it goes?
Flags: needinfo?(wcosta) → needinfo?(gbrown)
The nested-vm change made no apparent difference for the aws-based Android emulator (armv7) tests (but was great for x86 - bug 1489264). I also re-tested with gecko-t-linux-{2,4,8,16} -- https://bugzilla.mozilla.org/show_bug.cgi?id=1489264#c14 -- but these did not provide sufficient performance for the armv7 tests: Everything takes at least 2x as long, tests and tasks time out. https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=fc7eff5c917b5150e47a7feb143ab81c7adcd7ee
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #4) > I also re-tested with gecko-t-linux-{2,4,8,16} -- > https://bugzilla.mozilla.org/show_bug.cgi?id=1489264#c14 -- but these did > not provide sufficient performance for the armv7 tests: Everything takes at > least 2x as long, tests and tasks time out. Sorry, that's not accurate. gecko-t-linux-2 and gecko-t-linux-4 are insufficient. Preliminary results suggest gecko-t-linux-8 and gecko-t-linux-16 provide similar performance to our existing aws configuration. I'm waiting for tests to complete...will update tomorrow.
(In reply to Geoff Brown [:gbrown] from comment #5) > (In reply to Geoff Brown [:gbrown] from comment #4) > > I also re-tested with gecko-t-linux-{2,4,8,16} -- > > https://bugzilla.mozilla.org/show_bug.cgi?id=1489264#c14 -- but these did > > not provide sufficient performance for the armv7 tests: Everything takes at > > least 2x as long, tests and tasks time out. > > Sorry, that's not accurate. > > gecko-t-linux-2 and gecko-t-linux-4 are insufficient. Preliminary results > suggest gecko-t-linux-8 and gecko-t-linux-16 provide similar performance to > our existing aws configuration. I'm waiting for tests to complete...will > update tomorrow. I added the gecko-t-linux-{32, 64} worker types, as we still didn't saturate performance regarding the number of CPUs. The arm performance is expected, as the native machines run on x86 hardware. For performance boost on that, I believe we will have to stick with packet.
I believe :gbrown is comparing the performance of arm emulation on AWS vs arm emulation on GCP and noting that GCP is 2x slower than AWS. All we run in packet.net is x86 emulators to get coverage on geckoview which luckily enough is so fast it is like running on a desktop.
(In reply to Wander Lairson Costa [:wcosta] from comment #6) > I added the gecko-t-linux-{32, 64} worker types, as we still didn't saturate > performance regarding the number of CPUs. The arm performance is expected, > as the native machines run on x86 hardware. For performance boost on that, I > believe we will have to stick with packet. Are there other variables we can tweak here? Do we understand why packet.net is so fast comparatively?
:coop, this bug is about comparing arm7 emulators which currently run on AWS vs running them on GCP.
I have more comprehensive test results now and things are making sense. As reported earlier, on gecko-t-linux-2 and gecko-t-linux-4, Android armv7 firefox tests run significantly slower than on the current aws xlarge instances, resulting in test and task failures. On gecko-t-linux-8, Android armv7 firefox tests run in about the same time and with very similar test results to aws xlarge instances. https://treeherder.mozilla.org/#/jobs?repo=try&revision=dfcaf35b6d2d3b2e4d3125a81fe31daab36f2326 On gecko-t-linux-16, Android armv7 firefox tests run in about the same time and with very similar test results to aws xlarge instances. There is very little (possibly no) improvement with gecko-t-linux-16 over gecko-t-linux-8: It looks like ***gecko-t-linux-8*** is the sensible choice. https://treeherder.mozilla.org/#/jobs?repo=try&revision=a6fed1047e0eee01b4cb3192c171bba377486833 Since gecko-t-linux-16 showed no improvement, I won't try gecko-t-linux-{32, 64} (very sorry for the misunderstanding Wander -- I certainly didn't mean to waste your time!) Can someone comment on the expected cost comparison of gcp/gecko-t-linux-8 vs aws/xlarge? Are we on the right track here? wcosta, can you verify: are all of these configurations running 4 containers per instance? I will comment on the x86 emulator/geckoview/packet.net comparison in bug 1489264.
(In reply to Geoff Brown [:gbrown] from comment #10) > > Since gecko-t-linux-16 showed no improvement, I won't try gecko-t-linux-{32, > 64} (very sorry for the misunderstanding Wander -- I certainly didn't mean > to waste your time!) No problem, it didn't take more than a couple of minutes to set it up. > > Can someone comment on the expected cost comparison of gcp/gecko-t-linux-8 > vs aws/xlarge? Are we on the right track here? > > wcosta, can you verify: are all of these configurations running 4 containers > per instance? > Yep. > > I will comment on the x86 emulator/geckoview/packet.net comparison in bug > 1489264.
Whiteboard: [geckoview:p2]
is there anymore work to do here? The bug is title has investigate, it looks as if that is done. Possibly there is another round discussed outside of the bug?
I'm not actively working on this. I think it is basically complete. One last issue: I would like to understand the cost comparison, at least in approximate terms. :coop - Do you know, if we run 4 containers per gecko-t-linux-8 on gcp and those tests run in about the same time as they do on aws gecko-t-linux-xlarge (c3.xlarge or m3.xlarge), how do the $ costs compare?
Flags: needinfo?(coop)
We don't have much/any experience with GCP billing yet. Ill dig up the baseline numbers for AWS instance types so we can do the comparison as the billing numbers from GCP become available.
Flags: needinfo?(coop)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.