Closed Bug 1549555 (task-efficiency-test-overhead) Opened 6 years ago Closed 3 years ago

[meta] Test overhead reduction

Categories

(Testing :: General, task, P3)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: egao, Unassigned)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

(Keywords: meta)

This meta-bug deals specifically with the test overhead side of things for test/task efficiencies project. For more information on the overall project, please refer to the parent bug.

3. Reduce task overhead
 - Can some tasks be combined?
 - Improve setup/teardown efficiency

All bugs that block this bug will be related to the overhead work.

Ideas that have not been fully formulated will be saved in this top-level comment.

Ideas

virtualenv
what if we pre-package a virtualenv? Lots of time is spent creating the virtualenv, installing dependencies, and downloading necessary files.

quick check with https://taskcluster-artifacts.net/L9mCyDr5RI2tnohPimuVGA/0/public/logs/live_backing.log

  • begin time: 17:32:31
  • end time: 17:33:57
  • time spent creating virtualenv: 0:01:26
  1. what if most (if not all) of the pip packages were preinstalled, stored somewhere, and downloaded to the machine? It would have to be faster than downloading individual packages and installing.

pip
what can we do to speed up pip installation process? This route may be feasible over re-packaging virtualenv, as it would require only incremental changes to the current system (not an overhaul).

quick check with https://taskcluster-artifacts.net/L9mCyDr5RI2tnohPimuVGA/0/public/logs/live_backing.log

  • each pip install takes anywhere between 1 to 15 seconds (longer ones install multiple dependencies)
  • there are numerous pip commands
  1. what if we parallelize pip installation process? is this feasible?
  • yes, this is feasible. Looking at https://stackoverflow.com/questions/11021130/parallel-pip-install, this takes the approach of downloading each package in parallel fashion, then installing in a sequential manner. This avoids read-write concurrency, but is faster than sequential download and installation. I have done a comparison:
normal method concurrent download, sequential install concurrent download, concurrent install
real 0m6.392s real 0m3.831s real 0m5.679s
user 0m3.151s user 0m7.389s user 0m3.080s
sys 0m1.971s sys 0m2.357s sys 0m1.213s

concurrent download, sequential install PoC: time /bin/sh -c 'cat requirements.txt | xargs -t -n1 -P9 pip install -q download dist && pip install --no-index --find-links=dist -r requirements.txt'
concurrent download, concurrent install PoC: time xargs -P 8 pip install --no-cache-dir < requirements.txt
normal method: time pip install -r requirements.txt

Tests were conducted on Macbook Pro, 3.5GHz Core i7, 16GB DDR3.

Contents of example requirements.txt:

six
pytest
pyyaml
simplejson
websocket-client
pexpect
nose
requests
  1. what if we install some of the common dependencies of various packages first? perhaps this will cut down time later on.

android reftest startup
something with the Android reftest startup is consuming almost a minute, for uncertain reasons:

[task 2019-04-29T18:14:51.602Z] 18:14:51     INFO -  REFTEST INFO | Running with e10s: False
[task 2019-04-29T18:14:52.112Z] 18:14:52     INFO -  launch_application: am start -W -n org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp -a android.intent.action.VIEW --es env8 R_LOG_LEVEL=6 --es args "-no-remote -profile /sdcard/tests/reftest/profile/" --es env3 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env2 R_LOG_VERBOSE=1 --es env1 XPCOM_DEBUG_BREAK=stack --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_CRASHREPORTER_NO_REPORT=1 --es env6 R_LOG_DESTINATION=stderr --es env5 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env4 MOZ_IN_AUTOMATION=1
[task 2019-04-29T18:15:03.542Z] 18:15:03     INFO -  remoteautomation.py | Application pid: 821
[task 2019-04-29T18:15:36.900Z] 18:15:36     INFO -  REFTEST INFO | Reading manifest http://10.0.2.2:8854/tests/testing/crashtest/crashtests.list
[task 2019-04-29T18:15:36.901Z] 18:15:36     INFO -  REFTEST INFO | Dumping JSON representation of sandbox
[task 2019-04-29T18:15:36.901Z] 18:15:36     INFO -  REFTEST INFO | {"isDebugBuild":false,"xulRuntime":{"widgetToolkit":"android","OS":"Android","XPCOMABI":"arm-eabi-gcc3"},"smallScreen":false,"d2d":false,"dwrite":false,"gpuProcess":false,"azureCairo":false,"azureSkia":true,"skiaContent":true,"azureSkiaGL":false,"contentSameGfxBackendAsCanvas":true,"layersGPUAccelerated":true,"d3d11":false,"d3d9":false,"layersOpenGL":true,"webrender":false,"layersOMTC":true,"advancedLayers":false,"layerChecksEnabled":true,"retainedDisplayList":true,"Android":true,"cocoaWidget":false,"gtkWidget":false,"qtWidget":false,"winWidget":false,"is64Bit":false,"transparentScrollbars":false,"AndroidVersion":18,"AddressSanitizer":false,"webrtc":true,"retainedDisplayLists":true,"compareRetainedDisplayLists":false,"skiaPdf":false,"release_or_beta":false,"http":{"userAgent":"Mozilla/5.0 (Android 4.3.1; Mobile; rv:68.0) Gecko/68.0 Firefox/68.0","appName":"Mozilla","appVersion":"5.0","platform":"Android 4.3.1","oscpu":"Linux armv7l","misc":"rv:68.0"},"haveTestPlugin":false,"windowsDefaultTheme":false,"nativeThemePref":false,"gpuProcessForceEnabled":false,"prefs":{},"browserIsRemote":false,"asyncPan":true,"usesRepeatResampling":false,"verify":false,"serviceWorkerE10s":false}
[task 2019-04-29T18:16:08.250Z] 18:16:08     INFO -  REFTEST INFO | Running chunk 1 out of 4 chunks.  tests 1-941/3725
  1. why is remoteautomation.py pid detection taking 30 seconds?
  2. why is dumping of the sandbox configuration also taking approximately 30 seconds?

android mozharness download
something is causing mozharness on Android tests to take a long time to download.

quick check with https://taskcluster-artifacts.net/dRr86h91TmqDtd_NFVu-kQ/0/public/logs/live_backing.log

[task 2019-04-29T18:10:01.624Z] + curl --fail -o mozharness.zip --retry 10 -L https://queue.taskcluster.net/v1/task/ZRyfMlZRQ0mGVPbVaLnA6w/artifacts/public/build/mozharness.zip
[task 2019-04-29T18:10:01.630Z]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[task 2019-04-29T18:10:01.630Z]                                  Dload  Upload   Total   Spent    Left  Speed
[task 2019-04-29T18:10:01.630Z] 
[task 2019-04-29T18:10:02.454Z]   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
[task 2019-04-29T18:10:02.454Z] 100   222  100   222    0     0    269      0 --:--:-- --:--:-- --:--:--   269
[task 2019-04-29T18:10:02.454Z] 100   222  100   222    0     0    269      0 --:--:-- --:--:-- --:--:--   269
[task 2019-04-29T18:10:03.815Z] 
[task 2019-04-29T18:10:04.816Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
[task 2019-04-29T18:10:05.817Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
[task 2019-04-29T18:10:06.819Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
[task 2019-04-29T18:10:07.820Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
[task 2019-04-29T18:10:08.821Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0
[task 2019-04-29T18:10:09.822Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:07 --:--:--     0
[task 2019-04-29T18:10:10.823Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0
[task 2019-04-29T18:10:11.824Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0
[task 2019-04-29T18:10:12.826Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0
[task 2019-04-29T18:10:13.827Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--     0
[task 2019-04-29T18:10:14.828Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0
[task 2019-04-29T18:10:15.829Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:13 --:--:--     0
[task 2019-04-29T18:10:16.830Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:14 --:--:--     0
[task 2019-04-29T18:10:17.018Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:15 --:--:--     0
[task 2019-04-29T18:10:17.018Z] 100   241  100   241    0     0     15      0  0:00:16  0:00:15  0:00:01    57
[task 2019-04-29T18:10:17.138Z] 
[task 2019-04-29T18:10:17.138Z] 100 2281k  100 2281k    0     0   147k      0  0:00:15  0:00:15 --:--:--  147k

Download speed appears to be very slow at ~147kBps.

test-linux compiz issue
an issue with compiz complaining about stacking windows that it does not own consumes a lot of precious time.

quick analysis of https://taskcluster-artifacts.net/LDowPchDTJCNcHNUkp8kxw/0/public/logs/live_backing.log

[task 2019-04-29T17:56:07.297Z] 17:56:07     INFO - Fetch https://queue.taskcluster.net/v1/task/Hr49YPvqT82IbHAV--2CYw/artifacts/public/build/target.common.tests.tar.gz into memory
[task 2019-04-29T17:56:08.573Z] 17:56:08     INFO - Content-Length response header: 39769688
[task 2019-04-29T17:56:08.574Z] 17:56:08     INFO - Bytes received: 39769688
[task 2019-04-29T17:56:09.627Z] compiz (core) - Warn: Attempted to restack relative to 0x1400006 which is not a child of the root window or a window compiz owns
[task 2019-04-29T17:56:21.743Z] 17:56:21     INFO - Downloading and extracting to /builds/worker/workspace/build/tests these dirs * from https://queue.taskcluster.net/v1/task/Hr49YPvqT82IbHAV--2CYw/artifacts/public/build/target.mochitest.tests.tar.gz
[task 2019-04-29T17:56:21.743Z] 17:56:21     INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/Hr49YPvqT82IbHAV--2CYw/artifacts/public/build/target.mochitest.tests.tar.gz'}, attempt #1
[task 2019-04-29T17:56:21.744Z] 17:56:21     INFO - Fetch https://queue.taskcluster.net/v1/task/Hr49YPvqT82IbHAV--2CYw/artifacts/public/build/target.mochitest.tests.tar.gz into memory

Amount of time not spent executing tasks range appears to vary from platform to platform.

further chunk reductions
other tests that have not yet been targeted for a chunk revisit are:

  • reftests
  • mochitest-chrome

extraction of mozinfo.json on windows7-32 platforms
for some reason, the download and extraction of mozinfo.json from the web-platform-tests archive takes minutes on windows7-32:

18:00:52     INFO - Reading from file Z:\task_1556559288\build\target.test_packages.json
18:00:52     INFO - Downloading packages: [u'target.web-platform.tests.tar.gz', u'target.common.tests.tar.gz'] for test suite categories: ['web-platform']
18:00:52     INFO - Downloading and extracting to Z:\task_1556559288\build\tests these dirs mach, bin/*, config/*, mozbase/*, marionette/*, tools/*, web-platform/*, mozpack/*, mozbuild/* from https://queue.taskcluster.net/v1/task/PSe05NHCSmSUzDB0cuSHsQ/artifacts/public/build/target.web-platform.tests.tar.gz
18:00:52     INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/PSe05NHCSmSUzDB0cuSHsQ/artifacts/public/build/target.web-platform.tests.tar.gz'}, attempt #1
18:00:52     INFO - Fetch https://queue.taskcluster.net/v1/task/PSe05NHCSmSUzDB0cuSHsQ/artifacts/public/build/target.web-platform.tests.tar.gz into memory
18:01:24     INFO - Content-Length response header: 54812554
18:01:24     INFO - Bytes received: 54812554
18:06:22     INFO - Adding 'mozinfo.json' for extraction from common.tests archive
18:06:22     INFO - Downloading and extracting to Z:\task_1556559288\build\tests these dirs mach, bin/*, config/*, mozbase/*, marionette/*, tools/*, web-platform/*, mozpack/*, mozbuild/*, mozinfo.json from https://queue.taskcluster.net/v1/task/PSe05NHCSmSUzDB0cuSHsQ/artifacts/public/build/target.common.tests.tar.gz

similar issue is not observed on windows10-64 platforms. However, for both platforms the extraction time is consistent:

  • windows7: approx. 5 minutes
  • windows10: approx. 30 seconds

from comment 0:

virtualenv- seems hard, but will yield wins on ALL test jobs, I think worth investigating
pip - similar to virtualenv, maybe we don't need to do this if we can download a pre-packaged virtualenv

android-reftest-startup- when we are running in fewer chunks on android 7.0, this will have less impact, I vote for holding off on this unless we see similar issues on other platforms

android mozharness download- I suspect mozharness.zip doesn't change much, is this a download at bitbar or emulators- either way we should ensure the artifact download is the same speed on all platforms; maybe we need file caching at bitbar or packet.net- probably too hard to work on, but more investigation would help quantify the problem.

compiz window stacking - this could be another 15 second win for 50%+ of the jobs we run, I vote for working on this as well.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #1)

android-reftest-startup- when we are running in fewer chunks on android 7.0, this will have less impact, I vote for holding off on this unless we see similar issues on other platforms

More generally, I wouldn't worry about any optimizations for Android 4.3, since that platform has a limited life span now.

Also, be wary of timestamps in Android test logs: keep in mind that the test harness periodically polls the device and dumps results. Cross-reference with the logcat timestamps to see what is happening in real-time on the device.

Simon did some digging into the Android startup, and discovered that the manifest parsing/chunking happens inside the emulator. This is particularly slow in debug builds.

Flags: needinfo?(sfraser)

re: pip

in some cases we're running pip twice for the same requirements file. the first time with dependency resolution disabled, and the second time with it enabled. we also have multiple requirements.txt files that each get processed.

I wonder if we could combine the requirements file and do a single pip invocation that installs all of our requirements?

one other idea we had was to move away from .tar.bz2 to .tar.zst or .tar.xz. Either is significantly faster than .tar.bz2 to decompress.

This needs more testing, but it may be possible to avoid the functools install:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=1ed90dd67e94ca57378cfbda129110253e3e5581

Lots of great ideas - thanks!

(In reply to Chris AtLee [:catlee] from comment #4)

Simon did some digging into the Android startup, and discovered that the manifest parsing/chunking happens inside the emulator. This is particularly slow in debug builds.

The main culprits here, and their average time spent generating their chunk

Task Name Task Count Mean Delay
test-android-em-4.3-arm7-api-16/debug-jsreftest 100 0 days 00:18:07.000000000
test-android-em-4.3-arm7-api-16/debug-reftest 56 0 days 00:09:17.000000000
test-android-em-4.3-arm7-api-16/opt-jsreftest 40 0 days 00:05:32.000000000
test-android-em-4.3-arm7-api-16/opt-reftest 28 0 days 00:02:49.000000000
Flags: needinfo?(sfraser)

(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #9)

(In reply to Chris AtLee [:catlee] from comment #4)

Simon did some digging into the Android startup, and discovered that the manifest parsing/chunking happens inside the emulator. This is particularly slow in debug builds.

The main culprits here, and their average time spent generating their chunk

Task Name Task Count Mean Delay
test-android-em-4.3-arm7-api-16/debug-jsreftest 100 0 days 00:18:07.000000000
test-android-em-4.3-arm7-api-16/debug-reftest 56 0 days 00:09:17.000000000
test-android-em-4.3-arm7-api-16/opt-jsreftest 40 0 days 00:05:32.000000000
test-android-em-4.3-arm7-api-16/opt-reftest 28 0 days 00:02:49.000000000

Would these no longer be a concern once we remove Android 4.3 emulator tests? Or does the delay in manifest parsing also occur on newer, hardware based tests as well?

Right, they would no longer be a concern. On Android x86 7.0, reftest manifest parsing is fast (5 to 10 seconds). And we don't usually run reftests on hardware.

unrelated to "overhead", web-platform-tests have some tests that are |expected: TIMEOUT|. I often see these as 3 minute timeouts- could we not skip these or put them in a different job that runs once/day on limited platforms? I see 460 .ini files with TIMEOUT in it, that would be a lot of runtime saved given that we run wpt on just about every config.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #12)

unrelated to "overhead", web-platform-tests have some tests that are |expected: TIMEOUT|. I often see these as 3 minute timeouts- could we not skip these or put them in a different job that runs once/day on limited platforms? I see 460 .ini files with TIMEOUT in it, that would be a lot of runtime saved given that we run wpt on just about every config.

This would be addressed under the item Reduce task timeouts for this project. I'll go ahead and create a meta bug for that now.

one random idea for pip/virtualenv might be to use virtualenv-clone or look for similar tools.

Some proof of concept work for virtualenv can be seen in this push: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=db724848d1564eeafb1b1b07e1bcd2310c376fd9

I have two avenues to explore.

Approach 1

When the docker containers that run the tests (eg. desktop1604-test) is being generated, have a step to:

  • create the virtualenv
  • install every possible dependency required by all suites of tests

I would envision it slotting in after node is installed, here: https://searchfox.org/mozilla-central/source/taskcluster/docker/desktop1604-test/Dockerfile#35

The idea is that when the tests pull down the docker container and do further setup for the test, the virtualenv is already present and merely needs to be invoked.

Pros

  • eliminate the approximately 1:30 spent on setting up virtual environment per chunk (linux64/opt figures, typical test)
  • reduce the frequency of virtualenv generation (currently runs per chunk, this will eliminate that)

Cons

  • if generated docker container archive grows much larger, increased bandwidth requirement will reduce savings
  • depending on how often docker test images are regenerated, may not offer tangible savings

Approach 2

Run a separate task at specified frequency to:

  • create a virtualenv
  • archive into a tarball
  • upload to storage

This would be run at weekly frequency (example), and test harness will be rewritten to simply download and extract the virtualenv.

Pros

  • can customize the virtualenv to the specific test suite which reduces transfer size and time
  • can be run even less frequently than docker image regeneration, at much less overhead

Cons

  • would involve modifying a critical portion of mozharness and/or test environment setup code
  • need to work out a storage solution for the archived virtualenv
  • each test will require network transfer, which (depending on the size of compressed virtualenv) may reduce savings
Alias: task-efficiency-test-overhead

Could we do something like your "Approach 2", except make them like toolchain tasks? The decision task would look at the relevant files in-tree to figure out if the virtualenv archives need to be rebuilt.

(In reply to Chris AtLee [:catlee] from comment #16)

Could we do something like your "Approach 2", except make them like toolchain tasks? The decision task would look at the relevant files in-tree to figure out if the virtualenv archives need to be rebuilt.

:catlee - thanks for feedback - interesting that you prefer the second approach. I was personally leaning towards the first approach; I rationalized it thinking it would eliminate the bandwidth and time requirement for each chunk to download the generated virtualenv. Not to mention that it would be much easier to implement.

To implement the second approach, my experience tinkering and adding new tasks is limited; is it possible to run an arbitrary script (eg. create virtualenv) using an exiting docker image, and how would I interface with whatever is used to store the compressed archives? Those are the two areas of challenge for me to implement the change.

Flags: needinfo?(catlee)

I think the first approach is nice, but it works only for tests that run in docker? That wouldn't help us with macos or windows tests.

Flags: needinfo?(catlee)

Right - it escaped my attention that macosx and windows tests do not run in a container.

I will create a new bug to continue investigation of this task. It would be nice to have this done and save ourselves quite a bit of money and time.

Assignee: nobody → egao
Depends on: 1644875

The bug assignee didn't login in Bugzilla in the last 7 months.
:ahal, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: egao → nobody
Flags: needinfo?(ahal)

Lot's of stuff got fixed here, going to call it done. No one's looking at this specifically anymore so tracking bug probably isn't necessary.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(ahal)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.