1549555 - (task-efficiency-test-overhead) [meta] Test overhead reduction

Edwin Takahashi (:egao | infrequent contributor)

Reporter

Description

•

6 years ago

This meta-bug deals specifically with the test overhead side of things for test/task efficiencies project. For more information on the overall project, please refer to the parent bug.

3. Reduce task overhead
 - Can some tasks be combined?
 - Improve setup/teardown efficiency

All bugs that block this bug will be related to the overhead work.

Ideas that have not been fully formulated will be saved in this top-level comment.

Ideas

virtualenv
what if we pre-package a virtualenv? Lots of time is spent creating the virtualenv, installing dependencies, and downloading necessary files.

quick check with https://taskcluster-artifacts.net/L9mCyDr5RI2tnohPimuVGA/0/public/logs/live_backing.log

begin time: 17:32:31
end time: 17:33:57
time spent creating virtualenv: 0:01:26

what if most (if not all) of the pip packages were preinstalled, stored somewhere, and downloaded to the machine? It would have to be faster than downloading individual packages and installing.

pip
what can we do to speed up pip installation process? This route may be feasible over re-packaging virtualenv, as it would require only incremental changes to the current system (not an overhaul).

quick check with https://taskcluster-artifacts.net/L9mCyDr5RI2tnohPimuVGA/0/public/logs/live_backing.log

each pip install takes anywhere between 1 to 15 seconds (longer ones install multiple dependencies)
there are numerous pip commands

what if we parallelize pip installation process? is this feasible?

yes, this is feasible. Looking at https://stackoverflow.com/questions/11021130/parallel-pip-install, this takes the approach of downloading each package in parallel fashion, then installing in a sequential manner. This avoids read-write concurrency, but is faster than sequential download and installation. I have done a comparison:

normal method	concurrent download, sequential install	concurrent download, concurrent install
real 0m6.392s	real 0m3.831s	real 0m5.679s
user 0m3.151s	user 0m7.389s	user 0m3.080s
sys 0m1.971s	sys 0m2.357s	sys 0m1.213s

concurrent download, sequential install PoC: time /bin/sh -c 'cat requirements.txt | xargs -t -n1 -P9 pip install -q download dist && pip install --no-index --find-links=dist -r requirements.txt'
concurrent download, concurrent install PoC: time xargs -P 8 pip install --no-cache-dir < requirements.txt
normal method: time pip install -r requirements.txt

Tests were conducted on Macbook Pro, 3.5GHz Core i7, 16GB DDR3.

Contents of example requirements.txt:

six
pytest
pyyaml
simplejson
websocket-client
pexpect
nose
requests

what if we install some of the common dependencies of various packages first? perhaps this will cut down time later on.

android reftest startup
something with the Android reftest startup is consuming almost a minute, for uncertain reasons:

[task 2019-04-29T18:14:51.602Z] 18:14:51     INFO -  REFTEST INFO | Running with e10s: False
[task 2019-04-29T18:14:52.112Z] 18:14:52     INFO -  launch_application: am start -W -n org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp -a android.intent.action.VIEW --es env8 R_LOG_LEVEL=6 --es args "-no-remote -profile /sdcard/tests/reftest/profile/" --es env3 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env2 R_LOG_VERBOSE=1 --es env1 XPCOM_DEBUG_BREAK=stack --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_CRASHREPORTER_NO_REPORT=1 --es env6 R_LOG_DESTINATION=stderr --es env5 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env4 MOZ_IN_AUTOMATION=1
[task 2019-04-29T18:15:03.542Z] 18:15:03     INFO -  remoteautomation.py | Application pid: 821
[task 2019-04-29T18:15:36.900Z] 18:15:36     INFO -  REFTEST INFO | Reading manifest http://10.0.2.2:8854/tests/testing/crashtest/crashtests.list
[task 2019-04-29T18:15:36.901Z] 18:15:36     INFO -  REFTEST INFO | Dumping JSON representation of sandbox
[task 2019-04-29T18:15:36.901Z] 18:15:36     INFO -  REFTEST INFO | {"isDebugBuild":false,"xulRuntime":{"widgetToolkit":"android","OS":"Android","XPCOMABI":"arm-eabi-gcc3"},"smallScreen":false,"d2d":false,"dwrite":false,"gpuProcess":false,"azureCairo":false,"azureSkia":true,"skiaContent":true,"azureSkiaGL":false,"contentSameGfxBackendAsCanvas":true,"layersGPUAccelerated":true,"d3d11":false,"d3d9":false,"layersOpenGL":true,"webrender":false,"layersOMTC":true,"advancedLayers":false,"layerChecksEnabled":true,"retainedDisplayList":true,"Android":true,"cocoaWidget":false,"gtkWidget":false,"qtWidget":false,"winWidget":false,"is64Bit":false,"transparentScrollbars":false,"AndroidVersion":18,"AddressSanitizer":false,"webrtc":true,"retainedDisplayLists":true,"compareRetainedDisplayLists":false,"skiaPdf":false,"release_or_beta":false,"http":{"userAgent":"Mozilla/5.0 (Android 4.3.1; Mobile; rv:68.0) Gecko/68.0 Firefox/68.0","appName":"Mozilla","appVersion":"5.0","platform":"Android 4.3.1","oscpu":"Linux armv7l","misc":"rv:68.0"},"haveTestPlugin":false,"windowsDefaultTheme":false,"nativeThemePref":false,"gpuProcessForceEnabled":false,"prefs":{},"browserIsRemote":false,"asyncPan":true,"usesRepeatResampling":false,"verify":false,"serviceWorkerE10s":false}
[task 2019-04-29T18:16:08.250Z] 18:16:08     INFO -  REFTEST INFO | Running chunk 1 out of 4 chunks.  tests 1-941/3725

why is remoteautomation.py pid detection taking 30 seconds?
why is dumping of the sandbox configuration also taking approximately 30 seconds?

android mozharness download
something is causing mozharness on Android tests to take a long time to download.

quick check with https://taskcluster-artifacts.net/dRr86h91TmqDtd_NFVu-kQ/0/public/logs/live_backing.log

[task 2019-04-29T18:10:01.624Z] + curl --fail -o mozharness.zip --retry 10 -L https://queue.taskcluster.net/v1/task/ZRyfMlZRQ0mGVPbVaLnA6w/artifacts/public/build/mozharness.zip
[task 2019-04-29T18:10:01.630Z]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[task 2019-04-29T18:10:01.630Z]                                  Dload  Upload   Total   Spent    Left  Speed
[task 2019-04-29T18:10:01.630Z] 
[task 2019-04-29T18:10:02.454Z]   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
[task 2019-04-29T18:10:02.454Z] 100   222  100   222    0     0    269      0 --:--:-- --:--:-- --:--:--   269
[task 2019-04-29T18:10:02.454Z] 100   222  100   222    0     0    269      0 --:--:-- --:--:-- --:--:--   269
[task 2019-04-29T18:10:03.815Z] 
[task 2019-04-29T18:10:04.816Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
[task 2019-04-29T18:10:05.817Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
[task 2019-04-29T18:10:06.819Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
[task 2019-04-29T18:10:07.820Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
[task 2019-04-29T18:10:08.821Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0
[task 2019-04-29T18:10:09.822Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:07 --:--:--     0
[task 2019-04-29T18:10:10.823Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0
[task 2019-04-29T18:10:11.824Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0
[task 2019-04-29T18:10:12.826Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0
[task 2019-04-29T18:10:13.827Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--     0
[task 2019-04-29T18:10:14.828Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0
[task 2019-04-29T18:10:15.829Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:13 --:--:--     0
[task 2019-04-29T18:10:16.830Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:14 --:--:--     0
[task 2019-04-29T18:10:17.018Z]   0     0    0     0    0     0      0      0 --:--:--  0:00:15 --:--:--     0
[task 2019-04-29T18:10:17.018Z] 100   241  100   241    0     0     15      0  0:00:16  0:00:15  0:00:01    57
[task 2019-04-29T18:10:17.138Z] 
[task 2019-04-29T18:10:17.138Z] 100 2281k  100 2281k    0     0   147k      0  0:00:15  0:00:15 --:--:--  147k

Download speed appears to be very slow at ~147kBps.

test-linux compiz issue
an issue with compiz complaining about stacking windows that it does not own consumes a lot of precious time.

quick analysis of https://taskcluster-artifacts.net/LDowPchDTJCNcHNUkp8kxw/0/public/logs/live_backing.log

[task 2019-04-29T17:56:07.297Z] 17:56:07     INFO - Fetch https://queue.taskcluster.net/v1/task/Hr49YPvqT82IbHAV--2CYw/artifacts/public/build/target.common.tests.tar.gz into memory
[task 2019-04-29T17:56:08.573Z] 17:56:08     INFO - Content-Length response header: 39769688
[task 2019-04-29T17:56:08.574Z] 17:56:08     INFO - Bytes received: 39769688
[task 2019-04-29T17:56:09.627Z] compiz (core) - Warn: Attempted to restack relative to 0x1400006 which is not a child of the root window or a window compiz owns
[task 2019-04-29T17:56:21.743Z] 17:56:21     INFO - Downloading and extracting to /builds/worker/workspace/build/tests these dirs * from https://queue.taskcluster.net/v1/task/Hr49YPvqT82IbHAV--2CYw/artifacts/public/build/target.mochitest.tests.tar.gz
[task 2019-04-29T17:56:21.743Z] 17:56:21     INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/Hr49YPvqT82IbHAV--2CYw/artifacts/public/build/target.mochitest.tests.tar.gz'}, attempt #1
[task 2019-04-29T17:56:21.744Z] 17:56:21     INFO - Fetch https://queue.taskcluster.net/v1/task/Hr49YPvqT82IbHAV--2CYw/artifacts/public/build/target.mochitest.tests.tar.gz into memory

Amount of time not spent executing tasks range appears to vary from platform to platform.

further chunk reductions
other tests that have not yet been targeted for a chunk revisit are:

reftests
mochitest-chrome

extraction of mozinfo.json on windows7-32 platforms
for some reason, the download and extraction of mozinfo.json from the web-platform-tests archive takes minutes on windows7-32:

18:00:52     INFO - Reading from file Z:\task_1556559288\build\target.test_packages.json
18:00:52     INFO - Downloading packages: [u'target.web-platform.tests.tar.gz', u'target.common.tests.tar.gz'] for test suite categories: ['web-platform']
18:00:52     INFO - Downloading and extracting to Z:\task_1556559288\build\tests these dirs mach, bin/*, config/*, mozbase/*, marionette/*, tools/*, web-platform/*, mozpack/*, mozbuild/* from https://queue.taskcluster.net/v1/task/PSe05NHCSmSUzDB0cuSHsQ/artifacts/public/build/target.web-platform.tests.tar.gz
18:00:52     INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/PSe05NHCSmSUzDB0cuSHsQ/artifacts/public/build/target.web-platform.tests.tar.gz'}, attempt #1
18:00:52     INFO - Fetch https://queue.taskcluster.net/v1/task/PSe05NHCSmSUzDB0cuSHsQ/artifacts/public/build/target.web-platform.tests.tar.gz into memory
18:01:24     INFO - Content-Length response header: 54812554
18:01:24     INFO - Bytes received: 54812554
18:06:22     INFO - Adding 'mozinfo.json' for extraction from common.tests archive
18:06:22     INFO - Downloading and extracting to Z:\task_1556559288\build\tests these dirs mach, bin/*, config/*, mozbase/*, marionette/*, tools/*, web-platform/*, mozpack/*, mozbuild/*, mozinfo.json from https://queue.taskcluster.net/v1/task/PSe05NHCSmSUzDB0cuSHsQ/artifacts/public/build/target.common.tests.tar.gz

similar issue is not observed on windows10-64 platforms. However, for both platforms the extraction time is consistent:

windows7: approx. 5 minutes
windows10: approx. 30 seconds

Edwin Takahashi (:egao | infrequent contributor)

Reporter

Task Name	Task Count	Mean Delay
test-android-em-4.3-arm7-api-16/debug-jsreftest	100	0 days 00:18:07.000000000
test-android-em-4.3-arm7-api-16/debug-reftest	56	0 days 00:09:17.000000000
test-android-em-4.3-arm7-api-16/opt-jsreftest	40	0 days 00:05:32.000000000
test-android-em-4.3-arm7-api-16/opt-reftest	28	0 days 00:02:49.000000000

[meta] Test overhead reduction

Ideas

Approach 1

Approach 2