Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container.
Categories
(Testing :: General, defect, P3)
Tracking
(Not tracked)
People
(Reporter: gbrown, Unassigned, NeedInfo)
References
(Depends on 5 open bugs)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell infra] [see summary at comment 278] [relops-android])
+++ This bug was initially created as a clone of Bug #1204281 +++ There is considerable history in bug 1204281; I have cloned and closed that bug because it was simply too big -- too many comments, too much ancient history. I expect these failures to continue. Task timeouts are the last resort whenever a task is hung or otherwise runs for too long. The immediate goal is to address timeouts which recur in the same task frequently.
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 2•7 years ago
|
||
Bug 1339568 is the single biggest cause of these failures.
Comment hidden (Intermittent Failures Robot) |
Comment 4•7 years ago
|
||
Waiting for updates on bug 1339568, which depends on bug 1357466, which had no progress in the last months.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 7•7 years ago
|
||
Bug 1339568 remains the biggest contributer of failures reported in this bug.
Comment hidden (Intermittent Failures Robot) |
Comment 9•7 years ago
|
||
Update: there are 65 failures for the last 7 days
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 12•7 years ago
|
||
Some tests have been disabled in bug 1339568, reducing mochitest-media-e10s-2 shutdown hangs considerably. I expect to still see some intermittent failures reported here from other causes.
Comment hidden (Intermittent Failures Robot) |
Comment 14•7 years ago
|
||
In the last week there have been 31 total failures. The failures occur on debug and opt build types and on the following platforms: - android-4-3-armv7-api16: 14 - android-api-16-gradle: 5 - Android 4.2 x86: 2 - Linux x64: 2 - linux64-stylo-disabled: 2 - android-4-0-armv7-api16: 1 - Linux: 1 - linux32-stylo-disabled: 1 - linux64-ccov: 1 - linux64-qr: 1 - windows-mingw32-32: 1 Here is a recent log file: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=150857059&lineNumber=9563 And a relevant snippet from it: [task 2017-12-08T21:22:31.523Z] 21:22:31 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/xpcom/build' 9560 [task 2017-12-08T21:22:31.524Z] 21:22:31 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/xpcom/build' 9561 [task 2017-12-08T21:22:31.557Z] 21:22:31 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/xpcom/build' 9562 9563 [taskcluster:error] Task timeout after 7200 seconds. Force killing container. 9564 [taskcluster 2017-12-08 21:22:56.375Z] === Task Finished === 9565 [taskcluster 2017-12-08 21:22:56.383Z] Unsuccessful task run with exit code: -1 completed in 7596.094 seconds :gbrown, any updates on this?
Reporter | ||
Comment 15•7 years ago
|
||
The latest increase in frequency is due to failing Android builds. I cannot tell if that is an on-going problem, or a temporary glitch but will continue to monitor.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 18•7 years ago
|
||
Most recent failures have been Android tests, for a variety of reasons. Some show intermittent download failures/retries -- hopefully a temporary condition. I don't see enough consistency to warrant further investigation right now.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 23•6 years ago
|
||
Last week there were a variety of tasks failing here. The biggest group is linux32/debug jsreftests -- bug 1430668.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 25•6 years ago
|
||
Some recent android opt mochitest failures here - bug 1433560 should help. Also some test-verify failures - bug 1431125.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 30•6 years ago
|
||
Recent failures seem to be mostly bug 1321605 -- probably nothing more I can do there. There may also be some fallout from android mochitest run-by-manifest -- keeping an eye on that.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 35•6 years ago
|
||
There are 83 failures in the past 7 days. Platforms: most of them on android-4-3-armv7-api16 debug/ opt and linux64-jsdcov opt; and a few occurrences on Linux/Linux x64 debug, linux32-stylo-disabled debug, linux64-jsdcov opt, linux64-qr debug and windows10-64-ccov debug. Recent failure log example: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=165586428&lineNumber=5170877
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 38•6 years ago
|
||
(In reply to OrangeFactor Robot from comment #37) > Platform breakdown: > * linux64-jsdcov: 91 > * android-4-3-armv7-api16: 11 The linux64-jsdcov problem is apparent here -- working on that in bug 1442823.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•6 years ago
|
Comment 42•6 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c291e7dfe010 Increase taskcluster max-run-time for Android mochitest-chrome and Android/opt mochitest; r=me,a=test-only
Comment hidden (Intermittent Failures Robot) |
Comment 44•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c291e7dfe010
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 48•6 years ago
|
||
There are 105 failures in the past week. Platforms: - Android 4.2 x86 opt - android-4-0-armv7-api16 opt and debug - android-4-3-armv7-api16 opt and debug - android-5-0-aarch64 opt - Linux opt and debug (many occurrences) - Linux x64 opt, debug and asan (many occurrences) - linux32-stylo-disabled debug - linux64-ccov opt - linux64-jsdcov opt (many occurrences) - linux64-noopt debug - linux64-qr opt and debug - linux64-stylo-disabled opt and debug - windows10-64-ccov debug Recent log failure: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=168753923&lineNumber=1537 Relevant part of the log: [task 2018-03-17T23:01:53.204Z] 23:01:53 INFO - Successfully installed browsermob-proxy-0.6.0 firefox-puppeteer-52.1.0 marionette-driver-2.5.0 marionette-harness-4.3.0 wptserve-1.4.0 [task 2018-03-17T23:01:53.253Z] 23:01:53 INFO - Return code: 0 [task 2018-03-17T23:01:53.255Z] 23:01:53 INFO - Installing None into virtualenv /builds/worker/workspace/build/venv [task 2018-03-17T23:01:53.256Z] 23:01:53 INFO - error resolving pypi.pvt.build.mozilla.org (ignoring): [taskcluster:error] Task timeout after 5400 seconds. Force killing container. [task 2018-03-17T23:01:53.260Z] 23:01:53 INFO - retry: Calling run_command with args: [['/builds/worker/workspace/build/venv/bin/pip', 'install', '--timeout', '120', '-r', '/builds/worker/workspace/build/tests/config/marionette_requirements.txt', '--no-index', '--find-links', 'http://pypi.pub.build.mozilla.org/pub', '--trusted-host', 'pypi.pub.build.mozilla.org']], kwargs: {'error_level': 'warning', 'error_list': [{'substr': 'not found or a compiler error:', 'level': 'warning'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc33d4490>, 'level': 'error'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc3490300>, 'level': 'warning'}, {'regex': <_sre.SRE_Pattern object at 0x158aa80>, 'level': 'debug'}, {'substr': 'command not found', 'level': 'error'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc488fbb8>, 'level': 'warning'}, {'substr': 'Traceback (most recent call last)', 'level': 'error'}, {'substr': 'SyntaxError: ', 'level': 'error'}, {'substr': 'TypeError: ', 'level': 'error'}, {'substr': 'NameError: ', 'level': 'error'}, {'substr': 'ZeroDivisionError: ', 'level': 'error'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc3493188>, 'level': 'critical'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc48a7b28>, 'level': 'critical'}], 'cwd': '/builds/worker/workspace/build/tests/config', 'env': {'TASKCLUSTER_PORT_80_TCP_ADDR': '172.17.0.2', 'TASKCLUSTER_INSTANCE_TYPE': 'm3.large', 'TASKCLUSTER_WORKER_TYPE': 'gecko-t-linux-large', 'MOZ_AUTOMATION': '1', 'MOZ_SOURCE_CHANGESET': 'efce78e62b6dac195e7cb4898684da54155d1661', 'MOCHITEST_FLAVOR': 'plain', 'LOGNAME': 'worker', 'USER': 'worker', 'HOME': '/builds/worker', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/builds/worker/bin', 'DISPLAY': ':0', 'LANG': 'en_US.UTF-8', 'TERM': 'xterm', 'SHELL': '/bin/bash', 'MOZ_NODE_PATH': '/usr/local/bin/node', 'RUN_ID': '0', 'HG_STORE_PATH': '/builds/worker/checkouts/hg-store', 'MOZILLA_BUILD_URL': 'https://queue.taskcluster.net/v1/task/cjvNZxdBSKKfmKHtz6MnGQ/artifacts/public/build/target.tar.bz2', 'TASKCLUSTER_PORT': 'tcp://172.17.0.2:80', 'MOZHARNESS_SCRIPT': 'desktop_unittest.py', 'TASKCLUSTER_NAME': '/laughing_khorana/taskcluster', 'GECKO_HEAD_REPOSITORY': 'https://hg.mozilla.org/mozilla-central', 'SCCACHE_DISABLE': '1', 'TASKCLUSTER_PORT_80_TCP_PORT': '80', 'MOZ_SOURCE_REPO': 'https://hg.mozilla.org/mozilla-central', 'GECKO_HEAD_REV': 'efce78e62b6dac195e7cb4898684da54155d1661', 'MOZHARNESS_URL': 'https://queue.taskcluster.net/v1/task/cjvNZxdBSKKfmKHtz6MnGQ/artifacts/public/build/mozharness.zip', 'GECKO_BASE_REPOSITORY': 'https://hg.mozilla.org/mozilla-unified', 'LC_ALL': 'en_US.UTF-8', 'TASKCLUSTER_PORT_80_TCP_PROTO': 'tcp', 'TASK_ID': 'OgTQKS-PS-udVgG1SWL7lA', 'TASKCLUSTER_PORT_80_TCP': 'tcp://172.17.0.2:80', 'OLDPWD': '/builds/worker/workspace', 'HOSTNAME': 'taskcluster-worker', 'SHLVL': '1', 'PWD': '/builds/worker/workspace', 'NEED_PULSEAUDIO': 'true', 'ENABLE_E10S': 'false', 'NEED_WINDOW_MANAGER': 'true', 'MOZHARNESS_CONFIG': 'unittests/linux_unittest.py remove_executables.py', 'TASKCLUSTER_WORKER_GROUP': 'us-west-1', 'TASKCLUSTER_PUBLIC_IP': '54.183.87.60'}}, attempt #1 [task 2018-03-17T23:01:53.261Z] 23:01:53 INFO - Running command: ['/builds/worker/workspace/build/venv/bin/pip', 'install', '--timeout', '120', '-r', '/builds/worker/workspace/build/tests/config/marionette_requirements.txt', '--no-index', '--find-links', 'http://pypi.pub.build.mozilla.org/pub', '--trusted-host', 'pypi.pub.build.mozilla.org'] in /builds/worker/workspace/build/tests/config [task 2018-03-17T23:01:53.261Z] 23:01:53 INFO - Copy/paste: /builds/worker/workspace/build/venv/bin/pip install --timeout 120 -r /builds/worker/workspace/build/tests/config/marionette_requirements.txt --no-index --find-links http://pypi.pub.build.mozilla.org/pub --trusted-host pypi.pub.build.mozilla.org [task 2018-03-17T23:01:53.261Z] 23:01:53 INFO - Using env: {'DISPLAY': ':0',
Reporter | ||
Comment 49•6 years ago
|
||
Some failures are from 'try -p all -u all' runs which have triggered jsdcov/dev-tools failures -- carry over from bug 1442823.
Reporter | ||
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 51•6 years ago
|
||
Many of the recent logs on linux and android show pypi failures, similar to bug 1445580.
Comment hidden (Intermittent Failures Robot) |
Comment 53•6 years ago
|
||
Over the last 7 days there are 81 failures on this bug. These happen on Android 4.2 x86, android-4-0-armv7-api16, android-4-3-armv7-api16, android-5-0-aarch64, Linux, Linux x64, linux64-qr, Here is a recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=170655917&lineNumber=12538 [taskcluster:error] Task timeout after 7200 seconds. Force killing container.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 55•6 years ago
|
||
See comment 0 and bug dependencies. Investigation and mitigation is on-going.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 60•6 years ago
|
||
Recent failures on Android and Linux are frequent and irregular: Not associated with particular tests or test tasks. Some are related to network failures or delays during 'pip install': [task 2018-04-03T18:13:55.354Z] 18:13:55 INFO - Collecting jsonschema==2.5.1 [task 2018-04-03T18:15:55.610Z] 18:15:55 INFO - Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e3d0>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:17:56.643Z] 18:17:56 INFO - Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e550>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:19:58.106Z] 18:19:58 INFO - Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e6d0>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:22:00.205Z] 18:22:00 INFO - Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e850>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:24:04.445Z] 18:24:04 INFO - Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e9d0>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:26:04.821Z] 18:26:04 INFO - Could not find a version that satisfies the requirement jsonschema==2.5.1 (from versions: ) [task 2018-04-03T18:26:04.822Z] 18:26:04 INFO - No matching distribution found for jsonschema==2.5.1
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 70•6 years ago
|
||
Updates: There have been 141 failures in the last 7 days. Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container. The most affected build types are debug and opt, but there are also over 10 failures on asan & pgo. Failures per platforms: Linux: 32 Linux x64: 32 android-4-3-armv7-api16: 31 linux64-qr: 19 linux64-ccov: 18 android-4-0-armv7-api16: 5 android-5-0-aarch64: 2 linux32-nightly: 1 Android 4.2 x86: 1
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 77•6 years ago
|
||
Bug 1451432 remains my biggest current concern here.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 83•6 years ago
|
||
Over the last 7 days there are 99 failures present on this bug. These happen on Android 4.2 x86, android-4-3-armv7-api16, android-5-0-aarch64, Linux, Linux x64, linux64-ccov, linux64-qr, Windows 7. Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=176126037&lineNumber=25270 [task 2018-04-28T15:11:00.833Z] 15:11:00 INFO - TEST-START | /referrer-policy/unsafe-url/meta-referrer/same-origin/http-https/iframe-tag/generic.swap-origin-redirect.http.html [task 2018-04-28T15:11:00.844Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 28 (0x7fa601cc9800) [pid = 15232] [serial = 17] [outer = (nil)] [url = http://web-platform.test:8000/referrer-policy/generic/subresource/document.py?redirection=no-redirect&cache_destroyer=1524928253055] [task 2018-04-28T15:11:00.849Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 27 (0x7fa601508000) [pid = 15232] [serial = 14] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.853Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 26 (0x7fa602492800) [pid = 15232] [serial = 3] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.855Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 25 (0x7fa609723000) [pid = 15232] [serial = 2] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.858Z] 15:11:00 INFO - PID 15175 | --DOCSHELL 0x7fa601552800 == 8 [pid = 15232] [id = {62097795-cefa-4626-92b1-e2b01134ebcf}] [task 2018-04-28T15:11:00.862Z] 15:11:00 INFO - PID 15175 | --DOCSHELL 0x7fa5ff39c800 == 7 [pid = 15232] [id = {907aed2f-82ba-484c-a1a7-209937e276ec}] [task 2018-04-28T15:11:00.864Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 24 (0x7fa5ff365000) [pid = 15232] [serial = 12] [outer = (nil)] [url = http://web-platform.test:8000/referrer-policy/generic/subresource/document.py] [taskcluster:error] Task timeout after 7200 seconds. Force killing container. [task 2018-04-28T15:11:00.874Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 23 (0x7fa609720c00) [pid = 15232] [serial = 7] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.875Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 22 (0x7fa609721000) [pid = 15232] [serial = 6] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.876Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 21 (0x7fa6098bf000) [pid = 15232] [serial = 9] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.878Z] 15:11:00 INFO - PID 15175 | --DOCSHELL 0x7fa601555000 == 6 [pid = 15232] [id = {f6b879e5-be92-4156-adc9-5489f3b17629}] [taskcluster 2018-04-28 15:11:01.679Z] === Task Finished === [taskcluster 2018-04-28 15:11:01.685Z] Unsuccessful task run with exit code: -1 completed in 7740.784 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 88•6 years ago
|
||
Update: There have been 62 failures in the last 7 days, 230 total failures in the last 21 days. Depends on: 1243080, 1246165, 1321605, 1404190, 1431125, 1448049, 1451432 Blocks: 1306635 Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container. Failure per platform and build type: - Android 4.2 x86 / opt: 2 - android-4-0-armv7-api16 / debug & opt :3 - android-4-3-armv7-api16 / debug & opt: 17 - android-5-0-aarch64 / debug & op: 1 - Linux x64 / pgo, debug & asan: 8 - linux64-ccov / opt: 8 - linux64-qr / opt & debug: 9 - linux64-asan-reporter / opt: 1 - Linux / debug: 13 Recent log with the failure: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=177354933&lineNumber=1772
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 92•6 years ago
|
||
I expect these failures to continue and this bug to remain open indefinitely. Task timeouts are the last resort whenever a task is hung or otherwise runs for too long. My immediate goal is to address timeouts which recur in the same task frequently and identify other related issues in dependent bugs. I expect infrequent failures on Android especially due to bug 1321605. There are some infrequent failures on Android that can be addressed with more chunks - bug 1461393. Recent failures on Android and Linux are frequent and irregular: Not associated with particular tests or test tasks. I'm hoping those might be addressed in bug 1457694. (Recent network failures and delays during 'pip install' appear to have been resolved.)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 103•6 years ago
|
||
Over the last 7 days there are 124 failures present on this bug. These happen on android-4-0-armv7-api16, android-4-2-x86, android-4-3-armv7-api16, linux32, linux64, linux64-ccov, linux64-nightly, linux64-noopt, linux64-qr, windows7-32. Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?job_id=180438377&repo=mozilla-inbound&lineNumber=1806 [task 2018-05-27T02:38:21.245Z] 02:38:21 INFO - TEST-OK | testSessionOOMRestore | took 335841ms [task 2018-05-27T02:38:21.246Z] 02:38:21 INFO - TEST-START | Shutdown [task 2018-05-27T02:38:21.247Z] 02:38:21 INFO - Passed: 97 [task 2018-05-27T02:38:21.249Z] 02:38:21 INFO - Failed: 0 [task 2018-05-27T02:38:21.250Z] 02:38:21 INFO - Todo: 0 [task 2018-05-27T02:38:21.250Z] 02:38:21 INFO - SimpleTest FINISHED [task 2018-05-27T02:38:42.819Z] 02:38:42 INFO - INFO | automation.py | Application ran for: 0:06:10.045962 [task 2018-05-27T02:38:42.821Z] 02:38:42 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmp_o252qpidlog [task 2018-05-27T02:38:43.458Z] 02:38:43 INFO - /data/tombstones does not exist; tombstone check skipped [task 2018-05-27T02:38:50.701Z] 02:38:50 INFO - INFO | automation.py | Application pid: 5599 [task 2018-05-27T02:39:03.345Z] 02:39:03 INFO - SimpleTest START [task 2018-05-27T02:39:03.347Z] 02:39:03 INFO - TEST-START | testSessionPrivateBrowsing [taskcluster:error] Task timeout after 3600 seconds. Force killing container. [taskcluster 2018-05-27 02:39:11.369Z] === Task Finished === [taskcluster 2018-05-27 02:39:11.372Z] Unsuccessful task run with exit code: -1 completed in 3601.941 seconds
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 105•6 years ago
|
||
linux32-debug/jittest-4 and android-debug/xpcshell-11 are frequent and might be actionable. Otherwise, see comment 92.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 108•6 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5989424834d2 Increase Android/debug xpcshell max-run-time; r=me,a=test-only
Reporter | ||
Updated•6 years ago
|
Comment 109•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5989424834d2
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 114•6 years ago
|
||
As an update, there are 76 failures in the last 7 days as it follows: - 36 on linux64 (asan-reporter, ccov and qr) - 29 on android. Mostly on 4-3-armv7-api16 but there are also 3 on 4-0-armv7-api16 and 2 on 4-2-x86 - 11 on linux32
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 123•6 years ago
|
||
In the last 7 days, there are 111 failures on this bug. They occur mostly on linux 32, 64, 64-ccov, 64-qr, all builds type. Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=183726361&repo=mozilla-inbound&lineNumber=24638
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 134•6 years ago
|
||
In the last 7 days there are 127 failures: - 51 failures on linux64 / asan, debug, pgo & lto - 47 failures on android-em-4-3-armv7-api16 / opt & debug - 11 failures on linux 32 / debug, pgo & opt - 7 failures on linux64-ccov / debug - 6 failures on osx-cross / opt & debug - 2 failures on android-4-0-armv7-api16 / debug - 2 failures on linux64-qr / debug - 1 failure on linux64-noopt / debug As per comment 92, these will keep occuring. :gbrown is there no way to decrease their rate on the most common platforms they occur: linux64 and android?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 137•6 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #134) > In the last 7 days there are 127 failures: > > - 51 failures on linux64 / asan, debug, pgo & lto > - 47 failures on android-em-4-3-armv7-api16 / opt & debug > - 11 failures on linux 32 / debug, pgo & opt > - 7 failures on linux64-ccov / debug > - 6 failures on osx-cross / opt & debug > - 2 failures on android-4-0-armv7-api16 / debug > - 2 failures on linux64-qr / debug > - 1 failure on linux64-noopt / debug > > As per comment 92, these will keep occuring. > > :gbrown is there no way to decrease their rate on the most common platforms > they occur: linux64 and android? These are certainly too frequent, but I don't know what to do about it. There are multiple causes, even on one platform. Bug 1470177 eliminated a common cause of android timeouts, but the weekly stats hardly changed. Bug 1457694 was an attempt to address a common scenario I (may) see on linux and android, but that hasn't made any progress and I'm not sure how to make progress on it. :( I will keep monitoring failures and acting on those cases that I can.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 140•6 years ago
|
||
:gbrown thanks for looking into this!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 150•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #137) > Bug 1457694 was an attempt to address a common scenario I (may) see on linux > and android, but that hasn't made any progress and I'm not sure how to make > progress on it. :( Bug 1457694 has been confirmed as a real issue and is progressing, I think!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 155•6 years ago
|
||
Update: There has been a total of 119 failures in the last 7 days( and 339 failures in the last 21 days). - 2 failures on android-4-0-armv7-api16 debug - 1 failure on android-4-2-x86 opt - 1 failure on android-em-4-2-x86 opt - 19 failures on android-em-4-3-armv7-api16 debug/opt - 17 failures on linux32 debug/opt - 51 failures on linux64 asan/debug/lto/opt/pgo - 10 failures on linux64-ccov debug - 10 failures on linux64-qr debug/opt - 1 failure on linux64-noopt debug - 1 failure on osx-10-10 opt - 3 failures on osx-cross opt - 1 failure on windows10-64 asan - 2 failures on windows7-32 debug There are different type of tests running, but all of them have this error in common: [taskcluster:error] Task timeout after 3600 seconds. Force killing container. [taskcluster 2018-07-18 12:00:30.241Z] Unsuccessful task run with exit code: -1 completed in 3924.674 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 164•6 years ago
|
||
(In reply to OrangeFactor Robot from comment #163) > 135 failures in 138 pushes (0.978 failures/push) were associated with this > bug yesterday. > > Platform breakdown: > * android-em-4-3-armv7-api16: 105 robocop is perma-failing - https://bugzilla.mozilla.org/show_bug.cgi?id=1451513#c19.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 167•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #164) > robocop is perma-failing - > https://bugzilla.mozilla.org/show_bug.cgi?id=1451513#c19. That was corrected with a backout and subsequent re-landing - great!
Reporter | ||
Comment 168•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #150) > Bug 1457694 has been confirmed as a real issue and is progressing, I think! That bug hasn't been marked as resolved, but some changes have landed in related bugs and I am seeing far fewer of those issues. Failure rates in this bug have recently been significantly lower than we've seen in months.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Reporter | ||
Comment 175•6 years ago
|
||
Thanks :aryx! Yes, it looks like bug 1480494 is the biggest contributor to current failures here.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 188•6 years ago
|
||
Update: There have been 98 failures in the last week. Failures per platform and build type: linux64 / debug, asan, opt, lto: 39 osx-10-10 / opt & debug: 12 android-em-4-3-armv7-api16 / debug & opt: 11 linux64-ccov / debug: 11 linux32 / debug & opt: 7 osx-cross / opt & asan: 7 windows2012-32 / debug & pgo: 2 windows7-32 / opt & pgo: 3 android-em-4-2-x86 / debug: 2 linux64-noopt / debug: 1 linux64-qr / debug: 1 android-4-0-armv7-api16 / opt: 1 android-5-0-aarch64 / opt: 1 Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 194•6 years ago
|
||
There have been 118 failures in the last week. The most affected platform / build: linux64-ccov / debug. Recent relevat log file: https://treeherder.mozilla.org/logviewer.html#?job_id=197122988&repo=mozilla-central&lineNumber=180429 Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container. Blocks: 1306635 Depends on: 1115253, 1243080, 1321605, 1404190, 1431125, 1457694, 1480348, 1487877, 1487886
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 199•6 years ago
|
||
linux64/ccov devtool and android/debug mochitest failure are being addressed with more test chunks. linux64 mochitest-media failures are reportedly 1480348. There are also some build task timeouts, which seem quite random -- I don't know what to do about those.
Comment hidden (Intermittent Failures Robot) |
Comment 201•6 years ago
|
||
Update: There have been 37 failures in the last week. Failures per platform and build type: - linux64 / debug, opt, asan, pgo, lto: 16 - linux32 / debug & opt: 10 - android-em-4-3-armv7-api16 / debug & opt: 5 - linux64-qr / debug: 2 - linux64-ccov / debug: 1 - osx-10-10-dmd / opt: 1 - osx-cross / opt: 1 - osx-cross-noopt / debug: 1 Recent relevant log file: https://treeherder.mozilla.org/logviewer.html#?job_id=199174505&repo=mozilla-inbound&lineNumber=40467 Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 203•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #199) > linux64/ccov devtool and android/debug mochitest failure are being addressed > with more test chunks. > > linux64 mochitest-media failures are reportedly 1480348. > > There are also some build task timeouts, which seem quite random -- I don't > know what to do about those. Many of those (apparent) build task timeouts are in rusttests; I haven't investigated, but note bug 1489277 might be tangentially related.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 208•6 years ago
|
||
There was a recent spike in linux64-ccov/asan build timeouts after 60 minutes; these are "build-linux64-asan-fuzzing-ccov/opt (bocf)" builds, on mozilla-central only. The most recent builds have run close to 50 minutes, so I won't take action until/unless this issue returns.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 211•6 years ago
|
||
Some failures last week happened during robustcheckout - fallout from bug 1490703 possibly - should be resolved now.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 218•6 years ago
|
||
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=212394686&repo=autoland&lineNumber=45090 [task 2018-11-17T15:05:43.073Z] 15:05:43 INFO - TEST-START | /referrer-policy/same-origin/attr-referrer/cross-origin/http-https/script-tag/cross-origin.keep-origin-redirect.http.html [task 2018-11-17T15:05:43.296Z] 15:05:43 INFO - PID 22748 | ++DOCSHELL 0x7fdd2decb800 == 2 [pid = 22854] [id = {f599e7eb-3bd6-4194-91f2-ee9bbeb55f76}] [task 2018-11-17T15:05:43.298Z] 15:05:43 INFO - PID 22748 | ++DOMWINDOW == 5 (0x7fdd2ca0e800) [pid = 22854] [serial = 5] [outer = (nil)] [task 2018-11-17T15:05:43.397Z] 15:05:43 INFO - PID 22748 | ++DOMWINDOW == 6 (0x7fdd2dd62400) [pid = 22854] [serial = 6] [outer = 0x7fdd2ca0e800] [task 2018-11-17T15:05:43.679Z] 15:05:43 INFO - PID 22748 | ++DOMWINDOW == 7 (0x7fdd2dd63400) [pid = 22854] [serial = 7] [outer = 0x7fdd2ca0e800] [task 2018-11-17T15:05:44.143Z] 15:05:44 INFO - PID 22748 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties [taskcluster:error] Task timeout after 7200 seconds. Force killing container. [taskcluster 2018-11-17 15:05:45.094Z] === Task Finished === [taskcluster 2018-11-17 15:05:45.096Z] Unsuccessful task run with exit code: -1 completed in 7692.491 seconds Details in Comment 92.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 221•6 years ago
|
||
Tests are in good shape -- almost no test failures here recently. However, there are a bunch of build failures here. I don't see clear correlations to specific build types, but...needs more investigation.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 223•6 years ago
|
||
Recent Android 7.0 failures are on machine-16 -- a ticket has been opened on packet.net to have this machine specific performance problem addressed.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 227•6 years ago
|
||
This bug failed 45 times in the last 7 days. These occur on linux32, linux64 on opt, asan and pgo. For the following platforms there's only 1 failures for each: linux64-ccov, linux64-qr, android-em-4-3-armv7-api16. Recent log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=217017974&repo=mozilla-inbound&lineNumber=87534 15:52:26 INFO - ============================= test session starts ============================== [task 2018-12-14T15:52:26.563Z] 15:52:26 INFO - platform linux2 -- Python 2.7.9, pytest-3.6.2, py-1.5.4, pluggy-0.6.0 -- /builds/worker/workspace/build/src/obj-firefox/_virtualenvs/obj-firefox-8yIyzR8r-2.7/bin/python [task 2018-12-14T15:52:26.563Z] 15:52:26 INFO - rootdir: /builds/worker/workspace/build/src, inifile: /builds/worker/workspace/build/src/config/mozunit/mozunit/pytest.ini [task 2018-12-14T15:52:26.563Z] 15:52:26 INFO - collecting ... collected 1 item [task 2018-12-14T15:52:26.563Z] 15:52:26 INFO - ../config/tests/unit-mozunit.py::TestMozUnit::test_mocked_open PASSED [task 2018-12-14T15:52:26.564Z] 15:52:26 INFO - =========================== 1 passed in 0.01 seconds =========================== [task 2018-12-14T15:52:26.564Z] 15:52:26 INFO - /builds/worker/workspace/build/src/config/tests/test_mozbuild_reading.py [task 2018-12-14T15:52:26.564Z] 15:52:26 INFO - ============================= test session starts ============================== [task 2018-12-14T15:52:26.564Z] 15:52:26 INFO - platform linux2 -- Python 2.7.9, pytest-3.6.2, py-1.5.4, pluggy-0.6.0 -- /builds/worker/workspace/build/src/obj-firefox/_virtualenvs/obj-firefox-8yIyzR8r-2.7/bin/python [task 2018-12-14T15:52:26.565Z] 15:52:26 INFO - rootdir: /builds/worker/workspace/build/src, inifile: /builds/worker/workspace/build/src/config/mozunit/mozunit/pytest.ini [task 2018-12-14T15:52:26.565Z] 15:52:26 INFO - collecting ... collected 3 items [task 2018-12-14T15:52:26.565Z] 15:52:26 INFO - ../config/tests/test_mozbuild_reading.py::TestMozbuildReading::test_filesystem_traversal_no_config PASSED [task 2018-12-14T15:52:26.565Z] 15:52:26 INFO - ../config/tests/test_mozbuild_reading.py::TestMozbuildReading::test_filesystem_traversal_reading <- ../../../../../usr/lib/python2.7/unittest/case.py SKIPPED [task 2018-12-14T15:52:26.566Z] 15:52:26 INFO - ../config/tests/test_mozbuild_reading.py::TestMozbuildReading::test_orphan_file_patterns PASSED [task 2018-12-14T15:52:26.566Z] 15:52:26 INFO - ===================== 2 passed, 1 skipped in 79.39 seconds ===================== [taskcluster:error] Task timeout after 7200 seconds. Force killing container. [taskcluster 2018-12-14 15:53:03.117Z] === Task Finished === [taskcluster 2018-12-14 15:53:03.118Z] Unsuccessful task run with exit code: -1 completed in 7252.08 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 233•5 years ago
|
||
There are 29 total failures in the last 7 days and 162 total failures in the last 30 days (some of them might be misclassifications), majority on linux. Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=219384257&repo=autoland&lineNumber=194352 [task 2018-12-31T13:12:56.848Z] 13:12:56 INFO - TEST-START | dom/base/test/test_bug715041.xul [task 2018-12-31T13:13:08.337Z] 13:13:08 INFO - GECKO(5728) | JS CALLED OBSERVE FUNCTION!!! [task 2018-12-31T13:13:08.338Z] 13:13:08 INFO - GECKO(5728) | msg 1 Count: 1 [task 2018-12-31T13:13:09.378Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 28 (0xdb22d000) [pid = 5728] [serial = 132] [outer = (nil)] [url = about:blank] [task 2018-12-31T13:13:09.382Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 27 (0xdb238800) [pid = 5728] [serial = 129] [outer = (nil)] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html] [task 2018-12-31T13:13:09.385Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 26 (0xdb22b800) [pid = 5728] [serial = 124] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_blocking_image.html] [task 2018-12-31T13:13:09.387Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 25 (0xdb231000) [pid = 5728] [serial = 127] [outer = (nil)] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html] [task 2018-12-31T13:13:09.389Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 24 (0xdb223000) [pid = 5728] [serial = 123] [outer = (nil)] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html] [task 2018-12-31T13:13:09.391Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 23 (0xdc944c00) [pid = 5728] [serial = 110] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_blockParsing.html] [task 2018-12-31T13:13:09.396Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 22 (0xdb237c00) [pid = 5728] [serial = 128] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_bug1008126.html] [task 2018-12-31T13:13:09.398Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 21 (0xdb225000) [pid = 5728] [serial = 130] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_bug1016960.html] [task 2018-12-31T13:13:09.400Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 20 (0xdb2b9000) [pid = 5728] [serial = 133] [outer = (nil)] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html] [task 2018-12-31T13:13:10.312Z] 13:13:10 INFO - GECKO(5728) | msg 3 Count: 1 [task 2018-12-31T13:13:13.318Z] 13:13:13 INFO - GECKO(5728) | msg 4 Count: 1 [task 2018-12-31T13:13:13.319Z] 13:13:13 INFO - GECKO(5728) | msg 2 Count: 1 [task 2018-12-31T13:13:13.322Z] 13:13:13 INFO - GECKO(5728) | TESTING CASE AddShiftLocalCleanUp [task 2018-12-31T13:13:13.325Z] 13:13:13 INFO - GECKO(5728) | ============== [task 2018-12-31T13:13:13.327Z] 13:13:13 INFO - GECKO(5728) | JS REMOVE IDLE OBSERVER () time to be removed: 1 [task 2018-12-31T13:13:13.332Z] 13:13:13 INFO - GECKO(5728) | JS removeIdleObserver() observer time: 0 [task 2018-12-31T13:13:13.334Z] 13:13:13 INFO - GECKO(5728) | JS removeIdleObserver() observer time: 1 [task 2018-12-31T13:13:13.339Z] 13:13:13 INFO - GECKO(5728) | MOCK IDLE SERVICE REMOVING idle observer with time 1 [task 2018-12-31T13:13:13.341Z] 13:13:13 INFO - GECKO(5728) | MOCK IDLE SERVICE numIdleObserversRemoved: 1 numIdleObserversAdded: 0 [task 2018-12-31T13:13:13.342Z] 13:13:13 INFO - GECKO(5728) | JS FAKE IDLE SERVICE end of remove idle observer [task 2018-12-31T13:13:13.350Z] 13:13:13 INFO - GECKO(5728) | JS NUM OBSERVERS: 1 [task 2018-12-31T13:13:13.351Z] 13:13:13 INFO - GECKO(5728) | AddShiftLocalCleanUp() done clean up [task 2018-12-31T13:13:13.353Z] 13:13:13 INFO - GECKO(5728) | TESTING CASE AddNewLocalWhileAllIdle [task 2018-12-31T13:13:13.354Z] 13:13:13 INFO - GECKO(5728) | ============== [task 2018-12-31T13:13:13.356Z] 13:13:13 INFO - GECKO(5728) | JS FAKE IDLE SERVICE add idle observer before [task 2018-12-31T13:13:13.358Z] 13:13:13 INFO - GECKO(5728) | JS NUM OBSERVERS: 1 [task 2018-12-31T13:13:13.360Z] 13:13:13 INFO - GECKO(5728) | window is: [object Window] [task 2018-12-31T13:13:13.361Z] 13:13:13 INFO - GECKO(5728) | MOCK IDLE SERVICE ADDING idle observer with time: 1 [task 2018-12-31T13:13:13.363Z] 13:13:13 INFO - GECKO(5728) | MOCK IDLE SERVICE: num idle observers added: 1 [task 2018-12-31T13:13:13.364Z] 13:13:13 INFO - GECKO(5728) | JS FAKE IDLE SERVICE end of add idle observer [task 2018-12-31T13:13:13.366Z] 13:13:13 INFO - GECKO(5728) | JS NUM OBSERVERS: 2 [task 2018-12-31T13:13:13.368Z] 13:13:13 INFO - GECKO(5728) | JS FAKE IDLE SERVICE [task 2018-12-31T13:13:13.369Z] 13:13:13 INFO - GECKO(5728) | JS NUM OBSERVERS: 2 [task 2018-12-31T13:13:13.373Z] 13:13:13 INFO - GECKO(5728) | JS CALLED OBSERVE FUNCTION!!! [task 2018-12-31T13:13:13.375Z] 13:13:13 INFO - GECKO(5728) | msg 1 Count: 1 [task 2018-12-31T13:13:14.322Z] 13:13:14 INFO - GECKO(5728) | msg 2 Count: 1 [task 2018-12-31T13:13:15.321Z] 13:13:15 INFO - GECKO(5728) | msg 2 Count: 2 [task 2018-12-31T13:13:16.821Z] 13:13:16 INFO - GECKO(5728) | msg 5 Count: 1 [task 2018-12-31T13:13:16.825Z] 13:13:16 INFO - GECKO(5728) | function performNextTest() [task 2018-12-31T13:13:16.828Z] 13:13:16 INFO - GECKO(5728) | currTestCaseNum: 4 [task 2018-12-31T13:13:16.831Z] 13:13:16 INFO - GECKO(5728) | cleanUp: false [task 2018-12-31T13:13:16.834Z] 13:13:16 INFO - GECKO(5728) | passed: true [task 2018-12-31T13:13:16.838Z] 13:13:16 INFO - GECKO(5728) | numIdleObserversRemoved: 0 [task 2018-12-31T13:13:16.840Z] 13:13:16 INFO - GECKO(5728) | numIdleObservesAdded: 1 [task 2018-12-31T13:13:16.844Z] 13:13:16 INFO - GECKO(5728) | TESTING CASE AddNewLocalWhileAllIdleCleanUp [task 2018-12-31T13:13:16.846Z] 13:13:16 INFO - GECKO(5728) | ============== [task 2018-12-31T13:13:16.850Z] 13:13:16 INFO - GECKO(5728) | JS REMOVE IDLE OBSERVER () time to be removed: 1 [task 2018-12-31T13:13:16.853Z] 13:13:16 INFO - GECKO(5728) | JS removeIdleObserver() observer time: 0 [task 2018-12-31T13:13:16.854Z] 13:13:16 INFO - GECKO(5728) | JS removeIdleObserver() observer time: 1 [task 2018-12-31T13:13:16.856Z] 13:13:16 INFO - GECKO(5728) | MOCK IDLE SERVICE REMOVING idle observer with time 1 [task 2018-12-31T13:13:16.857Z] 13:13:16 INFO - GECKO(5728) | MOCK IDLE SERVICE numIdleObserversRemoved: 1 numIdleObserversAdded: 0 [task 2018-12-31T13:13:16.858Z] 13:13:16 INFO - GECKO(5728) | JS FAKE IDLE SERVICE end of remove idle observer [task 2018-12-31T13:13:16.859Z] 13:13:16 INFO - GECKO(5728) | JS NUM OBSERVERS: 1 [task 2018-12-31T13:13:16.861Z] 13:13:16 INFO - GECKO(5728) | TESTING CASE ShiftLocalTimerBack() [task 2018-12-31T13:13:16.863Z] 13:13:16 INFO - GECKO(5728) | ============== [task 2018-12-31T13:13:16.866Z] 13:13:16 INFO - GECKO(5728) | JS FAKE IDLE SERVICE add idle observer before [task 2018-12-31T13:13:16.867Z] 13:13:16 INFO - GECKO(5728) | JS NUM OBSERVERS: 1 [task 2018-12-31T13:13:16.869Z] 13:13:16 INFO - GECKO(5728) | window is: [object Window] [task 2018-12-31T13:13:16.871Z] 13:13:16 INFO - GECKO(5728) | MOCK IDLE SERVICE ADDING idle observer with time: 1 [task 2018-12-31T13:13:16.873Z] 13:13:16 INFO - GECKO(5728) | MOCK IDLE SERVICE: num idle observers added: 1 [task 2018-12-31T13:13:16.875Z] 13:13:16 INFO - GECKO(5728) | JS FAKE IDLE SERVICE end of add idle observer [task 2018-12-31T13:13:16.877Z] 13:13:16 INFO - GECKO(5728) | JS NUM OBSERVERS: 2 [task 2018-12-31T13:13:16.879Z] 13:13:16 INFO - GECKO(5728) | JS FAKE IDLE SERVICE [task 2018-12-31T13:13:16.881Z] 13:13:16 INFO - GECKO(5728) | JS NUM OBSERVERS: 2 [task 2018-12-31T13:13:16.883Z] 13:13:16 INFO - GECKO(5728) | JS CALLED OBSERVE FUNCTION!!! [task 2018-12-31T13:13:16.885Z] 13:13:16 INFO - GECKO(5728) | msg 2 Count: 1 [task 2018-12-31T13:13:18.751Z] 13:13:18 INFO - GECKO(5728) | msg 4 Count: 1 [taskcluster:error] Task timeout after 3600 seconds. Force killing container. [taskcluster 2018-12-31 13:13:22.095Z] === Task Finished === [taskcluster 2018-12-31 13:13:22.096Z] Unsuccessful task run with exit code: -1 completed in 3603.519 seconds
Reporter | ||
Comment 234•5 years ago
|
||
(In reply to Intermittent Failures Robot from comment #232) > 29 failures in 1008 pushes (0.029 failures/push) were associated with this > bug in the last 7 days. > > Platform breakdown: > * linux64: 11 > * linux32: 8 > * linux64-qr: 10 I reviewed these linux timeouts but couldn't find a common cause. Existing max-run-times seem appropriate. aws slow-down? Seems better recently -- monitoring.
Reporter | ||
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 236•5 years ago
|
||
Most recent failures are on linux/debug, but vary by test suite. Average task durations are << max-run-time. I wonder if these are related to logspam, like bug 1437991, bug 1515833, and bug 1515827.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 239•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #236)
Most recent failures are on linux/debug, but vary by test suite. Average task durations are << max-run-time. I wonder if these are related to logspam, like bug 1437991, bug 1515833, and bug 1515827.
Failure frequency did seem to decrease with resolution of bug 1437991, but linux/debug continues to dominate ongoing failures.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 241•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #236)
Most recent failures are on linux/debug, but vary by test suite. Average task durations are << max-run-time. I wonder if these are related to logspam, like bug 1437991, bug 1515833, and bug 1515827.
Those logspam bugs are all resolved now - great! - and did seem to help the random linux/debug failures here.
Recently we are seeing more linux/debug jsreftest failures specifically -- investigating...
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 243•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #241)
Recently we are seeing more linux/debug jsreftest failures specifically -- investigating...
And linux/debug reftest also.
linux64/debug jsreftest might benefit from more chunks.
Other platforms are more puzzling: linux64/debug reftests run in about 30 minutes generally...then suddenly a chunk runs in 60+ minutes. Affected chunks appear "random"...could that be because tests are being shuffled?
Reporter | ||
Comment 244•5 years ago
|
||
Some failures on Feb 4/5 appear to be related to certificate expiry (temporary infrastructure condition caused mass failures, hangs).
Comment hidden (Intermittent Failures Robot) |
Comment 246•5 years ago
|
||
Over the last 7 days this bug has 30 failures. These happen on windows7-32, osx-cross-noopt, osx-cross, linux64-qr, linux64, linux32
Here is the latest log failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228854900&repo=mozilla-inbound&lineNumber=30960
[task 2019-02-16T23:10:38.226Z] 23:10:38 INFO - REFTEST TEST-START | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-004.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html
[task 2019-02-16T23:10:38.231Z] 23:10:38 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-004.html | 1167 / 1195 (97%)
[task 2019-02-16T23:10:38.256Z] 23:10:38 INFO - ++DOMWINDOW == 334 (0x7f0529d71c00) [pid = 8577] [serial = 2751] [outer = 0x7f052d531800]
[task 2019-02-16T23:10:38.337Z] 23:10:38 INFO - [Child 8577, Main Thread] WARNING: HTMLEditRules::BeforeEdit() failed to handle something: 'NS_SUCCEEDED(rv)', file /builds/worker/workspace/build/src/editor/libeditor/HTMLEditor.cpp, line 3576
[task 2019-02-16T23:10:38.344Z] 23:10:38 INFO - [Child 8577, Main Thread] WARNING: '!aSelection.RangeCount()', file /builds/worker/workspace/build/src/editor/libeditor/EditorBase.cpp, line 3569
[task 2019-02-16T23:10:38.351Z] 23:10:38 INFO - [Child 8577, Main Thread] WARNING: '!selectionStartPoint.IsSet()', file /builds/worker/workspace/build/src/editor/libeditor/HTMLEditRules.cpp, line 9806
[task 2019-02-16T23:10:38.355Z] 23:10:38 INFO - [Child 8577, Main Thread] WARNING: Failed to normalize Selection: 'NS_SUCCEEDED(rv)', file /builds/worker/workspace/build/src/editor/libeditor/HTMLEditRules.cpp, line 453
[task 2019-02-16T23:10:41.103Z] 23:10:41 INFO - REFTEST TEST-PASS | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-004.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html | image comparison, max difference: 0, number of differing pixels: 0
[task 2019-02-16T23:10:41.106Z] 23:10:41 INFO - REFTEST TEST-END | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-004.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html
[task 2019-02-16T23:10:41.136Z] 23:10:41 INFO - ++DOMWINDOW == 335 (0x7f0528b65000) [pid = 8577] [serial = 2752] [outer = 0x7f052d531800]
[task 2019-02-16T23:10:41.196Z] 23:10:41 INFO - REFTEST TEST-START | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-005.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html
[task 2019-02-16T23:10:41.203Z] 23:10:41 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-005.html | 1168 / 1195 (97%)
[task 2019-02-16T23:10:41.243Z] 23:10:41 INFO - ++DOMWINDOW == 336 (0x7f0528d1ac00) [pid = 8577] [serial = 2753] [outer = 0x7f052d531800]
[task 2019-02-16T23:10:41.628Z] 23:10:41 INFO - REFTEST TEST-PASS | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-005.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html | image comparison, max difference: 0, number of differing pixels: 0
[task 2019-02-16T23:10:41.634Z] 23:10:41 INFO - REFTEST TEST-END | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-005.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html
[task 2019-02-16T23:10:41.659Z] 23:10:41 INFO - ++DOMWINDOW == 337 (0x7f0529084c00) [pid = 8577] [serial = 2754] [outer = 0x7f052d531800]
[taskcluster:error] Task timeout after 3600 seconds. Force killing container.
[task 2019-02-16T23:10:41.723Z] 23:10:41 INFO - REFTEST TEST-START | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-006.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-006-ref.html
[task 2019-02-16T23:10:41.746Z] 23:10:41 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-006.html | 1169 / 1195 (97%)
[taskcluster 2019-02-16 23:10:42.612Z] === Task Finished ===
[taskcluster 2019-02-16 23:10:42.613Z] Unsuccessful task run with exit code: -1 completed in 4378.344 seconds
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 250•5 years ago
|
||
Over the last 7 days there are 39 failures on this bug. These happen on osx-cross, linux64-qr, linux64, linux32, android-em-4-3-armv7-api16, android-5-0-x86_64.
Here is the latest failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=231041560&repo=autoland&lineNumber=35054
Reporter | ||
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment 252•5 years ago
|
||
This started to permafail from this merge https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception&revision=ecbfad744a66ee47428dff7992aa6604391844e6&selectedJob=232377959&searchStr=os%2Cx%2Ccross%2Ccompiled%2Cccov%2Cdebug%2Cbuild-macosx64-ccov%2Fdebug%2C%28b%29
gbrown can you please take a look?
Reporter | ||
Comment 253•5 years ago
|
||
Thanks Arthur. I noticed and hope my patch in bug 1533565 will resolve this.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 255•5 years ago
|
||
There were 4 timeouts on android builds here:
(On retry, they were all fine.)
It looks like hg robustcheckout was taking a very long time.
[task 2019-03-07T10:38:21.362Z] 10:38:21 INFO - Running main action method: multi_l10n
...
[task 2019-03-07T12:05:06.758Z] 12:05:06 INFO - 12:05:06 INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', '--config', 'extensions.robustcheckout=/builds/worker/workspace/build/src/testing/mozharness/external_tools/robustcheckout.py', 'robustcheckout', u'https://hg.mozilla.org/l10n-central/mr', u'mr', '--sharebase', u'/builds/hg-shared', '--branch', u'default']
[task 2019-03-07T12:05:06.758Z] 12:05:06 INFO - 12:05:06 INFO - Copy/paste: hg --config ui.merge=internal:merge --config extensions.robustcheckout=/builds/worker/workspace/build/src/testing/mozharness/external_tools/robustcheckout.py robustcheckout https://hg.mozilla.org/l10n-central/mr mr --sharebase /builds/hg-shared --branch default
[task 2019-03-07T12:05:06.809Z] 12:05:06 INFO - 12:05:06 INFO - (using Mercurial 4.8.1)
[task 2019-03-07T12:05:06.810Z] 12:05:06 INFO - 12:05:06 INFO - ensuring https://hg.mozilla.org/l10n-central/mr@default is available at mr
[task 2019-03-07T12:05:07.025Z] 12:05:07 INFO - 12:05:07 INFO - (sharing from new pooled repository cf74f930fde306218d72c790d69ad9ba7c8b8b51)
[task 2019-03-07T12:05:07.216Z] 12:05:07 INFO - 12:05:07 INFO - requesting all changes
[taskcluster:error] Task timeout after 7200 seconds. Force killing container.
vs a successful retry:
[task 2019-03-07T12:43:25.400Z] 12:43:25 INFO - Running main action method: multi_l10n
...
[task 2019-03-07T12:51:28.535Z] 12:51:28 INFO - 12:51:28 INFO - [mozharness: 2019-03-07 12:51:28.535244Z] Finished pull-locale-source step (success)
...
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - [mozharness: 2019-03-07 12:56:32.376373Z] Finished multi-l10n step (success)
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - [mozharness: 2019-03-07 12:56:32.376448Z] Skipping package-source step.
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - Running post-run listener: _parse_build_tests_ccov
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - Running post-run listener: _shutdown_sccache
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - Running post-run listener: _summarize
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - # TBPL SUCCESS #
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - [mozharness: 2019-03-07 12:56:32.376769Z] FxDesktopBuild summary:
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - # TBPL SUCCESS #
[taskcluster 2019-03-07 12:56:32.926Z] === Task Finished ===
[taskcluster 2019-03-07 12:58:24.583Z] Successful task run with exit code: 0 completed in 3083.283 seconds
I won't follow-up unless there are more problems like this.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 258•5 years ago
|
||
There are still some linux64/debug devtools timeouts after bug 1534822. Chunks are very unbalanced. Filed bug 1536253 as a first step to improvement.
Reporter | ||
Comment 259•5 years ago
|
||
There are some infrequent linux64-qr/debug reftest 3600 s timeouts, but chunks normally run in 20 to 40 minutes so I am reluctant to act; watching...
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 262•5 years ago
|
||
source-test-clang-tidy is perma-fail, causing a spike here. There is some existing activity on that test, like bug 1539779 -- waiting to see if that helps. (Otherwise, it might be worth noting that there was a sharp spike in run time around here: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=tidy&tochange=98452610cfcc81cba0d4478797fe1e83a51172e8&fromchange=cf6cfe33476622c202fcbc34ef12f8ca1c039b8a. )
Comment 263•5 years ago
|
||
Geoff, I've filed bug 1540325 for the clang tidy timeouts.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 267•5 years ago
|
||
Over the last 7 days there are 31 failures present on this bug. These happen on: android-em-7-0-x86_64, linux32-shippable, linux64, linux64-qr, osx-cross, osx-cross-ccov
Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=239147679&repo=autoland&lineNumber=78131
Reporter | ||
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 271•5 years ago
|
||
Over the last 7 days there are 38 failures present on this bug. These happen on android-4-0-armv7-api16, android-5-0-aarch64, android-em-7-0-x86, android-em-7-0-x86_64, linux32-shippable, linux64, linux64-shippable, osx-cross, osx-cross-noopt, osx-shippable, windows-mingw32, windows2012-32-shippable.
Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=240827727&repo=mozilla-central&lineNumber=5178
[task 2019-04-17T00:24:53.121Z] 00:24:53 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-283.xht
[task 2019-04-17T00:24:55.017Z] 00:24:55 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-283.xht | took 1905ms
[task 2019-04-17T00:24:55.019Z] 00:24:55 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-284.xht
[task 2019-04-17T00:24:57.803Z] 00:24:57 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-284.xht | took 2782ms
[task 2019-04-17T00:24:57.805Z] 00:24:57 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-285.xht
[task 2019-04-17T00:24:59.685Z] 00:24:59 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-285.xht | took 1879ms
[task 2019-04-17T00:24:59.687Z] 00:24:59 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-286.xht
[task 2019-04-17T00:25:01.657Z] 00:25:01 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-286.xht | took 1971ms
[task 2019-04-17T00:25:01.657Z] 00:25:01 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-287.xht
[task 2019-04-17T00:25:03.591Z] 00:25:03 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-287.xht | took 1935ms
[task 2019-04-17T00:25:03.593Z] 00:25:03 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-288.xht
[task 2019-04-17T00:25:05.387Z] 00:25:05 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-288.xht | took 1796ms
[task 2019-04-17T00:25:05.388Z] 00:25:05 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-289.xht
[task 2019-04-17T00:25:07.057Z] 00:25:07 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-289.xht | took 1667ms
[task 2019-04-17T00:25:07.058Z] 00:25:07 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-290.xht
[task 2019-04-17T00:25:08.942Z] 00:25:08 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-290.xht | took 1880ms
[task 2019-04-17T00:25:08.944Z] 00:25:08 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-291.xht
[task 2019-04-17T00:25:10.795Z] 00:25:10 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-291.xht | took 1857ms
[task 2019-04-17T00:25:10.796Z] 00:25:10 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-292.xht
[task 2019-04-17T00:25:12.671Z] 00:25:12 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-292.xht | took 1873ms
[task 2019-04-17T00:25:12.671Z] 00:25:12 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-293.xht
[taskcluster:error] Task timeout after 5400 seconds. Force killing container.
[taskcluster 2019-04-17 00:25:15.346Z] === Task Finished ===
[taskcluster 2019-04-17 00:25:15.347Z] Unsuccessful task run with exit code: -1 completed in 5402.945 seconds
Comment 272•5 years ago
|
||
Looking at a couple of the linux builds it seems sometimes it takes 1/2 to 1 hour to clone and update the repo. I do see '(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")' in the log as well.
Perhaps we a) either adjust the max run time to account for the slow hg performance or b) improve the hg performance perhaps with fsmonitor or both or some other approach. Too bad we have to clone and update > 3G of data each time as well.
Perhaps we can just try to bump the max run time to see if we prevent or reduce these errors for the problematic jobs, not just builds, then for the ones which complete when given enough time, try to improve their run times. For the ones that won't complete regardless of the run time we can reduce it again and investigate that as a distinct issue.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 273•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #272)
Thanks :bc.
Looking at a couple of the linux builds it seems sometimes it takes 1/2 to 1 hour to clone and update the repo. I do see '(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")' in the log as well.
I also see the fsmonitor warning in other logs, where the cloning completes quickly -- maybe it is normal / expected?
Comment hidden (Intermittent Failures Robot) |
Comment 275•5 years ago
|
||
There are 35 total failures in the last 7 days on multiple platforms: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-19&endday=2019-04-26&tree=trunk&bug=1411358
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=242811869&repo=mozilla-inbound&lineNumber=17137
[task 2019-04-26T09:12:42.250Z] 09:12:42 INFO - REFTEST TEST-START | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=test262/language/expressions/equals/S11.9.1_A3.3.js
[task 2019-04-26T09:12:42.251Z] 09:12:42 INFO - REFTEST TEST-LOAD | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=test262/language/expressions/equals/S11.9.1_A3.3.js | 4550 / 6609 (68%)
[task 2019-04-26T09:12:42.255Z] 09:12:42 INFO - wait for org.mozilla.geckoview.test complete; top activity=org.mozilla.geckoview.test
[task 2019-04-26T09:12:42.466Z] 09:12:42 INFO - org.mozilla.geckoview.test unexpectedly found running. Killing...
[task 2019-04-26T09:12:42.467Z] 09:12:42 INFO - REFTEST TEST-INFO | started process screentopng
[task 2019-04-26T09:12:44.529Z] 09:12:44 INFO - REFTEST TEST-INFO | screentopng: exit 0
[task 2019-04-26T09:13:00.704Z] 09:13:00 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=test262/language/expressions/equals/S11.9.1_A3.3.js | application ran for longer than allowed maximum time
[task 2019-04-26T09:13:00.705Z] 09:13:00 INFO - remoteautomation.py | Application ran for: 1:50:49.448078
[task 2019-04-26T09:13:02.391Z] 09:13:02 INFO - REFTEST INFO | Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpPTQpi_/4c2ef0dc-18bd-c459-f2b2-5b20c3643e3e.dmp /builds/worker/workspace/build/symbols
[taskcluster:error] Task timeout after 7200 seconds. Force killing container.
[taskcluster 2019-04-26 09:13:20.899Z] === Task Finished ===
[taskcluster 2019-04-26 09:13:20.899Z] Unsuccessful task run with exit code: -1 completed in 7369.418 seconds
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 277•5 years ago
|
||
Most failures are android-em-7-0*. I have checked that these are not hung (recent entries in log at timeout) and that the same tasks normally runs in much less time on other pushes. Although android-performance.log artifacts are not available in this bug, I strongly suspect bug 1545308 and hope to see that bug re-open soon.
Reporter | ||
Comment 278•5 years ago
|
||
I expect these failures to continue and this bug to remain open indefinitely. Task timeouts are the last resort whenever a task is hung or otherwise runs for too long. My goal is to address timeouts which recur in the same task frequently and identify other related issues in dependent bugs.
I try to review failures here at least once each week. There is no point in pasting recent failure logs or failure counts in this bug. If a spike in frequency is noted, or you have insight into a failure, or I haven't commented on an on-going issue for more than a week, need-info is appreciated.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 280•5 years ago
|
||
(In reply to Intermittent Failures Robot from comment #279)
19 failures in 933 pushes (0.02 failures/push) were associated with this bug yesterday.
Repository breakdown:
- try: 11
Mass mis-classification.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 283•5 years ago
|
||
Most recent failures are on android-em-7-0-x86. These are mostly wpt-reftests which normally complete in 15 to 30 minutes but sometimes exceed 90 minutes. Evidence is not conclusive, but I hope these will be resolved by bug 1545308.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 285•5 years ago
|
||
Excellent improvement in Android 7.0 since workaround landed in bug 1552334.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 291•5 years ago
|
||
There are 28 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-06-18&endday=2019-06-25&tree=trunk&bug=1411358
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=253283740&repo=mozilla-central&lineNumber=118657
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 294•5 years ago
|
||
There are 41 total failures in the last 7 days on android-em-7-0-x86_64 debug, linux64 asan, linux64-qr debug, 1 on osx-cross and 1 one osx-shippable.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=254527369&repo=autoland&lineNumber=7420
[task 2019-07-03T03:16:10.735Z] 03:16:10 INFO - PID 18139 | -----------------------------------------------------
[task 2019-07-03T03:16:10.964Z] 03:16:10 INFO - Browser exited with return code 0
[task 2019-07-03T03:16:10.964Z] 03:16:10 INFO - PROCESS LEAKS None
[task 2019-07-03T03:16:10.965Z] 03:16:10 INFO - Closing logging queue
[task 2019-07-03T03:16:10.966Z] 03:16:10 INFO - queue closed
[task 2019-07-03T03:16:10.974Z] 03:16:10 INFO - INFO | runtests.py | ASan using symbolizer at /builds/worker/workspace/build/application/firefox/llvm-symbolizer
[task 2019-07-03T03:16:10.989Z] 03:16:10 INFO - LSan enabled.
[task 2019-07-03T03:16:10.990Z] 03:16:10 INFO - LSan using suppression file /builds/worker/workspace/build/tests/web-platform/prefs/lsan_suppressions.txt
[task 2019-07-03T03:16:10.991Z] 03:16:10 INFO - INFO | runtests.py | ASan running in default memory configuration
[task 2019-07-03T03:16:11.006Z] 03:16:11 INFO - Setting up ssl
[task 2019-07-03T03:16:11.063Z] 03:16:11 INFO - certutil |
[task 2019-07-03T03:16:11.123Z] 03:16:11 INFO - certutil |
[task 2019-07-03T03:16:11.164Z] 03:16:11 INFO - certutil |
[task 2019-07-03T03:16:11.164Z] 03:16:11 INFO - Certificate Nickname Trust Attributes
[task 2019-07-03T03:16:11.164Z] 03:16:11 INFO - SSL,S/MIME,JAR/XPI
[task 2019-07-03T03:16:11.165Z] 03:16:11 INFO -
[task 2019-07-03T03:16:11.165Z] 03:16:11 INFO - web-platform-tests CT,,
[task 2019-07-03T03:16:11.165Z] 03:16:11 INFO -
[task 2019-07-03T03:16:11.180Z] 03:16:11 INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpENo3lD.mozrunner
[task 2019-07-03T03:16:11.196Z] 03:16:11 INFO - Starting runner
[task 2019-07-03T03:16:13.153Z] 03:16:13 INFO - PID 18469 | 1562123773138 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: mozillaAddons
[task 2019-07-03T03:16:13.154Z] 03:16:13 INFO - PID 18469 | 1562123773139 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: telemetry
[task 2019-07-03T03:16:13.155Z] 03:16:13 INFO - PID 18469 | 1562123773140 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: resource://pdf.js/
[task 2019-07-03T03:16:13.156Z] 03:16:13 INFO - PID 18469 | 1562123773141 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: about:reader*
[task 2019-07-03T03:16:21.769Z] 03:16:21 INFO - PID 18469 | 1562123781762 Marionette INFO Listening on port 42034
[task 2019-07-03T03:16:22.525Z] 03:16:22 INFO - TEST-START | /websockets/keeping-connection-open/001.html
[taskcluster:error] Task timeout after 7200 seconds. Force killing container.
[taskcluster 2019-07-03 03:17:10.911Z] === Task Finished ===
[taskcluster 2019-07-03 03:17:10.911Z] Unsuccessful task run with exit code: -1 completed in 7202.153 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 297•5 years ago
|
||
There are 62 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-07-06&endday=2019-07-13&tree=trunk&bug=1411358
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256280521&repo=mozilla-central&lineNumber=9212
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 299•5 years ago
|
||
Most recent failures are bug 1562078, which should be resolved soon.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 302•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #299)
Most recent failures are bug 1562078, which should be resolved soon.
Unfortunately, those continue, even after the wpt emulator upgrade.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 306•5 years ago
|
||
There have been several static-analysis-auto-st-autotest 3600 second timeouts lately, but the most recent tasks are completing in just 13 minutes.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 311•5 years ago
|
||
Bug 1574254 temporarily introduced some android xpcshell task timeouts.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 316•5 years ago
|
||
In the last 7 days there have been 27 occurrences on Linux 64, Linux 64 ccov, OS X cross noopt, build types debug and opt.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=266479367&repo=mozilla-central&lineNumber=39024
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 318•5 years ago
|
||
More than half the current failures here are timeouts in build tasks on various osx and linux platforms. Recently build peers have resisted efforts to increase max-run-time to allow for performance variation; I don't know what else can be done.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 321•5 years ago
|
||
++(comment 318)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 324•5 years ago
|
||
++(comment 318)
Reporter | ||
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 329•5 years ago
|
||
++(comment 318)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 333•5 years ago
|
||
There are 26 total failures in the last 7 days on android-em-7-0-x86_64 debug and some failures on linux, windows and osx.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=278373433&repo=autoland&lineNumber=7194
[vcs 2019-11-27T10:30:04.128Z] clone [==========================================> ] 3203918168/3346631639 43s
[vcs 2019-11-27T10:30:05.130Z] clone [==========================================> ] 3206277464/3346631639 42s
[vcs 2019-11-27T10:30:06.133Z] clone [==========================================> ] 3209816408/3346631639 41s
[taskcluster:error] Task timeout after 7200 seconds. Force killing container.
[vcs 2019-11-27T10:30:07.150Z] clone [==========================================> ] 3213093208/3346631639 40s
[taskcluster 2019-11-27 10:30:16.425Z] === Task Finished ===
[taskcluster 2019-11-27 10:30:16.460Z] Unsuccessful task run with exit code: -1 completed in 9144.754 seconds
Comment hidden (Intermittent Failures Robot) |
Comment 335•5 years ago
|
||
In the last 7 days there have been 21 occurrences on android (1 on andoid 4, 3 on andoid 5), linux 64 and osx-cross, mostly on build types debug and opt.
Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=279969270&repo=autoland&lineNumber=3218
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 338•5 years ago
|
||
In the last 7 days there have been 24 occurrences on linux 64 and osx-cross-noopt, build types asan and debug.
Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=280333995&repo=autoland&lineNumber=48848
[task 2019-12-09T19:34:01.024Z] 19:34:01 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/layout/tools/recording'
[task 2019-12-09T19:34:01.024Z] 19:34:01 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/layout/tools/reftest'
[task 2019-12-09T19:34:01.024Z] 19:34:01 INFO - mkdir -p '../../../dist/xpi-stage/reftest/chrome/'
[task 2019-12-09T19:34:01.024Z] 19:34:01 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/layout/tools/reftest'
[taskcluster:error] Task timeout after 3600 seconds. Force killing container.
[taskcluster 2019-12-09 19:34:42.469Z] === Task Finished ===
[taskcluster 2019-12-09 19:34:42.469Z] Unsuccessful task run with exit code: -1 completed in 3652.916 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 347•4 years ago
|
||
In the last 7 days there have been 34 occurrences on linux 64 and osx-cross-noopt, build types asan and debug.
recent log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=284270373&repo=autoland&lineNumber=21893
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 351•4 years ago
|
||
Most recent failures are from a variety of build tasks. I am hoping the discussion in bug 1610998 may help to resolve these.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 355•4 years ago
|
||
The recent spike here is caused by Bug 1614852.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 360•4 years ago
|
||
There are 33 failures in the last 7 days, most of the occurrences can be seen on linux1804-64 (debug), linux1804-64-asan (opt), linux64-tsan (opt): https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-02-14&endday=2020-02-21&tree=trunk&bug=1411358
A recent log example: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=289769127&repo=mozilla-central&lineNumber=22442
Comment 361•4 years ago
|
||
I will with Geoff's help work on getting Ubuntu 18.04 in better shape.
Comment 362•4 years ago
|
||
In the last week we saw for linux1804
count | label | project | platform | buildtype |
---|---|---|---|---|
1 | mochitest-browser-chrome-e10s-13 | autoland | linux1804-64 | debug |
1 | mochitest-browser-chrome-fis-e10s-9 | autoland | linux1804-64 | debug |
1 | mochitest-webgl2-ext-fis-e10s-4 | autoland | linux1804-64-qr | debug |
1 | reftest-no-accel-e10s-3 | autoland | linux1804-64 | debug |
1 | reftest-no-accel-e10s-7 | autoland | linux1804-64 | debug |
6 | web-platform-tests-e10s-2 | autoland | linux1804-64-asan | opt |
2 | web-platform-tests-e10s-2 | mozilla-central | linux1804-64-asan | opt |
3 | web-platform-tests-e10s-5 | autoland | linux1804-64-asan | opt |
1 | web-platform-tests-sw-e10s-11 | mozilla-central | linux1804-64 | debug |
Edwin has already changed linux64 to linux.*64 and Geoff just today changed the chunking for web-platform-tests asan from 24 to 28 in Bug 1617026 so most of this is resolved.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 367•4 years ago
|
||
There are a total of 35 failures on this bug over the last 7 days. These failures take place on android-em-7-0-x86_64, linux1804-64, linux1804-64-asan, linux1804-64-qr, linux32-shippable, linux64, linux64-asan-reporter, linux64-shippable, linux64-tsan, osx-cross, osx-shippable
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=291938632&repo=autoland&lineNumber=5082
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 370•4 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Reporter | ||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 375•4 years ago
|
||
There may be a problem with android-em machine-49 - temporary? Watching...
Various linux/asan tasks timed out - temporary? Watching...
This bug is in fairly good shape overall.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 380•4 years ago
|
||
In the last 7 days there have been 71 occurrences on android-em-7-0-x86_64 and linux1804-64 build types debug and opt.
Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=300435464&repo=mozilla-central&lineNumber=10564
Comment hidden (Intermittent Failures Robot) |
Comment 382•4 years ago
|
||
Joel, I believe a couple of backlog jobs started permafailing by timing out:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&searchStr=backlog&tochange=8e4a846c1c5bdb6f5c1d871b4d0e826a2a09b79e&fromchange=262c8adb52655e2028595d9838903b8b5a0a87da&selectedTaskRun=YKhUJO18Tr-R7MIGdoJ5TQ-0
And I believe the cause would be https://bugzilla.mozilla.org/show_bug.cgi?id=1632086#c8
Could you please take a look?
Comment 383•4 years ago
|
||
yes, I am working on this, I will do it as part of Bug 1634230
Reporter | ||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 395•4 years ago
|
||
We are suddenly seeing a spike of failures on the nss branch -- I'm not sure what to do about those.
Comment 396•4 years ago
|
||
why are the nss branch failures getting annotated? maybe for bugzilla and failure robot we should only focus on specific branches
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 401•4 years ago
|
||
Still dominated by failures on nss.
Comment 402•4 years ago
|
||
I don't know why ./mach try auto
runs test-android-em-7.0-x86_64/debug-geckoview-xpcshell-e10s but m-c runs test-android-em-7.0-x86_64/debug-geckoview-xpcshell-e10s-1/2/3/4, but perhaps that could lead to more failures on try.
Reporter | ||
Comment 403•4 years ago
|
||
:ahal - interested in comment 402? It looks like 'try auto' tried to run way too many android xpcshell test manifests in a single chunk.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 410•4 years ago
|
||
In the last 7 days there have been 118 occurrences on android-em-7-0-x86_64, linux1804-64-qr and linux1804-64-tsan, build types debug and opt.
Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=311694659&repo=autoland&lineNumber=12480
Reporter | ||
Comment 411•4 years ago
|
||
Spike in android-em xpcshell test failures was caused by manifest scheduling running all tests in one chunk: now corrected.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 437•3 years ago
|
||
The recent spike here was caused by a land in Bug 1683797 that was backed out: https://hg.mozilla.org/integration/autoland/rev/385a17fb41242fe88e1ec94dea355310952b3abb
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 470•3 years ago
|
||
Update:
There have been 75 failures within the last 7 days:
• 5 failures on OS X Cross Compiled debug
• 1 failures on Linux 18.04 x64 CCov WebRender opt
• 29 failures on Android 7.0 x86-64 WebRender debug
• 26 failures on android-em-7-0-x86_64-lite-qr debug
• 3 failures on Android 5.0 x86-64 debug
• 3 failures on Android 5.0 x86-64 opt
• 3 failures on android-5-0-x86 opt
• 3 failures on android-5-0-armv7 opt
• 2 failures on Android 5.0 AArch64 opt
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=347728355&repo=autoland&lineNumber=40865
[task 2021-08-07T11:01:57.171Z] 11:01:57 INFO - Running: /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python /builds/worker/checkouts/gecko/toolkit/crashreporter/tools/symbolstore.py -c --vcs-info --install-manifest=/builds/worker/workspace/obj-build/_build_manifests/install/dist_include,/builds/worker/workspace/obj-build/dist/include -s /builds/worker/checkouts/gecko /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/dist/crashreporter-symbols /builds/worker/workspace/obj-build/mfbt/tests/TestXorShift128PlusRNG
[task 2021-08-07T11:01:57.171Z] 11:01:57 INFO - Beginning work for file: /builds/worker/workspace/obj-build/mfbt/tests/TestXorShift128PlusRNG
[task 2021-08-07T11:01:57.171Z] 11:01:57 INFO - Processing file: /builds/worker/workspace/obj-build/mfbt/tests/TestXorShift128PlusRNG
[task 2021-08-07T11:01:57.171Z] 11:01:57 INFO - /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/mfbt/tests/TestXorShift128PlusRNG
[task 2021-08-07T11:01:57.171Z] 11:01:57 INFO - Finished processing /builds/worker/workspace/obj-build/mfbt/tests/TestXorShift128PlusRNG in 0.01s
[task 2021-08-07T11:01:57.172Z] 11:01:57 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/mfbt/tests'
[task 2021-08-07T11:01:58.034Z] 11:01:58 INFO - make[4]: Entering directory '/builds/worker/workspace/obj-build/mfbt/tests'
[task 2021-08-07T11:01:58.034Z] 11:01:58 INFO - /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python -m mozbuild.action.dumpsymbols /builds/worker/workspace/obj-build/mfbt/tests/TestSPSCQueue /builds/worker/workspace/obj-build/mfbt/tests/TestSPSCQueue_syms.track
[task 2021-08-07T11:01:58.034Z] 11:01:58 INFO - Running: /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python /builds/worker/checkouts/gecko/toolkit/crashreporter/tools/symbolstore.py -c --vcs-info --install-manifest=/builds/worker/workspace/obj-build/_build_manifests/install/dist_include,/builds/worker/workspace/obj-build/dist/include -s /builds/worker/checkouts/gecko /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/dist/crashreporter-symbols /builds/worker/workspace/obj-build/mfbt/tests/TestSPSCQueue
[task 2021-08-07T11:01:58.034Z] 11:01:58 INFO - Beginning work for file: /builds/worker/workspace/obj-build/mfbt/tests/TestSPSCQueue
[task 2021-08-07T11:01:58.035Z] 11:01:58 INFO - Processing file: /builds/worker/workspace/obj-build/mfbt/tests/TestSPSCQueue
[task 2021-08-07T11:01:58.035Z] 11:01:58 INFO - /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/mfbt/tests/TestSPSCQueue
[task 2021-08-07T11:01:58.035Z] 11:01:58 INFO - Finished processing /builds/worker/workspace/obj-build/mfbt/tests/TestSPSCQueue in 0.02s
[task 2021-08-07T11:01:58.035Z] 11:01:58 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/mfbt/tests'
[task 2021-08-07T11:01:58.882Z] 11:01:58 INFO - make[4]: Entering directory '/builds/worker/workspace/obj-build/mfbt/tests'
[task 2021-08-07T11:01:58.883Z] 11:01:58 INFO - /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python -m mozbuild.action.dumpsymbols /builds/worker/workspace/obj-build/mfbt/tests/TestThreadSafeWeakPtr /builds/worker/workspace/obj-build/mfbt/tests/TestThreadSafeWeakPtr_syms.track
[task 2021-08-07T11:01:58.883Z] 11:01:58 INFO - Running: /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python /builds/worker/checkouts/gecko/toolkit/crashreporter/tools/symbolstore.py -c --vcs-info --install-manifest=/builds/worker/workspace/obj-build/_build_manifests/install/dist_include,/builds/worker/workspace/obj-build/dist/include -s /builds/worker/checkouts/gecko /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/dist/crashreporter-symbols /builds/worker/workspace/obj-build/mfbt/tests/TestThreadSafeWeakPtr
[task 2021-08-07T11:01:58.883Z] 11:01:58 INFO - Beginning work for file: /builds/worker/workspace/obj-build/mfbt/tests/TestThreadSafeWeakPtr
[task 2021-08-07T11:01:58.883Z] 11:01:58 INFO - Processing file: /builds/worker/workspace/obj-build/mfbt/tests/TestThreadSafeWeakPtr
[task 2021-08-07T11:01:58.883Z] 11:01:58 INFO - /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/mfbt/tests/TestThreadSafeWeakPtr
[task 2021-08-07T11:01:58.884Z] 11:01:58 INFO - Finished processing /builds/worker/workspace/obj-build/mfbt/tests/TestThreadSafeWeakPtr in 0.01s
[task 2021-08-07T11:01:58.884Z] 11:01:58 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/mfbt/tests'
[task 2021-08-07T11:01:59.731Z] 11:01:59 INFO - make[4]: Entering directory '/builds/worker/workspace/obj-build/mfbt/tests'
[task 2021-08-07T11:01:59.732Z] 11:01:59 INFO - /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python -m mozbuild.action.dumpsymbols /builds/worker/workspace/obj-build/mfbt/tests/TestUtf8 /builds/worker/workspace/obj-build/mfbt/tests/TestUtf8_syms.track
[task 2021-08-07T11:01:59.732Z] 11:01:59 INFO - Running: /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python /builds/worker/checkouts/gecko/toolkit/crashreporter/tools/symbolstore.py -c --vcs-info --install-manifest=/builds/worker/workspace/obj-build/_build_manifests/install/dist_include,/builds/worker/workspace/obj-build/dist/include -s /builds/worker/checkouts/gecko /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/dist/crashreporter-symbols /builds/worker/workspace/obj-build/mfbt/tests/TestUtf8
[task 2021-08-07T11:01:59.732Z] 11:01:59 INFO - Beginning work for file: /builds/worker/workspace/obj-build/mfbt/tests/TestUtf8
[task 2021-08-07T11:01:59.732Z] 11:01:59 INFO - Processing file: /builds/worker/workspace/obj-build/mfbt/tests/TestUtf8
[task 2021-08-07T11:01:59.732Z] 11:01:59 INFO - /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/mfbt/tests/TestUtf8
[task 2021-08-07T11:01:59.732Z] 11:01:59 INFO - Finished processing /builds/worker/workspace/obj-build/mfbt/tests/TestUtf8 in 0.01s
[task 2021-08-07T11:01:59.732Z] 11:01:59 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/mfbt/tests'
[task 2021-08-07T11:02:00.590Z] 11:02:00 INFO - make[4]: Entering directory '/builds/worker/workspace/obj-build/mfbt/tests'
[task 2021-08-07T11:02:00.590Z] 11:02:00 INFO - /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python -m mozbuild.action.dumpsymbols /builds/worker/workspace/obj-build/mfbt/tests/TestPoisonArea /builds/worker/workspace/obj-build/mfbt/tests/TestPoisonArea_syms.track
[task 2021-08-07T11:02:00.590Z] 11:02:00 INFO - Running: /builds/worker/workspace/obj-build/_virtualenvs/common/bin/python /builds/worker/checkouts/gecko/toolkit/crashreporter/tools/symbolstore.py -c --vcs-info --install-manifest=/builds/worker/workspace/obj-build/_build_manifests/install/dist_include,/builds/worker/workspace/obj-build/dist/include -s /builds/worker/checkouts/gecko /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/dist/crashreporter-symbols /builds/worker/workspace/obj-build/mfbt/tests/TestPoisonArea
[task 2021-08-07T11:02:00.591Z] 11:02:00 INFO - Beginning work for file: /builds/worker/workspace/obj-build/mfbt/tests/TestPoisonArea
[task 2021-08-07T11:02:00.591Z] 11:02:00 INFO - Processing file: /builds/worker/workspace/obj-build/mfbt/tests/TestPoisonArea
[task 2021-08-07T11:02:00.591Z] 11:02:00 INFO - /builds/worker/fetches/dump_syms/dump_syms /builds/worker/workspace/obj-build/mfbt/tests/TestPoisonArea
[task 2021-08-07T11:02:00.591Z] 11:02:00 INFO - Finished processing /builds/worker/workspace/obj-build/mfbt/tests/TestPoisonArea in 0.01s
[task 2021-08-07T11:02:00.591Z] 11:02:00 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/mfbt/tests'
[taskcluster:error] Task timeout after 7200 seconds. Force killing container.
[taskcluster 2021-08-07 12:50:50.155Z] === Task Finished ===
[taskcluster 2021-08-07 12:50:50.155Z] Unsuccessful task run with exit code: -1 completed in 7220.117 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 475•3 years ago
|
||
Geoff, there have been two instances of webdriver jobs failing with a large failure log and lots of failure lines like this one:
Connection refused (os error 111), should they be tracked in a separate bug?
Reporter | ||
Comment 476•3 years ago
|
||
If it's just two instances, I wouldn't bother. In general, I usually ignore any failure in this bug that happens less than 10 times per week.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 547•1 year ago
|
||
There have been 53 total failures in the last 7 days, recent failure log.
Affected platforms are:
- linux1804-64-asan-qr
- linux1804-64-qr
- symbols
[task 2022-12-11T00:23:33.303Z] 00:23:33 INFO - TEST-START | services/fxaccounts/tests/xpcshell/test_commands.js
[taskcluster:error] Task timeout after 7200 seconds. Force killing container.
[taskcluster 2022-12-11 00:26:54.857Z] === Task Finished ===
[taskcluster 2022-12-11 00:26:54.858Z] Unsuccessful task run with exit code: -1 completed in 7201.887 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 591•11 months ago
|
||
Latest spike on Android xpcshell timeout runs is from https://bugzilla.mozilla.org/show_bug.cgi?id=1842167#c5.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•10 months ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 616•9 months ago
|
||
175 failures from 225 in the last 30 days are on xpcshell android-em-7-0-x86_64 both debug and opt runs.
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2023-08-01&endday=2023-08-31&tree=trunk&failurehash=all&bug=1411358
Joel, got any time to look over these ones? Thank you.
Comment 617•9 months ago
|
||
thanks for needinfo me, I see that we have a common pattern for android emulator:
[task 2023-08-31T12:02:35.112Z] 12:02:35 INFO - SUITE-END | took 2349s
[task 2023-08-31T12:02:35.112Z] 12:02:35 INFO - Node moz-http2 server shutting down ...
[task 2023-08-31T12:02:35.112Z] 12:02:35 INFO - Process stdout
[task 2023-08-31T12:02:35.112Z] 12:02:35 INFO - forked process without handler sent: {"error":"","errorStack":""}
[task 2023-08-31T12:02:35.112Z] 12:02:35 INFO - error: Command failed: /builds/worker/fetches/android-sdk-linux/platform-tools/adb reverse tcp:39137 tcp:39137
[task 2023-08-31T12:02:35.112Z] 12:02:35 INFO - adb: error: cannot bind listener: Address already in use
[taskcluster:error] Task timeout after 5400 seconds. Force killing container.
[taskcluster 2023-08-31 12:51:08.587Z] === Task Finished ===
[taskcluster 2023-08-31 12:51:08.588Z] Unsuccessful task run with exit code: -1 completed in 5471.759 seconds
the line [task 2023-08-31T12:02:35.112Z] 12:02:35 INFO - forked process without handler sent: {"error":"","errorStack":""}
is the last one I can find, I assume that we fork a process which then tries to do adb reverse
and we get Address already in use
. A few thoughts:
- we run many adb sessions at a time for a single host, we could really be out of addresses
- do we free the ports, or do they "timeout" over time?
- I am not sure where or how we are calling
adb reverse
, I not a lot of references for reverse in-tree.
the error I see comes from here:
console.log(
`forked process without handler sent: ${JSON.stringify(msg)}`
);
:aerickson, can you check the host machines to see if we are running out of addresses regularly? maybe a lot of stale connections via netstat or something like that?
:valentin, I see that you had added the code for the forking process, do you have thoughts on how this might be calling adb reverse
?
Comment 618•9 months ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #617)
- I am not sure where or how we are calling
adb reverse
, I not a lot of references for reverse in-tree.
We are calling adb reverse
here
let command = `${adb_path} reverse tcp:${port} tcp:${port}`;
if (remove) {
command = `${adb_path} reverse --remove tcp:${port}`;
return true;
}
I see Kershaw tried to fix a similar issue a few months ago by implementing a retry mechanism in bug 1842651, but I'm not sure it was entirely successful.
I assume that we fork a process which then tries to do adb reverse
and we get Address already in use
. A few thoughts:
- we run many adb sessions at a time for a single host, we could really be out of addresses
There is a small number of tests where we actually exercise this code. I suppose test-verify could cause it to run more times than expected. We do cleanup after the server, but to be honest I'm not sure if that actually closes the socket. (Worth trying to find out).
- do we free the ports, or do they "timeout" over time?
We do clean up in the code I linked above, but while looking into this I found 1, 2 places that don't clean up after themselves.
I filed bug 1851038 to fix them.
Comment hidden (Intermittent Failures Robot) |
Comment 620•9 months ago
|
||
Only 1 failure after bug 1851038 landed, but the counts are way down.
Updated•9 months ago
|
Updated•9 months ago
|
Description
•