Closed Bug 1440995 Opened 7 years ago Closed 7 years ago

Intermittent talos OS X Aborting task - max run time exceeded! followed by Exit Code: -1

Categories

(Testing :: Talos, defect)

Version 3
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1445580

People

(Reporter: apavel, Unassigned)

References

Details

(Whiteboard: [stockwell infra])

Revision: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=bfe62272d2a21f9d10d45e5aaa680f5c735604ae Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=164190404&repo=mozilla-central 02:59:47 INFO - TEST-INFO | started process 2511 (/Users/cltbld/tasks/task_1519524388/build/application/Firefox Nightly.app/Contents/MacOS/firefox -foreground -profile /var/folders/9g/1jk_l9qn7pn68yzgn2443qzc00000w/T/tmpQmZUGh/profile http://localhost:49225/startup_test/tresize/addon/content/tresize-test.html?gecko_profile_entries=1000000&gecko_profile_threads=GeckoMain%2CCompositor&gecko_profile_dir=%2Fvar%2Ffolders%2F9g%2F1jk_l9qn7pn68yzgn2443qzc00000w%2FT%2Ftmp1WLxYm&gecko_profile_interval=2 http://localhost:49225/startup_test/tresize/addon/content/tresize-test.html?gecko_profile_entries=1000000&gecko_profile_threads=GeckoMain%2CCompositor&gecko_profile_dir=%2Fvar%2Ffolders%2F9g%2F1jk_l9qn7pn68yzgn2443qzc00000w%2FT%2Ftmp1WLxYm&gecko_profile_interval=2) 02:59:47 INFO - PID 2511 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath 02:59:48 INFO - PID 2511 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath 02:59:49 INFO - PID 2511 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath 02:59:49 INFO - PID 2511 | 2018-02-25 02:59:49.392 plugin-container[2512:33809] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x9d3b, name = 'com.apple.tsm.portname' 02:59:49 INFO - PID 2511 | See /usr/include/servers/bootstrap_defs.h for the error codes. 02:59:49 INFO - PID 2511 | 2018-02-25 02:59:49.421 plugin-container[2513:33831] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x924b, name = 'com.apple.tsm.portname' 02:59:49 INFO - PID 2511 | See /usr/include/servers/bootstrap_defs.h for the error codes. 02:59:50 INFO - PID 2511 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath 02:59:50 INFO - PID 2511 | 2018-02-25 02:59:50.758 plugin-container[2514:33941] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x963f, name = 'com.apple.tsm.portname' 02:59:50 INFO - PID 2511 | See /usr/include/servers/bootstrap_defs.h for the error codes. 03:00:03 INFO - PID 2511 | 2018-02-25 03:00:02.999 firefox[2511:33804] Persistent UI failed to open file file:///Users/cltbld/Library/Saved%20Application%20State/org.mozilla.nightly.savedState/window_1.data: No such file or directory (2) 03:00:05 INFO - PID 2511 | __start_report28.62__end_report 03:00:05 INFO - PID 2511 | __startTimestamp1519556405340__endTimestamp 03:00:06 INFO - PID 2511 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping 03:00:06 INFO - TEST-INFO | 2511: exit 0 [taskcluster 2018-02-25T11:00:17.346Z] Aborting task - max run time exceeded! [taskcluster 2018-02-25T11:00:17.350Z] Exit Code: -1 [taskcluster 2018-02-25T11:00:17.350Z] === Task Finished === [taskcluster 2018-02-25T11:00:17.350Z] Task Duration: 14m58.809909888s [taskcluster 2018-02-25T11:00:17.980Z] Uploading artifact public/logs/localconfig.json from file logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:18.439Z] Uploading artifact public/logs/talos_critical.log from file logs/talos_critical.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:18.797Z] Uploading artifact public/logs/talos_error.log from file logs/talos_error.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:19.176Z] Uploading artifact public/logs/talos_fatal.log from file logs/talos_fatal.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:19.569Z] Uploading artifact public/logs/talos_info.log from file logs/talos_info.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:19.993Z] Uploading artifact public/logs/talos_raw.log from file logs/talos_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:20.539Z] Uploading artifact public/logs/talos_warning.log from file logs/talos_warning.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:20.908Z] Uploading artifact public/test_info/chromez-e10s_errorsummary.log from file build/blobber_upload_dir/chromez-e10s_errorsummary.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:21.295Z] Uploading artifact public/test_info/chromez-e10s_raw.log from file build/blobber_upload_dir/chromez-e10s_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:21.697Z] Uploading artifact public/test_info/missingsymbols.zip from file build/blobber_upload_dir/missingsymbols.zip with content encoding "", mime type "application/zip" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:22.578Z] Uploading artifact public/test_info/profile_about_preferences_basic.zip from file build/blobber_upload_dir/profile_about_preferences_basic.zip with content encoding "", mime type "application/zip" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:23.241Z] Uploading artifact public/test_info/profile_tresize.zip from file build/blobber_upload_dir/profile_tresize.zip with content encoding "", mime type "application/zip" and expiry 2019-02-25T10:03:27.189Z [taskcluster 2018-02-25T11:00:23.574Z] Task not successful due to following exception(s): [taskcluster 2018-02-25T11:00:23.574Z] Exception 1) [taskcluster 2018-02-25T11:00:23.574Z] signal: killed
Over the last 7 days this bug has 30 failures. These happen on OS X 10.10 and macosx64-nightly Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=165946894
Flags: needinfo?(rwood)
Whiteboard: [stockwell needswork]
these are all on profiling jobs- I suspect there is something broken there- possibly a quick fix.
Since 11.03 this fail spikes up and now there are 75 failures in the last 3 days. https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1440995&startday=2018-03-11&endday=2018-03-13&tree=all Most of them are on OS X 10.10 and only 6 on macosx64-nightly. There are 2 on Linux but those are starred wrong :rwood, can you please take a look?
this seems to be mostly related to the time to download artifact [1]: 3:14:41 INFO - u'xpcshell': [u'target.common.tests.zip', u'target.xpcshell.tests.zip']} 23:14:41 INFO - Downloading packages: [u'target.common.tests.zip', u'target.mochitest.tests.zip'] for test suite categories: ['mochitest'] 23:14:41 INFO - Downloading and extracting to /Users/cltbld/tasks/task_1520920450/build/tests these dirs * from https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip 23:14:41 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip'}, attempt #1 23:14:41 INFO - Fetch https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip into memory 23:15:01 INFO - Content-Length response header: 36551587 23:15:01 INFO - Bytes received: 36551587 23:15:18 INFO - Downloading and extracting to /Users/cltbld/tasks/task_1520920450/build/tests these dirs * from https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip 23:15:18 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip'}, attempt #1 23:15:18 INFO - Fetch https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip into memory 23:52:34 INFO - Content-Length response header: 78551031 23:52:34 INFO - Bytes received: 78551031 23:52:46 INFO - Downloading https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.dmg to /Users/cltbld/tasks/task_1520920450/installer.dmg 23:52:46 INFO - retry: Calling _download_file with args: (), kwargs: {'url': 'https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.dmg', 'file_name': '/Users/cltbld/tasks/task_1520920450/installer.dmg'}, attempt #1 23:53:00 INFO - Downloaded 73040179 bytes. 23:53:00 INFO - Setting buildbot property build_url to https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.dmg 23:53:00 INFO - mkdir: /Users/cltbld/tasks/task_1520920450/properties 23:53:00 INFO - Writing buildbot properties ['build_url'] to /Users/cltbld/tasks/task_1520920450/properties/build_url 23:53:00 INFO - Writing to file /Users/cltbld/tasks/task_1520920450/properties/build_url that is almost 30 minutes to download target.common.tests.zip which is 78MB. In other cases [2], [3] we time out before finishing the download of a .zip or python package. Here is a snippet from [4]: 15:04:59 INFO - Collecting simplejson 15:06:15 INFO - Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x1098fbd10>: Failed to establish a new connection: [Errno 60] Operation timed out',)': /pub 15:07:30 INFO - Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x1098fb1d0>: Failed to establish a new connection: [Errno 60] Operation timed out',)': /pub 15:08:46 INFO - Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x109a1d090>: Failed to establish a new connection: [Errno 60] Operation timed out',)': /pub I am tempted to make this a duplicate of bug 1443130, needinfo :dragrom as that is the next step on bug 1443130 as well. [1] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167596727 [2] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167596731 [3] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167793330 [4] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167775711
Flags: needinfo?(rwood) → needinfo?(dcrisan)
Tested on t-yosemite-r7-423.test.releng.mdc1.mozilla.com: [root@t-yosemite-r7-423.test.releng.mdc1.mozilla.com ~]# wget https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip; rm target.common.tests.zip --2018-03-14 05:24:24-- https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip Resolving queue.taskcluster.net... 50.16.228.78, 107.22.197.53, 50.17.218.87 Connecting to queue.taskcluster.net|50.16.228.78|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.common.tests.zip [following] --2018-03-14 05:24:25-- https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.common.tests.zip Resolving taskcluster-artifacts.net... 54.192.212.56 Connecting to taskcluster-artifacts.net|54.192.212.56|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 36551587 (35M) [application/zip] Saving to: 'target.common.tests.zip' 100%[==================================================================================================================================================================>] 36,551,587 6.30MB/s in 11s 2018-03-14 05:24:37 (3.26 MB/s) - 'target.common.tests.zip' saved [36551587/36551587] [root@t-yosemite-r7-423.test.releng.mdc1.mozilla.com ~]# wget https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip; rm target.common.tests.zip --2018-03-14 05:24:43-- https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip Resolving queue.taskcluster.net... 50.16.228.78, 107.22.197.53, 50.17.218.87 Connecting to queue.taskcluster.net|50.16.228.78|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.common.tests.zip [following] --2018-03-14 05:24:44-- https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.common.tests.zip Resolving taskcluster-artifacts.net... 54.192.212.56 Connecting to taskcluster-artifacts.net|54.192.212.56|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 36551587 (35M) [application/zip] Saving to: 'target.common.tests.zip' 100%[==================================================================================================================================================================>] 36,551,587 5.32MB/s in 7.7s 2018-03-14 05:24:52 (4.53 MB/s) - 'target.common.tests.zip' saved [36551587/36551587] [root@t-yosemite-r7-423.test.releng.mdc1.mozilla.com ~]# wget https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip; rm target.mochitest.tests.zip --2018-03-14 05:26:14-- https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip Resolving queue.taskcluster.net... 50.16.228.78, 107.22.197.53, 50.17.218.87 Connecting to queue.taskcluster.net|50.16.228.78|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.mochitest.tests.zip [following] --2018-03-14 05:26:15-- https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.mochitest.tests.zip Resolving taskcluster-artifacts.net... 54.192.212.56 Connecting to taskcluster-artifacts.net|54.192.212.56|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 78551031 (75M) [application/zip] Saving to: 'target.mochitest.tests.zip' 100%[==================================================================================================================================================================>] 78,551,031 5.71MB/s in 18s 2018-03-14 05:26:35 (4.16 MB/s) - 'target.mochitest.tests.zip' saved [78551031/78551031] I also used curl instead of wget: [root@t-yosemite-r7-423.test.releng.mdc1.mozilla.com ~]# curl -L -O https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip; rm target.mochitest.tests.zip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 124 100 124 0 0 104 0 0:00:01 0:00:01 --:--:-- 104 100 74.9M 100 74.9M 0 0 4813k 0 0:00:15 0:00:15 --:--:-- 6135k looks like the average speed is around 4,5 MB/s If I run wget or curl on individual hosts or on the max 3 hosts in the same time, the download speed looks OK.(I tested 3 downloads in the same time from 3 different hosts). Will be interesting to see the network load if more workers will try to download the artifacts in the same time..... :dividehex Who can give us a network load for the last 24 hours? I suspect a network bottleneck, when many worker try to download the artifacts in the same time....
Flags: needinfo?(dcrisan) → needinfo?(jwatkins)
Blocks: 1446784
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(jwatkins)
Resolution: --- → DUPLICATE
Whiteboard: [stockwell disable-recommended] → [stockwell infra]
You need to log in before you can comment on or make changes to this bug.