Closed Bug 1443130 Opened 7 years ago Closed 7 years ago

Intermittent [taskcluster:error] Task killed because maxRunTime was exceeded

Categories

(Testing :: Talos, defect, P5)

Version 3
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1453007

People

(Reporter: aryx, Assigned: rwood)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(1 obsolete file)

+++ This bug was initially created as a clone of Bug #1442736 +++ central-as-beta simulation hit this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7c4bd5b1fb1caefb97d75dec60bfa3e78a61c03&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&selectedJob=165959208 03:24:35 INFO - TEST-INFO | started process 31919 (/home/cltbld/workspace/build/application/firefox/firefox -profile /tmp/tmplXSJYV/profile) 03:24:35 INFO - PID 31919 | MOZ_EVENT_TRACE start 1520249075864 03:24:35 INFO - PID 31919 | MOZ_EVENT_TRACE sample 1520249075931 35.656119 03:24:36 INFO - PID 31919 | MOZ_EVENT_TRACE sample 1520249076146 166.923099 03:24:36 INFO - PID 31919 | 1520249076146 addons.webextension.talos@mozilla.org WARN Please specify whether you want browser_style or not in your page_action options. 03:24:36 INFO - PID 31919 | 1520249076148 addons.webextension.talos@mozilla.org WARN Please specify whether you want browser_style or not in your browser_action options. 03:24:36 INFO - PID 31919 | MOZ_EVENT_TRACE sample 1520249076205 37.322715 03:24:36 INFO - PID 31919 | MOZ_EVENT_TRACE sample 1520249076226 20.881010 03:24:36 INFO - PID 31919 | MOZ_EVENT_TRACE sample 1520249076307 46.899740 03:24:36 INFO - PID 31919 | 03:24:36 INFO - PID 31919 | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost 03:24:36 INFO - PID 31919 | 03:24:36 INFO - PID 31919 | MOZ_EVENT_TRACE sample 1520249076957 35.354026 03:24:37 INFO - PID 31919 | 03:24:37 INFO - PID 31919 | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost 03:24:37 INFO - PID 31919 | 03:24:37 INFO - PID 31919 | 03:24:37 INFO - PID 31919 | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost 03:24:37 INFO - PID 31919 |
there are many instances of this failure on linux over the weekend and today- the reason why- it takes 20 minutes to download target.tar.bz2: https://taskcluster-artifacts.net/V7M8ep8JQQG1X2u8k3zOpA/0/public/logs/live_backing.log: 18:32:06 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.talos.tests.zip'}, attempt #1 18:32:06 INFO - Fetch https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.talos.tests.zip into memory 18:32:25 INFO - Content-Length response header: 14031842 18:32:25 INFO - Bytes received: 14031842 18:32:26 INFO - Downloading https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.tar.bz2 to /home/cltbld/workspace/build/target.tar.bz2 18:32:26 INFO - retry: Calling _download_file with args: (), kwargs: {'url': 'https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.tar.bz2', 'file_name': '/home/cltbld/workspace/build/target.tar.bz2'}, attempt #1 18:55:13 INFO - Downloaded 62092291 bytes. 18:55:13 INFO - Setting buildbot property build_url to https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.tar.bz2 18:55:13 INFO - Writing buildbot properties ['build_url'] to /home/cltbld/workspace/properties/build_url 18:55:13 INFO - Writing to file /home/cltbld/workspace/properties/build_url but in a normal passing scenario (https://taskcluster-artifacts.net/PR_Wu-3oRiiABz3oPovA-w/0/public/logs/live_backing.log): 09:42:59 INFO - Fetch https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.talos.tests.zip into memory 09:43:02 INFO - Content-Length response header: 14031842 09:43:02 INFO - Bytes received: 14031842 09:43:02 INFO - Downloading https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.tar.bz2 to /home/cltbld/workspace/build/target.tar.bz2 09:43:02 INFO - retry: Calling _download_file with args: (), kwargs: {'url': 'https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.tar.bz2', 'file_name': '/home/cltbld/workspace/build/target.tar.bz2'}, attempt #1 09:43:07 INFO - Downloaded 62092291 bytes. 09:43:07 INFO - Setting buildbot property build_url to https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.tar.bz2 09:43:07 INFO - Writing buildbot properties ['build_url'] to /home/cltbld/workspace/properties/build_url I cannot think of anyway we would need 20 minutes to download a build- could something be wrong with our network or the machines? These are physical linux machines in the datacenter.
Flags: needinfo?(gps)
Flags: needinfo?(dustin)
Flags: needinfo?(ddurst)
That's about 50KB/s :( I don't know why that would happen.
Flags: needinfo?(dustin)
Note that these are mozharness timestamps and mozharness timestamps need to be taken with a grain of salt. The reason is that mozharness adds the current wall time when it processes a log event or line of output from an invoked process. That's all fine. However, mozharness frequently runs processes with buffered output. So, output from an invoked process could get buffered for seconds or minutes before mozharness consumes it. Then mozharness will consume several lines at once and attribute them to the same time. Whether that is happening here, I'm not sure. I /think/ the logged events are coming directly from mozharness ("[mozharness: 2018-03-12 01:31:48.721404Z] Running download-and-extract step."), which means there shouldn't be an event buffering problem. Anyway, several minutes to download these files is a bit concerning. It is likely the downloading part that is slow. But you can't rule out local filesystem I/O being borked as well. I'm not sure I can recommend any specific steps. Maybe we should start collecting better metrics about downloads so we know how prevalent problems like this are?
Flags: needinfo?(gps)
Fubar, can you check if the downloads of those files are slow and if yes if other downloads at the same time are also slow? from IRC: jmaher: Aryx: pmoore: I worked with cosmin yesterday and we noticed a 20 minute lapse in the logs at the time of downloading the build Aryx: for one os x debug build we had download times of 40 minutes, but else it seems to be linux talos
Flags: needinfo?(klibby)
It might also be the DC proxies. We're taking a look.
Flags: needinfo?(klibby)
[root@t-linux64-ms-183 ~]# wget https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.tar.bz2; rm target.tar.bz2 --2018-03-13 08:52:23-- https://queue.taskcluster.net/v1/task/IqoodGkwR7ut6F-eCup1PQ/artifacts/public/build/target.tar.bz2 Resolving queue.taskcluster.net (queue.taskcluster.net)... 50.17.218.87, 50.16.228.78, 107.22.197.53 Connecting to queue.taskcluster.net (queue.taskcluster.net)|50.17.218.87|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://taskcluster-artifacts.net/IqoodGkwR7ut6F-eCup1PQ/0/public/build/target.tar.bz2 [following] --2018-03-13 08:52:23-- https://taskcluster-artifacts.net/IqoodGkwR7ut6F-eCup1PQ/0/public/build/target.tar.bz2 Resolving taskcluster-artifacts.net (taskcluster-artifacts.net)... 54.192.212.56 Connecting to taskcluster-artifacts.net (taskcluster-artifacts.net)|54.192.212.56|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 62092291 (59M) [application/x-bzip2] Saving to: 'target.tar.bz2' target.tar.bz2 100%[==============================>] 59.22M 18.1MB/s in 4.0s Same host as in #c2, re-run several times. It looks like we're NOT using the DC proxies; I'm not sure if that's be design or accident, as I could have sworn that we were but enabling them gets a '403 Forbidden' error from them. The only changes we've made recently (Mar 6) to the ubuntu16.04 config on the moonshots was to switch syslog back to using 514/tcp instead of udp.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #2) > there are many instances of this failure on linux over the weekend and > today- the reason why- it takes 20 minutes to download target.tar.bz2: I'm going do disagree with your assertion. There ARE cases where it looks to take 20 minutes, but there are also cases where we exceed max run time and this download either doesn't happen or happens very fast (based on the failures listed at https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1443130&endday=2018-03-13&startday=2018-03-10&tree=trunk) https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167684633&lineNumber=345-347 https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167684702&lineNumber=344-346 20 seconds https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167684622&lineNumber=40700 non-existant? https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=167682697&lineNumber=337 I don't even; talos test runs for 2 seconds before maxRunTime?! I'm not saying we DON'T have a problem on linux/moonshots, but if we do I don't think it's clear what it is.
actually the first link you have [1] has a 20 minute download of common.tests.zip: 07:56:16 INFO - Downloading and extracting to /home/cltbld/workspace/build/tests these dirs * from https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip 07:56:16 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip'}, attempt #1 07:56:16 INFO - Fetch https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip into memory 08:08:54 INFO - Content-Length response header: 56969757 08:08:54 INFO - Bytes received: 56969757 and the 2nd link [2] has 20 minutes for a couple downloads: 07:57:49 INFO - u'xpcshell': [u'target.common.tests.zip', u'target.xpcshell.tests.zip']} 07:57:49 INFO - Downloading packages: [u'target.common.tests.zip', u'target.talos.tests.zip'] for test suite categories: ['common', 'talos'] 07:57:49 INFO - Downloading and extracting to /home/cltbld/workspace/build/tests these dirs * from https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip 07:57:49 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip'}, attempt #1 07:57:49 INFO - Fetch https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip into memory 08:08:53 INFO - Content-Length response header: 56969757 08:08:53 INFO - Bytes received: 56969757 08:08:58 INFO - Downloading and extracting to /home/cltbld/workspace/build/tests these dirs * from https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.talos.tests.zip 08:08:58 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.talos.tests.zip'}, attempt #1 08:08:58 INFO - Fetch https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.talos.tests.zip into memory 08:09:07 INFO - Content-Length response header: 14052878 08:09:07 INFO - Bytes received: 14052878 08:09:07 INFO - Downloading https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.tar.bz2 to /home/cltbld/workspace/build/target.tar.bz2 08:09:07 INFO - retry: Calling _download_file with args: (), kwargs: {'url': 'https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.tar.bz2', 'file_name': '/home/cltbld/workspace/build/target.tar.bz2'}, attempt #1 08:09:26 INFO - Downloaded 62046856 bytes. #3 has a 10 minute download [3]: 07:56:33 INFO - Downloading packages: [u'target.common.tests.zip', u'target.talos.tests.zip'] for test suite categories: ['common', 'talos'] 07:56:33 INFO - Downloading and extracting to /home/cltbld/workspace/build/tests these dirs * from https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip 07:56:33 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip'}, attempt #1 07:56:33 INFO - Fetch https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.common.tests.zip into memory 08:08:54 INFO - Content-Length response header: 56969757 08:08:54 INFO - Bytes received: 56969757 08:08:58 INFO - Downloading and extracting to /home/cltbld/workspace/build/tests these dirs * from https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.talos.tests.zip 08:08:58 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.talos.tests.zip'}, attempt #1 08:08:58 INFO - Fetch https://queue.taskcluster.net/v1/task/XulAjx9oQHuF-0lQG5pJyA/artifacts/public/build/target.talos.tests.zip into memory 08:09:07 INFO - Content-Length response header: 140 the 4th link [4], has a maxRunTime of 15 minutes: https://searchfox.org/mozilla-central/source/taskcluster/ci/test/talos.yml#275 typically this takes 6 minutes to complete, I think 15 minutes is enough overhead for retrying a download or two. this specific log might indicate there is a longer lag in the bootstrapping of linux to get up and running. It is odd there is <2 minutes of mozharness runtime before the 15 minutes expire. In addition to the above, we do see maxRunTime hit in many cases where a test hangs and we have to kill it- while that might be hundreds of times/week across all OS, the current rate of failures on linux seem to be related to longer download times. [1] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167684633&lineNumber=345-347 [2] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167684702&lineNumber=344-346 [3] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167684622&lineNumber=40700 [4] https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=167682697&lineNumber=337
I accidentally put ddurst and not dhouse
Flags: needinfo?(ddurst)
It might be worth checking for firewall state table overflow. IIRC we had issues with that years ago, where a session would get evicted from the firewall and thus the firewall would drop all traffic on that TCP/IP quad without any RSTs or anything. The end of the connection waiting for data (the HTTP client, in this case) can then sit for a long time waiting for data that is never coming -- there's no TCP packet to say "hey, did you have more data for me or are you dead?" That said, it looks like the download does eventually complete with the right number of bytes, so this guess doesn't fit all of the facts..
There's not much context I can give on this bug. I suspect the problem is in the platform/network and I am far from an expert in these areas.
Flags: needinfo?(gps)
Joel, thanks for more info! I tried looking for long gaps in logging for those sorts of things, but clearly missed some. Setting a NI on :dragrom to take a look.
Flags: needinfo?(dcrisan)
Tested on t-linux64-ms-059 and t-linux64-ms-183 servers for more times: On t-linux64-ms-183 server: [root@t-linux64-ms-183 ~]# wget https://queue.taskcluster.net/v1/task/RCCR58sJSHSdiqz39nA12Q/artifacts/public/build/target.tar.bz2; rm target.tar.bz2 --2018-03-14 05:10:21-- https://queue.taskcluster.net/v1/task/RCCR58sJSHSdiqz39nA12Q/artifacts/public/build/target.tar.bz2 Resolving queue.taskcluster.net (queue.taskcluster.net)... 50.17.218.87, 107.22.197.53, 50.16.228.78 Connecting to queue.taskcluster.net (queue.taskcluster.net)|50.17.218.87|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://taskcluster-artifacts.net/RCCR58sJSHSdiqz39nA12Q/0/public/build/target.tar.bz2 [following] --2018-03-14 05:10:21-- https://taskcluster-artifacts.net/RCCR58sJSHSdiqz39nA12Q/0/public/build/target.tar.bz2 Resolving taskcluster-artifacts.net (taskcluster-artifacts.net)... 54.192.212.56 Connecting to taskcluster-artifacts.net (taskcluster-artifacts.net)|54.192.212.56|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 61989054 (59M) [application/x-bzip2] Saving to: 'target.tar.bz2' target.tar.bz2 100%[================================================================================================================>] 59.12M 9.50MB/s in 11s 2018-03-14 05:10:35 (5.36 MB/s) - 'target.tar.bz2' saved [61989054/61989054] On next try: [root@t-linux64-ms-183 ~]# wget https://queue.taskcluster.net/v1/task/RCCR58sJSHSdiqz39nA12Q/artifacts/public/build/target.tar.bz2; rm target.tar.bz2 --2018-03-14 05:12:47-- https://queue.taskcluster.net/v1/task/RCCR58sJSHSdiqz39nA12Q/artifacts/public/build/target.tar.bz2 Resolving queue.taskcluster.net (queue.taskcluster.net)... 50.16.228.78, 107.22.197.53, 50.17.218.87 Connecting to queue.taskcluster.net (queue.taskcluster.net)|50.16.228.78|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://taskcluster-artifacts.net/RCCR58sJSHSdiqz39nA12Q/0/public/build/target.tar.bz2 [following] --2018-03-14 05:12:47-- https://taskcluster-artifacts.net/RCCR58sJSHSdiqz39nA12Q/0/public/build/target.tar.bz2 Resolving taskcluster-artifacts.net (taskcluster-artifacts.net)... 54.192.212.56 Connecting to taskcluster-artifacts.net (taskcluster-artifacts.net)|54.192.212.56|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 61989054 (59M) [application/x-bzip2] Saving to: 'target.tar.bz2' target.tar.bz2 100%[================================================================================================================>] 59.12M 18.5MB/s in 3.9s 2018-03-14 05:12:52 (15.1 MB/s) - 'target.tar.bz2' saved [61989054/61989054] On t-linux64-ms-059 server: [root@t-linux64-ms-059 ~]# wget https://queue.taskcluster.net/v1/task/RCCR58sJSHSdiqz39nA12Q/artifacts/public/build/target.tar.bz2; rm target.tar.bz2 --2018-03-14 05:11:18-- https://queue.taskcluster.net/v1/task/RCCR58sJSHSdiqz39nA12Q/artifacts/public/build/target.tar.bz2 Resolving queue.taskcluster.net (queue.taskcluster.net)... 107.22.197.53, 50.16.228.78, 50.17.218.87 Connecting to queue.taskcluster.net (queue.taskcluster.net)|107.22.197.53|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://taskcluster-artifacts.net/RCCR58sJSHSdiqz39nA12Q/0/public/build/target.tar.bz2 [following] --2018-03-14 05:11:18-- https://taskcluster-artifacts.net/RCCR58sJSHSdiqz39nA12Q/0/public/build/target.tar.bz2 Resolving taskcluster-artifacts.net (taskcluster-artifacts.net)... 54.192.212.56 Connecting to taskcluster-artifacts.net (taskcluster-artifacts.net)|54.192.212.56|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 61989054 (59M) [application/x-bzip2] Saving to: 'target.tar.bz2' target.tar.bz2 100%[================================================================================================================>] 59.12M 18.4MB/s in 3.9s 2018-03-14 05:11:23 (15.1 MB/s) - 'target.tar.bz2' saved [61989054/61989054] [root@t-linux64-ms-059 ~]# wget https://queue.taskcluster.net/v1/task/RCCR58sJSHSdiqz39nA12Q/artifacts/public/build/target.tar.bz2; rm target.tar.bz2 --2018-03-14 05:14:28-- https://queue.taskcluster.net/v1/task/RCCR58sJSHSdiqz39nA12Q/artifacts/public/build/target.tar.bz2 Resolving queue.taskcluster.net (queue.taskcluster.net)... 107.22.197.53, 50.16.228.78, 50.17.218.87 Connecting to queue.taskcluster.net (queue.taskcluster.net)|107.22.197.53|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://taskcluster-artifacts.net/RCCR58sJSHSdiqz39nA12Q/0/public/build/target.tar.bz2 [following] --2018-03-14 05:14:28-- https://taskcluster-artifacts.net/RCCR58sJSHSdiqz39nA12Q/0/public/build/target.tar.bz2 Resolving taskcluster-artifacts.net (taskcluster-artifacts.net)... 54.192.212.56 Connecting to taskcluster-artifacts.net (taskcluster-artifacts.net)|54.192.212.56|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 61989054 (59M) [application/x-bzip2] Saving to: 'target.tar.bz2' target.tar.bz2 100%[================================================================================================================>] 59.12M 18.0MB/s in 4.0s 2018-03-14 05:14:33 (14.8 MB/s) - 'target.tar.bz2' saved [61989054/61989054]
Flags: needinfo?(dcrisan)
Hello! Since last night, this issue seems to have increased again, from the 16th to the 17th having 505 failures, the majority on Linux x64: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1443130&startday=2018-03-16&endday=2018-03-17&tree=all (the 2 on OS X, 1 on Linux 32 and 1 Linux are misclassified) :fubar can you take a look, or do you have an update here? Thank you!
Flags: needinfo?(klibby)
Someone commented elsewhere that some of these issues may be related to an intermittent issue with AWS and geoip, where traffic gets routed in very inconvenient ways. I just did a very quick test on t-linux64-ms-183, tracerouting to queue.taskcluster.net, and traffic went from MDC1 (Sacramento), to Miami, and back to San Jose: [root@t-linux64-ms-183 ~]# traceroute -T queue.taskcluster.net traceroute to queue.taskcluster.net (50.16.222.244), 30 hops max, 60 byte packets 1 ae1-256.fw1.test.releng.mdc1.mozilla.net (10.49.56.1) 0.502 ms 0.489 ms 0.473 ms 2 63.245.208.17 (63.245.208.17) 0.624 ms 0.713 ms 0.624 ms 3 65.74.145.154 (65.74.145.154) 0.988 ms 1.110 ms 1.290 ms 4 173.225.175.145 (173.225.175.145) 1.084 ms 2.459 ms 2.538 ms 5 ip65-46-225-53.z225-46-65.customer.algx.net (65.46.225.53) 1.333 ms 1.232 ms 1.324 ms 6 216.156.16.58.ptr.us.xo.net (216.156.16.58) 43.155 ms 43.204 ms 43.194 ms 7 te-4-1-0.rar3.miami-fl.us.xo.net (207.88.12.161) 69.219 ms 58.565 ms 58.509 ms 8 207.88.12.144.ptr.us.xo.net (207.88.12.144) 43.336 ms 43.272 ms 43.246 ms 9 207.88.12.190.ptr.us.xo.net (207.88.12.190) 46.365 ms 46.360 ms 50.060 ms 10 te0-12-0-0.rar3.sanjose-ca.us.xo.net (207.88.12.189) 44.742 ms 44.756 ms 44.728 ms 11 207.88.12.194.ptr.us.xo.net (207.88.12.194) 43.074 ms 43.606 ms 43.512 ms 12 207.88.14.199.ptr.us.xo.net (207.88.14.199) 43.529 ms 43.528 ms 42.993 ms 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 54.239.108.190 (54.239.108.190) 73.449 ms 54.239.108.54 (54.239.108.54) 71.447 ms 54.239.108.182 (54.239.108.182) 72.531 ms 21 54.239.110.190 (54.239.110.190) 70.751 ms * 54.239.110.140 (54.239.110.140) 69.767 ms 22 54.239.110.167 (54.239.110.167) 91.087 ms 54.239.110.247 (54.239.110.247) 86.938 ms 54.239.110.183 (54.239.110.183) 77.929 ms 23 54.239.111.95 (54.239.111.95) 74.550 ms 54.239.111.87 (54.239.111.87) 73.561 ms 54.239.111.89 (54.239.111.89) 73.345 ms 24 * * * 25 205.251.244.95 (205.251.244.95) 73.378 ms * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * *
Flags: needinfo?(klibby)
Depends on: 1447092
I didn't realize the artifact download were coming from cloudfront. Erroneous GeoIP data could be at least one factor here. <dividehex> dustin: where does taskcluster-artifacts.net reside? <dividehex> is that a CDN? <dustin> dividehex: yes <dustin> cloudfront <dividehex> ahh ok good! <dividehex> that means GeoIP would have an effect
From 2nd of April this started to increase again - 36 failures In the last 7 days we have 41 failures. They occur on Linux x64 and the affected builds type are opt and pgo. Recent failure log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=171782517&lineNumber=13678
Assignee: nobody → rwood
Status: NEW → ASSIGNED
Comment on attachment 8966674 [details] Bug 1443130 - Allow more time for talos g2/g2 profiling to fix intermittent maxRunTime exceeded; Nope this doesn't solve the intermittent.
Attachment #8966674 - Attachment is obsolete: true
Attachment #8966674 - Flags: review?(jmaher)
we disabled tps
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: