Closed Bug 711725 Opened 13 years ago Closed 10 years ago

Tegras and Pandas disconnect with "remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion." or similar at any given step

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

ARM
Android

Tracking

(firefox16 affected, firefox17 affected, firefox18 affected, firefox19 affected)

RESOLVED WORKSFORME
Tracking Status
firefox16 --- affected
firefox17 --- affected
firefox18 --- affected
firefox19 --- affected

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [purple][android_tier_1])

Starting whichever night it was last week that we both got the buildfarm network fixed up, and added a bunch more foopies, we've been getting clumps of Android tests that successfully complete the run, but then disconnect ("Connection to the other side was lost in a non-clean fashion.") during the reboot device step, like https://tbpl.mozilla.org/php/getParsedLog.php?id=8000862&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=8000842&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=8000855&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=8000865&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=8000850&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=8000861&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=8000894&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=8000834&tree=Mozilla-Inbound Time of day (3:12 - 3:14 for those)? Everything attached to one foopy that's running at a particular time? Dunno.
https://tbpl.mozilla.org/php/getParsedLog.php?id=8037092&tree=Mozilla-Aurora Could just be that the 3:12 clumps are easier to spot as a clump, since there's probably less running on fewer trees then.
Or it could be that we've always had some disconnects during rebooting, which might be what I used to star with bits from stories explaining death to children ("Sometimes, honey, when a Tegra gets really really tired, it needs to rest for a very long time..."), and now that the clumps are making us notice this flavor of failure, those are sticking out more.
related to a comment in bug 713047 - dustin thinks it may be the reboot step timing out or being slower than the buildslave thinks it should be.. anywho - read that bug monday and figure it out bear
Assignee: nobody → bear
https://tbpl.mozilla.org/php/getParsedLog.php?id=8188464&tree=Mozilla-Aurora Should be interesting, having old tegras moved over, to see whether they start exploding in new ways in new places in the run.
Summary: Intermittent clumps of Tegras disconnecting during the reboot step after successful runs → Intermittent clumps of Tegras attached to bm-19 and bm-20 disconnecting like a honey badger
Per discussions with philor in #build, while the original burst of these may have been due to bring up issues with the new buildmasters, and move of tegras. However, these tegras continue to display a new failure signature. Several pieces moving at the same time here, need some data analysis to see if there is any correlation: - bm19 & bm20 originally brought up with tegras connected to new foopies on 2011-12-20 (bug 704597) (new foopies are faster hardware running os 10.7 (Lion)) - all older tegra/foopies moved to bm19 & bm20 on 2011-12-31 (bug 713170) (old bm was always bordering on swapping, new still have plenty of headroom) Example of "new signature": 07:53 < philor> hwine: https://tbpl.mozilla.org/php/getParsedLog.php?id=8441984&tree=Mozilla-Inbound is traditional purple, it was just sitting there and then there's a "process killed by signal 15" From that log: ========= Started Cleanup Device failed (results: 2, elapsed: 19 secs) ========== python /builds/sut_tools/cleanup.py 10.250.50.87 in dir /builds/tegra-177/test/. (timeout 1200 secs) watching logfiles {} argv: ['python', '/builds/sut_tools/cleanup.py', '10.250.50.87'] environment: PATH=/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin PWD=/builds/tegra-177/test SUT_IP=10.250.50.87 SUT_NAME=tegra-177 __CF_USER_TEXT_ENCODING=0x1F5:0:0 closing stdin using PTY: False process killed by signal 15 program finished with exit code -1 elapsedTime=19.277863 ======== Finished Cleanup Device failed (results: 2, elapsed: 19 secs) ======== ========= Started (results: not started, elapsed: not started) ========== ======== Finished (results: not started, elapsed: not started) ======== ========= Started (results: not started, elapsed: not started) ========== ======== Finished (results: not started, elapsed: not started) ======== ========= Started (results: not started, elapsed: not started) ========== ======== Finished (results: not started, elapsed: not started) ======== ========= Started (results: not started, elapsed: not started) ========== ======== Finished (results: not started, elapsed: not started) ======== ========= Started (results: not started, elapsed: not started) ========== ======== Finished (results: not started, elapsed: not started) ======== ========= Started Reboot Device interrupted (results: 4, elapsed: 2 secs) ========== remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] ======== Finished Reboot Device interrupted (results: 4, elapsed: 2 secs) ========
Summary: Intermittent clumps of Tegras attached to bm-19 and bm-20 disconnecting like a honey badger → Tegras attached to bm-19 and bm-20 exhibiting a never-seen-before failure signature
Sorry, I misled you - that "traditional purple" is the well-known failure, that's bug 660480 (hmm, or is it? I don't remember whether they've always come with a disconnect in the reboot step, because I didn't have any need to look below the "process killed by signal 15"). The non-traditional, new purple is https://tbpl.mozilla.org/php/getParsedLog.php?id=8442422&tree=Mozilla-Inbound, where the whole run went perfectly up until it's running `python /builds/sut_tools/reboot.py 10.250.49.70`, the reboot step, and somewhere in the middle of that, like that one where it got caught while dumping a logcat line, there's a sudden remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] There are three possibilities for comment 14 (at least): * that ain't it, because of things like https://tbpl.mozilla.org/php/getParsedLog.php?id=8212921&tree=Mozilla-Aurora where the disconnect was in the clobber build tools step * there are actually two (or more) new failure modes, one that's due to timing in the reboot step and one or more others that explain sudden disconnects in other steps * disconnecting in random steps isn't really new, I just blew it off before, and then tried to lump it in with this
Or https://tbpl.mozilla.org/php/getParsedLog.php?id=8457729&tree=Mozilla-Beta where the disconnect is in the tiresome `rm -rfv build` step, from which I would forcibly disconnect if I was a foopy.
I don't really pay attention to individual failures, because there are just too many of them, but it could well be that rather than what master or what foopy, it's a particular set of tegras (well, which can also mean a particular foopy) - feels like as I glance at the number, it's always over 250.
And given the timing of them this morning, one of the things that I call this, the lost connection during the reboot step, may be associated with reconfigs - there was a large batch right around the 11:40 reconfig.
removing from my queue so this can be triaged
Assignee: bear → nobody
Priority: -- → P3
https://tbpl.mozilla.org/php/getParsedLog.php?id=9897894&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=9897922&tree=Mozilla-Inbound ("Tegras sometimes disconnect prematurely during the reboot step, singly or in small or large groups, and sometimes philor notices it because they didn't also have another failure" would probably be a more accurate summary.)
Unless I'm mistaken, *all* production tegras are attached to bm-19 or bm-20 atm. Shortening the summary.
Summary: Tegras attached to bm-19 and bm-20 exhibiting a never-seen-before failure signature → Tegras exhibiting a never-seen-before failure signature
https://tbpl.mozilla.org/php/getParsedLog.php?id=10128958&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129019&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10128963&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10128668&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129006&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10128386&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10128176&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129131&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129081&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129811&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129012&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10128960&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129015&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10128959&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129795&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129079&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129102&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129132&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129083&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10128999&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10128476&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129046&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10129016&tree=Mozilla-Inbound
In general, we loose connection. Hopefully, distributing them among three masters might help with this (bug 734393).
Depends on: 734393
Summary: Tegras exhibiting a never-seen-before failure signature → Tegras disconnect with "Connection to the other side was lost in a non-clean fashion." or similar at any given step
There's actually quite a few things in here, and I think one of them, the one I notice the most, is actually "the reboot step takes way too long, and there is something that releng does which causes a large number of tegras to disconnect, and since the reboot step takes way too long, a large percentage of the disconnecting tegras are doing so in the reboot step."
https://tbpl.mozilla.org/php/getParsedLog.php?id=10244506&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10246035&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10244877&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10245057&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243784&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10244866&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243778&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10244796&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243786&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243768&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10244869&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10244513&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243704&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10244482&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10244547&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10245078&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243852&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243830&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243968&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=10243915&tree=Mozilla-Inbound
Blocks: 438871
Whiteboard: [android_tier_1] → [orange][purple][android_tier_1]
We're hitting this quite a lot :-( What are your thoughts on RETRYing this in the meantime?
Actually, we aren't hitting this, we're calling bug 793358, which we're hitting a lot because of bug 799896, this instead because tbpl now recognizes the remoteFailed which is only an after-effect of the prior and unrecognized by tbpl "process killed by signal 15."
And now that bug 799896 apparently actually is fixed, the gods are amusing themselves by sending lots of these (relative to their usual frequency, not relative to the frequency of the things bug 799896 was causing) along, to make me paste them right below me saying we aren't hitting this. https://tbpl.mozilla.org/php/getParsedLog.php?id=16141637&tree=Mozilla-Inbound
Summary: Tegras disconnect with "Connection to the other side was lost in a non-clean fashion." or similar at any given step → Tegras disconnect with "remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion." or similar at any given step
Whiteboard: [orange][purple][android_tier_1] → [purple][android_tier_1]
Blocks: 817024
Callek, this is one of our most frequent Android non-code failures at the moment, I've tried skimming the comments in this bug but can't see what the current understanding of the problem is. Could you take a look?
Flags: needinfo?(bugspam.Callek)
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #758) > Callek, this is one of our most frequent Android non-code failures at the > moment, I've tried skimming the comments in this bug but can't see what the > current understanding of the problem is. Could you take a look? From my understanding this is a symptom of practically any error that sets an error.flg. (and sometimes escapes onto next job :( ) Where anything that ends up killing buildbot on the foopy side for a given tegra causes this message to appear. We don't have a good solution for hiding this message and exposing only the error.flg message [yet].
Flags: needinfo?(bugspam.Callek)
Happening on Pandas too.
Blocks: android_4.0_testing
No longer blocks: 438871
Summary: Tegras disconnect with "remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion." or similar at any given step → Tegras and Pandas disconnect with "remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion." or similar at any given step
Justin, we're seeing a lot more disconnects (particularly on the Pandas), starting around the 5th Feb: http://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=711725&startday=2013-01-25&endday=2013-02-08&tree=trunk Any ideas?
Flags: needinfo?(bugspam.Callek)
(In reply to Ed Morley [:edmorley UTC+0] from comment #1277) > Justin, we're seeing a lot more disconnects (particularly on the Pandas), > starting around the 5th Feb: > http://brasstacks.mozilla.com/orangefactor/ > ?display=Bug&bugid=711725&startday=2013-01-25&endday=2013-02-08&tree=trunk > > Any ideas? Callek? You've been ignoring the needinfo on this for 3-4 weeks! :P
Information from etherpad regarding Android releng crashes * mostly on pandas, but plenty of tegras * 80% of these are before we even start to run the test, I doubt this is related to the product * has been a problem for >2 years * 900 failures in the last 2 months! * a lot of failures on clobber, verify, etc... * a few failures occur during tests or while connecting to device * perhaps cp is killing the buildslave instance? ** Quite likely (though no longer cp and just our watcher script) ** if we can get slave initiated shutdown working, so that we can *always* pass back into the foopy-side verify.py after a job, we can do away with error.flg needs
Depends on: 881466
(In reply to TinderboxPushlog Robot from comment #2812) > RyanVM > https://tbpl.mozilla.org/php/getParsedLog.php?id=24010283&tree=Mozilla- > Inbound > Android Tegra 250 mozilla-inbound talos remote-tp4m_nochrome on 2013-06-11 > 08:42:16 > slave: tegra-357 > > remoteFailed: [Failure instance: Traceback (failure with no frames): <class > 'twisted.internet.error.ConnectionLost'>: Connection to the other side was > lost in a non-clean fashion. This one happened after my deploy today, and looked at logs and saw the following (attached at end of bug comment), which roughly shows (in order of log) * at 08:35 we ran our watch script, which saw buildbot still up, happily (no error.flg or anything) * at 08:40 we ran our watch script which saw buildbot not running. * at 08:40:02 we ran verify.py, whose output follows and all-is-well. * at 08:42:09 we decide that we should start buildslave back up, which spits out a tail of the buildbot twistd.log (into our log due to stdout redir) * twistd.log reports the buildslave lost network to the buildbot master at 08:37:32 * at 08:42:09 - we're up and connected to the master again. ---- 2013-06-11 08:35:02 -- ################################################################# 2013-06-11 08:35:02 -- # Starting cycle for our device (tegra-357 = 10.250.51.197) now # 2013-06-11 08:35:02 -- ################################################################# 2013-06-11 08:35:02 -- buildbot pid is 26357 2013-06-11 08:35:02 -- (heartbeat) buildbot is running 2013-06-11 08:35:02 -- Cycle for our device (tegra-357) complete 2013-06-11 08:35:02 -- Cycle for our device (tegra-357) complete 2013-06-11 08:40:02 -- ################################################################# 2013-06-11 08:40:02 -- # Starting cycle for our device (tegra-357 = 10.250.51.197) now # 2013-06-11 08:40:02 -- ################################################################# 2013-06-11 08:40:02 -- Buildbot is not running 06/11/2013 08:40:02: DEBUG: updateSUT: Using device 'tegra-357' found in env variable 06/11/2013 08:40:02: DEBUG: calling [nslookup tegra-357] 06/11/2013 08:40:02: DEBUG: calling [ps -U cltbld] 06/11/2013 08:40:02: DEBUG: PID TTY TIME CMD 06/11/2013 08:40:02: DEBUG: 28194 ? 00:00:01 python2.7 06/11/2013 08:40:02: DEBUG: 28941 ? 00:00:01 python2.7 06/11/2013 08:40:02: DEBUG: 35725 ? 00:00:00 python 06/11/2013 08:40:02: DEBUG: 36167 ? 00:00:00 crond 06/11/2013 08:40:02: DEBUG: 36171 ? 00:00:00 sh 06/11/2013 08:40:02: DEBUG: 36174 ? 00:00:00 watch_devices.s 06/11/2013 08:40:02: DEBUG: 36180 ? 00:00:00 tee 06/11/2013 08:40:02: DEBUG: 36223 ? 00:00:00 watch_devices.s 06/11/2013 08:40:02: DEBUG: 36359 ? 00:00:00 python 06/11/2013 08:40:02: DEBUG: 36493 ? 00:00:00 crond 06/11/2013 08:40:02: DEBUG: 36496 ? 00:00:00 sh 06/11/2013 08:40:02: DEBUG: 36499 ? 00:00:00 watch_devices.s 06/11/2013 08:40:02: DEBUG: 36505 ? 00:00:00 tee 06/11/2013 08:40:02: DEBUG: 36556 ? 00:00:00 watch_devices.s 06/11/2013 08:40:02: DEBUG: 36812 ? 00:00:01 python2.7 06/11/2013 08:40:02: DEBUG: 36816 ? 00:00:00 sleep 06/11/2013 08:40:02: DEBUG: 36842 ? 00:00:00 python 06/11/2013 08:40:02: DEBUG: 36845 ? 00:00:00 python 06/11/2013 08:40:02: DEBUG: 36847 ? 00:00:00 xpcshell 06/11/2013 08:40:02: DEBUG: 36856 ? 00:00:00 crond 06/11/2013 08:40:02: DEBUG: 36857 ? 00:00:00 crond 06/11/2013 08:40:02: DEBUG: 36860 ? 00:00:00 sh 06/11/2013 08:40:02: DEBUG: 36861 ? 00:00:00 sh 06/11/2013 08:40:02: DEBUG: 36863 ? 00:00:00 watch_devices.s 06/11/2013 08:40:02: DEBUG: 36864 ? 00:00:00 tegra_stats.sh 06/11/2013 08:40:02: DEBUG: 36869 ? 00:00:00 tee 06/11/2013 08:40:02: DEBUG: 36899 ? 00:00:00 watch_devices.s 06/11/2013 08:40:02: DEBUG: 36923 ? 00:00:00 python 06/11/2013 08:40:02: DEBUG: 37007 ? 00:00:00 python 06/11/2013 08:40:02: DEBUG: 37056 ? 00:00:00 ping 06/11/2013 08:40:02: DEBUG: 37085 ? 00:00:00 ps 06/11/2013 08:40:02: INFO: INFO: attempting to ping device 06/11/2013 08:40:02: DEBUG: calling [ping -c 5 tegra-357] 06/11/2013 08:40:06: INFO: updateSUT.py: Connecting to: tegra-357 06/11/2013 08:40:06: INFO: INFO: attempting to create file /mnt/sdcard/writetest 06/11/2013 08:40:06: INFO: INFO: updateSUT.py: We're running SUTAgentAndroid Version 1.17 06/11/2013 08:40:06: INFO: INFO: Got expected SUTAgent version '1.17' 06/11/2013 08:40:07: INFO: Uninstalling org.mozilla.fennec... 06/11/2013 08:40:07: INFO: Waiting for device to come back... 06/11/2013 08:42:07: INFO: Try 1 06/11/2013 08:42:07: INFO: devroot /mnt/sdcard/tests 06/11/2013 08:42:07: INFO: devroot /mnt/sdcard/tests 06/11/2013 08:42:09: INFO: removeDir() returned [Deleting file(s) from /mnt/sdcard/tests Deleting file(s) from /mnt/sdcard/tests/webapps Deleted webapps.json Deleting directory /mnt/sdcard/tests/webapps Deleted robotium.config Deleting file(s) from /mnt/sdcard/tests/logs <empty> Deleting directory /mnt/sdcard/tests/logs Deleted fennec-24.0a1.en-US.android-arm.apk Deleting directory /mnt/sdcard/tests] 06/11/2013 08:42:09: INFO: removeDir(/data/local/xpcb) returned [Deleting file(s) from /data/local/xpcb <empty> Unable to delete directory /data/local/xpcb] reconnecting socket removing file: /mnt/sdcard/writetest reconnecting socket 2013-06-11 08:42:09 -- starting buildbot slave We want to always start buildbot through twistd We will run with the twistd command instead of calling buildslave /builds/tegra-357 ~ 2013-06-11 08:37:32-0700 [Broker,client] Lost connection to buildbot-master10.build.mozilla.org:9201 2013-06-11 08:37:32-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x14ca830> 2013-06-11 08:37:32-0700 [-] Main loop terminated. 2013-06-11 08:37:32-0700 [-] Server Shut Down. 2013-06-11 08:42:09-0700 [-] Log opened. 2013-06-11 08:42:09-0700 [-] twistd 10.2.0 (/tools/buildbot/bin/python2.7 2.7.3) starting up. 2013-06-11 08:42:09-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor. 2013-06-11 08:42:09-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x127e830> 2013-06-11 08:42:09-0700 [-] Connecting to buildbot-master10.build.mozilla.org:9201 2013-06-11 08:42:09-0700 [-] Watching /builds/tegra-357/shutdown.stamp's mtime to initiate shutdown ~ 2013-06-11 08:42:10 -- Sleeping for 200 sec after startup, to prevent premature flag killing
Flags: needinfo?(bugspam.Callek)
Found in triage. As this seems to impact an entire class of machines, moving to Platform Support component.
Component: Release Engineering → Release Engineering: Platform Support
QA Contact: coop
Product: mozilla.org → Release Engineering
Depends on: 925285
Depends on: 867593
These should now be retries, thanks to bug 925285. Question is whether it's still worth leaving this bug open, or perhaps closing and leaving bug 918677 to handle making AWS connections more resilient/having in-house masters etc.
Closing bugs where TBPLbot has previously commented, but have now not been modified for >3 months & do not contain the whiteboard strings for disabled/annotated tests or use the keyword leave-open. Filter on: mass-intermittent-bug-closure-2014-07
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.