Closed
Bug 734458
(tegra-084)
Opened 13 years ago
Closed 12 years ago
tegra-084 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bear, Unassigned)
References
()
Details
(Whiteboard: [buildduty][buildslave][tegra])
tegra-084 is not loading to sutagent after multiple PDU resets
Reporter | ||
Updated•13 years ago
|
Alias: tegra-084
Reporter | ||
Updated•13 years ago
|
Whiteboard: [buildduty][buildslave][tegra]
Comment 1•13 years ago
|
||
This appears to be taking jobs atm.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 2•12 years ago
|
||
So this needs to be reimaged, its failing to verify the updated watcher.ini (hash is returning None/file-not-found) Not sure whats wrong here, but every other tegra works fine.
Comment 3•12 years ago
|
||
I don't know what to do here, maybe Callek does?
Assignee: nobody → bugspam.Callek
Comment 4•12 years ago
|
||
Back in production.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 12 years ago
Resolution: --- → FIXED
Comment 5•12 years ago
|
||
A power cycle is not sufficient to get him back.
tegra-084 INACTIVE active OFFLINE SUTAgent not present; 0.0 0.0 foopy11 2 oooooooooooooooooo
09:35 nagios-scl1: [89] tegra-084.build.mtv1:tegra agent check is CRITICAL: Connection refused
Armens-MacBook-Air:tools armenzg$ python sut_tools/tegra_powercycle.py tegra-08410/18/2012 10:02:34: DEBUG: rebooting tegra-084 at pdu2.df201-4.build.mtv1.mozilla.com .AB13
SNMPv2-SMI::enterprises.1718.3.2.3.1.11.1.2.13 = INTEGER: 3
Armens-MacBook-Air:~ armenzg$ telnet tegra-084 20701
Trying 10.250.49.72...
telnet: connect to address 10.250.49.72: Connection refused
telnet: Unable to connect to remote host
Comment 6•12 years ago
|
||
Had to manually reformat its sdcard today, pdu reboot, and clear the error flag. It's back in production though.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 7•12 years ago
|
||
It's still busted, though, the same way as tegra-048. Between the two of them, they are very nearly all of bug 686085, where the tegra stays alive long enough to get through verify.py but then dies before the browser starts up in the talos test run, along with various manifestations of the same thing in non-talos, like https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=689856
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 8•12 years ago
|
||
Even moreso: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=663657 (I wondered why it suddenly came back to life when it had been nice and quiet for quite a while, maybe since verify.py landed).
Comment 9•12 years ago
|
||
ran stop_cp.sh. I have no idea what to do with this one, since it already went to recovery. Callek?
Comment 10•12 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #9)
> ran stop_cp.sh. I have no idea what to do with this one, since it already
> went to recovery. Callek?
Lets find a nearby garbage shute. Any objections?
Comment 12•12 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #10)
> Lets find a nearby garbage shute. Any objections?
Yes, let's decommission it.
Comment 13•12 years ago
|
||
Alas, no - bug 808437, so we don't yet know whether or not it's busted, because the reimage in October busted it.
Updated•12 years ago
|
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 14•12 years ago
|
||
So this has a bad SDCard. Need reimage+sdcard
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 15•12 years ago
|
||
Hasn't taken a job for 50 days, 4:27:36. Manually powercycling. Will check back on it tomorrow.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 16•12 years ago
|
||
(mass change: filter on tegraCallek02reboot2013)
I just rebooted this device, hoping that many of the ones I'm doing tonight come back automatically. I'll check back in tomorrow to see if it did, if it does not I'll triage next step manually on a per-device basis.
---
Command I used (with a manual patch to the fabric script to allow this command)
(fabric)[jwood@dev-master01 fabric]$ python manage_foopies.py -j15 -f devices.json `for i in 021 032 036 039 046 048 061 064 066 067 071 074 079 081 082 083 084 088 093 104 106 108 115 116 118 129 152 154 164 168 169 174 179 182 184 187 189 200 207 217 223 228 234 248 255 264 270 277 285 290 294 295 297 298 300 302 304 305 306 307 308 309 310 311 312 314 315 316 319 320 321 322 323 324 325 326 328 329 330 331 332 333 335 336 337 338 339 340 341 342 343 345 346 347 348 349 350 354 355 356 358 359 360 361 362 363 364 365 367 368 369; do echo '-D' tegra-$i; done` reboot_tegra
The command does the reboot, one-at-a-time from the foopy the device is connected from. with one ssh connection per foopy
Comment 17•12 years ago
|
||
had to cycle clientproxy to bring this back
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Comment 18•12 years ago
|
||
7 days without taking a job.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 19•12 years ago
|
||
Back in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Assignee: bugspam.Callek → nobody
QA Contact: armenzg → bugspam.Callek
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•