Closed Bug 614821 Opened 14 years ago Closed 14 years ago

reboots 20101125

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: zandr)

References

Details

(Whiteboard: [needs SCL visit])

talos-r3-w7-012.build - unreachable talos-r3-w7-040.build - host unknown
Blocks: 562459
linux-ix-slave12 -- down and unreachable via IPMI.
mw32-ix-slave22 - down and unreachable by ipmi
talos-r3-fed-012 talos-r3-fed-027 talos-r3-fed64-030 talos-r3-fed64-049 talos-r3-xp-012
talos-r3-fed64-053 talos-r3-fed64-047.build talos-r3-fed64-043.build talos-r3-fed64-013.build talos-r3-fed64-001.build talos-r3-fed-040.build talos-r3-fed-036.build talos-r3-fed-027.build talos-r3-fed-024.build talos-r3-fed-022.build
talos-r3-fed-030.build
talos-r3-xp-052.build
talos-r3-w7-009.build
Flags: colo-trip+
Whiteboard: [needs SCL visit]
talos-r3-fed-009.build talos-r3-w7-036.build
jlaz/jabba? Phong's outta town. jlaz, punt around as needed. Thanks guys!
Assignee: server-ops → jlazaro
talos-r3-w7-011.build talos-r3-fed-044.build talos-r3-fed-037.build Bumping severity because we've lost 20% of our Fedora 32-bit pool.
Severity: normal → critical
talos-r3-fed64-016.build Any ETA on these?
Copying from previous reboot bug. Needs re-image (mount errors): fed-012 fed-022 fed-024 fed-036 fed-040 fed64-53
Is there any way we can get these done today? The 32-bit Fedora wait times are getting really bad.
Severity: critical → blocker
Assignee: jlazaro → server-ops
It would be really nice to figure out the issue (hw clock issue) that causes the Fedora minis to need to be re-imaged on a regular basis. Re-imaging isn't as quick as rebooting.
Assignee: server-ops → zandr
What happened to cause so many Fedora hosts to need manual reboots? What happened to require so many reimages?
w7-11: rebooted fed-044: rebooted fed-037: rebooted fed-009: rebooted w7-009: rebooted xp-052: offline for power reasons fed-012: rebooted fed-027: rebooted fed64-030: on my desk in mv fed64-049: offline for power reasons xp-012: offline for power reasons fed64-053: offline for power reasons fed64-016: rebooted fed64-047: MIA, possibly in MV fed64-043: offline for power reasons fed64-013: rebooted fed64-001: reboooted fed-040: rebooted fed-036: rebooted fed-027: rebooted fed-024: rebooted fed-022: rebooted fed-030: rebooted w7-036: rebooted
fed64-053: offline for power reasons
(In reply to comment #16) Still can't ping: > fed-044: rebooted > fed-012: rebooted > fed64-016: rebooted > fed64-001: reboooted > fed-040: rebooted > fed-036: rebooted > fed-024: rebooted > fed-022: rebooted
(In reply to comment #16) These seem to be online and connected > w7-11: rebooted > fed-037: rebooted > fed-009: rebooted > fed64-013: rebooted > fed-027: rebooted Online, needs puppet cleanup: > fed-027: rebooted Also not pingable: > w7-036: rebooted > w7-009: rebooted > fed-030: rebooted
fed-012: reimaged fed-022: reimaged
fed-024: reimaged
fed-036: reimaged
fed-040: pulled. Has a CD stuck in the drive that it won't boot from.
>linux-ix-slave12 -- down and unreachable via IPMI. >mw32-ix-slave22 - down and unreachable by ipmi Bounced around 18:00PDT
Can we reboot these? talos-r3-w7-052.build 7d 7h 25m 50s talos-r3-w7-036.build 6d 22h 13m 44s talos-r3-w7-032.build 0d 18h 6m 56s talos-r3-w7-012.build 16d 14h 55m 21s talos-r3-w7-009.build 11d 15h 34m 56s talos-r3-w7-008.build 1d 14h 57m 51s That's ~10% of our win7 capacity.
(In reply to comment #25) > Can we reboot these? > > talos-r3-w7-052.build 7d 7h 25m 50s > talos-r3-w7-036.build 6d 22h 13m 44s > talos-r3-w7-032.build 0d 18h 6m 56s > talos-r3-w7-012.build 16d 14h 55m 21s > talos-r3-w7-009.build 11d 15h 34m 56s > talos-r3-w7-008.build 1d 14h 57m 51s > > That's ~10% of our win7 capacity. Will swing by scl1 on the way home tonight. -Z
It could wait until Monday/Tuesday as there is no pending jobs and at this time of the day people won't be pushing like mad people. Your call. Have a good weekend.
Given the allhands next week, I'm not certain I'll be able to get down there. It's not really out of my way tonight.
(In reply to comment #25) > Can we reboot these? > > talos-r3-w7-052.build 7d 7h 25m 50s > talos-r3-w7-036.build 6d 22h 13m 44s > talos-r3-w7-032.build 0d 18h 6m 56s > talos-r3-w7-012.build 16d 14h 55m 21s > talos-r3-w7-009.build 11d 15h 34m 56s > talos-r3-w7-008.build 1d 14h 57m 51s Rebooted, responding to ping, lots of ports open. Two of these we'd pulled power on, but I think we can get away with it for now.
This looks like the list of machines in Nagios that need rebooting; sorry for any dups. linux-ix-slave31.build.scl1 linux-ix-slave32.build.scl1 linux-ix-slave33.build.scl1 linux-ix-slave35.build.scl1 linux-ix-slave38.build.scl1 linux-ix-slave42.build.scl1 mv-moz2-linux-ix-slave05.build talos-r3-fed-012.build talos-r3-fed-029.build talos-r3-fed-033.build talos-r3-fed-036.build talos-r3-fed-038.build talos-r3-fed-041.build talos-r3-fed64-021.build talos-r3-fed64-027.build talos-r3-fed64-044.build talos-r3-fed64-055.build talos-r3-snow-004.build
All SCL machines got a reboot today, closing this. Filed bug 620041 to track remaining down minis.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Alias: reboots
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.