Closed Bug 652962 Opened 14 years ago Closed 14 years ago

buildbot-master2 having disk problems

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 655304

People

(Reporter: dustin, Assigned: zandr)

References

Details

dmesg is full of all sorts of goodness like ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7fff000) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xfffe000) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x10) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7ff000) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x38) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x38) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x39) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7ff000) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7fffff8) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x31) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xfc) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x18) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7ffff1ff) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7ffffff3) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x100) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x39) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x80) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x3ffc0) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x1ffc0) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xf0000) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7fffffc3) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x3ff8) ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x40001fff) Here's the last batch, with times (from /var/log/messages): Apr 26 11:55:56 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x4) Apr 26 11:57:56 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) Apr 26 12:02:02 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) Apr 26 12:02:44 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x2) Apr 26 12:02:46 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) Apr 26 12:07:50 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x4) Apr 26 12:21:40 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) Apr 26 12:22:16 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) Apr 26 12:22:27 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) Apr 26 12:32:33 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7fff9fff) Apr 26 12:49:24 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x9) Apr 26 13:35:16 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7c) Apr 26 13:41:45 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xfc) hdparm gives: /dev/sda: Timing cached reads: 28000 MB in 1.99 seconds = 14064.70 MB/sec Timing buffered disk reads: 216 MB in 3.01 seconds = 71.66 MB/sec We've moved all of the slaves off this master for now. What should we do with it?
Assignee: server-ops-releng → zandr
Blocks: 649734
I'll put this in the next batch going back to iX
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
As a reminder to self, since this bug is the first google hit for me for this error message: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x4) is more than likely harmless, and due to the ancient centos5.0 kernel not dealing well with hardware that doesn't do NCQ correctly. I see this error both on perfectly-functional iX systems and on formerly-broken iX systems. http://forum.soft32.com/linux/ata1-spurious-interrupt-irq_stat-0x8-active_tag-84148995-sac-ftopict337813.html
Better link for that thread: http://lkml.org/lkml/2006/12/27/174
How many other machines were we declaring dead due to these errors?
It's always been a secondary observation -- the machine is first reported as slow IO, and then *also* noted to have these errors.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.