Closed
Bug 773981
Opened 12 years ago
Closed 12 years ago
BIND crashes with assertion failure
Categories
(Infrastructure & Operations :: Infrastructure: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dumitru, Assigned: dumitru)
Details
(Whiteboard: [buildduty][outage])
named on ns2.private.scl3 hit a bug last night:
14-Jul-2012 02:02:25.131 general: critical: rbtdb.c:1619: INSIST(!((void *)((node)->deadlink.prev) != (void *)(-1))) failed
14-Jul-2012 02:02:25.131 general: critical: exiting (due to assertion failure)
bind version is 9.8.2, release 0.10.rc1.el6, OS RHEL 6.3.
First reported on a ISC mailing list:
https://lists.isc.org/pipermail/bind-users/2012-February/086793.html
ISC patched it in bind 9.8.2rc2, per https://lists.isc.org/pipermail/bind-announce/2012-March/000766.html [RT #27738]
Red Hat didn't ship an update yet, although it's been 4 months since ISC patched this:
https://bugzilla.redhat.com/show_bug.cgi?id=837165
Comment 1•12 years ago
|
||
named crashed again and was restarted.
Just for clarification, running OS version is RHEL 6.2 (x86_64)
Assignee | ||
Comment 2•12 years ago
|
||
(In reply to Adrian Fernandez [:Aj] from comment #1)
> Just for clarification, running OS version is RHEL 6.2 (x86_64)
Yeah, OS doesn't matter too much, all RHEL 6 flavors that use that named package are affected.
Comment 3•12 years ago
|
||
this has happened again in scl1 and has caused some build jobs to fail
Whiteboard: [buildduty][outage]
Comment 4•12 years ago
|
||
Just happened on admin1a.infra.scl1 too:
17-Jul-2012 15:25:46.262 general: critical: rbtdb.c:1619: INSIST(!((void *)((node)->deadlink.prev) != (void *)(-1))) failed
17-Jul-2012 15:25:46.262 general: critical: exiting (due to assertion failure)
Is this worth building a patched RPM?
Comment 5•12 years ago
|
||
for buildduty's benefit:
first nagios alerts happened at 1528:
[15:28] <nagios-releng-scl1> [09] buildbot-master06.build.scl1:MySQL connectivity is WARNING: Unknown MySQL server host buildbot-rw-vip.db.scl3.mozilla.com (1)
[15:30] <nagios-releng-scl1> [10] buildbot-master15.build.scl1:MySQL connectivity is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
and continued until 1546:
[15:46] <nagios-releng-scl1> buildbot-master21.build.scl1 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms
Comment 6•12 years ago
|
||
named restarted on ns2.private.scl3 (again).
However, besides the known bug, seems odd that this is only occurring on ns2 and not ns1 as well.
Comment 7•12 years ago
|
||
What do you think about either adding monitoring for named processes to nagios, or (better) monitoring the named daemon in keepalived so that the VIP fails over when this occurs? Apologies if I'm not being helpful..
Assignee | ||
Comment 8•12 years ago
|
||
I filed a case with Red Hat to address this.
Assignee | ||
Comment 9•12 years ago
|
||
Assignee: server-ops-infra → dgherman
Assignee | ||
Comment 10•12 years ago
|
||
Seems that puppet upgraded this across our infra.
Verified a couple of hosts and they have the new package.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•