Closed
Bug 1149564
Opened 10 years ago
Closed 10 years ago
Decom the TBPL DBs (tbpl1.db.phx1.mozilla.com, tbpl2.db.phx1.mozilla.com, tbpl2-new.db.phx1.mozilla.com)
Categories
(Infrastructure & Operations :: MOC: Service Requests, task)
Infrastructure & Operations
MOC: Service Requests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: Usul)
References
Details
(Keywords: spring-cleaning)
TBPL has just been end of lifed (all content is being redirected, data import cron job stopped etc).
This bug is _just_ for decommissioning the DB nodes used by TBPL - other bugs (which will be added to bug 1054977's dep tree soon) will handle the moving of redirects to Zeus (they are currently served from the app root via generic), deletion of src/www directories from generic, removing of flows etc etc.
Sheeri, should this be in the DB related components or Infrastructure & Operations? Not sure what DB monitoring pieces need to be turned off by your team prior to hardware decom.
I'm presuming for this bug we'll need to:
1) Disable Nagios/...
2) Remove puppet entries
3) Power down/decom the hardware
The nodes in question are:
tbpl1.db.phx1.mozilla.com
tbpl2.db.phx1.mozilla.com
tbpl2-new.db.phx1.mozilla.com
(https://rpm.newrelic.com/accounts/263620/servers?filterBy=Project:Tbpl&filterBy=Type:Database)
No data needs to be retained.
Assignee: infra → nobody
Component: Infrastructure: Other → MOC: Service Requests
QA Contact: jdow → lypulong
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → ludovic
Updated•10 years ago
|
Keywords: spring-cleaning
Reporter | ||
Comment 1•10 years ago
|
||
(In reply to Ed Morley (Away 23rd March -> 1st April) [:edmorley] from comment #0)
> I'm presuming for this bug we'll need to:
> 1) Disable Nagios/...
Oh and I guess we'll also need to disable the automated data expiration job added by bug 779290.
Comment 2•10 years ago
|
||
Yep! we have a protocol for this all.....
https://mana.mozilla.org/wiki/display/SYSADMIN/Server+Decommissioning+Checklist
This makes me so happy!
Comment 3•10 years ago
|
||
We can also remove the db from the shared dev instance, right?
tbpl1.db.phx1.mozilla.com aka 10.8.70.53 (warranty expired; known, on purpose)
tbpl2.db.phx1.mozilla.com aka 10.8.70.54 (warranty expired; known, on purpose)
tbpl2-new.db.phx1.mozilla.com aka 10.8.70.161 (warranty covered until 9/2015)
It is ok to power down these 3 servers!
Comment 4•10 years ago
|
||
Removed from Nagios in revision 103068.
Removed from puppet db stuff (includes newrelic, hiera, cron scripts/auto purge, configuration management and backups) in svn revision 103069.
Comment 5•10 years ago
|
||
Ran puppet on the newrelic proxy to remove tbpl from active newrelic instances.
Restarted Nagios to ensure all changes made in previous step were kosher. (they were!)
Comment 6•10 years ago
|
||
netvault isn't running (/etc/init.d/netvault isn't present)
None of the machines have NFS/external mounts
Puppet has been delayed for a year on all 3 servers.
Comment 7•10 years ago
|
||
sudo shutdown -h now
has been performed on all three machines.
Comment 8•10 years ago
|
||
I will continue with the steps tomorrow.
Reporter | ||
Comment 9•10 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #3)
> We can also remove the db from the shared dev instance, right?
Yup - everything TBPL related can go :-)
Comment 10•10 years ago
|
||
w00t! tbpl dev db dropped, 40G of data GONE BABY GONE.
Reporter | ||
Comment 11•10 years ago
|
||
I've removed the now inactive TBPL DB entries for the MySQL plugin:
https://rpm.newrelic.com/accounts/263620/plugins/2805?utf8=%E2%9C%93&id=2805&search[name]=tbpl
And also on the servers page:
https://rpm.newrelic.com/accounts/263620/servers?filterBy=Project:Tbpl
(In reply to Sheeri Cabral [:sheeri] from comment #2)
> https://mana.mozilla.org/wiki/display/SYSADMIN/
> Server+Decommissioning+Checklist
Is there any way I can be given access to view that page? It would have saved a bunch of headscratching on my part as to figure out which bugs I had to file - and I get a "page not found" at the moment for it. I totally get that some pages need to be private, but the permissions are too strict IMO for many pages under SYSADMIN at the moment :-(
Comment 12•10 years ago
|
||
I'm not sure about permissions on that page, it's not really "mine" to change permissions on. I could export it and mail you a PDF of the page, but that doesn't really solve the problem that someone *like* you doesn't know what to do.
In general, if you have questions about IT stuff, #it is a great place to ask, too. I wish I had a better answer :(
Usually with decoms, everyone is in the loop and you could just say "OK now that we no longer need it, how do we decom this service?" Kind of like how you say "we have this new service, how do we get it into production?"
Comment 13•10 years ago
|
||
DNS records for tbpl dbs and the VIPs for tbpl dbs are deleted.
Deleted the Zeus LB pool and virtual server entries for tbpl-ro and tbpl-rw virtual IPs.
cc Shyam: I noticed that dm-tbpl01.mozilla.org is still in DNS and resolves to 10.2.74.89 - is that even still valid?
Flags: needinfo?(smani)
Comment 14•10 years ago
|
||
Changed status in inventory (2 machines are decom'd, 1 is a spare)
Removed networking from key/value pairs in inventory so DHCP entries are removed.
Deleted RHN profiles for these db machines.
Comment 15•10 years ago
|
||
Removed from puppet dashboard.
Removed dbs from puppet.
Comment 16•10 years ago
|
||
:fox2mike I can't needinfo you twice, but I'm noticing a lot of tbpl entries still in puppet manifests, like puppet/trunk/manifests/nodes/netops.pp and puppet/trunk/manifests/nodes/elasticsearch.pp
Can your team (webops) double-check and make sure only the puppet entries we *need* are still in there?
Comment 17•10 years ago
|
||
Created bug 1152345 for de-racking.
All work except the 2 items in comment 13 and comment 16 are complete. Once those 2 questions are addressed, this bug can be resolved.
Assignee | ||
Comment 18•10 years ago
|
||
I assigned the bug for me to do it. Thought I got sick so you guys beat me to it.
Comment 19•10 years ago
|
||
Ludo - no worries! I jumped on this because tbpl db's have always been a problem, so it gave me SUCH JOY to kill 'em. :D
Glad you're feeling better!
Reporter | ||
Comment 20•10 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #12)
> Usually with decoms, everyone is in the loop and you could just say "OK now
> that we no longer need it, how do we decom this service?" Kind of like how
> you say "we have this new service, how do we get it into production?"
Makes sense, thank you. Just wanted to avoid being lazy :-)
Comment 21•10 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #13)
> cc Shyam: I noticed that dm-tbpl01.mozilla.org is still in DNS and resolves
> to 10.2.74.89 - is that even still valid?
Seems like someone has already deleted this. I've removed the PTR record as well and no, sjc1 is gone and 10.2 isn't valid.
(In reply to Sheeri Cabral [:sheeri] from comment #16)
> :fox2mike I can't needinfo you twice, but I'm noticing a lot of tbpl entries
> still in puppet manifests, like puppet/trunk/manifests/nodes/netops.pp and
> puppet/trunk/manifests/nodes/elasticsearch.pp
A needinfo set just a day ago and your statement above is why there's a disable needinfo option in Bugzilla now :) You need to give people sometime to actually read and respond to bugs.
> Can your team (webops) double-check and make sure only the puppet entries we
> *need* are still in there?
Filed Bug 1153091 for netops to remove tbpl from smokeping configs
Filed Bug 1153092 for webops to figure out ES changes
Everything else is fine, I cleaned out the following files (they had the Zeus VIP in there) :
modules/webapp/manifests/admin/genericrhel6.pp
modules/webapp/manifests/genericrhel6/prod.pp
And deleted the following file :
modules/webapp/files/genericrhel6-dev/etc-httpd/domains/tbpl-passwd
in sysadmins r103364.
Reporter | ||
Comment 22•10 years ago
|
||
Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS resolution error (or equivalent) - expected?
Reporter | ||
Comment 23•10 years ago
|
||
(In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #22)
> Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS
> resolution error (or equivalent) - expected?
Sorry meant to comment on bug 1152225.
Comment 24•10 years ago
|
||
(In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #23)
> (In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #22)
> > Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS
> > resolution error (or equivalent) - expected?
>
> Sorry meant to comment on bug 1152225.
Replied there, but for history sake on this bug, yes *.allizom.org will resolve to an IP in DNS. For example :
shyam@katniss ~ $ host foobar.allizom.org
foobar.allizom.org has address 63.245.217.83
shyam@katniss ~ $ host testing.allizom.org
testing.allizom.org has address 63.245.217.83
Comment 25•10 years ago
|
||
Resolving, as there are bugs for the open issues, but the dbs themselves are decom'd.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•