Closed
Bug 675712
Opened 13 years ago
Closed 13 years ago
Set up Elastic Search indexing on two production processors
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: laura, Assigned: jason)
References
Details
adrian/rhelmer can give config instructions, but let's get this underway.
We're going to set two processors to index reports into ES, so we can watch performance of those nodes compared to the others.
We also need the ES VIP set up (bug 673507).
Comment 1•13 years ago
|
||
Until the VIP is set, feel free to point on all of the 4 node-ids in round robin fashion.
hp-node6[1-4].phx1.mozilla.com:9999/queue/tasks/ : 9200
Comment 2•13 years ago
|
||
The above URLs are part of a cluster, there shouldn't be any delay in indexing if you hit just one node..
Reporter | ||
Comment 3•13 years ago
|
||
Ping?
Comment 4•13 years ago
|
||
Jason should have the vip up soon. I'd also really strongly recommend not doing this until we have the new dedicated ElasticSearch hardware. It feels like we have enough problems with hadoop on the existing hp-nodes and introducing even more production services on non-redundant-by-design hardware feels like a bad idea. Not to mention we are waiting for the other hardware so that we can claim hp-nodes61-70 as stage hadoop nodes, otherwise this will block the staging environment too.
This is just my opinion.
Assignee: jdow → jthomas
Component: Socorro → Server Operations
Product: Webtools → mozilla.org
QA Contact: socorro → mrz
Version: 1.7 → other
Comment 5•13 years ago
|
||
I understand that the hardware is non-redundant-by-design, but ES is a clustered application with redundancy built into it, so suffering the lose of a node won't cause any noticeable impact. We also have a few extra nodes available so we can handle the loss of more than one node if it comes to that. If we have some sort of catastrophic failure, it won't have a major impact on Socorro production as we are not looking to cut over to using ES exclusively right now.
We need to get some high volume real world use of the cluster to validate the performance testing we have done so far, especially since we want extremely specific criteria for ordering the new hardware rather than just throwing "more than enough" at it. We can't get that detailed performance data without having something that looks very much like real world to base it on.
Comment 6•13 years ago
|
||
So, just 2 hours after I posted comment 4, hp-node64 dropped offline due to a failed /dev/sda disk. Given our SLA on those servers, etc. I expect a 2-3 weeks before it is back online, maybe longer since one of our ops guys will be on vacation.
If these SLAs and failure scenarios are ok with Laura and the socorro team, then we can proceed with setting up the VIP and the service. Jason is point on this, and he's also working on the new stage build-out, so we'll have to prioritize accordingly.
Comment 7•13 years ago
|
||
They are fine for Metrics. We have 5 nodes still part of the cluster. We could likely bring three more into the cluster (67 68 69) depending on their health.
Immediately upon the loss of a node, the cluster will begin rebalancing any under-replicated shards to the remaining nodes until the health of the cluster is green again.
If we lose so many of the nodes that we run out of disk space on the remaining nodes, then we can reconfigure the processors to not send data to the ES cluster until we have the new hardware ordered and in place.
Assignee | ||
Comment 8•13 years ago
|
||
@adrian/rhelmer could you please provide configuration instructions?
Whiteboard: allhands
Assignee | ||
Updated•13 years ago
|
Whiteboard: allhands
Assignee | ||
Comment 10•13 years ago
|
||
Please reopen when we have more information.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INCOMPLETE
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•