Closed Bug 675712 Opened 13 years ago Closed 13 years ago

Set up Elastic Search indexing on two production processors

Tracking

(Not tracked)

Status:

RESOLVED INCOMPLETE

People

(Reporter: laura, Assigned: jason)

References

Details

Laura Thomson :laura

Reporter

Description

•

13 years ago

adrian/rhelmer can give config instructions, but let's get this underway. We're going to set two processors to index reports into ES, so we can watch performance of those nodes compared to the others. We also need the ES VIP set up (bug 673507).

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Blocks: 651279

Anurag Phadke[:aphadke@mozilla.com]

Comment 1

•

13 years ago

Until the VIP is set, feel free to point on all of the 4 node-ids in round robin fashion. hp-node6[1-4].phx1.mozilla.com:9999/queue/tasks/ : 9200

Anurag Phadke[:aphadke@mozilla.com]

Comment 2

•

13 years ago

The above URLs are part of a cluster, there shouldn't be any delay in indexing if you hit just one node..

Laura Thomson :laura

Reporter

Comment 3

•

13 years ago

Ping?

Justin Dow [:jabba]

Comment 4

•

13 years ago

Jason should have the vip up soon. I'd also really strongly recommend not doing this until we have the new dedicated ElasticSearch hardware. It feels like we have enough problems with hadoop on the existing hp-nodes and introducing even more production services on non-redundant-by-design hardware feels like a bad idea. Not to mention we are waiting for the other hardware so that we can claim hp-nodes61-70 as stage hadoop nodes, otherwise this will block the staging environment too. This is just my opinion.

Assignee: jdow → jthomas

Component: Socorro → Server Operations

Product: Webtools → mozilla.org

QA Contact: socorro → mrz

Version: 1.7 → other

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 5

•

13 years ago

I understand that the hardware is non-redundant-by-design, but ES is a clustered application with redundancy built into it, so suffering the lose of a node won't cause any noticeable impact. We also have a few extra nodes available so we can handle the loss of more than one node if it comes to that. If we have some sort of catastrophic failure, it won't have a major impact on Socorro production as we are not looking to cut over to using ES exclusively right now. We need to get some high volume real world use of the cluster to validate the performance testing we have done so far, especially since we want extremely specific criteria for ordering the new hardware rather than just throwing "more than enough" at it. We can't get that detailed performance data without having something that looks very much like real world to base it on.

Justin Dow [:jabba]

Comment 6

•

13 years ago

So, just 2 hours after I posted comment 4, hp-node64 dropped offline due to a failed /dev/sda disk. Given our SLA on those servers, etc. I expect a 2-3 weeks before it is back online, maybe longer since one of our ops guys will be on vacation. If these SLAs and failure scenarios are ok with Laura and the socorro team, then we can proceed with setting up the VIP and the service. Jason is point on this, and he's also working on the new stage build-out, so we'll have to prioritize accordingly.

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 7

•

13 years ago

They are fine for Metrics. We have 5 nodes still part of the cluster. We could likely bring three more into the cluster (67 68 69) depending on their health. Immediately upon the loss of a node, the cluster will begin rebalancing any under-replicated shards to the remaining nodes until the health of the cluster is green again. If we lose so many of the nodes that we run out of disk space on the remaining nodes, then we can reconfigure the processors to not send data to the ES cluster until we have the new hardware ordered and in place.

Jason Thomas [:jason]

Assignee

Comment 8

•

13 years ago

@adrian/rhelmer could you please provide configuration instructions?

Whiteboard: allhands

Jason Thomas [:jason]

Assignee

Updated

•

13 years ago

Whiteboard: allhands

Jason Thomas [:jason]

Assignee

Comment 10

•

13 years ago

Please reopen when we have more information.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → INCOMPLETE

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Set up Elastic Search indexing on two production processors

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: laura, Assigned: jason)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 10

Updated