620596 - Need some form of queue for posting of results to graphs server

Reporter

Description

•

14 years ago

If talos machines are unable to reach the graphs DB, tests fail (as of now) and the tree starts showing up red. This should be modified to a queue system which will change the colour to notify of possible issues with uploading to graphs and re-try every x minutes over y hours before failing and making it go red. Opinions? Thoughts? I'm CC'ing zandr since he's going to have a talk to joduinn about this in person.

Shyam Mani [:fox2mike]

Reporter

Comment 1

•

14 years ago

Also, I understand that this might not be the desired behaviour, but would like some discussion before we decide one way or another.

bhearsum@mozilla.com (:bhearsum)

Comment 2

•

14 years ago

I think this a Graph Server bug, not necessarily RelEng one.

Component: Release Engineering → Graph Server

Product: mozilla.org → Webtools

QA Contact: release → graph.server

Shyam Mani [:fox2mike]

Reporter

Comment 3

•

14 years ago

Not really. Who handles the part where the talos machines write to the graph server? RelEng I bet :) 19:24:14 < fox2mike> bhearsum: who handles the code that makes the talos machines contact the graphs server? 19:24:23 < bhearsum> releng 19:24:27 < bhearsum> it should be a server side queue, though

Component: Graph Server → Release Engineering

Product: Webtools → mozilla.org

QA Contact: graph.server → release

bhearsum@mozilla.com (:bhearsum)

Comment 4

•

14 years ago

Fine, this can stay here. I still don't believe that such a queue should be disassociated with the server, though.

Shyam Mani [:fox2mike]

Reporter

Comment 5

•

14 years ago

If the graphs server team handles talos code, I'd be happy to pass this to them :) I'm not the one decide where release engg related bug go, so I'll defer to you guys on that :D

Zandr Milewski [:zandr]

Comment 6

•

14 years ago

We're bikeshedding about the wrong thing here. 1) graphserver as SPOF is not new. I agree that some form of redundant collector would be good, but only if we can do it without turning graphserver into Son of Socorro. 2) Talos machines don't have any long-term persistence. They reboot and clobber with great frequency. So if the results aren't posted, they're gone. As such, this is desired behavior in the current world. There is a whole different discussion about distinguishing between failed tests and failed testers, but that's Not Trivial. 2) The recent breakage was caused by graphserver posting to the AMO db, and apparently doing that synchronously with the post from the slave. That is insane, unacceptable, and the response to https://bugzilla.mozilla.org/show_bug.cgi?id=620570#c10 is where that conversation will take place.

bhearsum@mozilla.com (:bhearsum)

Comment 7

•

14 years ago

I think that the most basic solution here is a message queue, with the Talos results being producers, and the graph server as a consumer. Might need some adjustment to Talos/unittests if graph server sends back any data.

Chris AtLee [:catlee]

Updated

•

14 years ago

Component: Release Engineering → Talos

Product: mozilla.org → Testing

QA Contact: release → talos

Version: other → Trunk

Jeff Hammel

Updated

•

12 years ago

Component: Talos → Webdev

Product: Testing → mozilla.org

Version: Trunk → other

Jeff Hammel

Comment 8

•

12 years ago

So this is a graphserver issue. The queue needs to be graphserver side, not Talos side (potentially, it could live elsewhere in infrastructure as well, but I'm guessing graphserver makes the most sense). That said, graphserver is going to be replaced with datazilla, which already has such a queuing system in place

Fred Wenzel [:wenzel]

Updated

•

12 years ago

Component: Webdev → Graph Server

Product: mozilla.org → Webtools

Nobody; OK to take it and work on it

Assignee

Updated

•

8 years ago

Product: Webtools → Webtools Graveyard

Bugzilla

Quick Search

Need some form of queue for posting of results to graphs server

Categories

(Webtools Graveyard :: Graph Server, defect)

Tracking

(Not tracked)

People

(Reporter: fox2mike, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Comment 8

Updated

Updated