Pernosco doesn't notice manually triggered tests that fail
Categories
(Developer Infrastructure :: Try, defect)
Tracking
(Not tracked)
People
(Reporter: whimboo, Unassigned)
Details
As I and Julien have recently seen the mach command mach try fuzzy --pernosco
doesn't seem to work anymore. A try build that I pushed is:
As Kyle replied on Matrix:
I also don't see any evidence in our logs that you triggered a reproduction the other day
Comment 1•4 years ago
|
||
PERNOSCO=1 is set in the try, so as far as mach try fuzzy --pernosco
is concerned, everything that is supposed to happen has happened.
So the question is, what is pernosco expecting from what subsystem?
Comment 2•4 years ago
|
||
I've turned on additional logging. Can somebody push another --pernosco push for testing?
Comment 3•4 years ago
|
||
I just pushed one: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc35bf97a856074bc1abd5f1374e4ba87ff6c6b4
Comment 4•4 years ago
|
||
julienw verified that this works. He pushed with --pernosco and was able to get an email.
What I think actually happened to whimboo is that we only look at the jobs specified in the task_graph.json for failures, and his failure appears to have come after manually retriggering a job many times. Is that accurate?
Reporter | ||
Comment 5•4 years ago
|
||
Oh yes, that's indeed true. So how do I push with a huge amount of retries? Might have to find this out on Friday.
Updated•4 years ago
|
Comment 6•4 years ago
|
||
The thing is once Pernosco notices a failure automatically it's only going to retry it once (up to three times if you manually trigger it through self-service) so if you have a failure that occurs once in twenty tries fixing this won't really help you. (In other words, with this fixed, you would retrigger a bunch, eventually hit a failure, Pernosco would see it, retry it once under rr, probably not hit the failure, and give up.) I think you're better off manually triggering a bunch of runs through self-service from the beginning.
Comment 7•4 years ago
|
||
glandium points out on chat that currently self-service requires a failure before you have a chance to reproduce something, so what I described above actually doesn't work.
Reporter | ||
Comment 8•4 years ago
|
||
Kyle, so is there any way right now that would allow me to somehow trigger bug 1663533 for a Pernosco session? It's not happening that often and as such will be harder to reproduce.
Comment 9•4 years ago
|
||
Do you know a revision it can be reproduced on (even if infrequently)? From the discussion in bug 1663533 it sounds like it's been papered over on trunk, though I can't see bug 1660307 to be sure.
Reporter | ||
Comment 10•4 years ago
|
||
I still see a high failure rate for mozilla-central / autoland:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2021-02-09&endday=2021-03-11&tree=trunk&bug=1663533
As such I don't think that something has been landed via bug 1660307 yet.
It's interesting that for the following build we saw the crash twice:
https://treeherder.mozilla.org/jobs?repo=mozilla-central&revision=b332567cbbcaa6e2b70bfe5449410e9cfb8b838f&searchStr=remote%2Cmochi
Maybe that would be a good candidate to try reproducing it.
Note that I'm going to work on bug 1609162 soon, which might also make this crash disappear (without fixing the core issue).
Comment 11•4 years ago
|
||
I ran that test by itself 250 times on that build and didn't come up with any failures in our infrastructure. :(
Reporter | ||
Comment 12•4 years ago
|
||
Thanks a lot, but that's also sad to hear. So I assume it's most likely related to our Linux docker image for tests (Ubuntu 18.04). And in that case we should just drop attempts to reproduce it. If I can find something while working on bug 1609162 I will let you know.
Comment 13•4 years ago
|
||
We use the same docker image so that shouldn't be it.
Reporter | ||
Comment 14•4 years ago
|
||
Not sure what else can be done here. Shall we close this bug?
Reporter | ||
Comment 15•3 years ago
|
||
Closing as WFM given that re-triggers via the Treeherder UI aren't taken into account.
Updated•2 years ago
|
Description
•