Closed Bug 1694586 Opened 4 years ago Closed 3 years ago

Pernosco doesn't notice manually triggered tests that fail

Categories

(Developer Infrastructure :: Try, defect)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: whimboo, Unassigned)

Details

As I and Julien have recently seen the mach command mach try fuzzy --pernosco doesn't seem to work anymore. A try build that I pushed is:

https://treeherder.mozilla.org/jobs?repo=try&revision=87e87ab9a5012cd96ac4f55a8727ccdf40abb9c7&selectedTaskRun=bRHDZTayTGCT58YXe2kC8g.0

As Kyle replied on Matrix:

I also don't see any evidence in our logs that you triggered a reproduction the other day

PERNOSCO=1 is set in the try, so as far as mach try fuzzy --pernosco is concerned, everything that is supposed to happen has happened.

So the question is, what is pernosco expecting from what subsystem?

Flags: needinfo?(khuey)

I've turned on additional logging. Can somebody push another --pernosco push for testing?

Flags: needinfo?(khuey) → needinfo?(hskupin)

julienw verified that this works. He pushed with --pernosco and was able to get an email.

What I think actually happened to whimboo is that we only look at the jobs specified in the task_graph.json for failures, and his failure appears to have come after manually retriggering a job many times. Is that accurate?

Flags: needinfo?(hskupin)

Oh yes, that's indeed true. So how do I push with a huge amount of retries? Might have to find this out on Friday.

Flags: needinfo?(hskupin)
Summary: "mach try fuzzy --pernosco" seems to be broken → Pernosco doesn't notice manually triggered tests that fail

The thing is once Pernosco notices a failure automatically it's only going to retry it once (up to three times if you manually trigger it through self-service) so if you have a failure that occurs once in twenty tries fixing this won't really help you. (In other words, with this fixed, you would retrigger a bunch, eventually hit a failure, Pernosco would see it, retry it once under rr, probably not hit the failure, and give up.) I think you're better off manually triggering a bunch of runs through self-service from the beginning.

glandium points out on chat that currently self-service requires a failure before you have a chance to reproduce something, so what I described above actually doesn't work.

Kyle, so is there any way right now that would allow me to somehow trigger bug 1663533 for a Pernosco session? It's not happening that often and as such will be harder to reproduce.

Flags: needinfo?(khuey)

Do you know a revision it can be reproduced on (even if infrequently)? From the discussion in bug 1663533 it sounds like it's been papered over on trunk, though I can't see bug 1660307 to be sure.

Flags: needinfo?(khuey) → needinfo?(hskupin)

I still see a high failure rate for mozilla-central / autoland:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2021-02-09&endday=2021-03-11&tree=trunk&bug=1663533

As such I don't think that something has been landed via bug 1660307 yet.

It's interesting that for the following build we saw the crash twice:
https://treeherder.mozilla.org/jobs?repo=mozilla-central&revision=b332567cbbcaa6e2b70bfe5449410e9cfb8b838f&searchStr=remote%2Cmochi

Maybe that would be a good candidate to try reproducing it.

Note that I'm going to work on bug 1609162 soon, which might also make this crash disappear (without fixing the core issue).

Flags: needinfo?(hskupin)

I ran that test by itself 250 times on that build and didn't come up with any failures in our infrastructure. :(

Thanks a lot, but that's also sad to hear. So I assume it's most likely related to our Linux docker image for tests (Ubuntu 18.04). And in that case we should just drop attempts to reproduce it. If I can find something while working on bug 1609162 I will let you know.

We use the same docker image so that shouldn't be it.

Not sure what else can be done here. Shall we close this bug?

Closing as WFM given that re-triggers via the Treeherder UI aren't taken into account.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Product: Firefox Build System → Developer Infrastructure
You need to log in before you can comment on or make changes to this bug.