tasks on pre-migration pushes should allow retriggers, backfills etc.
Categories
(Tree Management :: Treeherder, defect, P1)
Tracking
(Not tracked)
People
(Reporter: aryx, Unassigned)
References
Details
(Keywords: perf-alert)
Attachments
(1 file)
(deleted),
text/x-github-pull-request
|
Details |
Retriggering a task which ran before the taskcluster migration fails:
Error message: "Wrong version of actions.json, unable to continue"
404 for https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr68&selectedJob=275535527&revision=a43e93dedfe9c6a6f11fb339ac400f23fd606250
Performance and code sheriffs retrigger or backfill on older pushes to identify what caused a performance regression or triggered an intermittent failure to become frequent.
The failed Android nightlies are also because of that (decision task not found): https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr68&selectedJob=275535527&revision=a43e93dedfe9c6a6f11fb339ac400f23fd606250
Comment 1•5 years ago
|
||
A quick, short-term fix to this in TH would be to make a get_repo_root_url utility function which takes a push date and repo and returns repo.tc_root_url if the push date is after november 9, otherwise a literal https://taskcluster.net. That "hack" could be backed out some time later (a month or two).
Comment 2•5 years ago
|
||
Grabbing this, in an attempt to prepare a hotfix patch in advance.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 3•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #1)
A quick, short-term fix to this in TH would be to make a get_repo_root_url utility function which takes a push date and repo and returns repo.tc_root_url if the push date is after november 9, otherwise a literal https://taskcluster.net. That "hack" could be backed out some time later (a month or two).
Is this a Javascript-only fix? As I've spotted this line of code from treeherder/ui/models/runnableJob.js.
Reporter | ||
Comment 4•5 years ago
|
||
runnablejob is for adding a job to a push from the push menu ("Add new jobs").
The actions.json for retriggers gets called at https://github.com/mozilla/treeherder/blob/fd19ee836d6c4cfc5b49517341905f0b76e692a6/ui/models/taskcluster.js#L91
The queue
before that changed: https://github.com/mozilla/treeherder/commit/fd19ee836d6c4cfc5b49517341905f0b76e692a6#diff-fe5fd8a5bb1d673a87d37e9cb40f606cL90
Which changed at https://github.com/mozilla/treeherder/commit/fd19ee836d6c4cfc5b49517341905f0b76e692a6#diff-e107b709163f8d9808b192da65a01059L17
Comment 5•5 years ago
|
||
I'm able to reproduce locally, when using yarn start
.
Comment 6•5 years ago
|
||
Yes, the rough idea is to replace
- const queue = taskcluster.getQueue(currentRepo.tc_root_url, testMode);
+ const tcwDate = new Date('2019-11-09');
+ const rootUrl = pushDate < tcwDate ? 'https://taskcluster.net' : currentRepo.tc_root_url;
+ const queue = taskcluster.getQueue(currentRepo.tc_root_url, testMode);
(and remove the redefinition of queue
on line 102).
From my look at the code, the hard part would be finding pushDate, particularly since it seems that TaskclusterModel.load is sometimes called with job=null.
Comment 7•5 years ago
|
||
Couldn't figure out how to prepare the hotfix for this. Cameron, could you take it from here?
Updated•5 years ago
|
Comment 8•5 years ago
|
||
Yep, I'm working on this now. Unfortunately, as Dustin stated, it is a little complicated to get that information in some places. Not terrible, I think. Just takes some needle-threading... It may take me a couple hours to get this patch going.
Updated•5 years ago
|
Comment 9•5 years ago
|
||
Cam, I'd suggest Dustin as a reviewer once you have a patch ready. I can be a "reviewer of last resort," but I trust Dustin's knowledge of this code much more than I trust mine.
Comment 11•5 years ago
|
||
Ping me when you've got something rough -- tom pointed out to me today that most old tasks won't run on the new deployment without some changes landed, so it's not entirely clear that this functionality will even be useful. If the patch is easy, we can land it and see -- but if it's hard, let's reconsider.
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
As far as credentials go, if you pass in 'taskcluster.net' to taskcluster.getCredentials
, its going to look for credentials indexed by that root url so if you want to use firefox-ci credentials instead (if I'm understanding what dustin is saying about running old tasks on the new deployment, not the legacy cluster) you'll need to add some logic there.
Comment 14•5 years ago
|
||
As I mentioned in comment 11, I'm not sure this is ever going to work -- the pushes from before the 9th don't have the necessary patches to work on the new deployment. So no matter what Treeherder does, the resulting retriggered jobs would fail.
I think the fix here may be to re-push those old revisions to try and get the perf data there. It would probably be useful to come up with a list of revs to cherry-pick onto those try pushes.
Let's talk about this tomorrow.
Comment 15•5 years ago
|
||
Hah, it looks like we don't have a good time to talk synchronously. So, bug comments will have to do.
Tom, is the assessment in the previous comment correct? Do you have a list of revs for cherry-picking?
igoldan, dave, aside from waving a magic wand and making this "just work", could we do anything else to make this easier?
Updated•5 years ago
|
Comment 16•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #15)
igoldan, dave, aside from waving a magic wand and making this "just work", could we do anything else to make this easier?
I don't know what other alternative I could ask for. Aryx, any thoughts?
Reporter | ||
Comment 17•5 years ago
|
||
Backfills seem to way to do it. Feel free to ping me to try to identify what caused a regression without bisecting pushes.
Patches needed for Try pushes of older revisions:
Those 6 patches:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&resultStatus=usercancel%2Crunnable&revision=803bfbf03582e5e48af743094d51afd31c44957d
And:
641fe2e15d6f::33f64c1ef3e4
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=usercancel%2Crunnable&fromchange=641fe2e15d6f1e046f4064af76d1b7c701baa469&tochange=33f64c1ef3e40dc619c5eb8e43c5967959f57b29
20901bdfa9bd::d609eb2a8fe2
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=usercancel%2Crunnable&revision=d609eb2a8fe2930c6c10dab295eb5106f5ba0bc1
Updated•5 years ago
|
Updated•5 years ago
|
Comment 18•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #15)
igoldan, dave, aside from waving a magic wand and making this "just work", could we do anything else to make this easier?
The impact to perf sheriffing has mostly passed as we're able to retrigger and backfill since the migration. I think the best we can do is to consider how Taskcluster infrastructure changes may affect the ability to detect and investigate performance regressions.
Comment 20•5 years ago
|
||
Close this bug as INVALID?
Updated•5 years ago
|
Updated•5 years ago
|
Updated•4 years ago
|
Description
•