Closed Bug 1625168 Opened 5 years ago Closed 5 years ago

Decision task frequently fails with mach try auto

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: sg, Assigned: ahal)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1625168 - [taskgraph] Raise exception when timing out waiting for bugbug service, r?marco 5 years ago Andrew Halberstadt [:ahal] (deleted), text/x-phabricator-request		Details

Simon Giesecke [:sg] [he/him]

Reporter

Description

•

5 years ago

In the last days, I experienced Decision Task failures/timeouts on try pushes frequently. Today, it only worked on the fourth attempt.

https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=294851879&revision=17bdf54eb556907310e1194393a0de6490a2daa4

https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=294855005&revision=beec17c2c8722532518477fc6e1faf3e1646f030

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a381ecb762e2fd51d9b2f38f0ebbcbe650f2b2a0&selectedJob=294858656

all failed, then finally

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fc665bd3fd471d955dd7c816b077c9165685970

succeeded (I changed the commit message on the last one, that's why it shows a different revision, but the content was exactly the same).

This has cost me a lot of time, not sure if others are affected as well.

Armen [:armenzg]

Comment 1

•

5 years ago

I've asked the Taskcluster team to look into this.
Treeherder mainly displays what happens on Taskcluster.

Armen [:armenzg]

Comment 2

•

5 years ago

Hi marco, ahal,
This seems to be an issue with mach try auto.

Is there a way to make it more obvious under which component/repo should issues be filed against?

Flags: needinfo?(mcastelluccio)

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Assignee

Comment 3

•

5 years ago

Fyi ./mach try auto is very experimental atm (we haven't announced it anywhere yet), so expect issues.

I think what's happening here is that for some reason the bugbug service is failing to compute the results for this push, then the taskgraph isn't propagating the error properly. It would also help if ./mach try auto enabled verbose logging in the Decision task to help see what's going on.

Blocks: smart-scheduling

Component: Treeherder: Infrastructure → Task Configuration

Flags: needinfo?(ahal)

Product: Tree Management → Firefox Build System

Version: --- → unspecified

Simon Giesecke [:sg] [he/him]

Reporter

Comment 4

•

5 years ago

Oh, interesting. Sorry I didn't mention that these used mach try auto. Since it didn't fail deterministically, I thought it were an infrastructure issue. (mach try auto is incredibly useful, so it would be really great if it worked reliably)

Summary: Decision task frequently fails → Decision task frequently fails with mach try auto

Andrew Halberstadt [:ahal]

Assignee

Comment 5

•

5 years ago

Yikes.. I forgot to increment i in the timeout code:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/optimize/bugbug.py#46

So my guess was correct. I'll fix the timeout so that this doesn't wait 30 minutes to fail. Though the underlying cause seems to be that the service just isn't processing this push (it presumably keeps returning 202).

Andrew Halberstadt [:ahal]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Andrew Halberstadt [:ahal]

Assignee

Comment 6

•

5 years ago

Attached file Bug 1625168 - [taskgraph] Raise exception when timing out waiting for bugbug service, r?marco (deleted) — Details

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → ahal

Status: NEW → ASSIGNED

Pulsebot

Comment 7

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cd0c3c759c83 [taskgraph] Raise exception when timing out waiting for bugbug service, r=marco

Marco Castelluccio [:marco]

Comment 8

•

5 years ago

(In reply to Simon Giesecke [:sg] [he/him] from comment #4)

Oh, interesting. Sorry I didn't mention that these used mach try auto. Since it didn't fail deterministically, I thought it were an infrastructure issue. (mach try auto is incredibly useful, so it would be really great if it worked reliably)

Have you seen failures with specific patches, or generically? I'm going to add more logging in the bugbug service so I can more easily find out what happens when things go wrong.

Flags: needinfo?(mcastelluccio)

Marco Castelluccio [:marco]

Comment 9

•

5 years ago

Just a suggestion, while this is still experimental, instead of pushing again you could retrigger the decision task.

Noemi Erli[:noemi_erli]

Comment 10

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cd0c3c759c83

Simon Giesecke [:sg] [he/him]

Reporter

Comment 11

•

5 years ago

(In reply to Marco Castelluccio [:marco] from comment #8)

(In reply to Simon Giesecke [:sg] [he/him] from comment #4)

Oh, interesting. Sorry I didn't mention that these used mach try auto. Since it didn't fail deterministically, I thought it were an infrastructure issue. (mach try auto is incredibly useful, so it would be really great if it worked reliably)

Have you seen failures with specific patches, or generically? I'm going to add more logging in the bugbug service so I can more easily find out what happens when things go wrong.

I am not completely sure, but I guess the failed attempt were all changing quite basic things in mfbt or xpcom/ds.

(In reply to Marco Castelluccio [:marco] from comment #9)

Just a suggestion, while this is still experimental, instead of pushing again you could retrigger the decision task.

Unfortunately, due to an issue with my account, I can't retrigger any tasks at the moment. Hope this will be resolved soon.

Marco Castelluccio [:marco]

Comment 12

•

5 years ago

I made quite a few improvements in the bugbug HTTP service, so this should be fixed.

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Keywords: leave-open

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Decision task frequently fails with mach try auto

Categories

(Firefox Build System :: Task Configuration, defect)

Tracking

(Not tracked)

People

(Reporter: sg, Assigned: ahal)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Attachment

General

Description

File Name

Content Type