Closed Bug 1477097 Opened 6 years ago Closed 4 years ago

task stuck as unscheduled even with all deps resolved, possible issue with queue dependency resolver

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jlund, Unassigned)

References

Details

Attachments

(1 file)

dustin> jlund: from poking around i didn't find something that was "unscheduled" but had all "complete" dependencies, but obviously i didn't look at all of the tasks :) jlund> Jordan Lund dustin: okay, I walked the dep tree back to: https://tools.taskcluster.net/groups/JZ3EEfOLQ2KlDPO2a3RqJw/tasks/ZWRKPgonRayyySrhHfnp7w/details dustin> ok 15:14:49 the queue dependency resolver listens to a queue 15:14:57 like an Azure message queue 15:15:03 so it's possible that a message got lost in Azure 15:16:30 it's hard to see otherwise why this would happen once but not all the time 15:16:40 so, yeah, I guess just kick it with tc-cli
I had a peek, and couldn't see anything wrong. The dependencyResolver is chugging along as it has for years.
Jordan, has this happened again?
Flags: needinfo?(jlund)
No, there were two tasks within that group that had this issue. Aside from this release, non of us have seen this before or after. Feel free to triage this (closing even) in whatever way works best for your workflow
Flags: needinfo?(jlund)
Let's re-open if we see it again..
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Component: Queue → Services
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → DUPLICATE

based on https://bugzilla.mozilla.org/show_bug.cgi?id=1527583#c13 I suspect dustin didn't mean to dupe this against bug 1527583 or least now wants them separated.

Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---

I think this was due to a bug in taskcluster-lib-iterate, which has since been rewritten..

Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

We've seen another occurence of this for https://firefox-ci-tc.services.mozilla.com/tasks/CvhMdWSLTwKbT9AyBywwng
All deps green yet it didn't start. We force run it.

I've seen a few examples like this in my load-testing. I think it has to do with an Azure queue message "disappearing" for a while and then appearing. The service is eventually consistent, so this is within the definition of the service.

It looks like https://firefox-ci-tc.services.mozilla.com/tasks/CvhMdWSLTwKbT9AyBywwng started at 2020-01-17T10:08:36.116Z after its last dependent https://firefox-ci-tc.services.mozilla.com/tasks/f5epaouoSy2-QOxKtbUUaw finished at 2020-01-17T04:53:09.383Z, so about 5 hours' difference -- that's a long time for an Azure queue!

Let's see if this is better, or at least more inspectible, with Postgres (bug 1436478)

Depends on: 1436478

Another occurrence of this in Devedition 73.0b7 graph - task https://firefox-ci-tc.services.mozilla.com/tasks/W83g2g8sS5q27wCVb8cskw has all its deps completed and green yet it doesn't start.

Same graph, another task - https://firefox-ci-tc.services.mozilla.com/tasks/S-QF2VWJSJSxz_2hFAvSJQ

Happened again for Devedition 73.0b9 - quite a lot of tasks this time:

  • https://firefox-ci-tc.services.mozilla.com/tasks/aYk7_VS7RlasuamhBDGIig
  • https://firefox-ci-tc.services.mozilla.com/tasks/aaM9JoNoSBmH7miQtgRlvA
  • partials-signing-bs-linux-devedition-nightly/opt RnoV38FqS9mvNv9n_7Ba_w unscheduled
    partials-signing-da-linux64-devedition-nightly/opt KFpAP15TRM-Sha6xCOXIJg unscheduled
    partials-signing-en-GB-win64-aarch64-devedition-nightly/opt Y-HEiwIfQSKrqxHzRvpRbQ unscheduled
    partials-signing-eo-macosx64-devedition-nightly/opt BDHiyRf6TY6a_mgC4M1trQ unscheduled
    partials-signing-gn-win64-aarch64-devedition-nightly/opt CTFs_eRDSgGCORkxcElKLw unscheduled
    partials-signing-pa-IN-macosx64-devedition-nightly/opt LvHjxDEhTImhG0AJh9-kNQ unscheduled
    partials-signing-son-win64-devedition-nightly/opt IotllNVaTHiz63bIcLhijA unscheduled
    partials-signing-th-win64-aarch64-devedition-nightly/opt FAHck9GpT_yyD7fUI4IHFQ unscheduled
    partials-signing-ur-win64-aarch64-devedition-nightly/opt QSxtYn9STGa-dmR56BgTcA unscheduled
    partials-signing-uz-linux-devedition-nightly/opt HGyQc8oNSeuxwNkDJY-Nng unscheduled

As well on Firefox 73.0b9:

Bug 1477097 has some (inconclusive) investigation..

Ignore me, sorry. Different issue.

No sign of this since April 18, when we switched from Azure to Postgres? Reopen if this has been seen in that time range..

Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → WORKSFORME

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #19)

No sign of this since April 18, when we switched from Azure to Postgres? Reopen if this has been seen in that time range..

Hey I've been on leave but saw this in my inbox. I'm really glad this bug no longer bites us. Thanks!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: