1443503 - groupResolved notifications are often sent with a considerable delay

Reporter

Description

•

7 years ago

I'm often getting groupResolved notifications the day after a build and all associated tests have finished. The groupResolved notifications trigger a code coverage parsing task. This means the coverage data is sometimes lagging behind.

Marco Castelluccio [:marco]

Reporter

Updated

•

7 years ago

Summary: groupResolved notifcations are often sent with a considerable delay → groupResolved notifications are often sent with a considerable delay

Pete Moore [:pmoore][:pete]

Updated

•

7 years ago

Flags: needinfo?(bstack)

Marco Castelluccio [:marco]

Reporter

Comment 1

•

7 years ago

Not sure if bug 1387027 is related, but it might be.

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1387027

Brian Stack [:bstack]

Comment 2

•

7 years ago

Are we sure that the late taskGroupResolved

Assignee: nobody → bstack

Status: NEW → ASSIGNED

Flags: needinfo?(bstack)

Brian Stack [:bstack]

Comment 3

•

7 years ago

In irc today we talked about looking into taskGroup EEiuah8fQmO_vFfGZ5axgw. Indeed, the task group did resolve a day after most of the tasks finished (and I believe a day after anything you would see in treeherder finished). However, it does appear that the notification of taskGroupResolved which was received at 2018-03-06T18:15:40.982036+00:00 matches up with the time that the last task in the group was resolved which is BbmlqYN0SoWRe2vIIU7fPw. It resolved with deadline-exceeded at 2018-03-06T18:15:38.842Z. This is most likely because a build that the build this test depended on failed and it does appear that X06XTQYHRR-BG0wyxv1K9w failed. It seems like taskGroupResolved is working pretty much as expected although it does also seem like it is not proving to be a useful tool for you given how it works. We use taskGroupResolved in taskcluster-github but also listen for failed tasks simultaneously so it fits our use-case there. Could the right thing to do here to be to listen for taskFailed/taskException on any tasks in the group and use that knowledge to trigger coverage earlier when you know certain tasks won't end up running? You could also argue that we should change the semantics of the taskGroupResolved event but that might be a bit harder to do at this point. :jonasfj, any thoughts on this?

Flags: needinfo?(mcastelluccio)

Flags: needinfo?(jopsen)

Brian Stack [:bstack]

Comment 5

•

7 years ago

Maybe we ought to publish an event when all tasks are resolved other than those that depend on failed/exceptioned tasks?

Brian Stack [:bstack]

Comment 6

•

7 years ago

I feel like what we provide right now doesn't satisfy a simple/good use-case for task group events. Sorry to keep adding these in subsequent comments, I just keep thinking of more things to say :p

Marco Castelluccio [:marco]

Reporter

Comment 7

•

7 years ago

The best option for me would be to have the additional event, which might be useful in general and not just for code coverage. If that isn't feasible, I can implement what you do for taskcluster-github. It's a bit more complex as it requires to save state. The other option for me is to define a task that depends on all the code coverage tasks (with dummy tasks to avoid the 100 dependency limit) and then wait for the task-completed notification for this task.

Flags: needinfo?(mcastelluccio)

Jonas Finnemann Jensen (:jonasfj)

Comment 8

•

7 years ago

> The other option for me is to define a task that depends on all the code coverage tasks (with dummy tasks to avoid the 100 dependency limit) and then wait for the task-completed notification for this task. In this case you need to set task.requires = 'all-resolved' (instead of the default), see docs: https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/api#task Yet, if some task that the build depends on fails, you still wouldn't get your task running. @bstack, we could consider in the future that all dependent tasks should be resolved with exception: 'dependency-failed' whenever a dependent task fails and task.requires = 'all-completed' (as is default). We can't do this anytime soon, as it would have impact on remaining users of `queue.rerunTask`.

Flags: needinfo?(jopsen)

John Ford [:jhford] CET/CEST Berlin Time

Comment 9

•

6 years ago

It seems this bug hasn't seen activity in a while. Marco, Brian, do you know if this is still happening and causing trouble?

QA Contact: jhford

Marco Castelluccio [:marco]

Reporter

Comment 10

•

6 years ago

It's still happening, but since the delay is usually short it doesn't matter that much to me.

John Ford [:jhford] CET/CEST Berlin Time

Comment 11

•

6 years ago

In that case, let's tackle this as part of the move to postgres.

Depends on: 1436478

Priority: -- → P5

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: Queue → Services

Brian Stack [:bstack]

Updated

•

6 years ago

Assignee: bstack → nobody

Status: ASSIGNED → NEW

Dustin J. Mitchell [:dustin] (he/him)

Comment 12

•

4 years ago

We never really got to the bottom of any delay here -- the one case Brian looked at showed the notification being sent at the correct, if surprising, time.

For lack of data and lack of activity, after we've migrated everything to postgres, I'm going to close this as WORKSFORME.

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → WORKSFORME

Bugzilla

Quick Search

groupResolved notifications are often sent with a considerable delay

Categories

(Taskcluster :: Services, enhancement, P5)

Tracking

(Not tracked)

People

(Reporter: marco, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Updated

Comment 12