Closed Bug 1574651 Opened 5 years ago Closed 5 years ago

Investigate ingesting pulse messages from multiple TC deployments

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/5324 5 years ago GitHub Bugzilla PR Linker (deleted), text/x-github-pull-request		Details
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/5399 5 years ago GitHub Bugzilla PR Linker (deleted), text/x-github-pull-request		Details
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/5405 5 years ago GitHub Bugzilla PR Linker (deleted), text/x-github-pull-request		Details

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Description

•

5 years ago

I know, it scares me too.

Servo will be moving to the community Taskcluster deployment, while Treeherder is currently set to consume from pulse, which contains only data from the firefox-ci Taskcluster deployment.

Servo reports its results to treeherder and lacks another good solution for displaying status.

What would be involved in consuming messages from a second RabbitMQ cluster (or exchange, at least) and generating build status for those messages? I think the tricky bit will be that the rootURL at which to find the status of those tasks will differ, so we'll need to be careful and make sure all URLs point to the right place.

I'll spend some time investigating this, and may bend Armen's ear to validate my findings.

Karl Thiessen [:kthiessen, he/him]

Comment 1

•

5 years ago

I fully endorse you and Armen working together as much as needed to get this working.

Armen [:armenzg]

Comment 2

•

5 years ago

What will the new rootUrl will look like?
What would the new exchanges look like? Are they available now?
Are the messages going to keep the same schema?
When will this be needed for?
I believe this would be possible without a lot of code changes. Testing it will be a bit cumbersome.

There's few instances of rootUrl in here:
https://github.com/mozilla/treeherder/pull/5042/files

The command ./manage.py ingest_push_and_tasks is not fully mature and does not currently have support for Git branches.
However, you can still test ingesting a task at a time.

Armen [:armenzg]

Comment 3

•

5 years ago

I currently see the current implementation to work for the following:
https://treeherder.mozilla.org/#/jobs?repo=servo-master
https://treeherder.mozilla.org/#/jobs?repo=servo-auto

I don't see it fully working for:
https://treeherder.mozilla.org/#/jobs?repo=servo-try
https://treeherder.mozilla.org/#/jobs?repo=servo-prs

What's the difference between all four? What are the last two used for?

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 4

•

5 years ago

(In reply to Armen [:armenzg] from comment #2)

What will the new rootUrl will look like?

Current leading candidate is https://community-tc.services.mozilla.com

What would the new exchanges look like? Are they available now?

The exchanges would be the same, just on a different RabbitMQ instance.

Are the messages going to keep the same schema?

Yes

When will this be needed for?

Before we move servo to the new deployment, which will occur before we move Firefox-CI to its new deployment.

I believe this would be possible without a lot of code changes. Testing it will be a bit cumbersome.

There's few instances of rootUrl in here:
https://github.com/mozilla/treeherder/pull/5042/files

The command ./manage.py ingest_push_and_tasks is not fully mature and does not currently have support for Git branches.
However, you can still test ingesting a task at a time.

Thanks -- I will start by looking at those files.

Simon Sapin (:SimonSapin)

Comment 5

•

5 years ago

Homu is the tool Servo uses to manage pull requests in https://github.com/servo/servo/. Based on certain commands in github comments, it can create a merge commit and push it to a branch to trigger some tasks:

The auto branch, where if all tests report green (through the GitHub Status API), Homu will then push to master (which in turn triggers another couple of tasks).
The try branch where all tests are run and results are reported as a PR comment, but nothing is merged to master
Various try-FOO branches where a subset of the of tests are run

Servo feeds job/task data to Treeherder by setting a tc-treeherder.v2._/{repository}/{commitSha} route when creating tasks. The the queue service takes care of sending Pulse messages that Treeherder ingests. For tasks created in response to a push to a branch (which was typically made by Homu), the Treeherder "repository" name is set to servo-{branchName} except if the branch is one of try-* in which case it is set to servo-try. The logic for this is in .taskcluster.yml for the decision task, and in decision_task.py and in decisionlib.py for other tasks.

For tasks created in response to a pull request event, the Treeherder "respository" name is set to servo-prs (in .taskcluster.yml and decision_task.py).

There’s also a daily hook. I see that we’re missing a Treeherder route for that decision task. But then that decision task goes through the same code path as a push to the master branch when creating other tasks, so those go to Treeherder’s servo-master.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 6

•

5 years ago

Treeherder inserts the taskId into its DB, and then uses that in the context of the configured rootUrl both on the frontend and backend, mostly to support actions. So, that would need some refactoring too.

I made an appointment for Cam, Armen, and I to talk on Thursday.

Simon Sapin (:SimonSapin)

Comment 7

•

5 years ago

jgraham made a good point on IRC: we don’t need to track a Taskcluster rootUrl for each task, only one per repository.

Cameron Dawson [:camd]

Comment 8

•

5 years ago

Dang, I apologize for missing this meeting. I chimed in on IRC, but here's the distillation of my own 2 cents, some of these points already made by others:

I think keeping a single Treeherder instance would be best. Something might come up to make having a separate Treeherder a better option, but I can't think of anything right now.

As James said, we should just make sure to add the rootUrl to the repositories fixture/schema and build our urls from there. We have Taskcluster urls semi-hard-coded in a few places (like runnable-jobs.json, task inspector, etc). So same deal there. That's pretty straightforward.

Edits after thinking more:

And then we'd just need to be able to read messages from both exchanges. Sounds like the task/push schema is going to be the same, so that's going to be straightforward. I think it's just a matter of adding the new exchange to our Pulse queues. But I don't know if there are complications of reading from Pulse exchanges and another RabbitMQ instance. I think that should be fine. But I've never tried it.

So this all sounds mostly like a pretty straight shot. :)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 9

•

5 years ago

As I got started on this, I realized that this is going to interact poorly with the current sign-in implementation, which only works with one thing (Mozilla's Auth0, basically). So it will only have creds for one deployment. There's work going on regarding third-party login and that may offer a solution: when a user clicks or hits a key to trigger an action, then Treeherder should check if it has TC credentials for that rootUrl, and if not begin the third-party signin process, if such a thing is configured for that rootUrl. That can happen in a new tab, and will proceed without user interaction in most cases.

Until then, I'll just set things up so actions only work in repos with rootUrl https://taskcluster.net.

GitHub Bugzilla PR Linker

Comment 10

•

5 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/5324 (deleted) — Details

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 11

•

5 years ago

Next up:

ingest from multiple pulse exchanges, each for a different rootUrl

Cameron Dawson [:camd]

Updated

•

5 years ago

Status: NEW → ASSIGNED

Cameron Dawson [:camd]

Updated

•

5 years ago

Priority: -- → P2

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 12

•

5 years ago

I have some work on the next step already. I'll wrap that up and make a PR.

GitHub Bugzilla PR Linker

Comment 13

•

5 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/5399 (deleted) — Details

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 14

•

5 years ago

https://github.com/mozilla/treeherder/pull/5399 for part 2.

GitHub Bugzilla PR Linker

Comment 15

•

5 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/5405 (deleted) — Details

Cameron Dawson [:camd]

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1578524

Cameron Dawson [:camd]

Comment 16

•

5 years ago

This PR will also fix Bug 1578524. Please mark it as fixed when this is merged.

Cameron Dawson [:camd]

Updated

•

5 years ago

Blocks: 1578524

Sarah Clements [:sclements]

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1566190

Armen [:armenzg]

Comment 17

•

5 years ago

This got deployed this morning and I noticed the pulse queues overgrowing.

I have added the following variables:
PULSE_PUSH_SOURCES to [{"root_url": "https://taskcluster.net", "github": true, "hgmo": true, "pulse_url": <prod_url>}]
PULSE_PUSH_TASKS to [{"root_url": "https://taskcluster.net", "pulse_url": <prod_url>}]
ROOT_URL to https://taskcluster.net

Are these correct?

The queues are now back under control.

Hi camd,
I saw that you added and remove the variables; is there a reason for this?
https://cl.ly/b00bb096b096

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 18

•

5 years ago

That looks correct. They were added and removed because they conflicted with variables used by the previous version of Treeherder. I should not have re-used variable names like that -- sorry!

Thanks for catching this quickly and avoiding data loss!

Cameron Dawson [:camd]

Comment 19

•

5 years ago

Yeah, the code prior to this merge would detect if those vars were set. If so, it would stop ingesting the old way. But the new way wasn't complete till this merge. So I added the new vars in anticipation of a push of this code, but that caused a similar havoc. So I had to remove them to get back to normal.

My apologies for not touching base with you yesterday, Armen, about being sure these variables were set at deploy time. Fortunately, since the queue is now unbounded, we would not have lost data, even if you hadn't caught it so quick. And nice job figuring out what was needed to fix it.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 20

•

5 years ago

I think this is in production -- OK to close?

Armen [:armenzg]

Comment 21

•

5 years ago

Indeed. It did not get reverted.

Congratulations and thanks for fixing it!

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.