Closed Bug 1595623 Opened 5 years ago Closed 5 years ago

Create new worker pools for GCP POSIX builders

Categories

(Release Engineering :: Firefox-CI Administration, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Unassigned)

References

Details

Attachments

(3 files)

It looks like the existing worker pools for GCP builders got deleted as part of the redeployment over the weekend. No tier 3 builds have run since:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=3&searchStr=build&fromchange=6e19f038ae4d339067830ba7b30ffe1fdffb77e4&selectedJob=275458502

We were hoping to migrate build load from AWS to GCP this week in bug 1547111, but we need to validate that the tier 3 builds still work in the new deployment first.

We'll want to create new worker pools in GCP, probably with names close enough to the existing tier 1 worker pools that migration is easy.

I suggest maybe gecko-{level}/b-linux-gce?

Here's the last mozilla-history entry from Nov 8 (last Friday) showing the worker pool for gce/gecko-1-b-linux:

https://github.com/taskcluster/mozilla-history/blob/be98dfe4991a94981d4e66101141624ff465d249/WorkerPools/gce%E2%81%84gecko-1-b-linux

We'll need pools for all levels, of course.

The workers still exist in ci-config, but all disabled. There are a few things we need to for this:

  1. Clean up all the test workers, other than the gecko-<level>/...-gce workers
  2. Add some logic like get_aws_provider_config](https://hg.mozilla.org/ci/ci-admin/file/tip/ciadmin/generate/worker_pools.py#l56). There appears to be a bunch of duplication of things between each region. In particular, we should add support pulling images names from worker-image.yml, rather than having them inline.
  3. Adjust the remaining workers to use the appropriate provider-ids (the legacy gcp providerIds are different than the production ones), and remove the hack to disable non-aws providerIDs (this exists because of the aforementioned difference in providerIds.
Assignee: nobody → bstack
Status: NEW → ASSIGNED
Assignee: bstack → nobody
Status: ASSIGNED → NEW

I've added patches that show how to create images, but the old images that were generated are accessible to the production gcp projects, so I am unable to test this.

Attachment #9109203 - Attachment description: Bug 1595623: [DO-NOT-LAND] Sketch of worker-type definition. → Bug 1595623: Add gcp based workers;
Pushed by mozilla@hocat.ca: https://hg.mozilla.org/integration/autoland/rev/c3606c9d2d17 [firefox-ci] Update gcp workers to use new names; r=coop

Possibly related to the changes here we got a flurry of emails like this:

Worker Manager has encountered an error while trying to provision the worker pool gecko-3/b-linux-gcp:

Quota 'CPUS' exceeded.  Limit: 2400.0 in region us-central1.

ErrorId: fQuQpojlQICflD-MzMXsNw

It includes the extra information:

code: QUOTA_EXCEEDED

(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #10)

Quota 'CPUS' exceeded. Limit: 2400.0 in region us-central1.

Thanks, Nick. I'll talk to GCP and see whether we can increase our quota.

(In reply to Chris Cooper [:coop] pronoun: he from comment #12)

(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #10)

Quota 'CPUS' exceeded. Limit: 2400.0 in region us-central1.

Thanks, Nick. I'll talk to GCP and see whether we can increase our quota.

Filed bug 1598295 for this.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: