Closed Bug 1653450 Opened 4 years ago Closed 2 years ago

Reduce storage lifespan of artifacts

Categories

(Testing :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1649987

People

(Reporter: egao, Unassigned)

References

(Blocks 1 open bug)

Details

Synopsis

In the CI system, we currently assign artifacts certain lifespans depending on the branch.

In try, for most* artifacts the lifespan appears to be 28 days.
In mozilla-central, artifacts are given 1 year before they are removed.

Rough calculation states that mozilla spends tens of thousands of dollars per month in data storage costs alone.

Proposal

It is time to reconsider the storage lifespan of some task artifacts.

The hard question needs to be: does <artifact> need to be stored for 1 year? Can the liefspan be reduced to a lower value eg. 6 months without significant impact?

Justification

While there is no accurate storage cost breakdown (as of this writing), mozilla stores more than 1PB of data from CI runs. Every time a run produces some artifact or log, they are stored as part of this dataset.

Having a lifespan of 1 year allows users to access artifacts up to a 1 year after its initial run. This can be useful for long-running bugs or bisecting a tricky regression.

However, such situations are not the norm.

On top of that, there are types of artifacts that I suspect do not serve any use after the task run comes back green.
For example, in mozilla-central all language packs (search partial) are generated each push. There are in the ballpark of 90 languages Firefox supports. These are run on 6 platform variations, 2 times a day. Each of these tasks produce an artifact ~8.5MB in size.

So, in a given day, CI generates:
((((8.5 x 4) x 90) x 6) x 2) = 36720MB ~ 36.7GB of just the language pack artifacts.
Add on top the logs themselves, JSON dumps, etc. and the true figure is likely higher.

I see this being a meta bug and under which bugs will be filed that address specific artifacts, like the language pack that was mentioned in the description.

Blocks: cost-reduction
No longer blocks: 1573872

(In reply to Edwin Takahashi (:egao) from comment #0)

For example, in mozilla-central all language packs (search partial) are generated each push.

Only for every push for which the Nightly release promotion runs, so 2x day.

Dropping the .mars early might make sense because the update server won't serve them anymore.

:bhearsum, this example of .mar files and their expiration would be interesting to explore. I imagine this isn't a large offender like many of our storage needs, but would probably make up 30+ TB of data.

Flags: needinfo?(bhearsum)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

:bhearsum, this example of .mar files and their expiration would be interesting to explore. I imagine this isn't a large offender like many of our storage needs, but would probably make up 30+ TB of data.

I don't think there's any need for MARs attached to tasks to be stored for more than a week or so (maybe even only a day). We don't serve updates directly from them (they're served off of archive.m.o), and partial mars are generated from MARs stored on archive.m.o as well.

Callek can probably talk more about langpacks.

Flags: needinfo?(bhearsum) → needinfo?(bugspam.Callek)

great, so we could easily add .mar files (and the logs from the jobs?) to the cleanup of old artifacts. It would be nice to set a new expiration for these as it is a simple fix. Then cleanup the old data in the near future.

I took a quick stroll down to taskcluster/ci/partials/kind.yml and experimentally added:

expires-after: 3 days

The taskgraph generates but it seems that on try it is not possible to schedule partial tasks, so I can't verify if this actually works or not.

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #4)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

:bhearsum, this example of .mar files and their expiration would be interesting to explore. I imagine this isn't a large offender like many of our storage needs, but would probably make up 30+ TB of data.

I don't think there's any need for MARs attached to tasks to be stored for more than a week or so (maybe even only a day). We don't serve updates directly from them (they're served off of archive.m.o), and partial mars are generated from MARs stored on archive.m.o as well.

Callek can probably talk more about langpacks.

Ok, yes .mar's and langpacks are only served from archive.m.o -- we have an expiration policy on archive.m.o for (partial) mar's that expires them pretty aggressively (I forget exactly how aggressive offhand).

Langpacks are built and then moved to archive.m.o along with Nightlies (and releases) and for releases also served on AMO.

(In reply to Edwin Takahashi (:egao) (PTO 07/20-08/20) from comment #6)

I took a quick stroll down to taskcluster/ci/partials/kind.yml and experimentally added:

expires-after: 3 days

The taskgraph generates but it seems that on try it is not possible to schedule partial tasks, so I can't verify if this actually works or not.

I'd caution against only 3days if target is all branches, on beta/release we can have up to just-slightly-over a week in theory, due to RC on monday and then in 8 days the formal release. So we shouldn't let artifacts from an in-progress release expire earlier than we're sure we need them for later tasks...

But for Nightly 3 days is probably fine.

Flags: needinfo?(bugspam.Callek)
Severity: -- → S3
Priority: -- → P3

(In reply to Justin Wood (:Callek) from comment #7)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #4)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)
The taskgraph generates but it seems that on try it is not possible to schedule partial tasks, so I can't verify if this actually works or not.

I'd caution against only 3days if target is all branches, on beta/release we can have up to just-slightly-over a week in theory, due to RC on monday and then in 8 days the formal release. So we shouldn't let artifacts from an in-progress release expire earlier than we're sure we need them for later tasks...

But for Nightly 3 days is probably fine.

Are you sure we need to keep the MARs on the tasks? Tasks like https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/J2dV2XJ0TZWHUUrTIHbK2w (a mozilla-beta partials task) use archive.mozilla.org for from_mar - not task references. As far as I can tell, there is absolutely no reason to keep MARs on tasks after a release has shipped.

Flags: needinfo?(bugspam.Callek)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #8)

(In reply to Justin Wood (:Callek) from comment #7)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #4)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)
The taskgraph generates but it seems that on try it is not possible to schedule partial tasks, so I can't verify if this actually works or not.

I'd caution against only 3days if target is all branches, on beta/release we can have up to just-slightly-over a week in theory, due to RC on monday and then in 8 days the formal release. So we shouldn't let artifacts from an in-progress release expire earlier than we're sure we need them for later tasks...

But for Nightly 3 days is probably fine.

Are you sure we need to keep the MARs on the tasks? Tasks like https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/J2dV2XJ0TZWHUUrTIHbK2w (a mozilla-beta partials task) use archive.mozilla.org for from_mar - not task references. As far as I can tell, there is absolutely no reason to keep MARs on tasks after a release has shipped.

Apologies, my "week" is primarily for "it takes us a week to ship (to users) from Build1's RC" piece, and not "We need the task artifacts after we finished a release" -- Though in thinking more, we actually take the mar files we need from the signing task which happens within a short timeframe so we don't really need the unsigned mars for long.

Flags: needinfo?(bugspam.Callek)

I think this might be a (partial?) dupe of https://bugzilla.mozilla.org/show_bug.cgi?id=1649987, which already has some work in progress.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.