<a class="header-button" href="https://bugzilla-dev.allizom.org/home" title="Go to home page"> Bugzilla

Reporter

Updated

•

4 years ago

Flags: needinfo?(nfroyd)

Flags: needinfo?(mcastelluccio)

Flags: needinfo?(jmaher)

Flags: needinfo?(jkratzer)

Flags: needinfo?(jdemooij)

Flags: needinfo?(james)

Flags: needinfo?(coop)

Flags: needinfo?(choller)

Flags: needinfo?(ahal)

Christian Holler (:decoder)

Reporter

Updated

•

4 years ago

Flags: needinfo?(snorp)

Flags: needinfo?(gbrown)

Nathan Froyd [:froydnj]

Comment 1

•

4 years ago

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

The list for sign off:

"sccache.log" => Nathan

r+

For the crashreporter files, do we care about saving some of them for debugging builds identified with mozregression?

Flags: needinfo?(nfroyd)

Comment 2

•

4 years ago

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

"target.crashreporter-symbols.zip", "target.crashreporter-symbols-full.zip", "target.gtest.tests.tar.gz" => Decoder / Jason

I can only speak for FUZZING builds here:

We have disabled the crashreporter already on the fuzzing debug build so I don't think we have any fuzzing builds left that produce this artifact. Deleting the existing ones that are older than 2 months should be fine.
The gtest archive is used for fuzzing and we need it for bisecting bugs using these fuzzing targets. However, 2 months might be sufficient for this purpose.

Outside of fuzzing, I can't judge if and how much history we need for these files. My guess is, for gtests we don't need it at all, but we might need the crashreporter symbols for mozregression or similar things.

James Graham [:jgraham]

Comment 3

•

4 years ago

"wpt_raw.log" "wptreport.json" => James

r+ At some future time we might want to keep wpt_raw.log for less time and wptreport.json for more time, but that depends on other discussions and is compatible with this change.

Flags: needinfo?(james)

Geoff Brown [:gbrown]

Comment 4

•

4 years ago

"logcat-emulator-5554.log" => gbrown/snorp

For android tests, this log complements the test log (live.log/live_backing.log) and is important for debugging test failures; I'd say it should be kept as long as the test log is kept. But snorp should make the final decision here.

Flags: needinfo?(gbrown)

Andrew Halberstadt [:ahal]

Comment 5

•

4 years ago

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

The list for sign off:

"full-task-graph.json" => ahal

I think it should be fine, though there could be uses of it that I'm unaware of. So adding Tom Prince for those.

Flags: needinfo?(ahal) → needinfo?(mozilla)

Jason Kratzer [:jkratzer]

Comment 6

•

4 years ago

Due to https://github.com/taskcluster/taskcluster/issues/3183 it would be great to delete these artifacts by setting their expiration date in the DB. After the migration on the 18th, that should be relatively easy with some simple update queries in the DB. With that done, the queue service will delete the objects from S3 on its next run of the artifact-expiration job.

Comment 7

•

4 years ago

(In reply to Christian Holler (:decoder) from comment #2)

We have disabled the crashreporter already on the fuzzing debug build so I don't think we have any fuzzing builds left that produce this artifact. Deleting the existing ones that are older than 2 months should be fine.

I suggest we keep the crashreporter symbols for fuzzing debug builds. If we delete these for builds > 2 months old, we're unable to bisect beyond that range. And since we're no longer producing these artifacts, they'll eventually phase out on their own.

Flags: needinfo?(jkratzer)

Jan de Mooij [:jandem]

Comment 8

•

4 years ago

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

"jsreftest_raw.log" => Jandem

r+

Flags: needinfo?(jdemooij)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 9

•

4 years ago

r+ for logcats -- usually we need to access that when investigating intermittents, but we should have more recent instances to draw from.

Flags: needinfo?(snorp)

Comment 10

•

4 years ago

I see "target.gtest.tests.tar.gz" on the list above, but we should consider other test packages:
target.mochitest.tests.tar.gz
target.web-platform.tests.tar.gz
target.reftest.tests.tar.gz
target.cppunittest.tests.tar.gz
jsapi-tests
target.common.tests.tar.gz
js
geckoview-androidTest.apk
target.jsshell.zip
target.talos.tests.tar.gz
target.raptor.tests.tar.gz
target.xpcshell.tests.tar.gz
target.condprof.tests.tar.gz
target.updater-dep.tests.tar.gz
target.jsreftest.tests.tar.gz

maybe not all of these are subject to a shorter retention policy, but I imagine all of them would be. Just looking at all these test zips, we have >375TB of storage prior to May 14th.

Another thing to consider in addition to _raw.log is live_backing.log. We should understand the difference of these as I think they are very similar.

Nathan Froyd [:froydnj]

Comment 11

•

4 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #10)

I see "target.gtest.tests.tar.gz" on the list above, but we should consider other test packages:
jsapi-tests
js

Bug 1649757 stopped uploading these for non-sanitizer/fuzzing builds; we should delete old copies not associated with sanitizer/fuzzing builds.

Comment 12

•

4 years ago

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

"code-coverage-jsvm.zip", "code-coverage-grcov.zip" => marco

The code coverage bot ingests these artifacts a few hours after they are generated (sometimes there could be a delay due to Pulse messages, but it's always within a couple of days).
We have used them in the past to reingest data, but it is super rare (I think we did it just once).

So, in summary, we could keep them for even less than two months.

Flags: needinfo?(mcastelluccio)

Reporter

Comment 13

•

4 years ago

I see "target.gtest.tests.tar.gz" on the list above, but we should consider other test packages:

If I add these files, we are 627.95tb.
Do you think I could go ahead wit these files too ?

Comment 14

•

4 years ago

I would argue so. I think they are less useful than logs which are less useful than builds.

Reporter

Comment 15

•

4 years ago

After the migration on the 18th, that should be relatively easy with some simple update queries in the DB

Is there a documentation on how to do it?
thanks

Flags: needinfo?(dustin)

Reporter

Updated

•

4 years ago

Comment 15 is private: false

Reporter

Comment 16

•

4 years ago

Thanks Joel, I updated the query here:
https://github.com/mozilla-releng/ci-cost-queries/commit/fd7e71e11754ecdf96e8804ce6dd55092160cad1

Chris Cooper [:coop] (he/him)

Comment 17

•

4 years ago

Is there a documentation on how to do it?
thanks

Nope, this definitely a void-the-warranty kind of thing.

Flags: needinfo?(dustin)

Kyle Lahnakoski [:ekyle]

Comment 18

•

4 years ago

ActiveData reads the TC artifacts within a few hours, most of the time. There are occasional backfills that read month old information.

Comment 19

•

4 years ago

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

and the query to retrieve the list of keys should be:

SELECT
key
FROM
  `task-inventory-test.inv_20200630.inventory`
WHERE
    REGEXP_EXTRACT(key, r'.*/(.*)$') IN ("wpt_raw.log", "wptreport.json", "sccache.log", "logcat-emulator-5554.log", "reftest_raw.log", "jsreftest_raw.log", "full-task-graph.json", "target.crashreporter-symbols.zip", "target.crashreporter-symbols-full.zip", "target.gtest.tests.tar.gz", "code-coverage-jsvm.zip", "code-coverage-grcov.zip")
AND
    mtime <= '2020-05-14'

Query looks fine.

Flags: needinfo?(coop)

Comment 20

•

4 years ago

from what I can tell reftest_raw.log (and other <test>_raw.log) can be purged after ~3 weeks, so we are ok with a 2 month retention on those.

Flags: needinfo?(jmaher)

Tom Prince [:tomprince]

Comment 21

•

4 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #5)

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

The list for sign off:

"full-task-graph.json" => ahal

I think it should be fine, though there could be uses of it that I'm unaware of. So adding Tom Prince for those.

I'm not aware of anything out-of-tree looking at this (and my searches of the obvious places it might happen didn't turn up anything); though :marco was considering using them, I think.

The main impact in-tree of removing this would that we could no longer trigger tasks on old pushes. I don't know how common this is, though I've heard people doing this anecdotally, potentially quite a ways back in history.

It would be fairly easy, I think (given that there does not appear to be out-of-tree consumers), to switch to compressing the artifact going forward.

Flags: needinfo?(mozilla) → needinfo?(mcastelluccio)

Comment 22

•

4 years ago

(In reply to Tom Prince [:tomprince] from comment #21)

(In reply to Andrew Halberstadt [:ahal] from comment #5)

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

The list for sign off:

"full-task-graph.json" => ahal

I think it should be fine, though there could be uses of it that I'm unaware of. So adding Tom Prince for those.

I'm not aware of anything out-of-tree looking at this (and my searches of the obvious places it might happen didn't turn up anything); though :marco was considering using them, I think.

The main impact in-tree of removing this would that we could no longer trigger tasks on old pushes. I don't know how common this is, though I've heard people doing this anecdotally, potentially quite a ways back in history.

It would be fairly easy, I think (given that there does not appear to be out-of-tree consumers), to switch to compressing the artifact going forward.

I was thinking of using it to detect which tasks are manifest-level, but I could (and should, since it'd be slow to load the artifact for a large number of pushes) find another way.

Flags: needinfo?(mcastelluccio)

Updated

•

4 years ago

Assignee: nobody → sledru

Reporter

Comment 23

•

4 years ago

I won't have time to work on it.
Joel is aware

Assignee: sledru → nobody

Updated

•

4 years ago

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → INCOMPLETE

Christian Holler (:decoder)

Reporter

Comment 24

•

3 years ago

We will probably work on this at some point this year

Status: RESOLVED → REOPENED

Resolution: INCOMPLETE → ---

Updated

•

3 years ago

Flags: needinfo?(choller)

Comment 25

•

3 years ago

:mostlygeek asked me to add some more info to this bug:

The taskcluster k8s job to expire artifacts (on firefoxcitc, anyway) does not appear to be working. It's a k8s CronJob that runs every night at midnight (UTC). It's configured to kill itself just under 24 hours, so it doesn't collide with the next run of the job (at least that's my take on this discussion https://bugzilla.mozilla.org/show_bug.cgi?id=1638921).
It does indeed run for 24 hours, then dies.
The only log messages emitted are Type: db-pool-counts with pool: write and pool: read (one each per minute). No other logs.

Comment 26

•

3 years ago

I ran this query against firefoxcitc's postgres database:
select date_trunc('day', expires) as expires_day, count(*) from queue_artifacts group by date_trunc('day', expires);
(which seems to be the updated version of the query Dustin and Brian were using in https://bugzilla.mozilla.org/show_bug.cgi?id=1638921)

There are 586,191,940 artifacts with an expiration date of Feb 28, 2022 or prior. Those should all have been deleted, I believe.
There are 282,490,852 artifacts with an expiration date in the future.

Comment 27

•

3 years ago

to summarize- the cron job that deletes old artifacts and terminates after 24 hours, has not had enough runtime to deleted the 586 million artifacts? Is it possible to run the script multiple times or hack it to run to completion?

Comment 28

•

3 years ago

No idea what's actually happening here. Logs aren't telling me anything. Could be the query isn't working at all. Or it is working and the deleting-from-s3 isn't? Or both?

Comment 29

•

3 years ago

It looks like the query (https://github.com/taskcluster/taskcluster/blob/66941613e4242d69dd2aff3fb560359eb633d59c/db/versions/0057.yml ?) has to do a full table scan for each part? I'm not really sure which variables are being passed in. But a full table scan takes about 30 minutes per query at the current size.
Here's what I see in the database at the moment:

taskcluster=> SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '1 minutes';
   pid   |    duration     |                                                                                        query                                                                                         | state
---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------
 1102833 | 00:18:18.235286 | select * from "get_queue_artifacts_paginated"(task_id_in => $1,run_id_in => $2,expires_in => $3,page_size_in => $4,after_task_id_in => $5,after_run_id_in => $6,after_name_in => $7) | active
(1 row)

So that one query (returning 100 rows I think?) is taking over 19 minutes (and counting). [EDIT: it finished after 21 minutes]

Comment 30

•

3 years ago

I think the above query can be boiled down to:
select task_id, run_id, name, storage_type, content_type, details, present, expires from queue_artifacts where expires < '2022-02-28 00:00:00' order by task_id, run_id, name limit 10;
which takes about 20 minutes.

An index may help here, although it would need testing/EXPLAINing on a realistic table. Something like:
CREATE INDEX "expiry_date_idx" ON "queue_artifacts" ("expires", "task_id", "run_id") (or however taskcluster does DB migrations...)

Also, given that most of the database is currently "expired", perhaps dropping the expires in there WHERE clause and just ignoring records with a future expires date in the code would be easier?

Comment 31

•

3 years ago

I imagine the expires is important; could we maybe run this on expires < '2021-10-01'; and then keep moving a month forward until we catch up?

Comment 32

•

3 years ago

Doesn't matter what date you put in the expires, it'll have to do a full table scan first to satisfy the order by clause at the end. And if you remove the order by then you break how taskcluster intends to paginate these results.

Comment 33

•

3 years ago

I'm not sure an index would help here - at best it might create a temporary table based on that index, sort it by taskId, and then begin returning results -- still not performant!

Also, given that most of the database is currently "expired", perhaps dropping the expires in there WHERE clause and just ignoring records with a future expires date in the code would be easier?

I think that is basically what the DB is doing now; moving that logic to JS wouldn't change it. I suspect that it's spending most of its time finding the first entry that it can delete, and repeating that work every day.

Also, note that this isn't just delete from .. where .. -- the critical bit is deleting the data from S3, as that's what costs $$!

I suspect that the underlying problem is that artifacts are being created more quickly than they can be deleted. The expiration code is in https://github.com/taskcluster/taskcluster/blob/main/services/queue/src/utils.js. It's basically operating in batches of 100, with each of those 100 handled in parallel. For each one, it first deletes the object from S3, and then if successful makes a DELETE query to the DB. The pagination should be using the (taskId, runId, name) index (the primary key) to find the next batch immediately, rather than scanning from the beginning of the table.

So I think this will need some measurement and observation, maybe by adding some logging. It'd be interesting to know:

how many batches it gets through in 24h
how long it takes to return each batch of ID's, and whether the batches after the first are returned more quickly than the first
how long it takes to delete each object
how long it takes to delete each row from the DB

I suspect all of that could be made pretty clear by adding some temporary debugging info, making a docker build, and then temporarily changing the image property of just the k8s CronTask to point to that docker build (so: no need to do a TC release containing the extra logging).

Comment 34

•

3 years ago

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #33)

I'm not sure an index would help here - at best it might create a temporary table based on that index, sort it by taskId, and then begin returning results -- still not performant!

Maybe adding an index across expires, task_id, run_id, name AND adding expires to the beginning of the ORDER BY would help? Shrug.

Also, given that most of the database is currently "expired", perhaps dropping the expires in there WHERE clause and just ignoring records with a future expires date in the code would be easier?

I think that is basically what the DB is doing now; moving that logic to JS wouldn't change it. I suspect that it's spending most of its time finding the first entry that it can delete, and repeating that work every day.

Unfortunately it looks like each time it does a query it takes 20 minutes (based on me watching three of the queries this afternoon). Not just the first one but each one. Limiting the total number of batches to maybe 72 per 24 hours? Which would delete only 7,200 records per day.
In S3 I see around 9k-15k deletes total for the firefoxcitc bucket, which is close-ish to that number.

With the expires in the where clause, it seems to do a full table scan each time before sorting.
Without the expires, then each query should be able to use the index. The JS will essentially end up doing a full table scan, but only one. Instead of one per batch.

But yeah, I'm guessing. Testing would be great. And the logging you mention!

Aki Sasaki (not active)

Comment 35

•

3 years ago

Would a multi-task approach help here?

Query task: DB query for expired artifacts after time t, publishes set of S3 artifacts in some format. Yaml, maybe.

-> S3 deletion tasks: query task fans out to multiple chunked tasks that concurrently delete those artifacts from S3, publishes set of artifacts deleted. (This may be dependent on how much concurrency AWS will allow without throttling?)

DB cleanup task: removes those artifacts from the above fanned-out set of tasks from the DB. (This could also delete from the DB after the first task, as long as we have confidence that the fanned-out tasks will do the S3 deletion properly. If we have the ability to read the artifacts from the query task and S3 deletion tasks and find which artifacts haven't been deleted successfully, we could keep running that until success.

Also wondering if we can mark the artifact as expired on S3 rather than deleting, to speed things up.

I'm not an expert here though; I'm just wondering if patterns from other releng automation might help here.

Comment 36

•

3 years ago

Unfortunately it looks like each time it does a query it takes 20 minutes

OK, that points pretty strongly toward a bug in the pagination, then. We had a few of those due to typos in the SQL functions, within the migration functions (migration functions do a similar kind of pagination). This is a difficult thing to test for since the behavior looks the same whether it does a full scan or starts in the middle; and with any reasonable amount of data even the timing differences are miniscule. So it's the kind of bug that could easily have snuck through.

The function is defined here.

Chris, do you want to do an EXPLAIN on that query, passing values for the after_* functions ("abc', 0, "foo" should do -- the values aren't part of the planning) as well as expires (but not task_id or run_id)? It's possible that the cardinalities are such that postgres is choosing a table scan because it knows most of the rows are expired, but doesn't know that the first couple of million aren't. But it's also possible the EXPLAIN will reveal some other error.

Comment 37

•

3 years ago

taskcluster=> explain select task_id, run_id, name, storage_type, content_type, details, present, expires from queue_artifacts
        where
          (queue_artifacts.expires < '2022-02-28 00:00:00' ) and
            (queue_artifacts.task_id > 'abc' or
              (queue_artifacts.task_id = 'abc' and
                (queue_artifacts.run_id > 0 or
                  (queue_artifacts.run_id = 0 and
                    queue_artifacts.name > 'foo'
                  )
                )
              )
            )
        order by queue_artifacts.task_id, queue_artifacts.run_id, queue_artifacts.name
limit 10;
                                                                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.82..57.51 rows=10 width=226)
   ->  Index Scan using queue_artifacts_pkey on queue_artifacts  (cost=0.82..3469493153.92 rows=612072520 width=226)
         Filter: ((expires < '2022-02-28 00:00:00+00'::timestamp with time zone) AND ((task_id > 'abc'::text) OR ((task_id = 'abc'::text) AND ((run_id > 0) OR ((run_id = 0) AND (name > 'foo'::text))))))
(3 rows)

So I guess not a table scan, but it does have to evaluate 612,072,520 rows?

Comment 38

•

3 years ago

I also suspect that increasing the instance's memory significantly will speed up these operations, but that involves a restart so it's tough to experiment with...

Comment 39

•

3 years ago

Hm, I think that the expectation is that the Index Scan would list an index condition -- one it can use to jump to a location in the index. Let me play around a bit..

Comment 40

•

3 years ago

Attached image Database load (IO Wait) last 24 hours (deleted) — Details

I suspect Dustin is right in that the pagination isn't performing well. This is the last 24 hours and you can see the IO Wait go from [not much] right around 5pm Pacific (which is when the old pod is killed and the new one takes over) to increasing amounts over the day. And you can see the number of batches per hour drop as each query takes longer and longer.
Looking at the query timings in trace, the ones I can see start at just over 9 minutes around 8PM and increase fairly linearly to just over 20 minutes by 5pm the next day

Comment 41

•

3 years ago

We do a similar thing here:

        select task_id
        from tasks
        where
          (state_in ->> 'task_id' is null or task_id > state_in ->> 'task_id') and
          task_queue_id is null
        order by task_id
        limit batch_size_in

in a migration script, and it was a fair bit of debugging to make sure that was using an index scan and condition as well. The issue turned out to be a typo!

I can confirm:

                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..4.52 rows=10 width=173)
   ->  Index Scan using queue_artifacts_pkey on queue_artifacts  (cost=0.15..54.32 rows=124 width=173)
         Filter: ((task_id >= 'abc'::text) OR ((task_id = 'abc'::text) AND ((run_id >= 0) OR ((run_id = 0) AND (name >= 'foo'::text)))))
(3 rows)

but simplifying, it switches to an Index Cond when just specifying the taskId. Anything more complex (such as dropping the expires expression but keeping task_id, run_id, and name) still uses Filter

                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..3.90 rows=10 width=173)
   ->  Index Scan using queue_artifacts_pkey on queue_artifacts  (cost=0.15..46.30 rows=123 width=173)
         Index Cond: (task_id > 'abc'::text)
(3 rows)

in fact, it's happy to do both:

                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..11.48 rows=10 width=173)
   ->  Index Scan using queue_artifacts_pkey on queue_artifacts  (cost=0.15..46.61 rows=41 width=173)
         Index Cond: (task_id > 'abc'::text)
         Filter: (expires < '2022-02-28 00:00:00+00'::timestamp with time zone)
(4 rows)

(which means, use the index to find the task after 'abc', then scan looking for rows matching the filter)

So, it seems like this multi-key index expression is just not something the query planner can recognize. The docs for this are here and suggest that it only really understands conditionals joined with AND at the top level, whereas this one has an OR at the top level. So, let's try a slightly longer, and somewhat redundant, WHERE condition:

explain select task_id, run_id, name, storage_type, content_type, details, present, expires from queue_artifacts
        where
            queue_artifacts.task_id >= 'abc' and
            (queue_artifacts.task_id > 'abc' or
              (queue_artifacts.task_id = 'abc' and
                (queue_artifacts.run_id > 0 or
                  (queue_artifacts.run_id = 0 and                                                                                                                                                                                                                                                                              
                    queue_artifacts.name > 'foo'
                  )
                )
              )
            )
        order by queue_artifacts.task_id, queue_artifacts.run_id, queue_artifacts.name
limit 10;

 Limit  (cost=0.15..11.78 rows=10 width=173)
   ->  Index Scan using queue_artifacts_pkey on queue_artifacts  (cost=0.15..47.84 rows=41 width=173)
         Index Cond: (task_id >= 'abc'::text)
         Filter: ((task_id > 'abc'::text) OR ((task_id = 'abc'::text) AND ((run_id > 0) OR ((run_id = 0) AND (name > 'foo'::text)))))

that is what the explain should be! The Index Cond will get to the right task using the index (which is close enough), and the filter will make sure we don't return any undesirable rows in the subsequent scan.

Comment 42

•

3 years ago

We can check that this worked by patching the get_queue_artifacts_paginated stored function in-place, if you're comfortable doing so:

         where
           (queue_artifacts.task_id = task_id_in or task_id_in is null) and
           (queue_artifacts.run_id = run_id_in or run_id_in is null) and
           (queue_artifacts.expires < expires_in or expires_in is null) and
           (after_task_id_in is null or
+            (queue_artifacts.task_id >= after_task_id_in and
             (queue_artifacts.task_id > after_task_id_in or
               (queue_artifacts.task_id = after_task_id_in and
                 (queue_artifacts.run_id > after_run_id_in or
                   (queue_artifacts.run_id = after_run_id_in and
                     queue_artifacts.name > after_name_in
                   )
                 )
               )
             )
+            )
           )

If not (and patching DB functions in prod is not easy, so that's reasonably), perhaps someone on the TC team can make a new DB revision, test, release, and update. I'll be happy to review.

Benson Wong [:mostlygeek]

Comment 43

•

3 years ago

Great!
I'm going to say that one of the devs should do it and go through the release process. Thankfully, we're currently working on streamlining that process, so hopefully the wait won't be long!
NI :mostlygeek for assignment?/prioritization?/workflow? whatever you need to do...

Flags: needinfo?(bwong)

Comment 44

•

3 years ago

Following up on this. We discussed this in the Firefox-CI meeting today. We'll look at getting this situation resolved in our next sprint/s. We'll start by defining what "done" looks like here.

Flags: needinfo?(bwong)

Yarik Kurmyza [:yarik] (he/him) (UTC+1)

Comment 45

•

3 years ago

Adding fix here https://github.com/taskcluster/taskcluster/pull/5268
Tested migrations locally and tested this query manually on staging, it is working (although on a way smaller set).

Comment 46

•

2 years ago

As we are working on expiring these artifacts sooner from now on (bug 1804446), the benefit of doing a cleanup will fade over time.
If nobody can work on the cleanup soon, I'd say we should WONTFIX this.

We should have a periodic run of the query though, to make sure we don't regress with other artifacts that could expire sooner.

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1804446

Updated

•

2 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1826417

Comment 47

•

1 years ago

This has been done recently (https://mozilla-hub.atlassian.net/browse/OPST-701).

Status: REOPENED → RESOLVED

Closed: 4 years ago → 1 years ago

Resolution: --- → FIXED