1529764 - Backfill regresses/regressed-by fields for older bugs

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Reporter

Description

•

6 years ago

Once the Regresses/Regressed-By fields in bug 1461492 go live, we should clean up data in recent bugs so we can provide tools like BugBug with useful training data for spotting regressions, and do other analysis.

Going through 20 years of bugs is not feasible, but we should look at past year's regressions and get them updated.

There are ~5,800 regressions (https://mzl.la/2Xgy4e9) since 2018-01-01. ~4,000 of them have the 'blocks' field (which had been used as the regressed by field). So we could update those.

So questions:

How far back to clean up?
What about the existing regressions where we haven't found the change set that introduced the bug?

Flags: needinfo?(mcastelluccio)

Kohei Yoshino

Updated

•

6 years ago

Blocks: cleanup-bugzilla

Kohei Yoshino

Updated

•

6 years ago

No longer blocks: 1461492

Depends on: 1461492

Marco Castelluccio [:marco]

Assignee

Comment 1

•

6 years ago

Assigning to myself because I've been thinking about this and I'm still thinking about this.

We need be careful to only update old Regresses/Regressed-By when we are really sure because we don't want to introduce noise (for ML and other tools it's better to have missing data rather than wrong data).

Assignee: nobody → mcastelluccio

Status: NEW → ASSIGNED

Flags: needinfo?(mcastelluccio)

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1543876

Marco Castelluccio [:marco]

Assignee

Comment 2

•

6 years ago

A few ideas:

Parse mozregression comments - This is bound to be almost 100% correct.
Parse the uplift request comment - Given that the field in the template was "[Feature/Bug causing the regression]", we can't be sure that the bug ID here is actually a regressor, it could be the tracking bug for the feature.
Parse comments to find "caused by bug XXX" / "regressed by bug XXX" - Bound to be not always correct, e.g. could be part of a question "was this regressed by bug XXX?" or "no, this wasn't regressed by bug XXX!".
Changes to "blocks"/"blocked-by" at the same time as adding "regression" or setting "has_regression_range" to "yes" or removing "regressionwindow-wanted".
Using the SZZ algorithm (very far from being 100% accurate).

We can also mix some of these (e.g. parse mozregression comment and check that the mentioned bug is in "blocks"/"blocked_by").

I'll try to check how many links we'd generate by applying these rules over the past 1-2 years. Maybe it'll be possible to review them manually.

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Reporter

Updated

•

6 years ago

Summary: Clean up Regression Information → Backfill regresses/regressed-by fields for older bugs

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Reporter

Updated

•

5 years ago

Whiteboard: [october-2019-bmo-triage]

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Reporter

Comment 3

•

5 years ago

Marco, is this still needed?

Flags: needinfo?(mcastelluccio)

Marco Castelluccio [:marco]

Assignee

Comment 4

•

5 years ago

I still plan to do this, it's definitely not high priority though.

Flags: needinfo?(mcastelluccio)

Marco Castelluccio [:marco]

Assignee

Comment 5

•

3 years ago

(In reply to Marco Castelluccio [:marco] from comment #2)

A few ideas:

Parse mozregression comments - This is bound to be almost 100% correct.

Parse the uplift request comment - Given that the field in the template was "[Feature/Bug causing the regression]", we can't be sure that the bug ID here is actually a regressor, it could be the tracking bug for the feature.

Parse comments to find "caused by bug XXX" / "regressed by bug XXX" - Bound to be not always correct, e.g. could be part of a question "was this regressed by bug XXX?" or "no, this wasn't regressed by bug XXX!".

Changes to "blocks"/"blocked-by" at the same time as adding "regression" or setting "has_regression_range" to "yes" or removing "regressionwindow-wanted".

Using the SZZ algorithm (very far from being 100% accurate).

We can also mix some of these (e.g. parse mozregression comment and check that the mentioned bug is in "blocks"/"blocked_by").

I'll try to check how many links we'd generate by applying these rules over the past 1-2 years. Maybe it'll be possible to review them manually.

Additionally, we can parse the If not all supported branches, which bug introduced the flaw? from security request comments.

I've built a script to apply these rules to generate a list (bugs between 2017-04-29 and 2019-04-29), but it is pretty long (more than 1000 entries). I could go through them manually from time to time, but it's going to take a while (if I did 5 per day, it would take a year :D).

David Lawrence [:dkl]

Comment 6

•

3 years ago

(In reply to Marco Castelluccio [:marco] from comment #5)

Additionally, we can parse the If not all supported branches, which bug introduced the flaw? from security request comments.

I've built a script to apply these rules to generate a list (bugs between 2017-04-29 and 2019-04-29), but it is pretty long (more than 1000 entries). I could go through them manually from time to time, but it's going to take a while (if I did 5 per day, it would take a year :D).

If given the bug id list I can generate a script to run on the server to make the changes automatically. It could done in a way where emails are not sent out as well.

Marco Castelluccio [:marco]

Assignee

Comment 7

•

3 years ago

(In reply to David Lawrence [:dkl] from comment #6)

(In reply to Marco Castelluccio [:marco] from comment #5)

Additionally, we can parse the If not all supported branches, which bug introduced the flaw? from security request comments.

I've built a script to apply these rules to generate a list (bugs between 2017-04-29 and 2019-04-29), but it is pretty long (more than 1000 entries). I could go through them manually from time to time, but it's going to take a while (if I did 5 per day, it would take a year :D).

If given the bug id list I can generate a script to run on the server to make the changes automatically. It could done in a way where emails are not sent out as well.

Unfortunately not all the items in the list are correct, as those fields were all used inconsistently (except maybe the security form one, but it is a very small subset).

Bugzilla

Backfill regresses/regressed-by fields for older bugs

Categories

(bugzilla.mozilla.org :: Bulk Bug Edit Requests, task)

Tracking

()

People

(Reporter: emceeaich, Assigned: marco)

References

(Blocks 1 open bug)

Details

(Whiteboard: [october-2019-bmo-triage])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7