Backfill regresses/regressed-by fields for older bugs
Categories
(bugzilla.mozilla.org :: Bulk Bug Edit Requests, task)
Tracking
()
People
(Reporter: emceeaich, Assigned: marco)
References
(Blocks 1 open bug)
Details
(Whiteboard: [october-2019-bmo-triage])
Once the Regresses/Regressed-By fields in bug 1461492 go live, we should clean up data in recent bugs so we can provide tools like BugBug with useful training data for spotting regressions, and do other analysis.
Going through 20 years of bugs is not feasible, but we should look at past year's regressions and get them updated.
There are ~5,800 regressions (https://mzl.la/2Xgy4e9) since 2018-01-01. ~4,000 of them have the 'blocks' field (which had been used as the regressed by field). So we could update those.
So questions:
- How far back to clean up?
- What about the existing regressions where we haven't found the change set that introduced the bug?
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
Assigning to myself because I've been thinking about this and I'm still thinking about this.
We need be careful to only update old Regresses/Regressed-By when we are really sure because we don't want to introduce noise (for ML and other tools it's better to have missing data rather than wrong data).
Assignee | ||
Comment 2•6 years ago
|
||
A few ideas:
- Parse mozregression comments - This is bound to be almost 100% correct.
- Parse the uplift request comment - Given that the field in the template was "[Feature/Bug causing the regression]", we can't be sure that the bug ID here is actually a regressor, it could be the tracking bug for the feature.
- Parse comments to find "caused by bug XXX" / "regressed by bug XXX" - Bound to be not always correct, e.g. could be part of a question "was this regressed by bug XXX?" or "no, this wasn't regressed by bug XXX!".
- Changes to "blocks"/"blocked-by" at the same time as adding "regression" or setting "has_regression_range" to "yes" or removing "regressionwindow-wanted".
- Using the SZZ algorithm (very far from being 100% accurate).
We can also mix some of these (e.g. parse mozregression comment and check that the mentioned bug is in "blocks"/"blocked_by").
I'll try to check how many links we'd generate by applying these rules over the past 1-2 years. Maybe it'll be possible to review them manually.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
I still plan to do this, it's definitely not high priority though.
Assignee | ||
Comment 5•3 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #2)
A few ideas:
- Parse mozregression comments - This is bound to be almost 100% correct.
- Parse the uplift request comment - Given that the field in the template was "[Feature/Bug causing the regression]", we can't be sure that the bug ID here is actually a regressor, it could be the tracking bug for the feature.
- Parse comments to find "caused by bug XXX" / "regressed by bug XXX" - Bound to be not always correct, e.g. could be part of a question "was this regressed by bug XXX?" or "no, this wasn't regressed by bug XXX!".
- Changes to "blocks"/"blocked-by" at the same time as adding "regression" or setting "has_regression_range" to "yes" or removing "regressionwindow-wanted".
- Using the SZZ algorithm (very far from being 100% accurate).
We can also mix some of these (e.g. parse mozregression comment and check that the mentioned bug is in "blocks"/"blocked_by").
I'll try to check how many links we'd generate by applying these rules over the past 1-2 years. Maybe it'll be possible to review them manually.
Additionally, we can parse the If not all supported branches, which bug introduced the flaw?
from security request comments.
I've built a script to apply these rules to generate a list (bugs between 2017-04-29 and 2019-04-29), but it is pretty long (more than 1000 entries). I could go through them manually from time to time, but it's going to take a while (if I did 5 per day, it would take a year :D).
Comment 6•3 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #5)
Additionally, we can parse the
If not all supported branches, which bug introduced the flaw?
from security request comments.I've built a script to apply these rules to generate a list (bugs between 2017-04-29 and 2019-04-29), but it is pretty long (more than 1000 entries). I could go through them manually from time to time, but it's going to take a while (if I did 5 per day, it would take a year :D).
If given the bug id list I can generate a script to run on the server to make the changes automatically. It could done in a way where emails are not sent out as well.
Assignee | ||
Comment 7•3 years ago
|
||
(In reply to David Lawrence [:dkl] from comment #6)
(In reply to Marco Castelluccio [:marco] from comment #5)
Additionally, we can parse the
If not all supported branches, which bug introduced the flaw?
from security request comments.I've built a script to apply these rules to generate a list (bugs between 2017-04-29 and 2019-04-29), but it is pretty long (more than 1000 entries). I could go through them manually from time to time, but it's going to take a while (if I did 5 per day, it would take a year :D).
If given the bug id list I can generate a script to run on the server to make the changes automatically. It could done in a way where emails are not sent out as well.
Unfortunately not all the items in the list are correct, as those fields were all used inconsistently (except maybe the security form one, but it is a very small subset).
Description
•