adjust Treeherder FailureSummaryTab bug SuggestionList to only show simplified intermittent bugs instead of all related bugs
Categories
(Tree Management :: Treeherder: Frontend, task)
Tracking
(Not tracked)
People
(Reporter: jmaher, Assigned: jmaher)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
We are working to simplify the bugs that we have to track intermittent failures to be per test case, not per error.
As we will be starting this with a subset of tests, we could change this at any point of time. In an ideal situation we would need a simple change to treeherder and it would work with the existing code and then new bugs as they are available.
Keep in mind, there is a cycle to ingest the bugs (I think hourly, but lets say 24 hours), and then there is a bugscache that treeherder has which lasts for 24 hours. So any change made to bugs will take from an hour to 48 hours to be seen by the sheriffs.
I did a hack locally:
diff --git a/ui/shared/tabs/failureSummary/FailureSummaryTab.jsx b/ui/shared/tabs/failureSummary/FailureSummaryTab.jsx
index 7df2187df..df8538a95 100644
--- a/ui/shared/tabs/failureSummary/FailureSummaryTab.jsx
+++ b/ui/shared/tabs/failureSummary/FailureSummaryTab.jsx
@@ -74,6 +74,35 @@ class FailureSummaryTab extends React.Component {
}
BugSuggestionsModel.get(selectedJob.id).then(async (suggestions) => {
suggestions.forEach((suggestion) => {
+ let simpleCase = [];
+ if (suggestion.bugs.open_recent.length > 0) {
+ suggestion.bugs.open_recent.forEach((bug) => {
+ if (bug.summary.endsWith('single tracking bug')) {
+ simpleCase = [bug];
+ }
+ });
+ }
+ if (suggestion.bugs.all_others.length > 0) {
+ suggestion.bugs.all_others.forEach((bug) => {
+ if (bug.summary.endsWith('single tracking bug')) {
+ simpleCase = [bug];
+ }
+ });
+ }
+
+ // HACK: use the simple case if found.
+ if (simpleCase.length > 0) {
+ suggestion.bugs.open_recent = simpleCase;
+
+ // HACK: remove the error message from the error line to avoid confusion
+ suggestion.search =
+ suggestion.search.split(suggestion.path_end)[0] +
+ suggestion.path_end;
+
+ // HACK: remove any other bugs, keep this simple.
+ suggestion.bugs.all_others = [];
+ }
+
suggestion.bugs.too_many_open_recent =
suggestion.bugs.open_recent.length > thBugSuggestionLimit;
suggestion.bugs.too_many_all_others =
this is a bit hacky, but I imagine it would be better to make this a bit more robust. I am not sure if this is the right place to edit the hack. All the queries and matching is done in the backend in python which I had trouble setting up locally.
I am attaching a screenshot of what this looks like when running on localhost.
Assignee | ||
Comment 1•3 years ago
|
||
:aryx and :marco, do you have thoughts on this?
Comment 2•3 years ago
|
||
Comment 3•3 years ago
|
||
The hack sounds reasonable to me, this way we can easily remove it if the experiment doesn't bear the expected fruit.
A question: does this exclude timeout failures?
Assignee | ||
Comment 4•3 years ago
|
||
this hack only looks for bugs with the summary ending with single tracking bug
and of course matching the test name. If those conditions are not met, then nothing changes within treeherder. To avoid that, we would need to adjust the hack to only affect error lines starting with TEST-UNEXPECTED-FAIL
- to do this in the front end code, we would need to examine the suggestion.search field and flag for keywords like crash, leak, timeout.
If we file a bug for a test case and that test case has TEST-UNEXPECTED-FAIL
and TEST-TIMEOUT
conditions, we probably would match the new failure. Is that so bad? I see a few scenarios as end results:
- we get too many
TEST-UNEXPECTED-FAIL
but slightly higher numbers due to extraTEST-TIMEOUT
instances and disable the test - this is what would normally happen - we get too many
TEST-TIMEOUT
instances and disable the test, but it would be disabled faster due to the higher numbers by adding inTEST-UNEXPECTED-FAIL
.
In summary, we will have more tests be flagged for attention or disabling due to aggregating all failures. Ideally we will have tools/data for developers to see the types of failures and where they occur so debugging can take place.
Comment 5•3 years ago
|
||
Code looks reasonable. Please be aware it's TEST-UNEXPECTED-TIMEOUT
and let me know if the backend shall be changed to handle test failure and timeout differently.
Assignee | ||
Comment 6•3 years ago
|
||
I don't see a need to change the backend, longer term we will want to do that, but for this I don't see a need to do it.
Let me see if I can update this to do the old way for TIMEOUT|CRASH|LEAK
Assignee | ||
Comment 7•3 years ago
|
||
deployed to treeherder production
Description
•