Open Bug 1633927 Opened 5 years ago Updated 5 years ago

54.02% build times (linux64) regression on push 655d98fff192e4733f3317e233efcc6193534872 (Fri April 24 2020)

Categories

(Testing :: General, defect, P3)

defect

Tracking

(firefox-esr68 unaffected, firefox75 unaffected, firefox76 unaffected, firefox77 fix-optional, firefox78 affected)

Tracking Status
firefox-esr68 --- unaffected
firefox75 --- unaffected
firefox76 --- unaffected
firefox77 --- fix-optional
firefox78 --- affected

People

(Reporter: marauder, Unassigned)

References

(Regression)

Details

(Keywords: perf-alert, regression)

Perfherder has detected a 2 performance regression from push 655d98fff192e4733f3317e233efcc6193534872. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

54% build times linux64 opt taskcluster-c5d.4xlarge valgrind 1,039.40 -> 1,600.85

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Component: Performance → DOM: Content Processes
Flags: needinfo?(nika)
Product: Testing → Core
Version: Version 3 → unspecified

Looking into the issue. Not sure yet how my patch could've caused the average build time to stabilize at the worst-case scenario.

I retriggered the build on that push and the one that follows. Both ended up with "normal" build times. So what happened is that the push in question changed some central headers, that triggered most things to rebuild because of cache misses. Then, something else happened: those builds were made less frequent. And the indirect result is that they get less cache hits because of that. That's something to keep in mind when we change task scheduling.

Joel, do you know what bug made the valgrind builds less frequent (that seems to have happened this week, but a quick bugzilla search didn't show anything)? (and what the right component would be for this bug)

Flags: needinfo?(nika) → needinfo?(jmaher)

check out bug 1621764, we reduced builds that don't run tests on autoland to be run every 10th push on autoland.

Flags: needinfo?(jmaher)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

check out bug 1621764, we reduced builds that don't run tests on autoland to be run every 10th push on autoland.

Except the valgrind builds are tests.

Component: DOM: Content Processes → General
Product: Core → Testing
Regressed by: 1621764
No longer regressed by: 1580565

there is nothing to change here? we build 90% less often, but increase our build by 50% runtime- I recommend wontfix, but open to hearing other suggestions.

Whatever policy applies to tests should be applied to valgrind. Not the policy for builds.

the criteria for running every 10th push is a way for all tasks to continue to find regressions before merging to m-c but for tasks that are a low risk of finding regressions. This build time regression is infrastructure only issue and not a regression that would cause or need to cause a backout to keep nightly green. Once a certain task yields a few unique regressions it becomes higher value and we want to ensure it runs more frequently. Often there are tasks that have no history of finding a unique regression (it can fail as part of a regression easily found by another platform or task) and we target those to reduce frequency.

Ideally we could separate the build from the tests so this confusion of what is a build or a test wouldn't happen as often

Severity: -- → S4
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.