801955 - Run Valgrind builds on all mozilla-central-based trees (inbound, try, fx-team, etc)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Description

•

12 years ago

We should adapt Valgrind tbpl builds to run on other branches. Some that I can think of, off the top of my head, include: * mozilla-inbound * fx-team * services-central and possibly: * mozilla-aurora * mozilla-beta * mozilla-release (any future ESR branch)

Ed Morley [:emorley]

Comment 1

•

12 years ago

I think we should stick to trunk for now - given that unless we backport fixes, the Valgrind runs on them are going to be red until everything works its way through the trains. At the minimum we should run them on: * mozilla-central * mozilla-inbound * fx-team * try ...but I suppose we could enable on all trunk repos if the load wasn't an issue. Also, to reduce load, we could enable them on Try, but not include in the default 'all' TryChooser group, meaning people have to request them specifically (if this is easily doable). Adjusting summary to mention Try (also, TBPL is just the dashboard, it's the buildbot scheduling that needs to be adjusted).

Blocks: 800435

Summary: Adapt Valgrind tbpl builds to run on other branches, e.g. mozilla-inbound, fx-team etc. → Run Valgrind builds on more trunk trees + Try

Phil Ringnalda (:philor)

Comment 2

•

12 years ago

You mean you Need a way to exclude jobs from Trychooser's "all"? Bug 691177.

Ed Morley [:emorley]

Comment 3

•

12 years ago

Ah yes that was the bug (I'm even CCed) :-)

Nicholas Nethercote [inactive]

Comment 4

•

12 years ago

Enabling on mozilla-inbound seems like the most important of those cases in comment 1. As for enabling it on try... I'm trying to imagine the use case whereby someone would decide that their patch needs to enable the Valgrind testing. It wouldn't hurt to have, but I suspect it wouldn't get used much.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 5

•

12 years ago

Also, bug 631838 is the bug to get Valgrind builds on Try.

Ed Morley [:emorley]

Comment 6

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #4) > As for enabling it on try... I'm trying to imagine the use case > whereby someone would decide that their patch needs to enable the Valgrind > testing. It wouldn't hurt to have, but I suspect it wouldn't get used much. The use-case is "something in the last 24 hours turned Valgrind red, but we don't know what and have no way to run on Try". Running on Try is normally a prerequisite for unhiding things on TBPL (though as mentioned in bug 800435, the suppression file does make this slightly different; though without Try and thus a simple way to bisect, I imagine new suppressed failures will just end of being filed and ignored).

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 7

•

12 years ago

> I imagine new suppressed failures will just end of being filed and ignored). We track them in bug 793882.

Ed Morley [:emorley]

Comment 8

•

12 years ago

(In reply to Gary Kwong [:gkw, :nth10sd] from comment #7) > > I imagine new suppressed failures will just end of being filed and ignored). > > We track them in bug 793882. Yeah I realise - and you've done an amazing job of getting everything filed and suppressed so we have Valgrind green - but I'm just wondering how sustainable it is for only a couple of people to have a working Valgrind environment to determine regression-ranges in the future? It just seems like running on Try would alleviate some of the load on you...

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 9

•

12 years ago

> Yeah I realise - and you've done an amazing job of getting everything filed > and suppressed so we have Valgrind green Thank you! I wouldn't have been able to do it without help from you and some of the other folks around here though. It's a team effort. > but I'm just wondering how > sustainable it is for only a couple of people to have a working Valgrind > environment to determine regression-ranges in the future? It just seems like > running on Try would alleviate some of the load on you... Yes, I agree. Now if only someone could make some magic and help us put Valgrind on Try.. :)

Nicholas Nethercote [inactive]

Comment 10

•

12 years ago

> The use-case is "something in the last 24 hours turned Valgrind red, but we > don't know what and have no way to run on Try". > > Running on Try is normally a prerequisite for unhiding things on TBPL Gotcha. Thanks!

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 11

•

12 years ago

Clarifying summary. This bug is the major blocker that blocks unhiding a green tree on TBPL.

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=631838

Summary: Run Valgrind builds on more trunk trees + Try → Run Valgrind builds on mozilla-inbound, fx-team and Try

Chris AtLee [:catlee]

Assignee

Comment 12

•

12 years ago

How about we run valgrind per-push on these branches?

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 13

•

12 years ago

(In reply to Chris AtLee [:catlee] from comment #12) > How about we run valgrind per-push on these branches? If Releng is fine with the load, I'm fine with it. :) The concern in the past was that it would take up too much resources, but now that they're on AWS we might be able to scale up easily. Adjusting summary.

Summary: Run Valgrind builds on mozilla-inbound, fx-team and Try → Run Valgrind builds on mozilla-inbound, fx-team and Try, per push

Chris AtLee [:catlee]

Assignee

Updated

•

12 years ago

Priority: -- → P3

Whiteboard: [valgrind]

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 14

•

12 years ago

Getting Valgrind green on tbpl is a Q4 goal for us, and this bug is a major blocker to that. Chris, may I know what else needs to be done to help move this bug forward?

Flags: needinfo?(catlee)

Chris AtLee [:catlee]

Assignee

Comment 15

•

12 years ago

(In reply to Gary Kwong [:gkw, :nth10sd] from comment #14) > Getting Valgrind green on tbpl is a Q4 goal for us, and this bug is a major > blocker to that. > > Chris, may I know what else needs to be done to help move this bug forward? There's certainly a cost associated with running these per-checkin. Running both linux-32 and linux-64 results in about 2 hours of compute time per push. Is it worth it? Are we catching good bugs with valgrind? Can we choose a different schedule (e.g. every 4 hours) that keeps it visible on the tree, but doesn't use as many resources?

Flags: needinfo?(catlee)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 16

•

12 years ago

> Is it worth it? Are we catching good bugs with valgrind? Yes, we've found bug 795395, bug 795631 and bug 803386 - all initially marked security-sensitive (some still are!) > There's certainly a cost associated with running these per-checkin. Running > both linux-32 and linux-64 results in about 2 hours of compute time per push. This will benefit the sheriffs as they can be clear as to which checkin caused the error. > Can we choose a different schedule (e.g. every 4 hours) that keeps it > visible on the tree, but doesn't use as many resources? I'll let the sheriffs chime in on this. Ed, maybe you have some ideas?

Flags: needinfo?(bmo)

Phil Ringnalda (:philor)

Comment 17

•

12 years ago

That depends on whether we've done a complete 180 degree turn from the plan as laid out in bug 800435, which was that whenever it failed, we would just file a bug and push a suppression. If we have, and it's now to be something which may not ever be made to fail and a push which causes it to fail must be backed out, then we want it running on push on every push on every tree. If not, then we're perfectly happy with nightly-only on mozilla-central (well, unless we're also expected to determine who is at fault, along with filing the bug and pushing the suppression), and the only question is whether or not it has to be made visible in order to shift the burden of having to file the bug and push the suppression onto edmorley except when he's gone when we'd push it onto me or RyanVM.

Nicholas Nethercote [inactive]

Comment 18

•

12 years ago

> That depends on whether we've done a complete 180 degree turn from the plan > as laid out in bug 800435, which was that whenever it failed, we would just > file a bug and push a suppression. I think file-a-bug-and-push-a-suppression is probably the better way to go, for the following reasons. - Valgrind is resource-hungry even on the short test runs that we currently have enabled. We have bugs open to run it on much larger test suites, for which running on every push seems like a bad idea. - I suspect we'll get failures only moderately often. - A nice thing about Valgrind is that when it does complain it's usually not that hard to work out the cause -- i.e. not having the exact changeset that caused the problem probably isn't dire. Please speak up if you disagree with these reasons!

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 19

•

12 years ago

It's nice to almost have a consensus. I agree with philor and njn. If edmorley is fine with the plan to have Valgrind running on more branches once a day but not per push, and to file a bug and add a suppression each time one shows up, I'd say let's go with it. (Ed, please change the summary to remove "per push" as well)

Ed Morley [:emorley]

Comment 20

•

12 years ago

Going by comments 17-19, I think we should: 1) Just run once per day, but possibly on more trees (though how likely are say changes to browser/ in fx-team likely to cause Valgrind regressions?). 2) Unhide to start with, until we get an idea as to how often it's going to break. 3) Once we have the answer to #2, make a final call as to whether it stays unhidden, given the extra burden on (majority volunteer) sheriffs & the fact that people like the jetpack team manage to maintain their hidden builds pretty well. (And also the possible slippery slope of off-loading team-specific tree tasks onto sheriffs).

Flags: needinfo?(bmo)

Phil Ringnalda (:philor)

Comment 21

•

12 years ago

You forgot the most important slippery slope: right now, there are exactly zero visible builds which you can ignore and just push on top of unstarred unfiled undealt-with red or orange. Because we are not doing anything about these failures other than hiding them via the suppression file, there is absolutely no point in telling someone who has a patch to push which is blocking a beta but needs to land on the trunk first that they have to learn the (knowing us, undocumented, or documented where nobody knows how to find it) method for hiding this orange, so this will be the first visible build for which it's permissible to push on unstarred failure. And, why make it visible? What is there to be gained by that, in exchange for even more devaluation of the already deeply devalued tree rules? (And, followup, why are we discussing this in an ever-increasing number of different bugs, rather than a newsgroup?)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 22

•

12 years ago

I happened to chance on another failure - suppressed at https://hg.mozilla.org/mozilla-central/rev/a50a77da3cf9 Let's aim to run on multiple branches and on Try first, if per-push is not recommended, we can not do that now, and when circumstances change in the future, we can revisit that decision another day.

Summary: Run Valgrind builds on mozilla-inbound, fx-team and Try, per push → Run Valgrind builds on mozilla-inbound, fx-team and Try

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Blocks: 946005

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Assignee: nobody → catlee

Nicholas Nethercote [inactive]

Comment 24

•

11 years ago

catlee volunteered to do this. As for exactly which trees need it, https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#3.29_Runs_on_mozilla-central_and_all_trees_that_merge_into_it says: > 3) Runs on mozilla-central and all trees that merge into it > > Otherwise job failures when tree X merges into mozilla-central will not > be attributable to a single changeset, resulting in either tree closure > or backout of the entire merge (see requirement #2). > > When filing the release engineering bug to enable your job on all the > required trees, ask to enable it on "mozilla-central based trees" and > release engineering will enable it in the default config from which all > trunk trees inherit (unless the various tree owners have explicitly opted > out). As a rough guide, mozilla-central based trees include > mozilla-inbound, fx-team, services-central, ionmonkey, graphics as well > as many of the other project/disposable repositories.

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Summary: Run Valgrind builds on mozilla-inbound, fx-team and Try → Run Valgrind builds on all mozilla-central-based trees (inbound, try, fx-team, etc)

Chris AtLee [:catlee]

Assignee

Comment 25

•

11 years ago

Attached patch enable valgrind builds everywhere (deleted) — Details — Splinter Review

this changes the default on project branches to True. All the release branches are set as False already, so this patch ends up enabling valgrind builds on all non-release branches.

Attachment #8343717 - Flags: review?(jhopkins)

John Hopkins (:jhopkins)

Updated

•

11 years ago

Attachment #8343717 - Flags: review?(jhopkins) → review+

Chris AtLee [:catlee]

Assignee

Comment 26

•

11 years ago

Comment on attachment 8343717 [details] [diff] [review] enable valgrind builds everywhere https://hg.mozilla.org/build/buildbot-configs/rev/dcc6288a8d29

Attachment #8343717 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Comment 27

•

11 years ago

In production

Nicholas Nethercote [inactive]

Comment 28

•

11 years ago

It's working on m-c, but not on m-i, for example: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=valgrind There are multiple of these in the log: abort: unknown revision '6ecda297fb371a680efcd49e4a90e8164c561498'! philor suggested it's because m-c is being cloned?

Chris AtLee [:catlee]

Assignee

Comment 29

•

11 years ago

Yes, m-c is being cloned because of http://hg.mozilla.org/build/tools/file/d21da1c08036/scripts/valgrind/valgrind.sh#l7 I'm working on a patch

Chris AtLee [:catlee]

Assignee

Comment 30

•

11 years ago

Attached patch Use build properties to determine which repo to clone (obsolete) (deleted) — Details — Splinter Review

Attachment #8344843 - Flags: review?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Comment 31

•

11 years ago

Comment on attachment 8344843 [details] [diff] [review] Use build properties to determine which repo to clone r+ on the later version from IRC (https://diff.pastebin.mozilla.org/3744166) which >diff --git a/scripts/valgrind/valgrind.sh b/scripts/valgrind/valgrind.sh >+ wget -O branches.json $BRANCHES_JSON swaps this to the copy of tools we're running out of: BRANCHES_JSON=$SCRIPTS_DIR/buildfarm/maintenance/production-branches.json HG_REPO=$($JSONTOOL -k ${branch}.repo branches.json) and no wget. >+ echo python $SCRIPTS_DIR/clobberer/clobberer.py -s scripts -s $(basename $PROPERTIES_FILE) \ >+ $CLOBBERER_URL $branch "$builder" $builddir $slavename $master) And removed this echo.

Attachment #8344843 - Flags: review?(nthomas) → review+

Chris AtLee [:catlee]

Assignee

Comment 32

•

11 years ago

Attached patch Use build properties to determine which repo to clone (deleted) — Details — Splinter Review

Use the production-branches.json we already have! Also, remove extra echos.

Attachment #8344843 - Attachment is obsolete: true

Attachment #8344865 - Flags: review?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Updated

•

11 years ago

Attachment #8344865 - Flags: review?(nthomas) → review+

Chris AtLee [:catlee]

Assignee

Updated

•

11 years ago

Attachment #8344865 - Flags: checked-in+

Nicholas Nethercote [inactive]

Comment 33

•

11 years ago

Appears to be working -- Valgrind is running successfully (well, failing, but not due to buildbot config issues) on mozilla-inbound and fx-team. If I do a |-p all| push to try, will Valgrind run there?

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: General Automation → General

enable valgrind builds everywhere 11 years ago Chris AtLee [:catlee] (deleted), patch	jhopkins : review+ catlee : checked-in+	Details \| Diff \| Splinter Review
Use build properties to determine which repo to clone 11 years ago Chris AtLee [:catlee] (deleted), patch	nthomas : review+	Details \| Diff \| Splinter Review
Use build properties to determine which repo to clone 11 years ago Chris AtLee [:catlee] (deleted), patch	nthomas : review+ catlee : checked-in+	Details \| Diff \| Splinter Review