Closed
Bug 945427
Opened 11 years ago
Closed 10 years ago
Upgrade Valgrind on build machines to 3.10.0
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gkw, Assigned: mshal)
References
Details
Attachments
(3 files)
(deleted),
patch
|
dustin
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dustin
:
review+
mshal
:
checked-in+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #797573 +++
Valgrind 3.9.0 is out. Perhaps we could upgrade to it?
Reporter | ||
Comment 1•11 years ago
|
||
njn, will this be a good step forward?
Flags: needinfo?(n.nethercote)
Comment 2•11 years ago
|
||
I don't think it'll make much difference. The main changes that are relevant for us:
- Improvements in the heuristics used for "possibly lost" leaks, but we use --show-possibly-lost=no, and I doubt the improvements are enough for us to use --show-possibly-lost=yes.
- Some minor optimizations.
It probably wouldn't hurt to be up-to-date, but it's certainly not urgent.
Flags: needinfo?(n.nethercote)
Comment 3•11 years ago
|
||
This doesn't block the general valgrind-on-tbpl work.
No longer blocks: valgrind-on-tbpl
Comment 4•11 years ago
|
||
Actually, I just found a reason to do this. Valgrind-3.9.0 has a flag --errors-for-leak-kinds=definite which you need to use in conjunction with --show-possibly-lost=no if you want the error counts to make sense, i.e. be zero when they should be zero. And we need that for bug 957410 to work.
BTW, this is silly behaviour on Valgrind's part (I just complained in https://bugs.kde.org/show_bug.cgi?id=307465), but we're stuck with it for now.
Blocks: 957410
Comment 5•11 years ago
|
||
catlee, is this something you could do? http://valgrind.org/downloads/valgrind-3.9.0.tar.bz2 is the tarball.
FWIW, bug 750856 was the bug for updating to 3.8.1, and might be a useful reference.
Flags: needinfo?(catlee)
Reporter | ||
Comment 6•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #5)
> FWIW, bug 750856 was the bug for updating to 3.8.1, and might be a useful
> reference.
From bug 750856 comment 24:
> Bug 797573 further upgraded Valgrind to a special 3.8.1 tarball for Mozilla.
Bug 797573 might be a better reference.
Comment 7•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #4)
> Actually, I just found a reason to do this. Valgrind-3.9.0 has a flag
> --errors-for-leak-kinds=definite which you need to use in conjunction with
> --show-possibly-lost=no if you want the error counts to make sense, i.e. be
> zero when they should be zero. And we need that for bug 957410 to work.
Per PhilippeW's comments at
https://bugs.kde.org/show_bug.cgi?id=307465#c13 maybe it would be
simpler to not use --show-possibly-lost=no and instead work with
--show-leak-kinds= and --errors-for-leak-kinds= since they make the
counting- and showing- behaviour directly controllable.
Note that those flags are new in 3.9.0 and will break runs done with
previous versions.
Comment 8•11 years ago
|
||
> Per PhilippeW's comments at
> https://bugs.kde.org/show_bug.cgi?id=307465#c13 maybe it would be
> simpler to not use --show-possibly-lost=no and instead work with
> --show-leak-kinds= and --errors-for-leak-kinds= since they make the
> counting- and showing- behaviour directly controllable.
Sure. --show-leak-kinds=definite is equivalent to --show-possibly-lost=no.
> Note that those flags are new in 3.9.0 and will break runs done with
> previous versions.
Hence this bug.
Comment 9•11 years ago
|
||
Sorry, what's the consensus here? Install vanilla valgrind-3.9.0, or do we need a custom build again?
Flags: needinfo?(catlee)
Reporter | ||
Comment 10•11 years ago
|
||
I think our custom build changes some time ago were cherry-picked after 3.8.1 was cut, from in Valgrind trunk which eventually made it into 3.9.0.
Thus, vanilla 3.9.0 *should* be enough.
Comment 11•11 years ago
|
||
Vanilla 3.9.0 please: http://valgrind.org/downloads/valgrind-3.9.0.tar.bz2
Thanks!
Comment 12•11 years ago
|
||
Build instructions, based on https://bugzilla.mozilla.org/show_bug.cgi?id=797573#c7:
wget http://valgrind.org/downloads/valgrind-3.9.0.tar.bz2
tar -jvxf valgrind-3.9.0.tar.bz2
cp valgrind-3.9.0/valgrind.spec .
mock_mozilla -r mozilla-centos6-x86_64 --init
mock_mozilla -r mozilla-centos6-x86_64 --install gcc mpfr
mock_mozilla -r mozilla-centos6-x86_64 --shell --unpriv 'mkdir -p /builds/rpmbuild/{SOURCES,SPECS}'
mock_mozilla -r mozilla-centos6-x86_64 --copyin --unpriv valgrind-3.9.0.tar.bz2 /builds/rpmbuild/SOURCES
mock_mozilla -r mozilla-centos6-x86_64 --copyin --unpriv valgrind.spec /builds/rpmbuild/SPECS
mock_mozilla -r mozilla-centos6-x86_64 --shell --unpriv --cwd '/builds/rpmbuild/SPECS' 'rpmbuild -ba valgrind.spec'
cp /builds/mock_mozilla/mozilla-centos6-x86_64/root/builds/rpmbuild/RPMS/x86_64/valgrind-3.9.0-1.x86_64.rpm ./
mock_mozilla -r mozilla-centos6-i386 --init
mock_mozilla -r mozilla-centos6-i386 --install gcc mpfr
mock_mozilla -r mozilla-centos6-i386 --shell --unpriv 'mkdir -p /builds/rpmbuild/{SOURCES,SPECS}'
mock_mozilla -r mozilla-centos6-i386 --copyin --unpriv valgrind-3.9.0.tar.bz2 /builds/rpmbuild/SOURCES
mock_mozilla -r mozilla-centos6-i386 --copyin --unpriv valgrind.spec /builds/rpmbuild/SPECS
mock_mozilla -r mozilla-centos6-i386 --shell --unpriv --cwd '/builds/rpmbuild/SPECS' 'rpmbuild -ba valgrind.spec'
cp /builds/mock_mozilla/mozilla-centos6-i386/root/builds/rpmbuild/RPMS/i686/valgrind-3.9.0-1.i686.rpm ./
I'm going to get these deployed onto the Puppet servers, then we can look at changing the packages used by the Valgrind builders.
Comment 13•11 years ago
|
||
...except I just noticed the new build instructions on https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Build_RPMs, I need to adjust a few things.
Assignee: nobody → bhearsum
Comment 14•11 years ago
|
||
OK. I have packages built now. Is it safe to upgrade at any time, or do we need to do some testing first?
Flags: needinfo?(n.nethercote)
Flags: needinfo?(gary)
Comment 15•11 years ago
|
||
No need for a .pp, because this is only installed at build time inside of mock.
Attachment #8365154 -
Flags: review?(dustin)
Updated•11 years ago
|
Attachment #8365154 -
Flags: review?(dustin) → review+
Updated•11 years ago
|
Attachment #8365154 -
Flags: checked-in+
Comment 16•11 years ago
|
||
In theory, upgrading should be safe. In practice, it wouldn't hurt to do a test run if that's not too hard :)
Flags: needinfo?(n.nethercote)
Comment 17•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #16)
> In theory, upgrading should be safe. In practice, it wouldn't hurt to do a
> test run if that's not too hard :)
Turns out that there's no way I can find to test this ahead of time =(. I'm going to land this Monday morning (eastern) and hope for the best. It's easy to back out if something goes wrong.
I think I can put something together to make it possible to test the next upgrade though...
Flags: needinfo?(gary)
Comment 18•11 years ago
|
||
If it's easy to back out, that sounds fine.
Comment 19•11 years ago
|
||
I think this is better than what we're doing, and should work based on this test using an existing version:
[root@bld-centos6-hp-004.build.scl1.mozilla.com ~]# yum install valgrind-3.8.1-1
Loaded plugins: security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package valgrind.x86_64 1:3.8.1-1 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
==========================================================================================================================================================================
Package Arch Version Repository Size
==========================================================================================================================================================================
Installing:
valgrind x86_64 1:3.8.1-1 releng-public-CentOS6-x86_64 14 M
Transaction Summary
==========================================================================================================================================================================
Install 1 Package(s)
Total download size: 14 M
Installed size: 69 M
Is this ok [y/N]:
Can't be landed until 3.9.0-1 is deployed on Monday morning.
Attachment #8365333 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8365333 -
Flags: review?(catlee) → review+
Comment 20•11 years ago
|
||
Something urgent came up this morning and I wasn't able to land this. I'll try again tomorrow morning.
Comment 21•11 years ago
|
||
I got busy with other things again this morning. I'll be landing this around 1pm eastern today (really!)
Comment 22•11 years ago
|
||
I have to put this off again. Trees are closed for major issues. I'll try again tomorrow.
I'm very sorry for all the delays in getting this finished.
Comment 23•11 years ago
|
||
I landed this but had to back out due to bustage:
+ python2.7 ../src/mach valgrind-test
==544== Memcheck, a memory error detector
==544== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==544== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==544== Command: /builds/slave/m-in-l64-valgrind-000000000000/objdir/dist/bin/firefox http://127.0.0.1:37016/ -profile /tmp/tmp5sKQr1
==544==
==547==
==547== HEAP SUMMARY:
==547== in use at exit: 216,374 bytes in 634 blocks
==547== total heap usage: 791 allocs, 157 frees, 237,043 bytes allocated
==547==
==547== LEAK SUMMARY:
==547== definitely lost: 0 bytes in 0 blocks
==547== indirectly lost: 0 bytes in 0 blocks
==547== possibly lost: 1,101 bytes in 32 blocks
==547== still reachable: 215,273 bytes in 602 blocks
==547== suppressed: 0 bytes in 0 blocks
==547== Reachable blocks (those to which a pointer was found) are not shown.
==547== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==547==
==547== For counts of detected and suppressed errors, rerun with: -v
==547== ERROR SUMMARY: 32 errors from 32 contexts (suppressed: 6 from 6)
Xlib: extension "RANDR" missing on display ":2.0".
==544==
==544== HEAP SUMMARY:
==544== in use at exit: 860,001 bytes in 9,296 blocks
==544== total heap usage: 1,338,656 allocs, 1,329,360 frees, 955,339,648 bytes allocated
==544==
==544== LEAK SUMMARY:
==544== definitely lost: 0 bytes in 0 blocks
==544== indirectly lost: 4,800 bytes in 149 blocks
==544== possibly lost: 2,041 bytes in 57 blocks
==544== still reachable: 253,618 bytes in 2,090 blocks
==544== suppressed: 599,542 bytes in 7,000 blocks
==544== Reachable blocks (those to which a pointer was found) are not shown.
==544== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==544==
==544== For counts of detected and suppressed errors, rerun with: -v
==544== ERROR SUMMARY: 53 errors from 53 contexts (suppressed: 159 from 153)
TEST-UNEXPECTED-FAIL | valgrind-test | non-zero exit code
Full log is here: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64/1391013612/mozilla-inbound-linux64-valgrind-bm61-build1-build174.txt.gz
If you need a hand debugging this (eg, access to a machine), please let me know.
Flags: needinfo?(n.nethercote)
Comment 24•11 years ago
|
||
The problem is something to do with the highly confusing way that Valgrind counts leak errors. I know how to fix it for 3.9.0 -- it requires adding some new, 3.9.0-only flags to the Valgrind invocation. But I also don't understand why things work with 3.8.1.
I'll keep digging, but the good news is that comment 4 was wrong, and this doesn't block bug 957410 after all.
No longer blocks: 957410
Flags: needinfo?(n.nethercote)
Comment 25•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #24)
> The problem is something to do with the highly confusing way that Valgrind
> counts leak errors. I know how to fix it for 3.9.0 -- it requires adding
> some new, 3.9.0-only flags to the Valgrind invocation. But I also don't
> understand why things work with 3.8.1.
>
> I'll keep digging, but the good news is that comment 4 was wrong, and this
> doesn't block bug 957410 after all.
OK, thanks. Can you reassign this bug once you've figured out how to work around this?
Assignee: bhearsum → n.nethercote
Reporter | ||
Comment 26•11 years ago
|
||
If we do the next upgrade, please also have a build that has the Valgrind fix in:
https://bugzilla.mozilla.org/show_bug.cgi?id=970643#c49
as well - it helps eliminate asm.js false positives. (either in a new version or a custom 3.9.0 build with only that fix)
Reporter | ||
Comment 27•10 years ago
|
||
3.10.0 is out.
http://www.valgrind.org/downloads/current.html#current
Summary: Upgrade Valgrind on build machines to 3.9.0 → Upgrade Valgrind on build machines to 3.10.0
Comment 28•10 years ago
|
||
One compelling feature of Valgrind 3.10 is that it produces better stack traces. Compare this, from 3.9.0:
> ==19385== at 0x4C28A49: malloc (vg_replace_malloc.c:270)
> ==19385== by 0x782D757: moz_xmalloc (mozalloc.cpp:52)
> ==19385== by 0x84CF3F6: mozilla::net::nsHttpHandler::NewProxiedChannel(nsIURI*, nsIProxyInfo*, unsigned int, nsIURI*, nsIChannel**) (mozalloc.h:201)
> ==19385== by 0x83A9AAF: nsIOService::NewChannelFromURIWithProxyFlags(nsIURI*, nsIURI*, unsigned int, nsIChannel**) (nsIOService.cpp:591)
> ==19385== by 0x83A9C31: nsIOService::NewChannelFromURI(nsIURI*, nsIChannel**) (nsIOService.cpp:562)
with this, from 3.10.0:
> ==26147== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
> ==26147== by 0x4034684: moz_xmalloc (mozalloc.cpp:52)
> ==26147== by 0x7F390DA: operator new (mozalloc.h:201)
> ==26147== by 0x7F390DA: mozilla::net::nsHttpHandler::NewProxiedChannel(nsIURI*, nsIProxyInfo*, unsigned int, nsIURI*, nsIChannel**) (nsHttpHandler.cpp:1695)
> ==26147== by 0x7E1FF6A: nsIOService::NewChannelFromURIWithProxyFlags(nsIURI*, nsIURI*, unsigned int, nsIChannel**) (nsIOService.cpp:591)
> ==26147== by 0x7E200E9: nsIOService::NewChannelFromURI(nsIURI*, nsIChannel**) (nsIOService.cpp:562)
So I'd like to get it upgraded. I *think* the bustage that happened last time we tried upgrading to 3.9.0 (see comment 23) won't happen this time, because the way the Valgrind test job detects failures has changed.
bhearsum, do you think you could schedule this? Bug 1069034 will need to land first. It would be good if you could try this at a time that we're both online so we could troubleshoot any problems together, but unfortunately we're 10 hours apart so that might be difficult.
Flags: needinfo?(bhearsum)
Comment 29•10 years ago
|
||
Sorry for the delay in response. I won't be able to pick this up myself, but I pinged a few other folks that may have time soon.
Flags: needinfo?(bhearsum)
Comment 30•10 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #29)
> Sorry for the delay in response. I won't be able to pick this up myself, but
> I pinged a few other folks that may have time soon.
Ok. Bug 1069034 just landed on mozilla-inbound, so this should be able to happen as soon as it has propagated to the other repos.
mshal, is this something you'd be able to do?
Flags: needinfo?(mshal)
Assignee | ||
Comment 32•10 years ago
|
||
I'm not sure how to test this beforehand, but the packages should already be up on mockbuild-repos and synced to s3.
Attachment #8499577 -
Flags: review?(dustin)
Comment 33•10 years ago
|
||
Comment on attachment 8499577 [details] [diff] [review]
puppet-bug945427
This looks good. You could test by manually adding the repo to a mock config and running a build manually. I think the big risk is that valgrind jobs start failing because of some issue with the upgraded version, but if it's been tested -- either by you or by developers -- then it's probably reasonable to just ship it.
Attachment #8499577 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 34•10 years ago
|
||
Comment on attachment 8499577 [details] [diff] [review]
puppet-bug945427
Guess we'll see what breaks...
https://hg.mozilla.org/build/puppet/rev/fe812b06762e
Attachment #8499577 -
Flags: checked-in+
Comment 35•10 years ago
|
||
I merged the puppet change to production.
Assignee | ||
Comment 36•10 years ago
|
||
Backed out because of:
ERROR: Command failed:
# ['/usr/bin/yum', '--installroot', '/builds/mock_mozilla/mozilla-centos6-x86_64/root/', 'install', 'bash', 'bzip2', 'coreutils', 'cpio', 'diffutils', 'fedora-release', 'findutils', 'gawk', 'gmp', 'libstdc++', 'ppl', 'cpp', 'gcc', 'gcc-c++', 'grep', 'gzip', 'info', 'make', 'patch', 'redhat-rpm-config', 'rpm-build', 'sed', 'shadow-utils', 'tar', 'unzip', 'util-linux', 'which', 'xz']
http://mockbuild-repos.pub.build.mozilla.org/custom/valgrind/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404"
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: custom-valgrind. Please verify its path and try again
Assignee | ||
Updated•10 years ago
|
Attachment #8499577 -
Flags: checked-in+
Assignee | ||
Comment 37•10 years ago
|
||
Is it just not supposed to have the arch in the "baseurl" lines? So:
baseurl=%REPOROOT%/custom/valgrind
instead of:
baseurl=%REPOROOT%/custom/valgrind/x86_64
Assignee | ||
Comment 38•10 years ago
|
||
dustin figured out the problem was that I ran the "createrepo --update ." command in the wrong directory - it should be run in valgrind/i386 and valgrind/x86_64 instead of just valgrind. I fixed the mockbuild-repos and re-landed the puppet patch yesterday. It seems builds are starting to pick up the new valgrind:
https://tbpl.mozilla.org/php/getParsedLog.php?id=49679806&tree=Mozilla-Inbound&full=1
valgrind x86_64 1:3.10.0-1 custom-valgrind 16 M
Assignee | ||
Updated•10 years ago
|
Attachment #8499577 -
Flags: checked-in+
Comment 39•10 years ago
|
||
From the logs on inbound:
> ==31963== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
Woo! Thanks, mshal.
Updated•10 years ago
|
Assignee: n.nethercote → mshal
Assignee | ||
Comment 40•10 years ago
|
||
njn, is there anything else you need here or can we mark it fixed?
Flags: needinfo?(n.nethercote)
Comment 41•10 years ago
|
||
Looks good to me! Thank you.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(n.nethercote)
Resolution: --- → FIXED
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•