Closed
Bug 1220658
Opened 9 years ago
Closed 9 years ago
Upgrade ec2 test instances mesa versions to mesa-lts-saucy-9.2.1
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jrmuizel, Unassigned)
References
(Depends on 1 open bug)
Details
Attachments
(7 files, 3 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dustin
:
review+
Callek
:
checked-in-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Callek
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
text/x-review-board-request
|
rail
:
review+
dustin
:
checked-in+
|
Details |
There's a bug in mesa 8.0.4 that prevents us from updating the WebGL conformance tests. This bug has been fixed in mesa-lts-saucy-9.2.1 which I believe is the current version in 12.04. I've ported the patch from bug 975034 to 9.2.1
Updated•9 years ago
|
Attachment #8681965 -
Attachment is patch: true
Attachment #8681965 -
Attachment mime type: text/x-patch → text/plain
Flags: needinfo?(rail)
Comment 2•9 years ago
|
||
We use mesa 8.0.4 (see http://hg.mozilla.org/build/puppet/file/tip/modules/packages/manifests/mesa.pp) with a patch from you ;) see http://hg.mozilla.org/build/puppet/file/tip/modules/packages/manifests/mesa-debian/patches/moz-fix-llvmpipe
To upgrade to mesa-lts-saucy-9.2.1 (which can't be found at http://packages.ubuntu.com/search?suite=precise§ion=all&arch=any&keywords=mesa-lts-saucy&searchon=names, but available in some PPAs), I'd clear with the following first:
1) jmaher may want to know about the upgrade to watch how it affects other tests
2) ahal has been working on porting our tests to docker/taskcluster. Maybe we are close enough and can wait to be able to test the changes using a try push using a different docker image
It will also require some releng time to prep the packages.
Comment 3•9 years ago
|
||
oh fun stuff! Adding Armen as he will be helping :ahal with the porting to taskcluster.
I assume we will need a way to test this on try prior to deployment. We still have about 60 unique failures in task cluster land, but this is using a custom docker image already to install compiz and some fonts.
Is there a way to upgrade this dynamically in the job for buildbot (or current automation)? There is a long round a bout way for task cluster, but it takes a lot of patience. I would be curious to know what other tests have problems with this.
Comment 4•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #3)
> Is there a way to upgrade this dynamically in the job for buildbot (or
> current automation)?
Not for tests. :(
Reporter | ||
Comment 5•9 years ago
|
||
The package can be found here:
http://packages.ubuntu.com/search?suite=precise-updates&arch=any&searchon=names&keywords=mesa
Comment 6•9 years ago
|
||
I thought that it would be faster if I build the packages with the patch attached and someone can test installing them.
* The packages are deployed in a separate repo at https://releng-puppet2.srv.releng.scl3.mozilla.com/repos/apt/custom/mesa-lts-saucy
* I have an untested puppet patch which should do the trick: https://gist.github.com/rail/8872d93c9bec47bb8ba7
* The packages cannot be installed side by side with the old ones (they are marked as conflicting).
* libglu1-mesa has been removed from the list. Packaging changelog tells that "it was split upstream". We may need to tweak the list of packages if it fails to install.
* the best way to test this would be puppetizing a loaner instance against a user repo with the instance pinned to the user environment, see https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Set_up_a_user_environment#Pinning. This way you reduce the chance to fight conflicting packages from the previous version.
* The changes won't be applied to talos machines, we don't install these packages there.
Comment 7•9 years ago
|
||
I'm creating a loaner instance (tst-linux64-ec2-coop) to test these packages right now.
Comment 8•9 years ago
|
||
Not working out of the box.
Here's the list of packages I had to purge to get to a "clean slate":
http://people.mozilla.org/~coop/bug1220658/apt-get-purge-mesa.log
Here's the syslog output of the attempted, subsequent puppet install with Rail's patch installed:
http://people.mozilla.org/~coop/bug1220658/syslog
Comment 9•9 years ago
|
||
I looked at that instance and the logs. It sounds like mesa-lts-saucy-9.2.1 requires some other packages to be backported. It won't be trivial :/
Comment 10•9 years ago
|
||
Rail manually upgraded the packages on the test node in advance of bug 1225596. I'm currently running some unittests to see how many failures pop up.
Comment 11•9 years ago
|
||
I've run some tests and so far they're green modulo a failure to pull gaia for luciddream, which is expected due to last night's b2g/vcssync bustage.
Still a long way to go here though:
* make sure that tests on release branches (aurora/beta/release/esr) still work
* find the package delta between the current tester image and this trial node so we can mirror only the subset of packages we need, assuming bug 1225596 doesn't pan out (based on https://bugzilla.mozilla.org/show_bug.cgi?id=1225596#c1)
* perform above process with a tst-linux32 slave
* perform above process with a tst-emulator64 slave
Comment 12•9 years ago
|
||
(In reply to Rail Aliiev [:rail], on PTO Nov 21 - Mozlandia from comment #4)
> (In reply to Joel Maher (:jmaher) from comment #3)
> > Is there a way to upgrade this dynamically in the job for buildbot (or
> > current automation)?
>
> Not for tests. :(
Yes, this is possible with TaskCluster.
This will need to be ported to the ubuntu1204-test docker image, too, as that's what TC is using.
> root@taskcluster-worker:~# dpkg -l | grep mesa
> ii libgl1-mesa-dri 8.0.4-0ubuntu0.7 free implementation of the OpenGL API -- DRI modules
> ii libgl1-mesa-dri:i386 8.0.4-0ubuntu0.7 free implementation of the OpenGL API -- DRI modules
> ii libgl1-mesa-glx 8.0.4-0ubuntu0.7 free implementation of the OpenGL API -- GLX runtime
> ii libgl1-mesa-glx:i386 8.0.4-0ubuntu0.7 free implementation of the OpenGL API -- GLX runtime
> ii libglapi-mesa 8.0.4-0ubuntu0.7 free implementation of the GL API -- shared library
> ii libglapi-mesa:i386 8.0.4-0ubuntu0.7 free implementation of the GL API -- shared library
> ii libglu1-mesa 8.0.4-0ubuntu0.7 Mesa OpenGL utility library (GLU)
> ii libglu1-mesa:i386 8.0.4-0ubuntu0.7 Mesa OpenGL utility library (GLU)
> ii mesa-common-dev 8.0.4-0ubuntu0.7 Developer documentation for Mesa
TC image builds should not refer to the puppet repositories (since they don't use puppet!). The approach we took for xcb was to create an Ubuntu repository, put it in a tarball, and put it on tooltool. Then we download and unpack that and add its path to sources.list temporarily. See
https://dxr.mozilla.org/mozilla-central/source/testing/docker/ubuntu1204-test/system-setup.sh#181
Comment 13•9 years ago
|
||
What timeline should we expect for this bug?
I want to know so I can determine what to do with mochitest gl on TaskCluster (since we have that bug fixed there).
We have some tests that are now unexpectingly passing.
If this gets fixed, we change the state of the tests.
If it is going to take long, we will skip the tests until this gets fixed.
Comment 14•9 years ago
|
||
:coop- trying to make some decisions on taskcluster- we are using the new mesa library already and either need to wait for this if it is soon, or disable some tests until this is resolved. Can you help us figure out a timeline- I am not sure if Rail is the one who would be doing this work.
Flags: needinfo?(coop)
Comment 15•9 years ago
|
||
Here are the failures I saw in staging. Sounds like these correspond to what Armen is seeing:
Ubuntu VM 12.04 x64 mozilla-central opt test web-platform-tests-4
08:50:18 INFO - TEST-UNEXPECTED-PASS | /webgl/bufferSubData.html | bufferSubData - expected FAIL
08:50:18 INFO - TEST-INFO | expected FAIL
08:50:25 INFO - TEST-UNEXPECTED-PASS | /webgl/compressedTexImage2D.html | compressedTexImage2D - expected FAIL
08:50:25 INFO - TEST-INFO | expected FAIL
08:50:32 INFO - TEST-UNEXPECTED-PASS | /webgl/compressedTexSubImage2D.html | compressedTexSubImage2D - expected FAIL
08:50:32 INFO - TEST-INFO | expected FAIL
08:50:39 INFO - TEST-UNEXPECTED-PASS | /webgl/texImage2D.html | texImage2D - expected FAIL
08:50:39 INFO - TEST-INFO | expected FAIL
08:50:46 INFO - TEST-UNEXPECTED-PASS | /webgl/texSubImage2D.html | texSubImage2D - expected FAIL
08:50:46 INFO - TEST-INFO | expected FAIL
08:50:52 INFO - TEST-UNEXPECTED-PASS | /webgl/uniformMatrixNfv.html | Should not throw for 2 - expected FAIL
08:50:52 INFO - TEST-INFO | expected FAIL
08:50:52 INFO - TEST-UNEXPECTED-PASS | /webgl/uniformMatrixNfv.html | Should not throw for 3 - expected FAIL
08:50:52 INFO - TEST-INFO | expected FAIL
08:50:52 INFO - TEST-UNEXPECTED-PASS | /webgl/uniformMatrixNfv.html | Should not throw for 4 - expected FAIL
08:50:52 INFO - TEST-INFO | expected FAIL
Ubuntu VM 12.04 x64 mozilla-central opt test mochitest-e10s-browser-chrome-7
12:12:57 INFO - 514 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_save_video.js | uncaught exception - TypeError: gContextMenu is null at chrome://browser/content/browser.xul:1
12:12:58 INFO - Stack trace:
12:12:58 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:simpletestOnerror:1519
12:12:58 INFO - chrome://mochitests/content/browser/browser/base/content/test/general/browser_save_video.js:null:62
12:12:58 INFO - Tester_execTest@chrome://mochikit/content/browser-test.js:757:9
12:12:58 INFO - Tester.prototype.nextTest</<@chrome://mochikit/content/browser-test.js:677:7
12:12:58 INFO - SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:735:59
12:12:58 INFO - JavaScript error: chrome://browser/content/browser.xul, line 1: TypeError: gContextMenu is null
12:13:40 INFO - 517 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_save_video.js | Test timed out -
12:13:40 INFO - MEMORY STAT | vsize 1490MB | residentFast 298MB | heapAllocated 126MB
Ubuntu VM 12.04 x64 mozilla-central opt test web-platform-tests-e10s-4
12:34:57 INFO - TEST-UNEXPECTED-PASS | /webgl/bufferSubData.html | bufferSubData - expected FAIL
12:34:57 INFO - TEST-INFO | expected FAIL
12:35:06 INFO - TEST-UNEXPECTED-PASS | /webgl/compressedTexImage2D.html | compressedTexImage2D - expected FAIL
12:35:06 INFO - TEST-INFO | expected FAIL
12:35:15 INFO - TEST-UNEXPECTED-PASS | /webgl/compressedTexSubImage2D.html | compressedTexSubImage2D - expected FAIL
12:35:15 INFO - TEST-INFO | expected FAIL
12:35:24 INFO - TEST-UNEXPECTED-PASS | /webgl/texImage2D.html | texImage2D - expected FAIL
12:35:24 INFO - TEST-INFO | expected FAIL
12:35:33 INFO - TEST-UNEXPECTED-PASS | /webgl/texSubImage2D.html | texSubImage2D - expected FAIL
12:35:33 INFO - TEST-INFO | expected FAIL
12:35:42 INFO - TEST-UNEXPECTED-PASS | /webgl/uniformMatrixNfv.html | Should not throw for 2 - expected FAIL
12:35:42 INFO - TEST-INFO | expected FAIL
12:35:42 INFO - TEST-UNEXPECTED-PASS | /webgl/uniformMatrixNfv.html | Should not throw for 3 - expected FAIL
12:35:42 INFO - TEST-INFO | expected FAIL
12:35:42 INFO - TEST-UNEXPECTED-PASS | /webgl/uniformMatrixNfv.html | Should not throw for 4 - expected FAIL
12:35:42 INFO - TEST-INFO | expected FAIL
Comment 16•9 years ago
|
||
I don't see the mochitest-gl unexpected-pass tests though. Do we know if these wpt4 and bc7 failures are intermittent failures, or perma failures?
Comment 17•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #14)
> :coop- trying to make some decisions on taskcluster- we are using the new
> mesa library already and either need to wait for this if it is soon, or
> disable some tests until this is resolved. Can you help us figure out a
> timeline- I am not sure if Rail is the one who would be doing this work.
Since the upgrade seems to be moving the ball forward, i.e. making tests pass that were previously failing, we should proceed.
I'll ask Rail to roll a similar puppet patch for linux32 and emulator64. I can handle deployment once we have the packages mirrored, etc. Rail is on PTO until next week, and next week is Mozlando. Realistically, this isn't going to get fixed until late December. We should disable what we need to in the interim.
Comment 18•9 years ago
|
||
that makes sense! Thanks for the info, I am glad that it is pretty realistic to get this done by the end of the month barring any unforeseen problems!
Comment 19•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #16)
> I don't see the mochitest-gl unexpected-pass tests though. Do we know if
> these wpt4 and bc7 failures are intermittent failures, or perma failures?
I ran the tests against some other branches (e.g. mozilla-release), and the web platform "failures" seem to be legit.
I only have one data point on bc7 now, so I'm re-running the test to get more data.
Comment 20•9 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #19)
> I only have one data point on bc7 now, so I'm re-running the test to get
> more data.
The bc7 failure seems to be intermittent.
Flags: needinfo?(coop)
Comment 21•9 years ago
|
||
awesome to hear that. higher confidence in less issues randomly cropping up.
Comment 22•9 years ago
|
||
Attachment #8695995 -
Flags: review?(jgilbert)
Comment 23•9 years ago
|
||
reference to the gl failures is here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=31dcce5f171c&filter-searchStr=gl
Comment 24•9 years ago
|
||
Comment 25•9 years ago
|
||
The unexpected passes are gone.
I see some new unexpected *failures*:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b06014e0ba00&filter-searchStr=gl
The new failures could be related to changing the EC2 instance.
Comment 26•9 years ago
|
||
I assume they could also be related to the chunk-size changes? Perhaps some resource initialization ordering changed or is now split over multiple chunks?
Comment 27•9 years ago
|
||
Also:
23:02:41 INFO - 412 INFO TEST-FAIL | dom/canvas/test/webgl-conformance/_wrappers/test_conformance__limits__gl-min-textures.html | The author of the test has indicated that flaky timeouts are expected. Reason: untriaged
so maybe just disable that test?
Reporter | ||
Comment 28•9 years ago
|
||
This patch should make the test give us a bit more information about what's going wrong. Can you try again with it?
Comment 29•9 years ago
|
||
Comment 30•9 years ago
|
||
Comment 31•9 years ago
|
||
Comment 32•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #27)
> Also:
>
> 23:02:41 INFO - 412 INFO TEST-FAIL |
> dom/canvas/test/webgl-conformance/_wrappers/test_conformance__limits__gl-min-
> textures.html | The author of the test has indicated that flaky timeouts are
> expected. Reason: untriaged
>
> so maybe just disable that test?
All dom/canvas/webgl-conformance/* tests have this right now, so this is not grounds for disabling.
Comment 33•9 years ago
|
||
Comment on attachment 8695995 [details] [diff] [review]
temporarily skip 3 gl tests so we can pass on task cluster (1.0)
Review of attachment 8695995 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/canvas/test/_webgl-conformance.ini
@@ +777,5 @@
> skip-if = os == 'android'
> [webgl-conformance/_wrappers/test_conformance__textures__texture-size.html]
> skip-if = os == 'android'
> [webgl-conformance/_wrappers/test_conformance__textures__texture-size-cube-maps.html]
> +skip-if = (os == 'android') || (os == 'linux') # remove when bug 1220658 is resolved
Remove these comments. This is a generated file.
Rather, just rerun the generator python script.
Attachment #8695995 -
Flags: review?(jgilbert) → review-
Reporter | ||
Comment 34•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #31)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=461bd2e184a9
All of these seem orange...
Comment 35•9 years ago
|
||
Ugh, I dislike this new feature of putting try jobs on bugs -- that try push is unrelated to this bug.
Comment 36•9 years ago
|
||
Comment 38•9 years ago
|
||
this gets us green with the mesa driver:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e76f012d0faf&filter-searchStr=gl
Attachment #8695995 -
Attachment is obsolete: true
Attachment #8698592 -
Flags: review?(jgilbert)
Comment 39•9 years ago
|
||
Comment on attachment 8698592 [details] [diff] [review]
gl_mesa921_temp_disable.patch
Review of attachment 8698592 [details] [diff] [review]:
-----------------------------------------------------------------
:jrmuizel: Can you find someone to review reftest.list?
::: dom/canvas/test/webgl-conformance/mochitest-errata.ini
@@ +104,5 @@
> # Failures after enabling color_buffer_[half_]float.
> +# remove when bug 1220658 is resolved as this is unexpected-pass.
> +skip-if = (os == 'linux')
> +[_wrappers/test_conformance__textures__texture-mips.html]
> +# remove when bug 1220658 is resolved as this is unexpected-pass.
Alright. Technically we should unmark these unexpected-pass from being predicted to fail. I'm changing a bunch of this stuff right now, so it's fine if we just hack these out, if they mean you can move faster.
Attachment #8698592 -
Flags: review?(jmuizelaar)
Attachment #8698592 -
Flags: review?(jgilbert)
Attachment #8698592 -
Flags: review+
Comment 40•9 years ago
|
||
oh- I accidentally picked up another patch in this one- let me make this just the canvas changes.
Comment 41•9 years ago
|
||
ok, this is just the files under dom/canvas/test changed!
Attachment #8698592 -
Attachment is obsolete: true
Attachment #8698592 -
Flags: review?(jmuizelaar)
Attachment #8698601 -
Flags: review+
Comment 42•9 years ago
|
||
Comment 43•9 years ago
|
||
this is a patch landed to fix the tests which fail whilst running with the new mesa library AND on taskcluster. I assume this is related specifically to the mesa library since the 'gl' tests were at parity prior to the mesa upgrade.
Keywords: leave-open
Comment 44•9 years ago
|
||
bugherder |
Comment 45•9 years ago
|
||
Joel,
I'm going to mirror the missing repo, and then we should be ready to go with this update. Do you have any preferences when we should roll this out? Some time next week maybe?
Flags: needinfo?(jmaher)
Comment 47•9 years ago
|
||
I decided to mirror this repo separately, so it'd be easier to recover if we want to back it out.
debmirror --config-file=/etc/debmirror.conf --source --no-check-gpg \
-a i386,amd64 \
-s main,main/debian-installer,restricted,restricted/debian-installer,universe,universe/debian-installer \
-d precise-updates -h us.archive.ubuntu.com -r /ubuntu -e rsync --progress --nocleanup \
/data/repos/apt/precise-updates/
When the files are synced I'll try to land the patch and regenerate the AMIs.
Comment 48•9 years ago
|
||
Attachment #8700610 -
Flags: review?(dustin)
Updated•9 years ago
|
Attachment #8700610 -
Flags: review?(dustin) → review+
Comment 49•9 years ago
|
||
Don't forget to add the repo to https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Packages
Comment 50•9 years ago
|
||
Callek volunteered to deploy this change next Monday.
What we need to do:
1) Land https://bug1220658.bmoattachments.org/attachment.cgi?id=8700610
2) merge to production
3) wait until it's deployed on all masters
4) regenerate 3 AMIS:
# from /builds/aws_manager/bin on aws-manager2
./aws_manager-tst-linux64-ec2-golden.sh
./aws_manager-tst-linux32-ec2-golden.sh
./aws_manager-tst-emulator64-ec2-golden.sh
If everything goes well, this is all we need.
In case we need to backout the change.
* repeat the same steps with the patch backed out
* kill the instances based on the new AMIs, see https://wiki.mozilla.org/ReleaseEngineering/How_To/Manage_spot_AMIs
Assignee: rail → bugspam.Callek
Comment 51•9 years ago
|
||
Comment on attachment 8700610 [details] [diff] [review]
mesa-puppet.diff
https://hg.mozilla.org/build/puppet/rev/fe660b8a7f7d
https://hg.mozilla.org/build/puppet/rev/9d8b37c289ec
Attachment #8700610 -
Flags: checked-in+
Comment 52•9 years ago
|
||
Comment on attachment 8700610 [details] [diff] [review]
mesa-puppet.diff
Bustage fix: https://hg.mozilla.org/build/puppet/rev/b35230c93b8c
https://hg.mozilla.org/build/puppet/rev/c18c2a0c0a79
Comment 53•9 years ago
|
||
Both emulator64, linux32 and linux64, have all failed so far with:
Mon Dec 28 08:23:00 -0800 2015 Puppet (err): Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install libc6=2.15-0ubuntu10.10' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
libc6 : Depends: libc-bin (= 2.15-0ubuntu10.10) but 2.15-0ubuntu10.12 is to be installed
E: Unable to correct problems, you have held broken packages.
Wrapped exception:
Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install libc6=2.15-0ubuntu10.10' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
libc6 : Depends: libc-bin (= 2.15-0ubuntu10.10) but 2.15-0ubuntu10.12 is to be installed
E: Unable to correct problems, you have held broken packages.
Mon Dec 28 08:23:00 -0800 2015 /Stage[main]/Packages::Libc/Package[libc6]/ensure (err): change from 2.15-0ubuntu10 to 2.15-0ubuntu10.10 failed: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install libc6=2.15-0ubuntu10.10' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
libc6 : Depends: libc-bin (= 2.15-0ubuntu10.10) but 2.15-0ubuntu10.12 is to be installed
E: Unable to correct problems, you have held broken packages.
Comment 54•9 years ago
|
||
Sorry about that. I haven't tested the initial puppetization, because it's hard to use dev env/pinned slaves without changing the production configs. Probably I should add this support to puppetize.sh...
Comment 55•9 years ago
|
||
Comment on attachment 8700610 [details] [diff] [review]
mesa-puppet.diff
Review of attachment 8700610 [details] [diff] [review]:
-----------------------------------------------------------------
For future landing of this patch, assuming package deps are understood....
Backed out this patch (and my followup) due to the package dep issue above.
https://hg.mozilla.org/build/puppet/rev/9b0abc13eebe
https://hg.mozilla.org/build/puppet/rev/577bcd68bc0a
Feedback to :rail for what-to-do-next.
::: modules/packages/manifests/mesa.pp
@@ +11,5 @@
> package {
> # This package is a recompiled version of
> + # http://packages.ubuntu.com/precise-updates/mesa-common-dev-lts-saucy
> + ["libgl1-mesa-dri-lts-saucy", "libgl1-mesa-glx-lts-saucy",
> + "libglapi-mesa-lts-saucy", "libxatracker1-lts-saucy":
needs ] at the end.
@@ +22,5 @@
> + # http://packages.ubuntu.com/precise-updates/mesa-common-dev-lts-saucy
> + # libgl1-mesa-dev-lts-saucy:i386 is required by B2G emulators, Bug 1013634
> + ["libgl1-mesa-dri-lts-saucy", "libgl1-mesa-glx-lts-saucy",
> + "libglapi-mesa-lts-saucy", "libxatracker1-lts-saucy",
> + "libgl1-mesa-dev-lts-saucy:i386"]:
modules/packages/manifests/mesa.pp - ERROR: two-space soft tabs not used on line 26 - 2sp_soft_tabs
(I fixed the preceding two lines already in my travis patchset, that will be landing today)
Attachment #8700610 -
Flags: feedback?(rail)
Attachment #8700610 -
Flags: checked-in-
Attachment #8700610 -
Flags: checked-in+
Comment 56•9 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #50)
> In case we need to backout the change.
> * repeat the same steps with the patch backed out
> * kill the instances based on the new AMIs, see
> https://wiki.mozilla.org/ReleaseEngineering/How_To/Manage_spot_AMIs
Doesn't look like any of these AMI's got to the created stage during my steps....
I stopped the golden instances that were running as well.
Updated•9 years ago
|
Assignee: bugspam.Callek → rail
Comment 57•9 years ago
|
||
Relanded //hg.mozilla.org/build/puppet/rev/040e91335831, waiting for travis.
Comment 58•9 years ago
|
||
Comment 59•9 years ago
|
||
I had to land the following 3 bustage fixes:
update libc version:
http://hg.mozilla.org/build/puppet/rev/d5f27dc939d3
Make sure we use the new repo for libc (to make proxxy happier):
http://hg.mozilla.org/build/puppet/rev/0fb4c1e2ea65
update python version:
http://hg.mozilla.org/build/puppet/rev/1fde6f0410ba
I felt it safe to take minor version updates to fix the bustage, we are updating the whole system in any case.
Comment 60•9 years ago
|
||
pushed to try re-enabling the tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8af8fb99e5b0
we will need to get this landed on trunk/aurora/beta/release? probably. In addition there are web-platform-tests which need to get adjusted as well, tracked in bug 1236047
Comment 61•9 years ago
|
||
Comment on attachment 8700610 [details] [diff] [review]
mesa-puppet.diff
I think I sorted this out.
Attachment #8700610 -
Flags: feedback?(rail)
I created a new exclusion profile in Treeherder to hide these failing jobs until they can be fixed up. We'll need to remove that exclusion once that happens.
Flags: needinfo?(wkocher)
Comment hidden (Intermittent Failures Robot) |
Asan tests don't seem too happy: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=d464ff85debc&group_state=expanded&filter-searchStr=asan&selectedJob=19165556
As far as I can tell, they're related to the mesa upgrade? I don't think I'm going to be able to get away with hiding all of these Asan tests, even for a day or two...
Flags: needinfo?(rail)
Flags: needinfo?(jmaher)
Blocks: 1236113
Blocks: 1236115
Blocks: 1236116
And the fallout is piling up. Trunk, Aurora, Beta trees closed.
Flags: needinfo?(wkocher)
Comment 66•9 years ago
|
||
backed out all changes:
remote: https://hg.mozilla.org/build/puppet/rev/7ef629f87819
remote: https://hg.mozilla.org/build/puppet/rev/87b6ce64a12e
I'm going to revert the AMIs and kill running instances.
Flags: needinfo?(rail)
Comment 67•9 years ago
|
||
This is the working combined patch.
Reopening things, but I won't be around to watch for anything else breaking tonight.
Comment 69•9 years ago
|
||
Back to the pool. It doesn't look like we are going to upgrade easily. :(
To retry this attempt we would need to apply the following 2 patches (reversed):
http://hg.mozilla.org/build/puppet/rev/7ef629f87819
https://github.com/mozilla/build-cloud-tools/commit/a9ccd55ad8341d847bc3bbdfc7b4ff50a7dd936e
Assignee: rail → nobody
Comment 70•9 years ago
|
||
:jrmuizel, this is turning into a much larger effort- this will take a few weeks of time to sort out the failures and make this happen which isn't in our budget. If this is critical we could find a way for you or someone on your team to figure out the android/reftest/asan failures on a loaner and then we can push on upgrading our mesa libraries again. what are your thoughts?
Flags: needinfo?(jmaher) → needinfo?(jmuizelaar)
Comment 71•9 years ago
|
||
also on this note, we should remove the version of this in taskcluster until we are ready to upgrade on the buildbot work. This will allow us to not turn off tests and see green on the webgl web-platform-tests.
Reporter | ||
Comment 72•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #70)
> :jrmuizel, this is turning into a much larger effort- this will take a few
> weeks of time to sort out the failures and make this happen which isn't in
> our budget. If this is critical we could find a way for you or someone on
> your team to figure out the android/reftest/asan failures on a loaner and
> then we can push on upgrading our mesa libraries again. what are your
> thoughts?
Yeah, I can figure out the failures. How do I a machine in a similar state so that I can reproduce them?
Flags: needinfo?(jmuizelaar)
Comment hidden (Intermittent Failures Robot) |
Comment 74•9 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #69)
This patch and subsequent attempted backout also broke all of the talos-linux64-ix hardware instances. At this point, none of that pool is functional because puppet can't successfully complete. I don't see an easy way to back out (downgrading puppet2.7-minimal basically tries to wipe the entire system and start over). I think the best course of action to get back to a known good state is to reinstall them all, so I've started to do that now.
Comment 75•9 years ago
|
||
rail, is it possible to get Jeff a ec2 instance with the new mesa driver? I could help him run the tests on there.
Flags: needinfo?(rail)
Comment 76•9 years ago
|
||
Sure thing. Let me fix bug 1236550 first though
Comment 77•9 years ago
|
||
All talos-linux64-ix machines that were not loaned out have been reimaged.
Comment 78•9 years ago
|
||
(In reply to Amy Rich [:arr] [:arich] from comment #77)
> All talos-linux64-ix machines that were not loaned out have been reimaged.
Thank you for cleaning up after me. :(
Comment 79•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #60)
> pushed to try re-enabling the tests:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=8af8fb99e5b0
>
> we will need to get this landed on trunk/aurora/beta/release? probably. In
> addition there are web-platform-tests which need to get adjusted as well,
> tracked in bug 1236047
With the exception of the ASAN leaks, these are all basically spurious. (We should just update our expected fail/pass list)
Comment 80•9 years ago
|
||
these failures are why we backed out the tests: android failures (bug 1236116), b2g reftests (bug 1236115), asan leaksanitizer (bug 1236113)
for the webgl-conformance and web-platform-tests we can easily fix that in manifests- I had successfully done that in try- but it is a chicken and egg- for these tests we either need to:
* disable them, upgrade, reenable with new expectations
* upgrade 32/64/asan, live with failures, then land on ALL branches manifest expectations
Updated•9 years ago
|
Flags: needinfo?(rail)
Reporter | ||
Comment 81•9 years ago
|
||
For my own reference here are the failing tests:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=6ec9b53392a6&selectedJob=19165556
Reporter | ||
Comment 82•9 years ago
|
||
This should fix the reftest failures. I wasn't able to reproduce the address sanitizer issue and if it persists we can probably just suppress it.
Attachment #8681965 -
Attachment is obsolete: true
Comment 84•9 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #83)
> Can we try again with the new patch?
It's running into similar symptom on Bug 1162375 (emu-kk-opt reftest[1] on taskcluster docker infrastructure) and perhaps the patch to fix llvmpipe could help to solve the reftest failures as well.
rail, if the patch works, can you also upgrade the mesa library for docker tester image used by B2G emu kk tests[2]?
[1] http://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/c9GhjhbUQompYMp_JCKAxw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[2] taskcluster/tester:0.4.4 - https://tools.taskcluster.net/task-inspector/#E2gXibPRSQeM2StYMHIVIg/
Comment 85•9 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #83)
> Can we try again with the new patch?
I can try to build the package today and publish it to the same repo. Would it help testing it on the existing loaner? You'll just need to upgrade the mesa packages.
(In reply to Astley Chen [:astley] UTC+8 from comment #84)
> rail, if the patch works, can you also upgrade the mesa library for docker
> tester image used by B2G emu kk tests[2]?
Before we upgrade we should test it. AFAIK, you can do this by modifying https://dxr.mozilla.org/mozilla-central/source/testing/docker/base-test/Dockerfile#54 and pushing to try. It'd be better to ask people who worked on that file though.
Comment 86•9 years ago
|
||
the dockerfile will benefit the b2g stuff, but not the android or other desktop related failures.
Comment 87•9 years ago
|
||
Note for myself. Before the next attempt, we need to figure out what went wrong with talos machines.
Jeff, I uploaded the new packages. You can upgrade them on your loaner with something like:
apt-get update
apt-get install `dpkg -l |grep ^ii | grep 9.2.1-1ubuntu3~precise1mozilla1 | awk '{print $2}'`
Flags: needinfo?(rail)
Comment 88•9 years ago
|
||
Joel, mind to try this again? I still need to fix some stuff in the puppet patch. If I fix it before this weekend, do you think that we should try to land it again over the weekend?
Flags: needinfo?(jmaher)
Comment 89•9 years ago
|
||
I am going to be out of town most of the weekend- but open to trying this out when we can. If we can reduce the effects on the hardware boxes, that would be nice to see.
Flags: needinfo?(jmaher)
Comment 90•9 years ago
|
||
Let's make sure we get the debug package Jeff wants in bug 1220253 included here (libgl1-mesa-dri-dbg).
Comment 92•9 years ago
|
||
To make things work I had to cherry pick missing required packages (libdrm-* and libllvm3.3*) and put them into the same apt repo.
I explicitly listed all packages to be installed, including one dbg package per coop's request.
I ran this change multiple times against existing instances and fresh one (puppetized from scratch with this change).
It should just work really soon now. :)
Attachment #8718111 -
Flags: review?(bugspam.Callek)
Updated•9 years ago
|
Assignee: nobody → rail
Comment 93•9 years ago
|
||
Comment on attachment 8718111 [details] [diff] [review]
mesa.diff
r+ for the puppet changes, trusting :rail on the packages themselves.
Attachment #8718111 -
Flags: review?(bugspam.Callek) → review+
Comment 94•9 years ago
|
||
Comment on attachment 8718111 [details] [diff] [review]
mesa.diff
remote: https://hg.mozilla.org/build/puppet/rev/c7464b1bbe9e
remote: https://hg.mozilla.org/build/puppet/rev/dbc19c1c3525
Attachment #8718111 -
Flags: checked-in+
Comment 95•9 years ago
|
||
We'll see the change tomorrow. I hope it works this time for all platforms.
Comment 96•9 years ago
|
||
remote: https://hg.mozilla.org/build/puppet/rev/5fb3e7d8ae6b
remote: https://hg.mozilla.org/build/puppet/rev/ca664fd30e10
... to make make puppet-lint happier.
Comment 97•9 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #96)
> remote: https://hg.mozilla.org/build/puppet/rev/5fb3e7d8ae6b
> remote: https://hg.mozilla.org/build/puppet/rev/ca664fd30e10
>
> ... to make make puppet-lint happier.
we get the same problems as bug 1236113 again :(
Comment 98•9 years ago
|
||
Flags: needinfo?(rail)
Comment 99•9 years ago
|
||
I'm going to revert this, we have a lot of test failures:
08:05 <Tomcat|sheriffduty> we run into bug 1236113
08:05 <Tomcat|sheriffduty> rail: https://treeherder.mozilla.org/logviewer.html#?job_id=21520008&repo=mozilla-inbound
08:06 <Tomcat|sheriffduty> same problem again
08:07 <Tomcat|sheriffduty> rail: can we revert this ?
08:12 <Tomcat|sheriffduty> rail: filed bug 1247575
08:12 <Tomcat|sheriffduty> there is also a unexpected pass
08:12 <Tomcat|sheriffduty> https://treeherder.mozilla.org/logviewer.html#?job_id=3274298&repo=mozilla-central
08:12 <Tomcat|sheriffduty> i guess this also related
08:13 <Tomcat|sheriffduty> it seems that asan builds react like : TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at /usr/lib/x86_64-linux-gnu/libdricore9.2.1.so.1
08:13 <Tomcat|sheriffduty> and linux opt like https://treeherder.mozilla.org/logviewer.html#?job_id=21525557&repo=mozilla-inbound
08:13 <Tomcat|sheriffduty> 671 INFO TEST-UNEXPECTED-PASS | dom/canvas/test/webgl-mochitest/ensure-exts/test_EXT_disjoint_timer_query.html | fail-if condition in manifest - We expected at least one failure
Flags: needinfo?(rail)
Comment 100•9 years ago
|
||
Backout:
https://hg.mozilla.org/build/puppet/rev/2ba82e74a032
https://hg.mozilla.org/build/puppet/rev/2075e856e7b8
Still need to regenerate AMIs.
Updated•9 years ago
|
Attachment #8718111 -
Flags: checked-in+ → checked-in-
Comment 102•9 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #99)
> I'm going to revert this, we have a lot of test failures:
>
> 08:05 <Tomcat|sheriffduty> we run into bug 1236113
> 08:05 <Tomcat|sheriffduty> rail:
> https://treeherder.mozilla.org/logviewer.html#?job_id=21520008&repo=mozilla-
> inbound
> 08:06 <Tomcat|sheriffduty> same problem again
> 08:07 <Tomcat|sheriffduty> rail: can we revert this ?
> 08:12 <Tomcat|sheriffduty> rail: filed bug 1247575
> 08:12 <Tomcat|sheriffduty> there is also a unexpected pass
> 08:12 <Tomcat|sheriffduty>
> https://treeherder.mozilla.org/logviewer.html#?job_id=3274298&repo=mozilla-
> central
> 08:12 <Tomcat|sheriffduty> i guess this also related
> 08:13 <Tomcat|sheriffduty> it seems that asan builds react like :
> TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at
> /usr/lib/x86_64-linux-gnu/libdricore9.2.1.so.1
> 08:13 <Tomcat|sheriffduty> and linux opt like
> https://treeherder.mozilla.org/logviewer.html#?job_id=21525557&repo=mozilla-
> inbound
> 08:13 <Tomcat|sheriffduty> 671 INFO TEST-UNEXPECTED-PASS |
> dom/canvas/test/webgl-mochitest/ensure-exts/test_EXT_disjoint_timer_query.
> html | fail-if condition in manifest - We expected at least one failure
I can give you a patch to change these to expected-pass instead of their present expected-fail.
The unexpected passes here are good news.
It's just the leaks we need to worry about.
Comment 103•9 years ago
|
||
wait...we have test-unexpected-pass failures expected, we need to land the manifest updates. In addition, web-platform-tests will have a failure as well.....and...we need to do this on ALL branches.
the asan leak is something new though.
Comment 104•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #103)
> wait...we have test-unexpected-pass failures expected, we need to land the
> manifest updates. In addition, web-platform-tests will have a failure as
> well.....and...we need to do this on ALL branches.
Why would web-platform-tests have a failure?
Reporter | ||
Comment 105•9 years ago
|
||
Mesa looks to be leaking its debug logs. We'll work around this in bug 1247762.
Depends on: 1247762
Reporter | ||
Comment 106•9 years ago
|
||
Bug 1247762 didn't work. I've added a suppression for dricore9.2.1.so in bug 1248290. In my tests that resolves the address sanitizer issues. I assume that with that fix, this should be ready to go again.
Comment 107•9 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #106)
> Bug 1247762 didn't work. I've added a suppression for dricore9.2.1.so in bug
> 1248290. In my tests that resolves the address sanitizer issues. I assume
> that with that fix, this should be ready to go again.
Joel, mind if we try again on Tue?
Flags: needinfo?(jmaher)
Comment 109•9 years ago
|
||
Comment on attachment 8718111 [details] [diff] [review]
mesa.diff
Alright, take N+1. It'll show up tomorrow early ET morning
remote: https://hg.mozilla.org/build/puppet/rev/7d907471288a
remote: https://hg.mozilla.org/build/puppet/rev/95c88502fa3d
Attachment #8718111 -
Flags: checked-in- → checked-in+
Comment 110•9 years ago
|
||
testing some patches on try to fix up the tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=30c827329e02
Comment 111•9 years ago
|
||
jgilbert, I am not sure how to get expected-pass for the tests. Here is a try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=99a72143fb4d
here is the patch that changes the webgl manifests:
https://hg.mozilla.org/try/rev/0328ff396b1a
I cannot figure out why these are expected-fail.
Flags: needinfo?(jgilbert)
Comment 112•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/35525/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/35525/
Attachment #8720966 -
Flags: review?(rail)
Comment 113•9 years ago
|
||
Comment on attachment 8720966 [details]
MozReview Request: Bug 1220658: remove old mesa-debian directory; r?rail
https://reviewboard.mozilla.org/r/35525/#review32191
Attachment #8720966 -
Flags: review?(rail) → review+
Comment 114•9 years ago
|
||
Comment on attachment 8720966 [details]
MozReview Request: Bug 1220658: remove old mesa-debian directory; r?rail
remote: https://hg.mozilla.org/build/puppet/rev/26d1d13ccb1b
remote: https://hg.mozilla.org/build/puppet/rev/36ed03cbf75f
Attachment #8720966 -
Flags: checked-in+
Depends on: 1250311
Comment 115•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #111)
> jgilbert, I am not sure how to get expected-pass for the tests. Here is a
> try push:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=99a72143fb4d
>
> here is the patch that changes the webgl manifests:
> https://hg.mozilla.org/try/rev/0328ff396b1a
>
> I cannot figure out why these are expected-fail.
We fixed these by marking the failures (and passes) in the resulting perma-orange bugs.
Flags: needinfo?(jgilbert)
Comment 116•9 years ago
|
||
I think I was supposed to mark this done?
If not, just reopen.
Assignee | ||
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•