Closed Bug 1211659 Opened 9 years ago Closed 9 years ago

pulse_actions: Backfilling of a test job with pushes with coalesced builds should not test against the wrong build

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: KWierso, Assigned: armenzg)

References

Details

Attachments

(2 files)

(deleted), text/x-github-pull-request
chmanchester
: review+
Details
(deleted), text/x-github-pull-request
Details
I was trying to track down when the B2G ICS Emulator R3 failure in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=8966326bc731&group_state=expanded&filter-searchStr=B2G%20ICS%20Emulator%20opt%20Reftest%20Reftest%20R%28R3%29&tochange=2ab2e3d60643 started failing. At the time, the pushes for bug 1205630 and bug 1202663 did NOT have R3 runs, so I did a backfill request for it on the push for bug 1209964. When the results came back from that backfill request, it showed that the failures started on the push for bug 1202663. Which sounds great, but bg-fixed-transformed-image.html wasn't added to the tree until bug 1209964. So are the backfilled jobs using the wrong binary or something? Filing this in Treeherder to start with since that's where the backfill button is sitting.
oh interesting. This might be hard to solve- but there are probably some most of the way fixes we can do.
One interesting thing is that there is no build there, despite the existence of what looks like a B - we run emulator tests on buildbot, on the buildbot-produced emulator builds, and there's nothing on that rev except the taskcluster-produced build. So something somewhere said that http://pvtbuilds.pvt.build.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-inbound-emulator/20151005083938/b2g-44.0a1.en-US.android-arm.cppunittest.tests.zip and http://pvtbuilds.pvt.build.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-inbound-emulator/20151005083938/emulator.tar.gz would be a good way to run a reftest on 443dc9a9c21c, those being the tests and build that http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-inbound-emulator/1444070176/mozilla-inbound_ubuntu64_vm-b2g-emulator_test-reftest-3-bm116-tests1-linux64-build512.txt.gz felt it should grab.
This is odd, I have in the list of valid builders these: u'b2g_mozilla-inbound_emulator-debug_dep', u'b2g_mozilla-inbound_emulator_dep', u'b2g_mozilla-inbound_flame-kk_eng-debug_periodic', u'b2g_mozilla-inbound_flame-kk_eng_dep', u'b2g_mozilla-inbound_flame-kk_periodic', u'b2g_mozilla-inbound_linux64-b2g-haz_dep', u'b2g_mozilla-inbound_macosx64_gecko build', u'b2g_mozilla-inbound_macosx64_gecko-debug build', u'b2g_mozilla-inbound_nexus-4_eng_periodic', u'b2g_mozilla-inbound_nexus-4_periodic', u'b2g_mozilla-inbound_nexus-5-l_eng_periodic', u'b2g_mozilla-inbound_nexus-5-l_periodic', u'b2g_mozilla-inbound_win32_gecko build', u'b2g_mozilla-inbound_win32_gecko-debug build', However, all of the b2g jobs in here are produced in TC: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=443dc9a9c21c&filter-searchStr=b2g I will ask releng.
So it seems that we run that build both on TC and Buildbot. You can see both builds in here: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=8966326bc731&group_state=expanded&filter-searchStr=b2g%20ics%20emulator%20opt%20b2g%20emulator%20image%20build%20%28b%29&tochange=2ab2e3d60643 Anyways, I've found out that it is a visualization issue with TH. b2g_mozilla-inbound_emulator_dep is there: https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-inbound/build/84260082 I will now look again to the original reason of this bug.
Self-serve lists coalesced jobs on the revision where they did not run, so that you can retrigger them and have them actually run there. The difference between "treeherder sucks and didn't show the build which ran on this revision" and "buildbot coalesced away the build on this revision" is looking at the starttime listed by self-serve, and seeing that it is actually the exact starttime of the build which does exist on a revision above yours.
As philor points out, it was a coalesced job what I saw. I think we have a mozci bug with regards to coalescing.
Assignee: nobody → armenzg
Component: Treeherder → General
Product: Tree Management → Testing
Version: --- → Trunk
Attached file ignore coalesced jobs (deleted) —
Attachment #8670466 - Flags: review?(cmanchester)
Attachment #8670466 - Flags: review?(cmanchester) → review+
Landed. I will need to release a new version of mozci and update pulse_actions.
I released the new version of mozci with this fix. Tomorrow I will test pulse_actions against this new version of mozci.
Summary: Backfilling a job to track down a failure is causing a test to be run on pushes prior to it being added to the tree. → pulse_actions: Backfilling of a test job with pushes with coalesced builds should not test against the wrong build
I'm trying to test this on treeherder's stage since I'm a sheriff in there (I don't want to be a sheriff on production). bug 1212967 is preventing me from testing this locally. If it is not fixed by tomorrow I will deploy regardless and keep an eye on the logs.
Depends on: 1212967
Attached file bump mozci version + others (deleted) —
Hi adusca, there's no rush to have a look at this. It is mainly to keep you in the loop. Let me know if you have anything to add or to fix in a follow-up. I'm hoping to land this tomorrow.
Attachment #8671516 - Flags: review?(alicescarpa)
After all sorts of issues I trired to revert to the v77 deployment from 8 days ago (7a5e4b2) However that does not seem to work. I think I'm trying to figure out how to clobber the python dependencies currently installed for pulse_actions. I will report more as I make progress
Comment on attachment 8671516 [details] bump mozci version + others This was merged.
Attachment #8671516 - Flags: review?(alicescarpa)
The issues are now clear. There will be few more alerts in this hour but we're out of the woods. KWierso: thanks for reporting and let us know if you have anymore issues. Even though, we requested to use MozillaPulse 1.2.2 and I can tell that the version is in place [1] The old behaviour is still happening and had to handle it with: https://github.com/adusca/pulse_actions/commit/b918c94d85740323d0201dfd212999e2d7b4185a armenzg@armenzg-thinkpad:~/repos/pulse_actions$ heroku run pip freeze Running pip freeze on pulse-actions... up, run.8274 amqp==1.4.7 anyjson==0.3.3 beautifulsoup4==4.4.1 bugsy==0.6.0 httplib2==0.9.2 ijson==2.2 keyring==5.4 kombu==3.0.26 mozci==0.15.1 MozillaPulse==1.2.2 oauth2==1.9.0.post1 progressbar==2.3 PyHawk-with-a-single-extra-commit==0.1.5 pytz==2015.6 requests==2.7.0 slugid==1.0.6 taskcluster==0.0.27 treeherder-client==1.7.0 yajl==0.3.5
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: