Closed Bug 1037077 Opened 10 years ago Closed 10 years ago

Report run status to treeherder

Categories

(Testing Graveyard :: Autophone, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mcote, Assigned: bc)

References

Details

Attachments

(12 files, 4 obsolete files)

(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
(deleted), patch
mcote
: review+
Details | Diff | Splinter Review
As a first step before we get to automatically detecting and notifying on regressions (bug 971807), we should do as talos does and at least report, to Treeherder, whether a run on a given build completed successfully or not. A green result would just indicate that we were able to complete the tests and would not involve an analysis of the results.
Assignee: nobody → bclary
Status: NEW → ASSIGNED
Depends on: 1062378
Depends on: 1062385
Depends on: 1063886
Attached patch bug-1037077-treeherder.patch (obsolete) (deleted) — Splinter Review
Attachment #8487249 - Flags: feedback?(jeads)
Attached patch bug-1037077-treeherder-service.patch (obsolete) (deleted) — Splinter Review
Attachment #8487250 - Flags: feedback?(jeads)
Attached image treeherder.png (obsolete) (deleted) —
jeads, I am able to submit results to my local instance of treeherder, but the results are not grouped as I had hoped. For example, for the smoketests I am setting the following in a config file: [treeherder] job_name = 'Autophone smoketest' job_symbol = 'Sm' group_name = 'Autophone' group_symbol = 'A' which is intended to match attachment 8487250 [details] [diff] [review] You can see how I am submitting the results in attachment 8487249 [details] [diff] [review] in def post_treeherder_request(self, treeherder_job_collection): def submit_treeherder_pending(self, tests=[]): def submit_treeherder_running(self, tests=[]): def submit_treeherder_complete(self, tests=[], test_result=None): Instead of displaying in the ui as A(Sm) I am getting 'Sm' and in the summary box I am getting 'Autophone smoketest' for the job name instead of Autophone smoketest. You can see a screenshot at attachment 8487251 [details] Could you point out my error for me?
Flags: needinfo?(jeads)
As discussed in IRC an easier way to do this would be to use https://github.com/mozilla/treeherder-client to submit json for Autophone tests directly to treeherder. You can set this up locally or we can provide you with the OAuth credentials to submit data to our dev and stage environment directly (http://treeherder-dev.allizom.org and https://treeherder-stage.allizom.org) and then there's no need to have a local treeherder instance just need the client. Once you're comfortable with the data displayed we will provide you with the OAuth credentials to submit data to the production instance of treeherder. There's no need to directly modify anything in treeherder-service. The cause of the issue described above is the incorporation of single ticks around some of the data submitted to treeherder. You can see that in the example json structure that bc retrieved from his local treeherder-service instance: [{"project": "mozilla-central", "job": {"end_timestamp": 1410365925, "submit_timestamp": 1410364742, "start_timestamp": 1410365561, "build_url": "http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1410353652/fennec-35.0a1.en-US.android-arm.apk", "name": "'Autophone webappstartup'", "option_collection": {"opt": "opt"}, "artifacts": [], "machine_platform": {"platform": "android-2-3", "os_name": "android", "architecture": "arm"}, "who": "", "group_symbol": "'A'", "reason": "", "group_name": "'Autophone'", "machine": "nexus-one-2", "state": "completed", "log_references": [], "result": "success", "build_platform": {"platform": "android-2-3", "os_name": "android", "architecture": "arm"}, "job_symbol": "'app'", "job_guid": "de852e9873512a6724462cfb849616a78ce55f7d", "product_name": "fennec", "desc": ""}, "revision_hash": "a3380433c02d95999ff22d638d9ea87c625cbde6", "coalesced": []}] The embedded single ticks around some of the fields is causing the new mappings added to buildbot.py to fail. This structure can be generated directly with treeherder-client and submitted to the treeherder web service directly, taking that strategy will simplify things greatly.
Flags: needinfo?(jeads)
jeads, cool. I am already using the treeherder client and just need to stop trying to patch the treeherder-service. If you could send me the credentials for the dev version that would be great since the local one eats my bandwidth something horrible.
Ok, I have 3 nexus ones, 2 gs3s running locally and reporting to https://treeherder.allizom.org/ui/#/jobs?repo=mozilla-inbound I'll leave these running to collect some data and see if any edge cases appear. I still need to tweak what is being sent to include a Failure summary if applicable as well as some artifacts that point to phonedash.mozilla.org, test to make sure failures are reported properly and investigate why the jobs aren't displayed in their pending or running states. We're getting there, but we haven't arrived quite yet.
Attachment #8487249 - Attachment is obsolete: true
Attachment #8487249 - Flags: feedback?(jeads)
Attachment #8487250 - Attachment is obsolete: true
Attachment #8487250 - Flags: feedback?(jeads)
Attachment #8487251 - Attachment is obsolete: true
Attached patch bug-1037077.patch (deleted) — Splinter Review
Example output can be seen at: https://treeherder.allizom.org/ui/#/jobs?repo=mozilla-inbound&revision=de7c7bbb5679 The Job Info plugin panel doesn't show my job details. I'll attach a list of the TreeherderJobCollection objects that were submitted. They are accepted ok by the service. jeads: should I just create the collection as a raw object instead of trying to use the TreeherderClient?
Attachment #8490824 - Flags: review?(mcote)
Attachment #8490824 - Flags: feedback?(jeads)
Attached file nexus-one.js (deleted) —
example Job Info that doesn't display
Attachment #8490836 - Flags: feedback?(jeads)
Comment on attachment 8490824 [details] [diff] [review] bug-1037077.patch Review of attachment 8490824 [details] [diff] [review]: ----------------------------------------------------------------- ::: autophone.ini.example @@ +16,5 @@ > #build_cache_port = 28008 > #verbose = True > > +#treeherder_protocol = http > +#treeherder_server = local.treeherder.mozilla.org For simplicity, maybe combine them, e.g. treeherder_url = http://local.treeherder.mozilla.org? Same with command-line option. ::: configs/webappstartup_settings.ini.example @@ +22,5 @@ > +stderrp_attempts = 2 > +[signature] > +id = foo > +key = bar > +[treeherder] Should put blank lines above sections just to make them clear. ::: mailer.py @@ +59,4 @@ > > + > + def send(self, subject, body): > + if not self.from_address: Might as well check for self.mail_dest too.
Attachment #8490824 - Flags: review?(mcote) → review+
jeads: one other thing. While I do want to fix the display of my Job Info artifact, I would also like to get the error and message to appear in the Failure summary box. Is there a way to do that with the client?
as discussed on irc, you could compose an artifact containing the failure summary but it wouldn't benefit from having bug suggestions Here is an example of a simple Bug suggestions artifact without suggestions [ { "blob": [ { "search": "Return code: 1", "bugs": { "open_recent": [], "all_others": [] } }, { "search": "failed to purge builds", "bugs": { "open_recent": [], "all_others": [] } }, { "search": "Running post_fatal callback...", "bugs": { "open_recent": [], "all_others": [] } }, { "search": "Exiting 2", "bugs": { "open_recent": [], "all_others": [] } } ], "type": "json", "name": "Bug suggestions" } ]
Attached patch bug-1037077-v2.patch (deleted) — Splinter Review
* addresses previous review comments. * adds retries for submitting data to treeherder * made the attempt ranges consistent and 1..limit * fixed bug in previous patch in perftest when reporting error sending results to server. * added uninstall of webappstartup app in teardown after test and uninstall of fennec after the tests were run. hopefully will improve recharge times and leave the device in consistent state. * added bug suggestions to get error message into Failure summary I've been running this on https://treeherder.allizom.org since yesterday. Issues: * Job Info still not displaying. I believe this is a treeherder bug. * When Job is submitted with result 'skipped', it appears that the entire result set displays as empty. I think this is treeherder bug. See https://treeherder.allizom.org/ui/#/jobs?repo=b2g-inbound&revision=7cc47db168dc
Attachment #8493067 - Flags: review?(mcote)
Attachment #8493067 - Flags: feedback?(mdoglio)
Depends on: 1070979
Attachment #8493067 - Flags: feedback?(jeads)
The first issue (Job Info not displaying) is under investigation, I opened bug 1070979 for it. The second issue (jobs with result 'skipped') is actually an internal server error on the resultset endpoint happening when treeherder is not familiar with a certain job result. I don't think we have jobs with result 'skipped' anywhere else, the closest result type we have for jobs is 'coalesced' but I don't think this is the same case.
Mauro, I was using the values from the RESULT_DICT at https://github.com/mozilla/treeherder-service/blob/master/treeherder/etl/buildbot.py as possible values to send as the value of the test result. Are those not the valid values to use in a job collection add_result?
Mauro, is it a bug that the lists are identical except for one having skipped and the other having unknown? skipped is more meaningful in my case since it is used when the device is in an error state and can not run the test at all.
I'll switch to usercancel for now to prevent autophone from horking treeherder. we can revisit if skipped seems a reasonable addition to test result states.
Comment on attachment 8493067 [details] [diff] [review] bug-1037077-v2.patch Review of attachment 8493067 [details] [diff] [review]: ----------------------------------------------------------------- lgtm.
Attachment #8493067 - Flags: review?(mcote) → review+
Attached patch bug-1037077-v3.patch (deleted) — Splinter Review
This is what I finally checked in. carrying forward r+. https://github.com/mozilla/autophone/commit/5d020e1002268d1e02d86e9ebc9498e2414145d6
Attachment #8493575 - Flags: review+
Attachment #8490824 - Flags: feedback?(jeads)
Attachment #8490836 - Flags: feedback?(jeads)
Attachment #8493067 - Flags: feedback?(mdoglio)
Attachment #8493067 - Flags: feedback?(jeads)
Also realized I should have made the phonedash graph point to the appropriate test. Will fix in the AM. Note this is running in production Autophone reporting temporarily to https://treeherder.allizom.org
Depends on: 1071572
Attached patch bug-1037077-1-bail.patch (deleted) — Splinter Review
This does not keep trying to reduce the stderrp if there are no measurements. Should speed up busted build testing.
Attachment #8493794 - Flags: review?(mcote)
Attached patch bug-1037077-2-relax.patch (deleted) — Splinter Review
bug 1060171 changed the loglevel of console.log which broke webappstartup detection.
Attachment #8493795 - Flags: review?(mcote)
Attached patch bug-1037077-3-mailer.patch (deleted) — Splinter Review
The case where there was no email.ini specified was borken. This is more explicit.
Attachment #8493796 - Flags: review?(mcote)
Attachment #8493794 - Flags: review?(mcote) → review+
Comment on attachment 8493795 [details] [diff] [review] bug-1037077-2-relax.patch Review of attachment 8493795 [details] [diff] [review]: ----------------------------------------------------------------- Is this relaxed enough? I wonder if maybe we should just do .* at the beginning...
Attachment #8493795 - Flags: review?(mcote) → review+
Comment on attachment 8493796 [details] [diff] [review] bug-1037077-3-mailer.patch Review of attachment 8493796 [details] [diff] [review]: ----------------------------------------------------------------- Hm would it not make more sense to initialize all values to False (or other defaults) if we can't read from the config file, and keep the old check, rather than having a bunch of different places where we set send_mail? I'm not sure myself, just putting that out there.
Attachment #8493796 - Flags: review?(mcote) → review+
(In reply to Mark Côté [:mcote] from comment #26) > Comment on attachment 8493795 [details] [diff] [review] > bug-1037077-2-relax.patch > > Review of attachment 8493795 [details] [diff] [review]: > ----------------------------------------------------------------- > > Is this relaxed enough? I wonder if maybe we should just do .* at the > beginning... should be fine for changes in loglevel. the format is fixed otherwise. (In reply to Mark Côté [:mcote] from comment #27) > Comment on attachment 8493796 [details] [diff] [review] > bug-1037077-3-mailer.patch > > Review of attachment 8493796 [details] [diff] [review]: > ----------------------------------------------------------------- > > Hm would it not make more sense to initialize all values to False (or other > defaults) if we can't read from the config file, and keep the old check, > rather than having a bunch of different places where we set send_mail? I'm > not sure myself, just putting that out there. It might be better. I'll go ahead and push this and consider reworking it in either the setup/docs bug or the developer friendly bug.
No longer depends on: 1070979
https://github.com/mozilla/autophone/commit/11bc95c1b906f9625024c293a947a2a15b98387e https://github.com/mozilla/autophone/commit/219501013d6d7bd795e63e5c41497e18840d0344 https://github.com/mozilla/autophone/commit/219501013d6d7bd795e63e5c41497e18840d0344 I still want to adjust the phonedash link and I'll take a look at the mailer issue as well. Once they are done, I think we will be ready to post to treeherder.mozilla.org.
Depends on: 1072319
Attached patch bug-1037077-4-phonedash-url.patch (obsolete) (deleted) — Splinter Review
make the phonedash job info link point to the appropriate graph for the day.
Attachment #8494544 - Flags: review?(mcote)
Attached patch bug-1037077-5-mailer.patch (deleted) — Splinter Review
belatedly follow mcote's advice
Attachment #8494546 - Flags: review?(mcote)
Attached patch bug-1037077-6-kill-server.patch (deleted) — Splinter Review
adb kill-server is horking my mac mini. comment it out for now.
Attachment #8494547 - Flags: review?(mcote)
Attached patch bug-1037077-7-job-guid.patch (deleted) — Splinter Review
I'm having problems with grouping in the treeherder ui and was concerned there was a chance of getting duplicate job_guids. This patch switches to using uuid.uuid4(). It hasn't fixed this issue in bug 1072319 but I'd like to keep it.
Attachment #8494549 - Flags: review?(mcote)
Comment on attachment 8494544 [details] [diff] [review] bug-1037077-4-phonedash-url.patch Review of attachment 8494544 [details] [diff] [review]: ----------------------------------------------------------------- Looks like this could easily be refactored and put into some function in PerfTest, especially since PerfTest already has a bunch of phonedash-specific code in it.
Attachment #8494544 - Flags: review?(mcote) → review-
Attachment #8494546 - Flags: review?(mcote) → review+
Comment on attachment 8494547 [details] [diff] [review] bug-1037077-6-kill-server.patch Review of attachment 8494547 [details] [diff] [review]: ----------------------------------------------------------------- ::: autophone.py @@ +548,4 @@ > console_logger.info('Starting server on port %d.' % options.port) > console_logger.info('Starting build-cache server on port %d.' % > options.build_cache_port) > + # Calling kill_server on the mac mini Autophone host, horks Grammar nit: no comma.
Attachment #8494547 - Flags: review?(mcote) → review+
Comment on attachment 8494549 [details] [diff] [review] bug-1037077-7-job-guid.patch Review of attachment 8494549 [details] [diff] [review]: ----------------------------------------------------------------- Since the job guid is now totally random (i.e. you can't recreate it), maybe we should log it as well?
Attachment #8494549 - Flags: review?(mcote) → review+
sorry, was tired and lazy.
Attachment #8494544 - Attachment is obsolete: true
Attachment #8494713 - Flags: review?(mcote)
Comment on attachment 8494713 [details] [diff] [review] bug-1037077-4-phonedash-url.patch Review of attachment 8494713 [details] [diff] [review]: ----------------------------------------------------------------- ::: tests/webappstartup.py @@ +17,5 @@ > + @property > + def phonedash_url(self): > + # For webappstartup, Phonedash records the test name as > + # webappstartup. > + return self._phonedash_url('webappstartup') The old patch had the name set to self.name[:-4].lower(). Why the change here?
It seemed unnecessary as well as more readable especially considering the need to hard-code the local-blank name for S1S2. If we add additional tests we won't have to follow the exact mapping from test class to phonedash's test name.
Comment on attachment 8494713 [details] [diff] [review] bug-1037077-4-phonedash-url.patch Review of attachment 8494713 [details] [diff] [review]: ----------------------------------------------------------------- Wfm.
Attachment #8494713 - Flags: review?(mcote) → review+
I tried switching over to treeherder.mozilla.org on Sunday but of course one of the devices had an installation problem and philor had to hide it. It came out that hiding tests on Treeherder is pretty broken at the moment so I returned Autophone to treeherder.allizom.org for now.
Found during setup in MV that the requirements didn't work due to the use of the git protocal. Fixed it locally using git+https and pushing fix with self review. https://github.com/mozilla/autophone/commit/1d997f0efe332b7cb6eabc77a2ee9ae45e0a86df
Depends on: 1080782
Depends on: 1080783
Depends on: 1112693
removed dependencies for bugs 1062385, 1072319 that don't really block this bug. I've turned on reporting to https://treeherder.mozilla.org but with Autophone hidden. You can use the hidden toggle box to set exclusion_state=all so you can see Autophone jobs.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
No longer depends on: 1062385, 1072319
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: