Closed Bug 680962 Opened 13 years ago Closed 13 years ago

Add-on perf test cron job is no longer finishing properly

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: clouserw, Assigned: anodelman)

References

Details

Attachments

(3 files)

The last time add-on perf tests ran was July 20th. Alice says you own a cron job that is supposed to be running every Saturday and testing these add-ons.
Assignee: nobody → bhearsum
Looks like this is because I set the crontab up on production-master02, a recently retired machine. I'll move it to a newer master, and get it in Puppet to help avoid forgetting about it in the future.
Could the job be launched right now anyway to update the stale data and public page?
We can't run this job during the week because it eats too much of our test pool. I'll definitely have it running for this coming weekend, though.
So, it looks like the cronjob _has_ been running, and hanging when trying to connect to ftp://ftp.mozilla.org for the Android info. After changing it to http it seems to run fine again. Regardless, I'm going to puppetize the setup.
Attachment #555093 - Flags: review?(bear)
...because they moved to puppet-manifests
Attachment #555094 - Flags: review?(bear)
Comment on attachment 555093 [details] [diff] [review] puppetize addon sendchange stuff puppet bits of the patch reviewed
Attachment #555093 - Flags: review?(bear) → review+
Comment on attachment 555094 [details] [diff] [review] remove sendchange files from tools repository *stamp*
Attachment #555094 - Flags: review?(bear) → review+
Attachment #555094 - Flags: checked-in+
Attachment #555093 - Flags: checked-in+
Updated the Puppet masters with the change. These jobs should fire as usual on the weekend.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Did it actually run? https://addons.mozilla.org/en-US/firefox/performance/ still looks unchanged.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Turns out the jobs were not run as expected, which is 6am Pacific on Saturday. The reason is probably that there is no user after time description in http://hg.mozilla.org/build/puppet-manifests/file/958f195d38a9/modules/addon-sendchanges/templates/crontab There's no error message in syslog or the cron log to confirm this, but it seems likely for a cron run out of /etc/cron.d/. When I ran it as cltbld@buildbot-master10 it successfully set up jobs (lots of 'change sent successfully'). bhearsum, as well as adding the user, perhaps we should set the MAILTO so we know when things fail ? Would need to suppress all that normal output first though. Some of the jobs failed in the perf test due to a python error, bug 682683 filed on that. There were also some errors with the downloading extensions, eg * mozilla.ussg.indiana.edu was returning '503 Service Temporarily Unavailable' - wor * Some addons get a 404 from addons.m.o, eg lightning starts off at https://addons.mozilla.org/en-US/firefox/downloads/latest/2313 and redirects until it hits the 404 at https://addons.mozilla.org/firefox/downloads/latest/lightning/. Also for * MemoryFox - https://addons.mozilla.org/en-US/firefox/downloads/latest/53880 * Torbutton - https://addons.mozilla.org/en-US/firefox/downloads/latest/2275 (although this one was removed by the author) There are logs appearing in http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/addontester-linux/1313106963/ http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/addontester-win32/1313106963/ No results for mac, investigating ...
Bug 682686 for the mac issue, which is two buildbot issues.
Could https://addons.mozilla.org/en-US/firefox/performance/ please be taken down until this issue is settled? The outdated data shown there is quite bad publicity and is very different from current reality, at least in NoScript's case, see https://addons.mozilla.org/en-US/firefox/addon/noscript/reviews/306644/
(In reply to Giorgio Maone from comment #13) > Could https://addons.mozilla.org/en-US/firefox/performance/ please be taken > down until this issue is settled? I asked for this to happen Monday :\ Will file a separate bug for it.
Depends on: 682737
(In reply to Nick Thomas [:nthomas] from comment #11) > Turns out the jobs were not run as expected, which is 6am Pacific on > Saturday. The reason is probably that there is no user after time > description in > > http://hg.mozilla.org/build/puppet-manifests/file/958f195d38a9/modules/addon- > sendchanges/templates/crontab > > There's no error message in syslog or the cron log to confirm this, but it > seems likely for a cron run out of /etc/cron.d/. Yeah, I bet your right - I forgot to take this into consideration when moving it to cron.d. > When I ran it as cltbld@buildbot-master10 it successfully set up jobs (lots > of 'change sent successfully'). bhearsum, as well as adding the user, > perhaps we should set the MAILTO so we know when things fail ? Would need to > suppress all that normal output first though. I'll make this happen.
I gave this a quick run through from cron after changing the buildbot commands to "echo buildbot" (and disabling the stdout redirection) -- i ended up with a list of buildbot commands, as expected.
Attachment #556558 - Flags: review?(nrthomas)
Attachment #556558 - Flags: review?(nrthomas) → review+
Attachment #556558 - Flags: checked-in+
Sendchanges should be happening again, starting this weekend. Once bug 682686 is fixed, Mac results will be back, and bug 682683 should fix up the addons with unicode issues.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
What happened this time? And, more important, does (non) resolution verification really need to be dragged from week to week?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
What's going on here?
These jobs ran fine over the weekend as far as I can tell, here's a log excerpt: Completed test ts: Stopped Sat, 03 Sep 2011 06:15:41 RETURN: cycle time: 00:05:47<br> talos-r3-w7-062: Stopped Sat, 03 Sep 2011 06:15:41 Sending results: Started Sat, 03 Sep 2011 06:15:41 Generating results file: ts: Started Sat, 03 Sep 2011 06:15:41 Generating results file: ts: Stopped Sat, 03 Sep 2011 06:15:41 Transmitting test: ts: Started Sat, 03 Sep 2011 06:15:41 Transmitting test: ts: Stopped Sat, 03 Sep 2011 06:15:42 RETURN:addon results inserted successfully Completed sending results: Stopped Sat, 03 Sep 2011 06:15:42 I'm not sure where to look for the results, though, they don't show up on graphs.mozilla.org.
[09:56] <anode> bhearsum: from what i heard from clouserw yesterday it seems like the results might be ending up on amo stage, instead of amo production [09:56] <anode> i'm not sure why that would be happening [09:56] <bhearsum> what can i do to help debug that? [09:58] <anode> well, we should double check that we are sending to production graphs - that would be in the buildbot-configs somewhere [09:58] --> aakashd has joined this channel (aakashd@moz-BBE3ABD.mv.mozilla.com). [09:58] <anode> then we need to track down someone in IT/webdev to look at the graph server to see what amo it is pointing at [10:02] <anode> i'll file an IT bug to check which amo credentials it is using We're sending results to the same place as all other tests -- graphs.mozilla.org.
bug 685143 is verifying that the addons website is looking at the correct data.
(In reply to Ben Hearsum [:bhearsum] from comment #22) > bug 685143 is verifying that the addons website is looking at the correct > data. "You are not authorized to access bug #685143" (Side note: I'm a member of the core-security group and as far as I can remember I've got clearance also on web site related sensitive bugs).
According to https://bugzilla.mozilla.org/show_bug.cgi?id=685143, AMO is looking in the wrong place for data. Given that, and the fact that everything with regard to the triggering and running of test jobs looks fine to me, I'm closing this as FIXED. Once bug 685143 is fixed, I think AMO will start showing the data.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
bug 685143 was fixed a week ago and the most recent perf results in AMO's db are still from July. It's now been 2 months since we've had a successful performance run.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
These jobs are still completing fine on the RelEng side as far as I can tell. I'm happy to help debug this further, but I don't know what to try, or what to look for. Alice, can you drive this?
Assignee: bhearsum → anodelman
The cron job is completed and talos is sending results to the production graph server. You can see the jobs running here: http://tinderbox.mozilla.org/showbuilds.cgi?tree=AddonTester&maxdate=1316270166&legend=0&norules=1 If data in amo is still old then I believe that the bug is now on the amo side of the world and should be tracked in a separate bug from this - as it doesn't have to do with releng/talos.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
I see a bunch of results from yesterday. Before that there are 4 rows from the 2nd and then a bunch of June 18th. Is this still scheduled to run on Saturdays? If so, why are there results from yesterday and none from the 17th?
I believe that you are seeing the results from triggering the addon tests through the new addon perf testing pool, which kumar was playing with yesterday.
Kumar is using our -dev site. Do the results from that go to our production db? If yesterday's results are from Kumar's testing, where are the cron job results?
These results are from the 17th: http://tinderbox.mozilla.org/showbuilds.cgi?tree=AddonTester&maxdate=1316316496&hours=24&legend=0&norules=1 They show the tests being run and being reported to graphs.mozilla.org. They complete with 'addon results inserted successfully'. So, the tests are run and the data is generated and sent to the graph server. I'm not sure if at that point they are inserted into an incorrect db, I believe that the issue with the graph server pointing at the wrong db was fixed?
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: