533013 - try server result emails don't link to the log

Reporter

Description

•

15 years ago

Try server emails direct to the waterfall diagram, where people have to manually search the columns for the right result and then retrieve the link to the log from there. This is supremely annoying and error-prone. To add injury to insult, there seems to be a slight delay. The email arrives a minute or two before the column result shows up, so I have to keep reloading the waterfall until the column is there and I can find it and then get the log. If the result email could have a link to the brief and full logs, that would be awesome. From: tryserver@build.mozilla.org Date: December 4, 2009 3:22:51 PM PST To: agal@mozilla.com Subject: Try Server: failure on Linux try hg build Your Try Server build (try-63331f1e1a35) failed to complete on linux. Visit http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTry to view the full logs.

Reed Loden [:reed]

Updated

•

15 years ago

Component: Tinderbox → Release Engineering

OS: Mac OS X → All

Product: Webtools → mozilla.org

QA Contact: tinderbox → release

Hardware: x86 → All

Version: Trunk → other

bhearsum@mozilla.com (:bhearsum)

Comment 1

•

15 years ago

Sorry, this isn't a serious enough issue to deal with right now. It's actually really hard to find out which logs go with which builds, too - there's no way to predict what the log URL will be until Tinderbox generates it. The delay is because we communicate to tinderbox through e-mail, and it takes a bit of time to receive and process the mail.

Component: Release Engineering → Release Engineering: Future

bhearsum@mozilla.com (:bhearsum)

Comment 2

•

15 years ago

On a side note, would it be helpful if the e-mails contained a link to: http://tests.themasta.com/tinderboxpushlog/?tree=MozillaTry instead?

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 3

•

15 years ago

(In reply to comment #1) > Sorry, this isn't a serious enough issue to deal with right now. > > It's actually really hard to find out which logs go with which builds, too - > there's no way to predict what the log URL will be until Tinderbox generates > it. > > The delay is because we communicate to tinderbox through e-mail, and it takes a > bit of time to receive and process the mail. Also, this might be much easier once we put the logs on ftp.m.o, in a predictable location, per bug#530318.

Robert Sayre

Comment 4

•

15 years ago

(In reply to comment #1) > Sorry, this isn't a serious enough issue to deal with right now. This is actually serious. I'll leave it up to you guys to decide the dependencies, but the current situation is really broken.

Component: Release Engineering: Future → Release Engineering

Chris Cooper [:coop] (he/him)

Comment 5

•

15 years ago

(In reply to comment #4) > This is actually serious. I'll leave it up to you guys to decide the > dependencies, but the current situation is really broken. RelEng:Future nominally means we're not going to get to it in the next 2 weeks, which is realistic considering: the number of releases we're trying to slam out between now and New Year's, other unfinished (higher priority) goals, and PTO from releng staff over the holidays. I agree that the current system is broken/difficult to use (I push stuff to try-server too), but please have a little faith that we know how to manage our priorities. Would comment #2 help at all?

Component: Release Engineering → Release Engineering: Future

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 6

•

15 years ago

"Future" is basically opaque to others, though, and I think this is the only component in our Bugzilla in which "Future" (as seen in the TM) doesn't mean "whenever, if ever", so that confusion will continue as long as components, rather than target milestones or priority fields or status whiteboard text or keywords are used to distinguish untriaged from triaged, or whatever the distinction is here. The component description encourages the belief that having your bug moved to Future is, if not a death sentence, at least a trip to the carbonite spa: "For longer term projects that have been agreed should be done, but have no immediate plans to so. These are not be part of the regular recurring triage. Advanced planning and placeholder goals for next quarter also go here." Should we understand that things that are in Release Engineering proper are expected within 2 weeks of being in that state and surviving a triage pass, then? With a keyword-or-whatever system, bugs could start in the "unscheduled" state, and the social dynamics would be much improved. (As with bugs in other areas, reporters would see action to indicate activity on the bug, not to indicate that no activity was planned in the near future.) Also, and more specifically about this bug: how can a motivated developer set up an environment to hack the software in question, if they are, as aforementioned, motivated? I am more than happy to have developers work on tool improvements where their perceived-critical needs exceed releng's bandwidth, but I don't know how to let them self-service. (Caveat: this may make no sense. I am currently on an awesome cocktail of medications in an attempt to regain basic human function.)

bhearsum@mozilla.com (:bhearsum)

Comment 7

•

15 years ago

Shaver, I don't necessarily disagree with your points about Future, but we should probably discuss those elsewhere. > Also, and more specifically about this bug: how can a motivated developer set > up an environment to hack the software in question, if they are, as > aforementioned, motivated? I am more than happy to have developers work on > tool improvements where their perceived-critical needs exceed releng's > bandwidth, but I don't know how to let them self-service. There's no fix for this with the current system. As I mentioned a few comments back, it's not possible to know what the log URL is going to be (it includes the start time of the build, the time the end-of-build mail was processed, and the PID of the processing process IIRC). One possible alternative would be to move the e-mail notifications somewhere else. This other system could be notified from Buildbot or scrape Tinderbox for status. Either way it would have to scrape Tinderbox for the logs. This could even be added to TBPL, I think, considering it already does all of this except the actual e-mailing part. (This would have the added benefit of enabling e-mail notifications for branches, too.) cc'ing mstange in case he has thoughts on tbpl e-mail notifications.

Robert Sayre

Comment 8

•

15 years ago

(In reply to comment #7) > > There's no fix for this with the current system. As I mentioned a few comments > back, it's not possible to know what the log URL is going to be (it includes > the start time of the build, the time the end-of-build mail was processed, and > the PID of the processing process IIRC). why don't we change the log URLs to use a UUID based on the start time? is that what it uses to actually draw the waterfall?

bhearsum@mozilla.com (:bhearsum)

Comment 9

•

15 years ago

(In reply to comment #8) > (In reply to comment #7) > > > > There's no fix for this with the current system. As I mentioned a few comments > > back, it's not possible to know what the log URL is going to be (it includes > > the start time of the build, the time the end-of-build mail was processed, and > > the PID of the processing process IIRC). > > why don't we change the log URLs to use a UUID based on the start time? is that > what it uses to actually draw the waterfall? We don't own the Tinderbox Server code. Getting patches into it is _extremely_ painful. Additionally, we're explicitly trying to move away from Tinderbox and therefore I don't think it's a good place to spend time and effort.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 10

•

15 years ago

If we're trying to move away from it, it seems like we could just fork locally, though maybe there are upstream improvements that we want to take before we migrate away? If the master specified the log file name to the slaves, we'd be all set indeed. I can almost visualize the places in TB code that would need to change for it!

bhearsum@mozilla.com (:bhearsum)

Comment 11

•

15 years ago

(In reply to comment #10) > If we're trying to move away from it, it seems like we could just fork locally, > though maybe there are upstream improvements that we want to take before we > migrate away? IT manages Tinderbox, so that would have to be run by them. I don't care either way, other than the fact that I don't want to spend much time hacking Tinderbox. > If the master specified the log file name to the slaves, we'd be all set > indeed. I don't think the master should be telling Tinderbox what to name the log file - it wouldn't be possible for Tinderbox to guarantee uniqueness in that case. However, I think the log file name should be based on information sent in the end-of-build mail - timestamp, column name - whatever.

Chris AtLee [:catlee]

Comment 12

•

15 years ago

We could also bypass Tinderbox completely by saving the log on the slave, and uploading it alongside the build.

bhearsum@mozilla.com (:bhearsum)

Comment 13

•

15 years ago

(In reply to comment #12) > We could also bypass Tinderbox completely by saving the log on the slave, and > uploading it alongside the build. We should do that, yeah. I don't think that's a great piece for someone without experience with our systems though. That'll require mostly Buildbot hacking.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 14

•

15 years ago

What stopping you from making whatever changes you want to tinderbox on CVS HEAD? As far as I know cls is already off on his own branch and doesn't care what happens on HEAD.

Reed Loden [:reed]

Comment 15

•

15 years ago

(In reply to comment #14) > What stopping you from making whatever changes you want to tinderbox on CVS > HEAD? As far as I know cls is already off on his own branch and doesn't care > what happens on HEAD. Indeed. I'm happy to review patches, too. I'd rather not see Tinderbox fork yet again.

Chris Cooper [:coop] (he/him)

Comment 16

•

15 years ago

Mass move of bugs from Release Engineering:Future -> Release Engineering. See http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.

Component: Release Engineering: Future → Release Engineering

Priority: -- → P3

Andreas Gal :gal

Reporter

Comment 17

•

15 years ago

Three and a half months have passed. Any updates on this?

bhearsum@mozilla.com (:bhearsum)

Comment 18

•

15 years ago

I don't see anywhere that we agreed to start working on this. If you feel it is an important enough issue to bump other work please talk to joduinn and get this prioritized.

Robert Sayre

Comment 19

•

15 years ago

(In reply to comment #18) > I don't see anywhere that we agreed to start working on this. Agree. > If you feel it is > an important enough issue to bump other work please talk to joduinn and get > this prioritized. Have my boss talk to your boss? That's not very Mozilla.

bhearsum@mozilla.com (:bhearsum)

Comment 20

•

15 years ago

(In reply to comment #19) > > If you feel it is > > an important enough issue to bump other work please talk to joduinn and get > > this prioritized. > > Have my boss talk to your boss? That's not very Mozilla. Okay, let me put it this way then: I personally do not believe that the time that it would take to do all of the work involved here is worth bumping any of my current work for. If someone still strongly feels this is important they should speak to somebody with a higher level perspective.

Robert Sayre

Comment 21

•

15 years ago

(In reply to comment #20) > If someone still strongly feels this is important they > should speak to somebody with a higher level perspective. How about you answer comment 14? Someone might fix this bug if they knew where to hack...

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 22

•

15 years ago

What does someone else (like myself or Andreas) need to know in order to do this ourselves? Not everything can get done right away, but I think it's important that developers are able to work on the things that they feel are worth the effort, just as I think it's important that you're able to work on the (other) things that you feel are worth the effort. If "you could hack tinderbox, I will review patches, but we don't want to fork tinderbox" is meant to be the instructions to address this, then I will admit that I need more detail, like information on what branch of TB we're using, and what the configs are, etc. Hacking buildbot seems more palatable, tbqh, even for people who are starting from non-familiarity with our systems. If the log filename is unique on the master, how will it not be unique on the client? Do slaves serve multiple masters? If that's the case, would a UUID for the master (or hostname, even) disambiguate well?

Chris AtLee [:catlee]

Comment 23

•

15 years ago

(In reply to comment #21) > (In reply to comment #20) > > If someone still strongly feels this is important they > > should speak to somebody with a higher level perspective. > > How about you answer comment 14? Someone might fix this bug if they knew where > to hack... I believe Jeff was doing some work on this.

Chris AtLee [:catlee]

Comment 24

•

15 years ago

(In reply to comment #22) > What does someone else (like myself or Andreas) need to know in order to do > this ourselves? Not everything can get done right away, but I think it's > important that developers are able to work on the things that they feel are > worth the effort, just as I think it's important that you're able to work on > the (other) things that you feel are worth the effort. > > If "you could hack tinderbox, I will review patches, but we don't want to fork > tinderbox" is meant to be the instructions to address this, then I will admit > that I need more detail, like information on what branch of TB we're using, and > what the configs are, etc. Hacking buildbot seems more palatable, tbqh, even > for people who are starting from non-familiarity with our systems. So, part of the problem is that we (releng) don't manage the tinderbox servers, so are ill-equipped to point interested parties to the right place to start hacking, or to review patches. IT might know better? > If the log filename is unique on the master, how will it not be unique on the > client? Do slaves serve multiple masters? If that's the case, would a UUID > for the master (or hostname, even) disambiguate well? Not sure what you mean here, but the current problem is that the tinderbox server generates a unique filename for the log file based on various factors, some of which aren't possible for the buildbot master to predict. At the time buildbot sends the email with the try server results, tinderbox won't have even received the logs. To fix this requires fixing both tinderbox so one can safely specify a predictable log name (UUIDs could work, with some additional security to prevent someone from intentionally overwriting a log file), and then corresponding fixes to buildbot's try configs to include the appropriate directives to tinderbox, and then include the link in the email.

Aki Sasaki (not active)

Comment 25

•

15 years ago

(In reply to comment #22) > I need more detail, like information on what branch of TB we're using, and > what the configs are, etc. If someone wanted to hack tinderbox and deal with getting it reviewed/tested/landed: I believe http://mxr.mozilla.org/webtools/source/tinderbox/handlemail.pl#33 is the culprit. It's called in http://mxr.mozilla.org/webtools/source/tinderbox/processbuild.pl#94 as well; possibly other places. The time/pid are in the filenames to prevent overwriting previous mails/logs afaict. We could also leave the time/pid in the mail file name, and just use a different identifier in the logfile name here: http://mxr.mozilla.org/webtools/source/tinderbox/processbuild.pl#143 We use CVS trunk tinderbox; cls has his own branch that we don't use, so we won't be breaking him if we alter our code. We would be diverging further if he doesn't take this patch, however. I am not certain what other consequences of changing the logfile name might be, but the easiest solution here would seem to be making that log file name be predictable before we email the log to tinderbox-daemon. (The email from Try builds happen before the entire build factory exits; the log is sent to Tinderbox from the Buildbot master after the entire build factory exits.) > If the log filename is unique on the master, how will it not be unique on the > client? Do slaves serve multiple masters? If that's the case, would a UUID > for the master (or hostname, even) disambiguate well? I'm not entirely grokking the first question here, but I'm assuming the above links (tinderbox naming its files based on timestamp+pid after it accepts the email) answers that? A UUID would work. Not entirely sure how to implement that. Possibly as another Tinderbox-blank: header in the tinderbox email body, but then there would have to be changes made to the buildbot tinderbox notifier.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 26

•

15 years ago

To set context here: buildbot emails the logfiles to tinderbox server. the tinderbox server dynamically munges a location for the log files on tinderbox server filesystem. There is no way for buildbot to know where tinderbox server will post the logfiles on tinderbox server filesystem. Figuring out the inner mechanics of tinderbox server will take non-trivial work by whomever maintains our current tinderbox server instance. A quicker, cleaner, and imho easier, fix is to have buildbot post the logfiles, alongside the build, on ftp.m.o. Buildbot could send email with data so tinderbox waterfall and TBPL, would include the known location of the logs on ftp.m.o. This approach was raised in comment#12 and is in bug#530318. Would people be ok with closing this as WONTFIX and instead working on buildbot to post files to ftp?

Robert Sayre

Comment 27

•

15 years ago

yes, that sounds like a better plan to me, since I gather that will address a bunch of other issues as well. bug 530318 it is.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → WONTFIX

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 28

•

15 years ago

Cool, thanks rsayrer. The curious should cc themselves on bug#530318.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 30

•

15 years ago

Er, so this bug was WONTFIX'd, but the problem still remains and a fix is desired; I assume even when bug 530318 is fixed, additional work would need to be done to ensure that the links are in the emails. Can some bug hierarchy be set up to represent that this is still a feature that's desired, but instead has a dependency on the work in 530318? This bug has a lot of cruft amongst the the useful content, so perhaps bug 549740 can just be reopened and marked as depending on 530318, but I don't really care about the mechanics of that.

Rail Aliiev [:rail]

Updated

•

15 years ago

Resolution: WONTFIX → DUPLICATE

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering