Closed Bug 533013 Opened 15 years ago Closed 15 years ago

try server result emails don't link to the log

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 511749

People

(Reporter: gal, Unassigned)

Details

Try server emails direct to the waterfall diagram, where people have to manually search the columns for the right result and then retrieve the link to the log from there. This is supremely annoying and error-prone. To add injury to insult, there seems to be a slight delay. The email arrives a minute or two before the column result shows up, so I have to keep reloading the waterfall until the column is there and I can find it and then get the log. If the result email could have a link to the brief and full logs, that would be awesome. From: tryserver@build.mozilla.org Date: December 4, 2009 3:22:51 PM PST To: agal@mozilla.com Subject: Try Server: failure on Linux try hg build Your Try Server build (try-63331f1e1a35) failed to complete on linux. Visit http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTry to view the full logs.
Component: Tinderbox → Release Engineering
OS: Mac OS X → All
Product: Webtools → mozilla.org
QA Contact: tinderbox → release
Hardware: x86 → All
Version: Trunk → other
Sorry, this isn't a serious enough issue to deal with right now. It's actually really hard to find out which logs go with which builds, too - there's no way to predict what the log URL will be until Tinderbox generates it. The delay is because we communicate to tinderbox through e-mail, and it takes a bit of time to receive and process the mail.
Component: Release Engineering → Release Engineering: Future
On a side note, would it be helpful if the e-mails contained a link to: http://tests.themasta.com/tinderboxpushlog/?tree=MozillaTry instead?
(In reply to comment #1) > Sorry, this isn't a serious enough issue to deal with right now. > > It's actually really hard to find out which logs go with which builds, too - > there's no way to predict what the log URL will be until Tinderbox generates > it. > > The delay is because we communicate to tinderbox through e-mail, and it takes a > bit of time to receive and process the mail. Also, this might be much easier once we put the logs on ftp.m.o, in a predictable location, per bug#530318.
(In reply to comment #1) > Sorry, this isn't a serious enough issue to deal with right now. This is actually serious. I'll leave it up to you guys to decide the dependencies, but the current situation is really broken.
Component: Release Engineering: Future → Release Engineering
(In reply to comment #4) > This is actually serious. I'll leave it up to you guys to decide the > dependencies, but the current situation is really broken. RelEng:Future nominally means we're not going to get to it in the next 2 weeks, which is realistic considering: the number of releases we're trying to slam out between now and New Year's, other unfinished (higher priority) goals, and PTO from releng staff over the holidays. I agree that the current system is broken/difficult to use (I push stuff to try-server too), but please have a little faith that we know how to manage our priorities. Would comment #2 help at all?
Component: Release Engineering → Release Engineering: Future
"Future" is basically opaque to others, though, and I think this is the only component in our Bugzilla in which "Future" (as seen in the TM) doesn't mean "whenever, if ever", so that confusion will continue as long as components, rather than target milestones or priority fields or status whiteboard text or keywords are used to distinguish untriaged from triaged, or whatever the distinction is here. The component description encourages the belief that having your bug moved to Future is, if not a death sentence, at least a trip to the carbonite spa: "For longer term projects that have been agreed should be done, but have no immediate plans to so. These are not be part of the regular recurring triage. Advanced planning and placeholder goals for next quarter also go here." Should we understand that things that are in Release Engineering proper are expected within 2 weeks of being in that state and surviving a triage pass, then? With a keyword-or-whatever system, bugs could start in the "unscheduled" state, and the social dynamics would be much improved. (As with bugs in other areas, reporters would see action to indicate activity on the bug, not to indicate that no activity was planned in the near future.) Also, and more specifically about this bug: how can a motivated developer set up an environment to hack the software in question, if they are, as aforementioned, motivated? I am more than happy to have developers work on tool improvements where their perceived-critical needs exceed releng's bandwidth, but I don't know how to let them self-service. (Caveat: this may make no sense. I am currently on an awesome cocktail of medications in an attempt to regain basic human function.)
Shaver, I don't necessarily disagree with your points about Future, but we should probably discuss those elsewhere. > Also, and more specifically about this bug: how can a motivated developer set > up an environment to hack the software in question, if they are, as > aforementioned, motivated? I am more than happy to have developers work on > tool improvements where their perceived-critical needs exceed releng's > bandwidth, but I don't know how to let them self-service. There's no fix for this with the current system. As I mentioned a few comments back, it's not possible to know what the log URL is going to be (it includes the start time of the build, the time the end-of-build mail was processed, and the PID of the processing process IIRC). One possible alternative would be to move the e-mail notifications somewhere else. This other system could be notified from Buildbot or scrape Tinderbox for status. Either way it would have to scrape Tinderbox for the logs. This could even be added to TBPL, I think, considering it already does all of this except the actual e-mailing part. (This would have the added benefit of enabling e-mail notifications for branches, too.) cc'ing mstange in case he has thoughts on tbpl e-mail notifications.
(In reply to comment #7) > > There's no fix for this with the current system. As I mentioned a few comments > back, it's not possible to know what the log URL is going to be (it includes > the start time of the build, the time the end-of-build mail was processed, and > the PID of the processing process IIRC). why don't we change the log URLs to use a UUID based on the start time? is that what it uses to actually draw the waterfall?
(In reply to comment #8) > (In reply to comment #7) > > > > There's no fix for this with the current system. As I mentioned a few comments > > back, it's not possible to know what the log URL is going to be (it includes > > the start time of the build, the time the end-of-build mail was processed, and > > the PID of the processing process IIRC). > > why don't we change the log URLs to use a UUID based on the start time? is that > what it uses to actually draw the waterfall? We don't own the Tinderbox Server code. Getting patches into it is _extremely_ painful. Additionally, we're explicitly trying to move away from Tinderbox and therefore I don't think it's a good place to spend time and effort.
If we're trying to move away from it, it seems like we could just fork locally, though maybe there are upstream improvements that we want to take before we migrate away? If the master specified the log file name to the slaves, we'd be all set indeed. I can almost visualize the places in TB code that would need to change for it!
(In reply to comment #10) > If we're trying to move away from it, it seems like we could just fork locally, > though maybe there are upstream improvements that we want to take before we > migrate away? IT manages Tinderbox, so that would have to be run by them. I don't care either way, other than the fact that I don't want to spend much time hacking Tinderbox. > If the master specified the log file name to the slaves, we'd be all set > indeed. I don't think the master should be telling Tinderbox what to name the log file - it wouldn't be possible for Tinderbox to guarantee uniqueness in that case. However, I think the log file name should be based on information sent in the end-of-build mail - timestamp, column name - whatever.
We could also bypass Tinderbox completely by saving the log on the slave, and uploading it alongside the build.
(In reply to comment #12) > We could also bypass Tinderbox completely by saving the log on the slave, and > uploading it alongside the build. We should do that, yeah. I don't think that's a great piece for someone without experience with our systems though. That'll require mostly Buildbot hacking.
What stopping you from making whatever changes you want to tinderbox on CVS HEAD? As far as I know cls is already off on his own branch and doesn't care what happens on HEAD.
(In reply to comment #14) > What stopping you from making whatever changes you want to tinderbox on CVS > HEAD? As far as I know cls is already off on his own branch and doesn't care > what happens on HEAD. Indeed. I'm happy to review patches, too. I'd rather not see Tinderbox fork yet again.
Mass move of bugs from Release Engineering:Future -> Release Engineering. See http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
Three and a half months have passed. Any updates on this?
I don't see anywhere that we agreed to start working on this. If you feel it is an important enough issue to bump other work please talk to joduinn and get this prioritized.
(In reply to comment #18) > I don't see anywhere that we agreed to start working on this. Agree. > If you feel it is > an important enough issue to bump other work please talk to joduinn and get > this prioritized. Have my boss talk to your boss? That's not very Mozilla.
(In reply to comment #19) > > If you feel it is > > an important enough issue to bump other work please talk to joduinn and get > > this prioritized. > > Have my boss talk to your boss? That's not very Mozilla. Okay, let me put it this way then: I personally do not believe that the time that it would take to do all of the work involved here is worth bumping any of my current work for. If someone still strongly feels this is important they should speak to somebody with a higher level perspective.
(In reply to comment #20) > If someone still strongly feels this is important they > should speak to somebody with a higher level perspective. How about you answer comment 14? Someone might fix this bug if they knew where to hack...
What does someone else (like myself or Andreas) need to know in order to do this ourselves? Not everything can get done right away, but I think it's important that developers are able to work on the things that they feel are worth the effort, just as I think it's important that you're able to work on the (other) things that you feel are worth the effort. If "you could hack tinderbox, I will review patches, but we don't want to fork tinderbox" is meant to be the instructions to address this, then I will admit that I need more detail, like information on what branch of TB we're using, and what the configs are, etc. Hacking buildbot seems more palatable, tbqh, even for people who are starting from non-familiarity with our systems. If the log filename is unique on the master, how will it not be unique on the client? Do slaves serve multiple masters? If that's the case, would a UUID for the master (or hostname, even) disambiguate well?
(In reply to comment #21) > (In reply to comment #20) > > If someone still strongly feels this is important they > > should speak to somebody with a higher level perspective. > > How about you answer comment 14? Someone might fix this bug if they knew where > to hack... I believe Jeff was doing some work on this.
(In reply to comment #22) > What does someone else (like myself or Andreas) need to know in order to do > this ourselves? Not everything can get done right away, but I think it's > important that developers are able to work on the things that they feel are > worth the effort, just as I think it's important that you're able to work on > the (other) things that you feel are worth the effort. > > If "you could hack tinderbox, I will review patches, but we don't want to fork > tinderbox" is meant to be the instructions to address this, then I will admit > that I need more detail, like information on what branch of TB we're using, and > what the configs are, etc. Hacking buildbot seems more palatable, tbqh, even > for people who are starting from non-familiarity with our systems. So, part of the problem is that we (releng) don't manage the tinderbox servers, so are ill-equipped to point interested parties to the right place to start hacking, or to review patches. IT might know better? > If the log filename is unique on the master, how will it not be unique on the > client? Do slaves serve multiple masters? If that's the case, would a UUID > for the master (or hostname, even) disambiguate well? Not sure what you mean here, but the current problem is that the tinderbox server generates a unique filename for the log file based on various factors, some of which aren't possible for the buildbot master to predict. At the time buildbot sends the email with the try server results, tinderbox won't have even received the logs. To fix this requires fixing both tinderbox so one can safely specify a predictable log name (UUIDs could work, with some additional security to prevent someone from intentionally overwriting a log file), and then corresponding fixes to buildbot's try configs to include the appropriate directives to tinderbox, and then include the link in the email.
(In reply to comment #22) > I need more detail, like information on what branch of TB we're using, and > what the configs are, etc. If someone wanted to hack tinderbox and deal with getting it reviewed/tested/landed: I believe http://mxr.mozilla.org/webtools/source/tinderbox/handlemail.pl#33 is the culprit. It's called in http://mxr.mozilla.org/webtools/source/tinderbox/processbuild.pl#94 as well; possibly other places. The time/pid are in the filenames to prevent overwriting previous mails/logs afaict. We could also leave the time/pid in the mail file name, and just use a different identifier in the logfile name here: http://mxr.mozilla.org/webtools/source/tinderbox/processbuild.pl#143 We use CVS trunk tinderbox; cls has his own branch that we don't use, so we won't be breaking him if we alter our code. We would be diverging further if he doesn't take this patch, however. I am not certain what other consequences of changing the logfile name might be, but the easiest solution here would seem to be making that log file name be predictable before we email the log to tinderbox-daemon. (The email from Try builds happen before the entire build factory exits; the log is sent to Tinderbox from the Buildbot master after the entire build factory exits.) > If the log filename is unique on the master, how will it not be unique on the > client? Do slaves serve multiple masters? If that's the case, would a UUID > for the master (or hostname, even) disambiguate well? I'm not entirely grokking the first question here, but I'm assuming the above links (tinderbox naming its files based on timestamp+pid after it accepts the email) answers that? A UUID would work. Not entirely sure how to implement that. Possibly as another Tinderbox-blank: header in the tinderbox email body, but then there would have to be changes made to the buildbot tinderbox notifier.
To set context here: buildbot emails the logfiles to tinderbox server. the tinderbox server dynamically munges a location for the log files on tinderbox server filesystem. There is no way for buildbot to know where tinderbox server will post the logfiles on tinderbox server filesystem. Figuring out the inner mechanics of tinderbox server will take non-trivial work by whomever maintains our current tinderbox server instance. A quicker, cleaner, and imho easier, fix is to have buildbot post the logfiles, alongside the build, on ftp.m.o. Buildbot could send email with data so tinderbox waterfall and TBPL, would include the known location of the logs on ftp.m.o. This approach was raised in comment#12 and is in bug#530318. Would people be ok with closing this as WONTFIX and instead working on buildbot to post files to ftp?
yes, that sounds like a better plan to me, since I gather that will address a bunch of other issues as well. bug 530318 it is.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
Cool, thanks rsayrer. The curious should cc themselves on bug#530318.
Er, so this bug was WONTFIX'd, but the problem still remains and a fix is desired; I assume even when bug 530318 is fixed, additional work would need to be done to ensure that the links are in the emails. Can some bug hierarchy be set up to represent that this is still a feature that's desired, but instead has a dependency on the work in 530318? This bug has a lot of cruft amongst the the useful content, so perhaps bug 549740 can just be reopened and marked as depending on 530318, but I don't really care about the mechanics of that.
Resolution: WONTFIX → DUPLICATE
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.