Closed Bug 489523 Opened 16 years ago Closed 15 years ago

talos reboot (connection lost) error message should say that the real errors are above

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dbaron, Unassigned)

Details

This is a continuation of bug 474950. The talos reboot error message at the end of every talos log is still very frequently confusing people into thinking that the problem is a network problem rather than a test failure. I think the current message (I trimmed a few stars so it fits): ******************************************************************************** *** END OF RUN - NOW DOING SCHEDULED REBOOT; FOLLOWING ERROR MESSAGE EXPECTED ** ******************************************************************************** Should also say something like: REAL ERROR MESSAGES, IF ANY, WILL BE BEFORE THE PREVIOUS "BuildStep started" Especially on Windows builds, that previous "BuildStep started" line is quite a few lines up, so people often don't look that far up.
I dunno, I suspect that any warning message is just effectively hidden in the normal log output spewage, even with caps and stars. It's pretty common to scan up until seeing the first error, and suspecting that. Since we have (had?) other common sporadic failures that also manifest as "twisted.internet.error.ConnectionLost", it's really easy to see that and not look further. Really the best fix would be to not report this in the log at in the first place. Ideally have code that explicitly watches for the connection to be dropped when it's expected to (and reports an error if it's *not*). A second-best-that-i-hate-to-even-suggest would be to use some unexpected ascii art to really grab attention. Like: This error is expected! | | | | \ | / \ | / \ | / \|/
> A second-best-that-i-hate-to-even-suggest would be to use some unexpected ascii > art to really grab attention. Like: > > This error is expected! That wouldn't have helped me. I understood that the error was expected, but I still assumed that the box was orange for that reason.
catlee: is it possible we could get something added to buildbot to facilitate rebooting the slaves as a post-build step without it winding up as a disconnection error like this? As useful as the auto-rebooting is, it clearly causes confusion that no amount of explanatory text is going to fix.
(In reply to comment #3) > catlee: is it possible we could get something added to buildbot to facilitate > rebooting the slaves as a post-build step without it winding up as a > disconnection error like this? As useful as the auto-rebooting is, it clearly > causes confusion that no amount of explanatory text is going to fix. It might be...buildbot is pretty good about killing off processes after a build is done, which makes it hard to spawn tasks that are supposed to happen later. Two things I can think of that may work is to have some kind of a DisconnectStep, that can run a given command and then disconnect the slave without complaining, ok a disconnectOk flag on steps to indicate that it's ok if the slave disappears afterwards. What might be easier is to modify the log output after we're done. So if the log ends with our reboot message followed by the twisted disconnection, then we strip one or both off. This feels a bit dirty to have to do though...
Maybe we can use the Graceful Shutdown feature once the 0.7.10p1 upgrade happens.
Component: Release Engineering: Talos → Release Engineering: Future
Oops. I believe this is no longer an issue -- is that correct?
Yeah, looks better to me. They seem to no longer be showing the error; the ones I checked were: Linux tp4 mozilla-central: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1252514181.1252515848.9113.gz Mac nochrome mozilla-central: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1252513075.1252515853.9130.gz Windows dirty profile mozilla-central: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1252506830.1252515657.6694.gz
Cool. Resolving fixed.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Moving closed Future bugs into Release Engineering in preparation for removing the Future component.
Component: Release Engineering: Future → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.