Closed Bug 621304 Opened 14 years ago Closed 12 years ago

Not showing results for builds/tests more than 12 hours after the push

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: regression)

Today's TraceMonkey nightlies were built against http://hg.mozilla.org/tracemonkey/rev/6255a0255dc2 which was pushed yesterday at 15:15:59. Neither http://tbpl.mozilla.org/?tree=TraceMonkey nor http://tbpl.mozilla.org/?tree=TraceMonkey&rev=6255a0255dc2 shows them or the tests that were triggered by them, and I have to back up to http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/bc58c4a26094 to see them displayed. Did the bug 619673 and bug 619651 stuff give us the belief that the push for a build would always be within 12 hours of the build? That's also not true for test runs triggered later on a try build, or for that matter on a try build itself when the load is heavy.
See the discussion in bug 619673 comment 8. What would be a reasonable timeframe here? What about loading 48 hours worth of data in 12 hour chunks to keep the tinderbox load manageable?
And we need to detect if the push we're loading for is the most recent push, and then load all the way to the present, no matter how old the push is.
(In reply to comment #2) > And we need to detect if the push we're loading for is the most recent push, > and then load all the way to the present, no matter how old the push is. Depending on how old the push is, “all the way to the present” is probably overkill. In the case we have a range, we can just load everything between frompush and topush + 24h. In the case of a single push, we could extend the range to 24 hours, how about that?
For a tree with nightly builds, it isn't overkill to load to the present, because there will be builds in every 24 hour period, no matter how old the tip is. If we took the code complexity of teaching tbpl which trees do and don't have nightlies enabled, we could make loading ?tree=Cedar cheaper by not trying to load 24 days of empty tinderbox data, but we'd have to deal with the way that sometimes there haven't been any pushes for a while because the tree was broken, and you're waiting for the new builds that releng triggered on the tip rev to go green. We could solve that problem with some UI that we need anyway. Looking forward 24 hours from a particular non-tip rev is fine on most trees, where builds are only retained for 24 hours so any extra tests you get triggered have to have happened within 24 hours (though maybe there's a need for some fudge-factor for start vs. end times), but on try, builds are retained for 14 days, and it's not unusual to push on Thursday, take a long weekend, and get extra tests triggered on Tuesday. If we had a "Look for more recent builds" up-arrow any time we're showing a rev or range of revs and not showing tinderbox up to the present, we could (sort of awkwardly) solve both the try problem and the waiting-for-rebuilds-on-Cedar problem.
Blocks: 619673
The new world order is going to make this much more frequent - I just starred some stuff on mozilla-central when it popped into view because the Friday 15:13 push made the results of the nightlies built on a push from Thursday 11:16 suddenly visible. If we're going to have m-c nightlies, but no m-c pushes other than merges out of project branches, that's likely to be a daily thing: push, and then after you push suddenly the unstarred orange you pushed into becomes visible. (It's already a every-single-Monday thing on TraceMonkey, but luckily nobody there cares whether or not they pushed into unstarred orange.)
Summary: Not showing results for builds/tests more than 12 hours after the push? → Not showing results for builds/tests more than 12 hours after the push
We definitely need to take the code complexity, though slightly different complexity. Last night, I filed a bug on a new (luckily intermittent, rather than caused by building a clobbered nightly) crash from mozilla-central's nightly's tests, which I only saw because my local tbpl has a 24-hour window in its queue (and now dev.philringnalda.com/tbpl/ is up to 48 hours and climbing as we stay closed and pushless). What we need to take is a per-tree window. For trees like Cedar, that window is 24 hours, because there are no nightlies and that's how long builds are retained (unless we need some fudge-factor for "push, 23:59 later retrigger, 24:20 later it finishes"). For trees with nightlies, it's to-the-present. For MozillaTry, it's 14 days, or whatever build retention is this month - in the old days, maybe you would just push again if you wanted retests of something you pushed a week ago rather than asking releng to retrigger, but now you'll just use self-serve and expect &rev= to show your new test.
I'm currently considering bumping comm-central's period to 24 hours to at least improve this slightly. It seems like we need to ensure that for the latest push we take everything until now - or at least show the current real status, rather than a bogus x hour old one. I'm setting this as blocking the stop using tinderbox for Firefox status bug, as I think you can't realistically stop using tinderbox whilst we don't show the real status of the tree at all times.
Blocks: 630538
That seems like a backward dependency: the right fix depends on getting _off_ Tinderbox, which thinks that the only API it needs to expose is "tell me all the runs that happened during a particular timespan" when what we need is something new which instead exposes an API for "tell me all the runs that happened on a particular changeset." Conveniently, while I was bitching about how expanding to a 30 day retention for builds ruined my comment 7 plan, catlee pointed out that he has already exposed exactly that API - if we have ?tree=Firefox&rev=a7346f028fd6, he's happy to tell us that the builds on that rev are https://build.mozilla.org/buildapi/self-serve/mozilla-central/rev/a7346f028fd6 (well, https://build.mozilla.org/buildapi/self-serve/mozilla-central/rev/a7346f028fd6?format=json). All we need is someone smarter than me, to write the code to sit between the pushlog fetch and the tinderbox fetches. Well, and IT to put it on a stronger box once we crush it by requesting dozens of URIs hundreds of times.
It needs to move out of ldap space and gain some CORS headers, then I’ll be hacking on it right away.
No longer blocks: 630538
I certainly feel sorry for anyone depending on usetinderbox=1, but that's nobody on tbpl.m.o, and you'll just have to take the code complexity of fixing this problem on your hypothetical fork of tbpl.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.