Closed Bug 827790 Opened 12 years ago Closed 10 years ago

Clobberer history should be pruned or else moved to another table in the DB, to improve perf

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 944005

People

(Reporter: emorley, Assigned: mrrrgn)

References

Details

(Keywords: sheriffing-P3, Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2046] )

At the time of bug 815556 and bug 816934, there was a discussion on IRC about the sheer number of rows in the clobberer's DB, which was not helping with perf. Whilst those bugs are now fixed via improved queries/caching, we should still clean up the clobberer history, to avoid future perf issues as the table grows. (That and whilst the clobberer doesn't time out any more, it is still pretty slow to use). We should either: a) Regularly prune old rows from the table. b) Move history to a separate table (that isn't queried by the clobberer page) - and keep the main table as just one row per builder with timestamp of last clobber. Do we actually have a need for history any further than a few weeks back? (Given that periodic clobbers will have done one since then anyway)
Bit of context: { 19:27:41 - bhearsum: how'd you fix it? 19:28:29 - jhopkins|buildduty: bhearsum: there were only db indexes on individual fields in clobber_times so i created a multi-field index (with fields ordered from most to least selectivity) ... 19:29:40 - sheeri: still, though, jhopkins|buildduty I'm not sure why all 2200 queries need to be fulfilled before the timeout. 19:31:24 - jhopkins|buildduty: sheeri: it might be possible to print the page and flush a bit at a time to the user to avoid the timeout, but either way that's still way too long for a user to wait for a page load 19:31:55 - sheeri: why not write the query so it does, maybe, 100 at a time, instead of 2200 separate queries? (or, say, 1000 at a time?) 19:32:25 - jhopkins|buildduty: sheeri: how would that help? 19:33:00 - sheeri: doing that can reduce disk I/O. 19:33:32 - jhopkins|buildduty: sheeri: oh, are you talking about displaying a subset of results in a page with prev/next page links? 19:33:40 - sheeri: Nope, not pagination 19:33:57 - sheeri: So, somewhere you get the 2200 queries you need to run, right? 19:34:03 - sheeri: where do you get that info? 19:34:07 - sheeri: you can probably combine queries. 19:34:55 - jhopkins|buildduty: there's an outer loop that fetches all builder names (m-a-lnx, m-a-android, etc.) for a particular source repo (eg. mozilla-aurora). the inner loop checks the latest clobber times for every build slave that is registered to handle those builders 19:34:56 - sheeri: make less work for the webapp 19:35:11 - jhopkins|buildduty: if we had a better db schema what you're saying would be a lot easier 19:36:00 - sheeri: It sounds like you're not bothering to do a join, or something. 19:36:11 - sheeri: so for example 19:36:11 - sheeri: SELECT id, who, lastclobber FROM clobber_times WHERE 19:36:11 - sheeri: builddir = 'm-aurora-andrd-armv6' 19:36:11 - sheeri: AND (branch IS NULL OR branch = 'mozilla-aurora') AND 19:36:11 - sheeri: (master IS NULL OR master = '') AND (slave IS NULL OR slave = 'bld-linux64-ec2-334') 19:36:11 - sheeri: ORDER BY lastclobber DESC LIMIT 1 19:36:27 - sheeri: you're doing an order limit 1, is that important? 19:36:47 - jhopkins|buildduty: yeah, we're sorting most recent->least recent clobber times and choosing the first row 19:36:51 - sheeri: OK. 19:36:58 - sheeri: no caching? 19:37:11 - jhopkins|buildduty: no i checked that - most of the requests are unique 19:37:33 - sheeri: most of the 2200, per page, but each page is different too? 19:38:13 - jhopkins|buildduty: each page load can be different, with build slaves updating the last clobber time 19:39:29 - jhopkins|buildduty: one way to speed this up would be to move the clobber history to another table and make each row in clobber_times unique 19:39:36 - sheeri: hrm, there's 207,000 rows approx in clobber times. 19:39:59 - sheeri: sure, do 2 writes, one as a write to clobber history and one as an update to clobber_times. 19:40:09 - jhopkins|buildduty: exactly 19:41:00 - sheeri: that would push more load on the writes, but it sounds like that might be OK 19:41:09 - terrence has left the room (Quit: Ping timeout). 19:42:15 - jhopkins|buildduty: sheeri: here's the index I created in case you need to do any follow-up work: create index ix_get_clobber_times on clobber_times(slave, builddir, branch); 19:42:51 - sheeri: well, since you have that you don't need an index on just slave, if one existss 19:42:59 - sheeri: and do you want that on dev/stage too? 19:45:57 - jhopkins|buildduty: sheeri: yes, please }
pretty sure we're regularly pruning out the DB...or we used to be at least! Dustin, do you remember where the clobberer DB pruning is running?
Flags: needinfo?(dustin)
It runs in a crontask on relengwebadm. I haven't seen cronspam from it, and it runs without error on the command line. /var/log/cron confirms it's running. IIRC, it's not the history that's huge - it's the number of builders x slaves. Sounds like sheeri's on the right track.
Flags: needinfo?(dustin)
Product: mozilla.org → Release Engineering
Blocks: 914669
Can we confirm the number of rows of history is as expected? (If so, I'll morph this bug to be entirely about moving the history to another table)
Ed, see comment 3 - "IIRC, it's not the history that's huge - it's the number of builders x slaves." Looks like there are 1391 builders (result of select count(distinct buildername) from builds;) and 1132 slaves (result of select count(distinct slave) from builds;) Not sure if that's expected.
Yeah I read comment 3 - perf has suddenly decreased, so I wanted to ensure the purge was still working as expected :-)
I can confirm that purge is still working. The tables themselves are not large, the whole database is 265M. :(
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2037]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2037] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2046]
Bug is out of date post clobberer rewrite. Closing this one.
Assignee: nobody → winter2718
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.