Closed
Bug 827790
Opened 12 years ago
Closed 10 years ago
Clobberer history should be pruned or else moved to another table in the DB, to improve perf
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 944005
People
(Reporter: emorley, Assigned: mrrrgn)
References
Details
(Keywords: sheriffing-P3, Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2046] )
At the time of bug 815556 and bug 816934, there was a discussion on IRC about the sheer number of rows in the clobberer's DB, which was not helping with perf.
Whilst those bugs are now fixed via improved queries/caching, we should still clean up the clobberer history, to avoid future perf issues as the table grows. (That and whilst the clobberer doesn't time out any more, it is still pretty slow to use).
We should either:
a) Regularly prune old rows from the table.
b) Move history to a separate table (that isn't queried by the clobberer page) - and keep the main table as just one row per builder with timestamp of last clobber.
Do we actually have a need for history any further than a few weeks back? (Given that periodic clobbers will have done one since then anyway)
Reporter | ||
Comment 1•12 years ago
|
||
Bit of context:
{
19:27:41 - bhearsum: how'd you fix it?
19:28:29 - jhopkins|buildduty: bhearsum: there were only db indexes on individual fields in clobber_times so i created a multi-field index (with fields ordered from most to least selectivity)
...
19:29:40 - sheeri: still, though, jhopkins|buildduty I'm not sure why all 2200 queries need to be fulfilled before the timeout.
19:31:24 - jhopkins|buildduty: sheeri: it might be possible to print the page and flush a bit at a time to the user to avoid the timeout, but either way that's still way too long for a user to wait for a page load
19:31:55 - sheeri: why not write the query so it does, maybe, 100 at a time, instead of 2200 separate queries? (or, say, 1000 at a time?)
19:32:25 - jhopkins|buildduty: sheeri: how would that help?
19:33:00 - sheeri: doing that can reduce disk I/O.
19:33:32 - jhopkins|buildduty: sheeri: oh, are you talking about displaying a subset of results in a page with prev/next page links?
19:33:40 - sheeri: Nope, not pagination
19:33:57 - sheeri: So, somewhere you get the 2200 queries you need to run, right?
19:34:03 - sheeri: where do you get that info?
19:34:07 - sheeri: you can probably combine queries.
19:34:55 - jhopkins|buildduty: there's an outer loop that fetches all builder names (m-a-lnx, m-a-android, etc.) for a particular source repo (eg. mozilla-aurora). the inner loop checks the latest clobber times for every build slave that is registered to handle those builders
19:34:56 - sheeri: make less work for the webapp
19:35:11 - jhopkins|buildduty: if we had a better db schema what you're saying would be a lot easier
19:36:00 - sheeri: It sounds like you're not bothering to do a join, or something.
19:36:11 - sheeri: so for example
19:36:11 - sheeri: SELECT id, who, lastclobber FROM clobber_times WHERE
19:36:11 - sheeri: builddir = 'm-aurora-andrd-armv6'
19:36:11 - sheeri: AND (branch IS NULL OR branch = 'mozilla-aurora') AND
19:36:11 - sheeri: (master IS NULL OR master = '') AND (slave IS NULL OR slave = 'bld-linux64-ec2-334')
19:36:11 - sheeri: ORDER BY lastclobber DESC LIMIT 1
19:36:27 - sheeri: you're doing an order limit 1, is that important?
19:36:47 - jhopkins|buildduty: yeah, we're sorting most recent->least recent clobber times and choosing the first row
19:36:51 - sheeri: OK.
19:36:58 - sheeri: no caching?
19:37:11 - jhopkins|buildduty: no i checked that - most of the requests are unique
19:37:33 - sheeri: most of the 2200, per page, but each page is different too?
19:38:13 - jhopkins|buildduty: each page load can be different, with build slaves updating the last clobber time
19:39:29 - jhopkins|buildduty: one way to speed this up would be to move the clobber history to another table and make each row in clobber_times unique
19:39:36 - sheeri: hrm, there's 207,000 rows approx in clobber times.
19:39:59 - sheeri: sure, do 2 writes, one as a write to clobber history and one as an update to clobber_times.
19:40:09 - jhopkins|buildduty: exactly
19:41:00 - sheeri: that would push more load on the writes, but it sounds like that might be OK
19:41:09 - terrence has left the room (Quit: Ping timeout).
19:42:15 - jhopkins|buildduty: sheeri: here's the index I created in case you need to do any follow-up work: create index ix_get_clobber_times on clobber_times(slave, builddir, branch);
19:42:51 - sheeri: well, since you have that you don't need an index on just slave, if one existss
19:42:59 - sheeri: and do you want that on dev/stage too?
19:45:57 - jhopkins|buildduty: sheeri: yes, please
}
Comment 2•12 years ago
|
||
pretty sure we're regularly pruning out the DB...or we used to be at least!
Dustin, do you remember where the clobberer DB pruning is running?
Flags: needinfo?(dustin)
Comment 3•12 years ago
|
||
It runs in a crontask on relengwebadm. I haven't seen cronspam from it, and it runs without error on the command line. /var/log/cron confirms it's running.
IIRC, it's not the history that's huge - it's the number of builders x slaves.
Sounds like sheeri's on the right track.
Flags: needinfo?(dustin)
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Reporter | ||
Comment 4•11 years ago
|
||
Can we confirm the number of rows of history is as expected? (If so, I'll morph this bug to be entirely about moving the history to another table)
Comment 5•11 years ago
|
||
Ed, see comment 3 - "IIRC, it's not the history that's huge - it's the number of builders x slaves."
Looks like there are 1391 builders (result of select count(distinct buildername) from builds;) and 1132 slaves (result of select count(distinct slave) from builds;)
Not sure if that's expected.
Reporter | ||
Comment 6•11 years ago
|
||
Yeah I read comment 3 - perf has suddenly decreased, so I wanted to ensure the purge was still working as expected :-)
Comment 7•11 years ago
|
||
I can confirm that purge is still working. The tables themselves are not large, the whole database is 265M. :(
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2037]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2037] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2046]
Assignee | ||
Comment 8•10 years ago
|
||
Bug is out of date post clobberer rewrite. Closing this one.
Assignee: nobody → winter2718
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Updated•8 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•