Closed
Bug 176599
Opened 22 years ago
Closed 22 years ago
Improve performance of duplicates.cgi
Categories
(Bugzilla :: Reporting/Charting, defect)
Bugzilla
Reporting/Charting
Tracking
()
RESOLVED
FIXED
Bugzilla 2.18
People
(Reporter: gerv, Assigned: gerv)
References
Details
(Keywords: perf)
Attachments
(3 files, 4 obsolete files)
(deleted),
text/html
|
Details | |
(deleted),
text/html
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
Split off from bug 176002.
Moving the stats into the DB is desireable, but an effort. I think we should
immediately make the following changes:
- Incorporate the "openonly" functionality into the query.
- Incorporate the access check into the query
- Incorporate the "numwanted" functionality into the query. Because the access
check is also incorporated, we won't need to iterate.
This should make a significant difference. Then, we can work out what schema we
need to move the stats into the DB.
Gerv
Assignee | ||
Comment 1•22 years ago
|
||
Let MySQL do all the work.
bbaetz: you've been pushing for this; any time to review it? ;-) Seriously,
it's not that bug - some stuff copied from CanSeeBug, and a template
optimisation now that we pass in the same number of bugs that we display.
Gerv
Comment 2•22 years ago
|
||
Comment on attachment 104272 [details] [diff] [review]
Patch v.1
You should:
+ detaint_natural($maxrows);
|| die (...)
Also, dkl is rewriting the stuff in Search.pm to be ~ 10 times faster; you want
to use the new method instead. See bug 114696.
Attachment #104272 -
Flags: review-
Assignee | ||
Comment 3•22 years ago
|
||
It's not clear that that bug will make it in before b.m.o upgrades - let's take
this now, and if we can improve it later, great.
Nits fixed.
Gerv
Attachment #104272 -
Attachment is obsolete: true
Comment 4•22 years ago
|
||
Gerv:
I'd suggest either making this depend on bug 114696 or use the same appproach.
It is faster, works better for differrent DBs, and does not have to be heavuilty
rewritten to switch between AND and OR groups.
Assignee | ||
Comment 5•22 years ago
|
||
Joel: as I said in comment 3:
"It's not clear that that bug will make it in before b.m.o upgrades - let's take
this now, and if we can improve it later, great."
If it requires rewriting to fit in with bug 114696, then we can do that in
another round of patching. We do not need multi-DB support in the next week :-)
I do not want to miss the boat with this - now that the Bugzilla Helper's being
used on b.m.o., that page gets a lot of hits.
Gerv
Comment 6•22 years ago
|
||
The patch looks good (a few indentation nits, but nothing review-breaking),
seems to correctly hide bugs the user is unauthorized to see, and is much
faster, but it doesn't show the same bug set. Here's an example of what I get
on a several-days old copy of b.m.o data.
Comment 7•22 years ago
|
||
Here's what duplicates.cgi produces on the same installation if I reverse the
patch.
Comment 8•22 years ago
|
||
Comment on attachment 104691 [details] [diff] [review]
Patch v.2
OK, got it.
You've got a LIMIT, but no ORDER BY, so you're going to be picking $maxrows
_random_ rows, then sorting, rather than sorting then limiting.
You can move the foundgroups stuff into the query, I guess, and then ORDER BY
if you're not sorting on dupes. See my discussion in bug 176002.
Attachment #104691 -
Flags: review-
Assignee | ||
Comment 9•22 years ago
|
||
OK, my idea here is to put the limiting back in the template; duplicates.cgi's
cool bi-directional column sorting mechanism is based on having all the data
available in the template, and changing that requires quite significant surgery.
We can still provide a performance boost by moving the access check and open
check into the query. Although it's not as much as we'd like ideally, we have
deadlines...
Anyone think that's a bad plan?
Gerv
Assignee | ||
Comment 10•22 years ago
|
||
Here's v.3, which moves the row limiting back to the template. Still a perf
improvement, though, surely.
Gerv
Attachment #104691 -
Attachment is obsolete: true
Comment 11•22 years ago
|
||
Comment on attachment 105506 [details] [diff] [review]
Patch v.3
>Index: duplicates.cgi
>+ my $query =
>+ "SELECT bugs.bug_id, components.name, bug_severity, op_sys, " .
>+ "target_milestone, short_desc, bug_status, resolution, " .
>+ "reporter, assigned_to, qa_contact, reporter_accessible, " .
>+ "cclist_accessible, cc.who IS NOT NULL, " .
>+ "COUNT(DISTINCT(bug_group_map.group_id)) as cntbugingroups, " .
>+ "COUNT(DISTINCT(user_group_map.group_id)) as cntuseringroups " .
>+ "FROM bugs, components " .
>+ "LEFT JOIN cc ON bugs.bug_id = cc.bug_id " .
>+ " AND cc.who = $userid " .
>+ "LEFT JOIN bug_group_map ON bugs.bug_id = bug_group_map.bug_id " .
>+ "LEFT JOIN user_group_map ON " .
>+ " user_group_map.group_id = bug_group_map.group_id " .
>+ " AND user_group_map.isbless = 0 " .
>+ " AND user_group_map.user_id = $userid " .
Nit: indentation here is a bit ambiguous.
>+ # Don't add CLOSED, and don't add VERIFIED unless they are INVALID or
>+ # WONTFIX. We want to see VERIFIED INVALID and WONTFIX because common
>+ # "bugs" which aren't bugs end up in this state.
>+ $query .= "AND (bug_status != 'CLOSED') " .
>+ "AND ((bug_status = 'VERIFIED' " .
>+ " AND resolution IN ('INVALID', 'WONTFIX')) " .
>+ " OR (bug_status != 'VERIFIED')) ";
Nit: indentation here is inconsistent.
With this patch applied the "changed since" column doesn't show up.
Also, I'm getting mixed results with performance tests on the code. F.e.,
hitting the CGI twenty five times takes 2 minutes 27 seconds for the version
currently on b.m.o, 3 minutes 2 seconds for the trunk without the patch
applied, and 2 minutes 13 seconds for the trunk with the patch applied on a
test installation. I'm not 100% confident in these results, but did
performance regress on the trunk recently?
Attachment #105506 -
Flags: review-
Comment 12•22 years ago
|
||
myk - that slowdown is probably due to the groups stuff again. bitwise & is
always better than an sql join...
Assignee | ||
Comment 13•22 years ago
|
||
>+ my $query =
>+ "SELECT bugs.bug_id, components.name, bug_severity, op_sys, " .
>+ "target_milestone, short_desc, bug_status, resolution, " .
>+ "reporter, assigned_to, qa_contact, reporter_accessible, " .
>+ "cclist_accessible, cc.who IS NOT NULL, " .
>+ "COUNT(DISTINCT(bug_group_map.group_id)) as cntbugingroups, " .
>+ "COUNT(DISTINCT(user_group_map.group_id)) as cntuseringroups " .
>+ "FROM bugs, components " .
>+ "LEFT JOIN cc ON bugs.bug_id = cc.bug_id " .
>+ " AND cc.who = $userid " .
>+ "LEFT JOIN bug_group_map ON bugs.bug_id = bug_group_map.bug_id " .
>+ "LEFT JOIN user_group_map ON " .
>+ " user_group_map.group_id = bug_group_map.group_id " .
>+ " AND user_group_map.isbless = 0 " .
>+ " AND user_group_map.user_id = $userid " .
> Nit: indentation here is a bit ambiguous.
Well, all the ANDs are lined up with their ONs, and the only anomaly is that
user_group_map.group_id = bug_group_map.group_id is down one line because it
won't fit.
>+ # Don't add CLOSED, and don't add VERIFIED unless they are INVALID or
>+ # WONTFIX. We want to see VERIFIED INVALID and WONTFIX because common
>+ # "bugs" which aren't bugs end up in this state.
>+ $query .= "AND (bug_status != 'CLOSED') " .
>+ "AND ((bug_status = 'VERIFIED' " .
>+ " AND resolution IN ('INVALID', 'WONTFIX')) " .
>+ " OR (bug_status != 'VERIFIED')) ";
> Nit: indentation here is inconsistent.
Things are lined up with their brackets, to show the structure - the OR is an OR
for the entire clause above, etc. etc.
I'm happy to accept suggestions for better indentation, but that's the best I
could come up with.
> With this patch applied the "changed since" column doesn't show up.
Are you sure your test installation has the historical data? This works for me.
Gerv
Comment 14•22 years ago
|
||
extra thing - I'm in the process of redoing bug 114696 for a 10-20% speedup on
the group checking stuff. (Summary: groupby sucks). I don't know if we want to
wait for that before this, or to do this in towo stages. I guess it depends on
timing, at this stage.
Assignee | ||
Comment 15•22 years ago
|
||
That's up to myk. This patch is ready to go (perhaps with indentation tweaks)
and gives us a win. Or we can wait for bug 114696 - but we are running out of
time fast.
Gerv
Comment 16•22 years ago
|
||
Bug 114696 has a perf improvement for the security stuff. I was going to merge
it into duplicates, but then I decided that it would be better to use
Bugzilla::Search instead.
This isn't finished - as was pointed out on IRC, I don't need teh extra stuff,
because I can represent the extra bit with boolean charts.
Attachment #105506 -
Attachment is obsolete: true
Comment 17•22 years ago
|
||
-> 2.18; I'll play with this tomorrow
Target Milestone: --- → Bugzilla 2.18
Comment 18•22 years ago
|
||
Thanks to some help on #mozwebtools, I don't need the extracond stuff.
Note that I use anyexact, even though this is not directly available from the
query.cgi UI - see bug 179302
myk - can I get you to run some speed/correctness tests, please?
Attachment #105673 -
Attachment is obsolete: true
Updated•22 years ago
|
Attachment #105728 -
Flags: review?
Comment 19•22 years ago
|
||
Comment on attachment 105728 [details] [diff] [review]
v5
>Index: duplicates.cgi
>+ # Restrict to product if requested
>+ if ($::FORM{'product'}) {
>+ $params->param('product', $::FORM{'product'});
>+ }
Nit: simple conditionals like this are more readable if postfix:
$params->param('product', $::FORM{'product'}) if ($::FORM{'product'});
>Index: template/en/default/global/user-error.html.tmpl
>+ [% ELSIF error == "invalid_maxrow" %]
>+ [% title = "Invalid Max Rows" %]
>+ The maximum number of rows, '[% maxrows FILTER html %]', must be a positive
>+ integer.
Nit: quoting values is better accomplished with style (bold, italic,
strikethrough, monospaced font, different font color or background color,
etc.).
This patch produces identical output and provides a significant speed boost.
In my tests on Bugzilla data, five runs of duplicates.cgi took ~27 seconds
without the patch and ~9.5 seconds with it. Impressive.
r=myk
Attachment #105728 -
Flags: review? → review+
Comment 20•22 years ago
|
||
a=justdave
Comment 22•22 years ago
|
||
I should say that I find this syntax poor:
SELECT ... FROM bugs... LEFT JOIN longdescs longdescs_email_1 ON bugs.bug_id =
longdescs_email_1.bug_id AND longdescs_email_1.who IN(###)... WHERE ... AND
longdescs_email_1.who IS NOT NULL...
I think it would make more sense (cleaner, easier to read, more correct) as:
SELECT ... FROM bugs... LEFT JOIN longdescs longdescs_email_1 ON bugs.bug_id =
longdescs_email_1.bug_id... WHERE ... AND longdescs_email_1.who = ###...
The MySQL manual also discourages row selection criteria in JOIN statements in
the FROM clause. Preliminary testing says the queries are equally performant.
This is perhaps another bug, though.
Comment 23•22 years ago
|
||
The mysql manual discourages it because people assume that the two things are
equivalent.
I'm not particularly attached to either form, mind you, although I do think that
the current one is clearer. mysql optimises tehm to be the same, so tahts not an
issue.
Updated•12 years ago
|
QA Contact: matty_is_a_geek → default-qa
You need to log in
before you can comment on or make changes to this bug.
Description
•