Closed
Bug 80157
Opened 24 years ago
Closed 22 years ago
patch that adds "regenerate" option to collectstats.pl
Categories
(Bugzilla :: Reporting/Charting, enhancement, P3)
Tracking
()
RESOLVED
FIXED
Bugzilla 2.18
People
(Reporter: rwalters, Assigned: gerv)
References
Details
Attachments
(1 file, 9 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review |
I've added a "regenerate" command line option to collectstats.pl, similar to the
option found in the "processmail" script. It will regenerate all the
data/mining files from scratch, based on the current bug_status of each bug and
tracing back the changes to the bug_status field as recorded in the
bugs_activity table.
I'm going to attach a patch that can be applied to rev 1.7 of collectstats.pl to
add the enhancement. Sorry I'm not operating off the tip but it's been a while
since I did a cvs update. Now it looks like for this to work in the latest
Bugzilla some merging will have to be done.
Reporter | ||
Comment 1•24 years ago
|
||
Updated•24 years ago
|
Target Milestone: --- → Bugzilla 2.16
Comment 2•24 years ago
|
||
This is cool :-)
Based on visual inspection, I'll confirm for those who haven't looked at it that
this snarfs the snapshot of what the bugs looked like on a given day based on the
bugs starting status and applying changes listed in bugs_activity prior to that
day.
I think it may be missing some fields in the generated files however, and/or may
not be creating the files in the proper format. CCing Gerv because he's messed
with that code most recently.
Assignee | ||
Comment 3•24 years ago
|
||
Yep, this patch is from before the big changes which added a large number of
other states to chart. From superficial inspection, the idea seems sound,
though.
rwalters@qualcomm.com - would you be up for fixing this patch up for the new,
shiny, chart-lots-of-things world?
If not, I could give it a go, but can't promise to get to it for three weeks or
so.
Gerv
Reporter | ||
Comment 4•23 years ago
|
||
Sure, I'll see what I can do. Maybe I can make it run a little faster too.
Reporter | ||
Comment 5•23 years ago
|
||
Comment 6•23 years ago
|
||
I tried this patch on our local install, and running "./collectstats.pl
regenerate" gives the error:
Can't locate object method "autoflush" via package "IO::Handle"
at ./collectstats.pl line 233.
Comment 7•23 years ago
|
||
Stephen: look in your checksetup.pl file for the version checks, and add this
line:
unless (have_vers("IO::Handle",0)) { push @missing,"IO::Handle" }
And report what output you get from it.
My system has autoflush, it reports 1.21. I found reference to autoflush in the
ChangeLog for version 1.16, however, I don't see anything in the ChangeLog
describing when it was added. They only kept track back to 1.10 though.
Comment 8•23 years ago
|
||
Initially the { push @missing,"IO::Handle" } gave a compilation error,
as "@missing" didn't exist. When I removed that clause, I got the required
output, including:
Checking for IO::Handle (any) ok: found v1.21
After looking at a couple of man pages, I found that adding the following line
to the top of collectstats.pl made the error go away and made it do something
useful:
use IO::Handle;
I notice that the regenerate option gives no output. Perhaps it could output
something like:
Regenerating -All- ..................................
Regenerating TestProduct ..................................
etc.
with a . printed for each date, so that you know it is doing something?
Comment 9•23 years ago
|
||
I can confirm that when modified as stated above, the new regenerate option on
our local bugzilla installation generates identical output for the date range
over which we had data previously. It also generates sensible-looking data for
the earlier dates.
Reporter | ||
Comment 10•23 years ago
|
||
Reporter | ||
Comment 11•23 years ago
|
||
Good idea, Stephan - take a look at the new patch I put in and see if you like.
Comment 12•23 years ago
|
||
Looks good to me...
Assignee | ||
Comment 13•23 years ago
|
||
This looks good at first glance; very good indeed. It's excellent also for those
of us who boot into Linux part time and so don't have a collectstats.pl cron
job; it generates all the missing data values. It works for me too, albeit on a
tiny DB. I hope to see many more patches of this quality from this man :-)
r=gerv if you fix the mismatched bracket in the comment around line 35. ;-)
Gerv
Updated•23 years ago
|
Comment 14•23 years ago
|
||
See bug 55161, the column names for the stuff in the bugs_activity table has
changed, this will need to take that into account.
Updated•23 years ago
|
Priority: -- → P3
Comment 15•23 years ago
|
||
-> Bugzilla product, Charting component, reassigning.
This sounds really cool. Is this somehow related to my 2001-02-14 00:18 comments
in bug 16009 "General summary charts" ?
Assignee: tara → gerv
Component: Bugzilla → Reporting/Charting
Product: Webtools → Bugzilla
Version: Bugzilla 2.11 → 2.11
Assignee | ||
Comment 16•23 years ago
|
||
Comment on attachment 39510 [details] [diff] [review]
another patch to -r1.20, this outputs progress indicators
r=gerv. Looking for second review from some other Bugzilla hacker.
Gerv
Attachment #39510 -
Flags: review+
Assignee | ||
Comment 17•23 years ago
|
||
CCing Dave to find a reviewer. ;-)
Gerv
Comment 18•23 years ago
|
||
Comment 19•23 years ago
|
||
Comment on attachment 39510 [details] [diff] [review]
another patch to -r1.20, this outputs progress indicators
as noted in other comments, this doesn't work with 2.14+ because of the
activity table having some columns renamed.
Attachment #39510 -
Attachment is obsolete: true
Attachment #39510 -
Flags: review-
Updated•23 years ago
|
Attachment #34000 -
Attachment is obsolete: true
Updated•23 years ago
|
Attachment #39369 -
Attachment is obsolete: true
Comment 20•23 years ago
|
||
Comment on attachment 52690 [details] [diff] [review]
new version changes bugs_activity.oldvalue to bugs_activity.removed to make it work with 2.14
it works great except for one small little problem...
the generated data files have the permissions set wrong, and the webserver
can't read them. In fact, it completely disables charting because reports.cgi
can no longer read the -All- file. running checksetup.pl does not fix it.
Attachment #52690 -
Flags: review-
Comment 21•23 years ago
|
||
Oh, I did want to note that the regenerated data appears to be pretty darn close
to accurate. I compared it with the "before" data, and with the exception of
the period of time when charting was broke for a month or two shortly after 2.12
was out, the data was all within one or two points of the old data. I blame the
discrepancies on the fact that my cron job runs at 6am instead of midnight, and
we tend to do a lot of work between midnight and 6, so it all credits to the
previous day when the cron job generates it.
Comment 22•23 years ago
|
||
We are currently trying to wrap up Bugzilla 2.16. We are now close enough to
release time that anything that wasn't already ranked at P1 isn't going to make
the cut. Thus this is being retargetted at 2.18. If you strongly disagree with
this retargetting, please comment, however, be aware that we only have about 2
weeks left to review and test anything at this point, and we intend to devote
this time to the remaining bugs that were designated as release blockers.
Target Milestone: Bugzilla 2.16 → Bugzilla 2.18
Comment 23•23 years ago
|
||
I tried the patch on my company's bugzilla installation and it worked
beautifully!! I ran it as the webserver user using sudo so I didn't have any
permission problems. Although it did create the mining directory as 777 which I
thought would kinda dangerous in that someone could delete all your mining
files, but now that you can regenerate them, it's not much of a lose.
Assignee | ||
Comment 24•23 years ago
|
||
rwalters: are you able to address the couple of issues that have been raised here?
I apologise that this didn't make it in sooner.
Gerv
Reporter | ||
Comment 25•23 years ago
|
||
Gerv:
As far as the permissions issues reported, I thought I changed the code that
creates the mining directory so that it uses mode 0770 instead of mode 0777 - I
just checked and apparently it did not get in the patch (hmm, I wonder how that
happened?) Anyway, it is a simple fix - change "777" to "770" in two places in
the "check_data_dir" subroutine.
In my installation I set the group sticky bit on the whole Bugzilla directory,
and set the owning group to "webadmin" which consists of myself, httpd, and
mail. The group sticky bit should cause the mining directory to inherit that
group ownership from the parent, and mode 0770 should allow only user httpd and
group webadmin to read/write/modify/delete it.
As for any issues relating to Bugzilla versions newer than 2.13, I haven't
updated my Bugzilla installation since then so I don't know of any problems.
It looks like someone put in a patch to make this work with 2.14, and I scanned
the comments, which seems to indicate any issues have been resolved.
Comment 26•23 years ago
|
||
Okay, I may have a shot at this if you want.
What permissions should the files have, and which owner/group shoudl they belong to?
Assignee | ||
Comment 27•23 years ago
|
||
Dave?
Gerv
Comment 28•23 years ago
|
||
since collectstats.pl is run either by the maintainer or a cron job, the files
do not need to be writable by the webserver, only readable. This mean 750 for
the directory containing them if you have $webservergroup and 755 if you don't,
and 640 for the files with, and 644 without.
Comment 29•23 years ago
|
||
This code does the same as patch 52690, only *much* faster (4 min compared to
140 min with patch 52690 on a Bugzilla installation with about 30 projects and
1000 bugs.
The statistics have an additional column "total", which is the total bug count.
I think this is useful.
I tested this against patch 52690 on Bugzilla 2.14 with our db mentioned above;
the only differences were the extra "totals" column and all my dates are off by
one day (I'm not sure whether it is my code which is incorrect or whether the
patch is incorrect).
One can probably rewrite the awk script in perl but I was too lazy to do it.
The code keeps intermediate results in RAM. This might be a problem with very
large installations, but it is very fast.
Assignee | ||
Comment 30•23 years ago
|
||
Is that an application/zip, an application/x-gzip or what? :-)
Gerv
Comment 31•23 years ago
|
||
It is two scripts, stored as a .tar.gz file.
(Sorry, but I thought that the file name is stored
too)
Comment 32•22 years ago
|
||
We should rewrite possible awk scripts into Perl. Not all Bugzilla installations
run on Unix, and you don't usually have awk installed on Windows.
Assignee | ||
Comment 33•22 years ago
|
||
I am assigning all the bugs I am not working on in the immediate future to
nobody@bugzilla.org. This means:
- I will be able to search for bugs assigned to me as a list of bugs I'm going
to fix (which is as it should be), and
- people won't falsely assume I might be about to fix a bug when I'm not.
Gerv
Assignee: gerv → nobody
Comment 34•22 years ago
|
||
Very nice patch (I just tried the Perl version, since I'm on Windows).
I updated the regenerate_stats sub's SQL to work on 2.17.x, and also added some
(simple) code to benchmark the time it takes for each product. The output is
now :
Regenerating -All- [100.0%] - 00:01:03
... (for each product)
Total time taken 00:02:09
But I don't think the code is clean enough to be posted here. Should I attach
it here anyways, and let everyone ridicule my Perl (one of the best ways to
learn, I know, but still painful :) or should I post it on the webtools mailing
list and ask for comments there before attaching it here?
Or then again, should I just post the SQL changes here and let someone else
change the actual file?
Comment 35•22 years ago
|
||
Sure, go ahead and post it here. (add it as an attachment).
A diff -u against cvs or against the original would work best, but if you don't
have diff, you can just upload the file and we'll figure it out. :)
Comment 36•22 years ago
|
||
Patch against the current collectstats.pl from CVS (as of Dec. 06 2002). The
SQL to regenerate the stats did not work with 2.17, since the product name has
been replaced by a foreign key in the bugs table. This complicates the SQL a
bit.
Also, when 'collectstats.pl regenerate' is run, the time taken for each product
and the total time will be printed.
Should make the previous patch (patch 52690) obsolete, but I don't have the
option ([no attachments can be made obsolete] above).
(justdave : diff is no problem, I work with CygWin because Win32 is strangely
lacking for any real development... :-)
Updated•22 years ago
|
Attachment #52690 -
Attachment is obsolete: true
Updated•22 years ago
|
Attachment #108486 -
Flags: review?(gerv)
Comment 37•22 years ago
|
||
Notice that my patch does not fix the primary reason why the other patch got a
negative review, which was a permissions problem... I don't know what the final
concensus on that was. My patch only fixes the SQL...
Assignee | ||
Comment 38•22 years ago
|
||
Comment on attachment 108486 [details] [diff] [review]
Patch based on the one in attachment 52690 [details] [diff] [review], with SQL updated to work with 2.17.x
This patch still looks OK; but I'd much rather review the final version, with
the permissions problem solved.
I assume it works?
Gerv
Comment 39•22 years ago
|
||
Yep, it works for me anyways. And I have no idea what the permission problem
was, since I'm on Windows. If I had a Linux machine to check, or if someone can
tell me what the changes are, I can make them. But other than that, I think the
original patch author might have to fix that part.
Comment 40•22 years ago
|
||
Shouldn't someone try to get this patch completed? It seems like this bug has
been at a status quo (except for my SQL update) for over a year, for what
basically is a permissions problem that shouldn't take too long to fix by
someone who knows how...
If no one has the time, just tell me what has to be done, I'll do it and submit
the patch, so that this bug can get reviewed and resolved once and for all!
This is useful enough to be checked in, IMHO, so we should complete it before
it gets to the point of needing another SQL code update...
Comment 41•22 years ago
|
||
it's in Gerv's review queue and he hasn't denied it yet... hopefully he'll
catch up soon :)
Assignee | ||
Comment 42•22 years ago
|
||
jean_seb@hybride.com: I've been avoiding this review for ages, because this code
scares me ;-) However, that's rather unreasonable and, as it's an _option_,
no-one has to run it. So if you check the patch and make sure it's fine for the
current tip, I'll look at it and check it in.
Gerv
Comment 43•22 years ago
|
||
grev : I've checked against collectstats.pl taken from CVS this morning
(25/02/2003 ~10:50 Eastern) and the last patch applies with the first hunk
rejected, but it's very easy to merge it manually. Here is another version of
the patch that applies cleanly.
I also tried to fix the permissions problem mentioned in comment #20, with the
information in comment #28. I set the permissions on creation of the directory,
and each time data is written to a file, in case the file had to be created
before writing to it. I don't know how to check "if you have $webservergroup",
so I set the permissions to what Dave said they should be WITH $webservergroup.
Before reviewing, if someone can tell me how to check it, I can add it to the
patch and it would be pretty final.
Note that if there's anything wrong with the patch other than the SQL code and
the permissions fix, the original author of the patch (rwalters@qualcomm.com)
should probably fix it. The code scares me too :-)
Attachment #108486 -
Attachment is obsolete: true
Comment 44•22 years ago
|
||
Comment on attachment 115509 [details] [diff] [review]
Patch that applies cleanly as of 2003/02/25 11:10 AM Eastern
Oops, sorry about the nasty tabs in the patch. If there are any changes to be
made, I'll remove them at the same time.
Updated•22 years ago
|
Attachment #115509 -
Flags: review?(gerv)
Comment 45•22 years ago
|
||
The SQL queries this uses are incredibly slow, because they perform the function on the
column, rather than the constant, and thus, can't use any indexes.
For example, it does:
SendSQL ("select bug_id from bugs" .
(($product ne '-All-') ? ", products" : "") .
" where to_days(bugs.creation_ts)=" . ($day - 1) .
(($product ne '-All-') ? " and bugs.product_id=products.id " .
" and products.name=" . SqlQuote ($product) : "") .
" order by bug_id");
Notice it does "to_days(bugs.creation_ts) = ($day - 1)", which can't use any indexes
because it has to apply to_days to *every* bugs.creation_ts value.
If you write this as "bugs.creation_ts < from_days($day - 1) and bugs.creation_ts >=
from_days ($day - 2)", it will be able to use the bugs.creation_ts index that exists (since
it'll turn the comparisons into comparisons against constants, and thus, just do a range
select).
This makes the first 16% (which are what use this query) *much* faster, because it will
turn from_days
A similar problem exists in the other later queries like "... to_days
(bugs_activity.bug_when) >= $day " instead of "bugs_activity.bug_when >= from_days
($day - 1)", but modifying them doesn't speed up renegerate measurably.
Just thought i'd point this out
Comment 46•22 years ago
|
||
dberlin : (this may sound like an excuse, but it's actually an explanation) I
didn't write the original SQL, I just changed it to work on Bugzilla 2.17.x,
but I agree that what you mention represents a good speedup. Of course, since
regenerate will theoretically only be run once (for example, as in my case,
when an installation moves from not collecting stats to collecting them), it's
not that important. Nevertheless, if you want to fix the code and submit
another patch, that would be great.
Assignee | ||
Comment 47•22 years ago
|
||
Yes, dberlin: are you able to submit an improved patch with better queries?
Gerv
Comment 48•22 years ago
|
||
Sure, i can submit a patch against the patch (since it's only changing a few lines).
While regenerate is only run once, on my bugzilla installation (which is for the GCC project), regenerate takes >24 hours with the current queries. Since this is a gnats->bugzilla conversion (using a heavily modified gnats2bz.pl script), and we haven't done the final cutover yet, whenever i reimport the gnats data (once or twice a week right now), i need to run regenerate.
There is also some perl code that is doing unnecessary work somewhere (it really should be query bound, but it's not right now, ) that i'm working on tracking down. It gets exponentially slower as it approaches 100%.
Once i fix that, i'll submit an updated patch.
Comment 49•22 years ago
|
||
dberlin : Ouch, 24 hours! Thanks for your cooperation. I look forward to your
update.
Assignee | ||
Comment 50•22 years ago
|
||
dberlin: ping? :-)
Gerv
Comment 51•22 years ago
|
||
Just to be clear, the reason we have to change to_days (creation_ts) = $day - 1
to creation_ts >= from_days($day-1) and creation_ts < from_days ($day-2) is
because there is no easy *AND* fast way to remove the time from the creation_ts
, and with the time part there, "03-11-03 12:54:11" != "03-11-03".
Thus, we check if it's greater than the beginning of the day and less than the
beginning of the next day.
I've excluded other speedups than the query changes, i'll submit them
seperately when they are completely finished.
Comment 52•22 years ago
|
||
*** Bug 197064 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 53•22 years ago
|
||
OK. Here's a patch, cleaned up to match our coding standards with a small
dabble of rearrangement. The command-line parameter is now the more-standard
"--regenerate". I've tested it locally, and it works.
I'm attaching this patch so I can request a=.
Gerv
Attachment #76414 -
Attachment is obsolete: true
Attachment #115509 -
Attachment is obsolete: true
Attachment #116907 -
Attachment is obsolete: true
Assignee | ||
Comment 54•22 years ago
|
||
And now the version without the debugging code.
Regarding permissions: I've used the looser of the two given by justdave in
comment 28, because I don't think we can easily check if webservergroup is set.
Gerv
Attachment #117339 -
Attachment is obsolete: true
Assignee | ||
Updated•22 years ago
|
Attachment #108486 -
Flags: review?(gerv) → review-
Assignee | ||
Updated•22 years ago
|
Attachment #115509 -
Flags: review?(gerv) → review-
Comment 57•22 years ago
|
||
Regarding comment 54: Correct me if I'm wrong as I haven't actually looked at
the code/patch, but doesn't collectstats.pl require globals.pl which in turn
imports everything from localconfig into the global namespace? If so, it should
be trivial to tell whether or not webservergroup is set.
Comment 58•22 years ago
|
||
Just so it doesn't get lost, i'll point out the pathological behavior in the
regenerate option that causes it to be still incredibly slow.
It currently works like so:
for all days:
Initialize bug counts for each status to 0
add all bugs from previous day to list of bugs to process (processlist).
for all bugs in processlist:
determine status of bug and update bugcount.
Obviously, because we never take bugs off the processlist (necessary because we
reset the bugcounts every day), this gets slower and slower as we move through
the days.
I've completely reworked it so that we take advantage of the fact that the only
time the counts change is when a bug is added, or it is resolved. Thus, we only
need to keep looking at the *unresolved* bugs, not all of them, and we don't
need to update the count on the unresolved bugs until they get resolved. (IE
once we've seen it once, until it's resolved, we don't care about it).
Unfortunately, my current version assumes that it's only resolved once.
Once it is resolved, we don't look at it again.
When i get back to it, i'll make it do a quick query once a resolution occurs to
see if it ever gets reopened later on, and put it back on the processlist if
this is the case.
With changes to do this, it takes roughly 20-30 seconds to regenerate stats,
rather than ungodly large amounts of time.
I should also note that AFAICT, it's the perl code that's slow in all of this
(both before and after), not the large amount of queries. Mysql was waiting
most of the time, and wasn't eating much CPU at all.
Assignee | ||
Comment 59•22 years ago
|
||
dberlin: I noticed it got slower as you went, and better queries are always good
:-) But I'd much rather not do the cleanups I did again, so please base any
future patch on my version 2, above.
When do you anticipate being able to make these changes?
Gerv
Comment 60•22 years ago
|
||
I don't see any reviewed patches ready to check in here, so nothing to approve...
Flags: approval?
Assignee | ||
Comment 61•22 years ago
|
||
I'd hate to see this patch get lost _again_.
dberlin: will you be able to fix the dodgy cases in the next week, or should I
get it checked in and maybe you can fix it in a later patch?
Gerv
Comment 62•22 years ago
|
||
I doubt it, i've got finals coming up, and the big final gcc switchover to
attend to.
I'll fix the slowness later, so just go with what we have now
Assignee | ||
Comment 63•22 years ago
|
||
Requesting approval. This patch was basically written by someone else and tidied
up by me, so I've reviewed it as well as uploaded it :-) Also, it's been tested
by several people and is very self-contained (you have to pass an option to
collectstats.pl to enable the function.)
Gerv
Flags: approval?
Updated•22 years ago
|
Flags: approval? → approval+
Assignee | ||
Comment 64•22 years ago
|
||
Fixed. Apologies for the delay.
Checking in collectstats.pl;
/cvsroot/mozilla/webtools/bugzilla/collectstats.pl,v <-- collectstats.pl
new revision: 1.29; previous revision: 1.28
done
Gerv
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 65•22 years ago
|
||
Do we have a bug number for fixing the speed issues mentioned in comment #62?
Assignee | ||
Comment 66•22 years ago
|
||
No, we don't have a bug number for the speed issue: I assume dberlin will file
one when and if he gets around to making a patch for it.
Gerv
Updated•20 years ago
|
Flags: documentation2.18?
Updated•20 years ago
|
Flags: documentation?
OS: Linux → All
Hardware: PC → All
Updated•19 years ago
|
Flags: documentation?
Flags: documentation2.18?
Updated•12 years ago
|
QA Contact: matty_is_a_geek → default-qa
You need to log in
before you can comment on or make changes to this bug.
Description
•