Closed
Bug 608347
Opened 14 years ago
Closed 14 years ago
create AMO collection mode in collect.py
Categories
(Webtools Graveyard :: Graph Server, defect)
Webtools Graveyard
Graph Server
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: anodelman, Assigned: rhelmer)
References
Details
Attachments
(8 files, 1 obsolete file)
(deleted),
patch
|
ianbicking
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
ianbicking
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
rhelmer
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
rhelmer
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
anodelman
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
anodelman
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
rhelmer
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
Based upon spec in
https://wiki.mozilla.org/Buildbot/Talos/DataFormat#AMO
Updated•14 years ago
|
Assignee: nobody → ianb
Comment 1•14 years ago
|
||
This is implemented for the graphs code in: http://hg.mozilla.org/users/ibicking_mozilla.com/graphs/rev/043b9d1e4e38
This hasn't been deployed, and may need to be backported to deploy into production. The core code is here: http://hg.mozilla.org/users/ibicking_mozilla.com/graphs/file/tip/server/pyfomatic/collect_amo.py (with tests here: http://hg.mozilla.org/users/ibicking_mozilla.com/graphs/file/tip/tests/), along with a small change to server/pyfomatic/collect.py that would need to be backported (http://hg.mozilla.org/users/ibicking_mozilla.com/graphs/diff/043b9d1e4e38/server/pyfomatic/collect.py)
Also server/graphsdb.py will have to be updated to have a amo_db connection object that points to the AMO database.
Comment 2•14 years ago
|
||
One issue: the database schema in at https://github.com/jbalogh/zamboni/blob/master/migrations/91-adjust-perf-tables.sql#L29 removing the perf_osversions.version column, as graphs does not contain any specific OS version information (except embedded in the OS name I guess). Or... maybe we were going to parse the OS name into two parts, I wasn't sure.
Comment 3•14 years ago
|
||
Thanks! If you could parse it out, it would be easier to use on AMO since we'll want them separated.
Comment 4•14 years ago
|
||
OS name and version are split in http://hg.mozilla.org/users/ibicking_mozilla.com/graphs/rev/befb31830c64
Comment 5•14 years ago
|
||
Thanks. CCing alice to get this deployed on staging
Reporter | ||
Comment 6•14 years ago
|
||
I'll get this onto staging this week.
Thanks for all the hard work, Ian.
Reporter | ||
Comment 7•14 years ago
|
||
Attachment #493796 -
Flags: review?
Reporter | ||
Updated•14 years ago
|
Attachment #493796 -
Flags: review? → review?(ianb)
Updated•14 years ago
|
Attachment #493796 -
Flags: review?(ianb) → review+
Comment 8•14 years ago
|
||
This is on stage and working, thanks everyone. Two things so far:
1) The exponential numbers. Several averages are coming through as things like 8.16079e+07. Technically, this is fine, but is something wrong there? Those numbers are way different than most.
2) It looks like everything is getting inserted into the database 4 times, sometimes with different averages. There should be 1 row per add-on, per app, per os, per test (in this case, always ts). You can see the current data as of this post at http://pastebin.mozilla.org/874871
Reporter | ||
Comment 9•14 years ago
|
||
Looks like there is some poor name matching going on:
http://pastebin.mozilla.org/874936
You can see that I'm sending results for a bunch of metrics, but they are all going into the 'ts' bucket. Anything that isn't called 'ts' or 'tp4' should be ignored.
Reporter | ||
Comment 10•14 years ago
|
||
Attachment #496305 -
Flags: review?(ianb)
Updated•14 years ago
|
Attachment #496305 -
Flags: review?(ianb) → review+
Reporter | ||
Comment 11•14 years ago
|
||
Looking over the code, the average that you use is a straight average. The
rest of talos world always excludes the max values before calculating an
average (talos average = total - max_result/ runs-1). Can you fix up the
calculation?
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment #11)
> Looking over the code, the average that you use is a straight average. The
> rest of talos world always excludes the max values before calculating an
> average (talos average = total - max_result/ runs-1). Can you fix up the
> calculation?
Pushed to http://hg.mozilla.org/users/ibicking_mozilla.com/graphs/rev/911b07ebff80 and also now live on staging.
Reporter | ||
Comment 13•14 years ago
|
||
There are now quadruplicate entries for addon_id = NULL:
http://pastebin.mozilla.org/882712
Reporter | ||
Comment 14•14 years ago
|
||
A couple of fixes in this patch:
- correctly calculate average
- handle 'NULL' addon_id so as not to create quadruplicate entries
Attachment #497921 -
Flags: review?(ianb)
Reporter | ||
Comment 15•14 years ago
|
||
Take the working collect_amo.py and add it to the currently used graph server front end.
Attachment #497922 -
Flags: review?(ianb)
Assignee | ||
Updated•14 years ago
|
Attachment #497921 -
Flags: review?(ianb) → review+
Assignee | ||
Updated•14 years ago
|
Attachment #497922 -
Flags: review?(ianb) → review+
Reporter | ||
Comment 16•14 years ago
|
||
Comment on attachment 497922 [details] [diff] [review]
[checked in]backport new amo collection mode to old graph server front end
changeset: 326:6360c723794c
Attachment #497922 -
Attachment description: backport new amo collection mode to old graph server front end → [checked in]backport new amo collection mode to old graph server front end
Assignee | ||
Comment 17•14 years ago
|
||
Comment on attachment 497921 [details] [diff] [review]
[checked in] final fixes for ian's collect_amo.py
Pushed to ian's repo, ce1940d12c17
Attachment #497921 -
Attachment description: final fixes for ian's collect_amo.py → [checked in] final fixes for ian's collect_amo.py
Assignee | ||
Comment 18•14 years ago
|
||
This patch was pushed live in bug 620015 and the AMO DB being unavailable caused a tree closure (bug 620572)
Bug 620596 is about putting a queue in place, which is a good long-term solution but will take some time to implement. For right now, let's throw the AMO data away if we cannot contact the database (per clouserw).
Reassigning this to me (Ian is now in Labs and no longer works on graph server).
Assignee: ianb → robert
Status: NEW → ASSIGNED
Assignee | ||
Comment 19•14 years ago
|
||
I've set up a Hudson job to run the unit tests for this feature:
https://hudson.mozilla.org/job/graphs
I see two failures right now, they both look like rounding errors. Will take a look at these before closing this bug.
Assignee | ||
Comment 20•14 years ago
|
||
Not tested for real yet, but introduces no new errors in the unit test.
Attachment #500853 -
Flags: review?(anodelman)
Reporter | ||
Comment 21•14 years ago
|
||
Erg - just realized that that is a patch for the 2.0 graph server branch. We need one for 1.0 so that this can be rolled out to the current production graph server. I can hack something together.
Assignee | ||
Comment 22•14 years ago
|
||
This patch is against hg.m.o/graphs branch 1.0
After giving it some more thought I think catching Exception might be the right thing to do here, since we really honestly don't care what happens in the "try" block and want to continue processing if at all possible.
Alice, I don't know if "RETURN\tfail (...)" is going to do be useful to talos, feel free to change this.
Attachment #500853 -
Attachment is obsolete: true
Attachment #500972 -
Flags: review?(anodelman)
Attachment #500853 -
Flags: review?(anodelman)
Reporter | ||
Comment 23•14 years ago
|
||
Comment on attachment 500972 [details] [diff] [review]
[checked in]catch and report mysql connection problems
This patch covers the issue in collect.py, but there is still connection problems in graphsdb.py. If the amo_db is not able to connect then regular talos sends also fail, as when graphsdb is called it attempts to set up both the regular db connection and the amo_db connection.
graphdb.py should be changed to hold the db credentials, actual db connections should be set up elsewhere on an as-needed basis.
Attachment #500972 -
Flags: review?(anodelman) → review+
Assignee | ||
Comment 24•14 years ago
|
||
This handles the connection problem described in comment 23. The actual connection is now done inside collect_amo.py, which is wrapped in a try/except by collect.py
I'd really like to move the regular "db" connect out of graphsdb.py too, but it'd make this patch noisy and there isn't any real benefit right now. Stuck a FIXME comment in there, we can do this more consistently on the 2.0 branch.
Attachment #501375 -
Flags: review?(anodelman)
Reporter | ||
Updated•14 years ago
|
Attachment #501375 -
Flags: review?(anodelman) → review+
Reporter | ||
Comment 25•14 years ago
|
||
Solution tested on staging and regular talos data passing to graph server is successful even if amo_db is down. This effectively removes the amo_db dependency and clears the way for this to be rolled out.
Reporter | ||
Comment 26•14 years ago
|
||
There's an issue in collect.py (1.0 branch) is the spacing, this fixes it. In the old state the VALUES/AVERAGE code was called even if AMO data was sent. By adding the correct depth to the if statement when AMO data is sent the VALUES/AVERAGE code is no longer executed.
Tested green on staging.
Attachment #501452 -
Flags: review?(rhelmer)
Assignee | ||
Updated•14 years ago
|
Attachment #501452 -
Flags: review?(rhelmer) → review+
Reporter | ||
Comment 27•14 years ago
|
||
Comment on attachment 501452 [details] [diff] [review]
[checked in]fix spacing in collect.py
changeset: 329:e69875df700d
Attachment #501452 -
Attachment description: fix spacing in collect.py → [checked in]fix spacing in collect.py
Reporter | ||
Comment 28•14 years ago
|
||
Comment on attachment 500972 [details] [diff] [review]
[checked in]catch and report mysql connection problems
changeset: 330:c8063e009870
Attachment #500972 -
Attachment description: catch and report mysql connection problems → [checked in]catch and report mysql connection problems
Reporter | ||
Comment 29•14 years ago
|
||
Comment on attachment 501375 [details] [diff] [review]
[checked in]move amo_db connection into collect_amo.py
changeset: 331:91bdf3d3cf29
Attachment #501375 -
Attachment description: move amo_db connection into collect_amo.py → [checked in]move amo_db connection into collect_amo.py
Assignee | ||
Comment 30•14 years ago
|
||
This has been pushed to production in bug 623386, and we tested that the DB being unavailable (bad auth, bad IP) does not impact normal data collection.
During the downtime, we were not able to confirm normal operation using an example from:
https://wiki.mozilla.org/Buildbot/Talos/DataFormat#AMO on prod or staging, although it tested green on talos staging after the change in comment 29 landed (by anode).
I don't have access or visibility into talos staging; however re-testing now I see different behavior on prod and stage:
$ curl -F filename=@/tmp/test http://graphs-stage.mozilla.org/server/collect.cgi
RETURN fail unsubscriptable object
$ curl -F filename=@/tmp/test http://graphs.mozilla.org/server/collect.cgi
RETURN fail global name 'db' is not defined
It would be good to test this in Talos staging (I have no visibility into or access to this), if it is not working then we probably need to change something in the environment on prod, and/or ensure we've got the same code in both prod and stage for graphserver.
clouserw is following up on this testing, as I am out this week.
Comment 31•14 years ago
|
||
(In reply to comment #30)
> This has been pushed to production in bug 623386, and we tested that the DB
> being unavailable (bad auth, bad IP) does not impact normal data collection.
>
> During the downtime, we were not able to confirm normal operation using an
> example from:
> https://wiki.mozilla.org/Buildbot/Talos/DataFormat#AMO on prod or staging,
> although it tested green on talos staging after the change in comment 29 landed
> (by anode).
>
> I don't have access or visibility into talos staging; however re-testing now I
> see different behavior on prod and stage:
>
> $ curl -F filename=@/tmp/test
> http://graphs-stage.mozilla.org/server/collect.cgi
> RETURN fail unsubscriptable object
>
> $ curl -F filename=@/tmp/test http://graphs.mozilla.org/server/collect.cgi
> RETURN fail global name 'db' is not defined
>
> It would be good to test this in Talos staging (I have no visibility into or
> access to this), if it is not working then we probably need to change something
> in the environment on prod, and/or ensure we've got the same code in both prod
> and stage for graphserver.
>
> clouserw is following up on this testing, as I am out this week.
Bhearsum, no one on my team (except Alice, who's away) knows how to get into graphs-staging. Is this something you guys (RelEng) can verify? Or teach us(me, harth, jmaher) how to verify? Who actually owns this piece of the puzzle?
Thanks for the help.
Comment 32•14 years ago
|
||
I tried running this in staging and found that we don't have the newly required "AddonTester" branch in the staging or production database.
I did do a run through of bug 599169 in staging production against the Tryserver branch, however. In staging, I got this error when it tried to submit:
DEBUG: process_Request line: fail (1452, 'Cannot add or update a child row: a foreign key constraint fails (`addons_reskin/perf_results`, CONSTRAINT `perf_results_addon_id_key` FOREIGN KEY (`addon_id`) REFERENCES `addons` (`id`))')
DEBUG: process_Request line:
In production, I got the same error rhelmer did in his test:
DEBUG: process_Request line: fail global name 'db' is not defined
DEBUG: process_Request line:
Comment 33•14 years ago
|
||
> I did do a run through of bug 599169 in staging production against the
> Tryserver branch, however. In staging, I got this error when it tried to
> submit:
> DEBUG: process_Request line: fail (1452, 'Cannot add or update a child row: a
> foreign key constraint fails (`addons_reskin/perf_results`, CONSTRAINT
> `perf_results_addon_id_key` FOREIGN KEY (`addon_id`) REFERENCES `addons`
> (`id`))')
> DEBUG: process_Request line:
What data were you submitting? It sounds like you were trying to insert data for an add-on that doesn't exist.
Assignee | ||
Comment 34•14 years ago
|
||
Waiting to get my access fixed on graphs-stage (bug 613267), catlee was kind enough to look on graphs-stage for local patches and we found this one. I am sure that this is causing the exception in comment 30 that we can't repro on staging.
Just a missing import. We should remove any unnecessary local patches on staging and have a good test case before pushing this to prod though, so we don't keep spinning our wheels on this one (comment 32 and comment 33 are looking close to a good test).
Attachment #512852 -
Flags: review?(catlee)
Updated•14 years ago
|
Attachment #512852 -
Attachment is patch: true
Attachment #512852 -
Attachment mime type: application/octet-stream → text/plain
Updated•14 years ago
|
Attachment #512852 -
Flags: review?(catlee) → review+
Comment 35•14 years ago
|
||
(In reply to comment #33)
> > I did do a run through of bug 599169 in staging production against the
> > Tryserver branch, however. In staging, I got this error when it tried to
> > submit:
> > DEBUG: process_Request line: fail (1452, 'Cannot add or update a child row: a
> > foreign key constraint fails (`addons_reskin/perf_results`, CONSTRAINT
> > `perf_results_addon_id_key` FOREIGN KEY (`addon_id`) REFERENCES `addons`
> > (`id`))')
> > DEBUG: process_Request line:
>
> What data were you submitting? It sounds like you were trying to insert data
> for an add-on that doesn't exist.
I _think_ this is what's being sent, possibly constructed differently:
START
AMO
Firefox,3.6.15pre,https://addons.mozilla.org/en-US/firefox/downloads/latest/4925
talos-r3-snow-001,ts,Tryserver,baaf2315445d,20110216033208,1297952166
0,1267.00,NULL
1,958.00,NULL
END
Do we have to manually add entries for all the add-ons that we'll be testing, or is that done on demand?
Assignee | ||
Comment 36•14 years ago
|
||
Comment on attachment 512852 [details] [diff] [review]
[checked in] import db (was patched locally on stage)
Can someone please land this in hg.m.o/graphs ? I don't have access to push there (filed bug 635119 for that).
Comment 37•14 years ago
|
||
ping? We still need the "import db" patch landed, and to figure out why submissions are failing because of foreign key constraints.
Assignee | ||
Comment 38•14 years ago
|
||
(In reply to comment #37)
> ping? We still need the "import db" patch landed, and to figure out why
> submissions are failing because of foreign key constraints.
I am waiting on getting access to hg.m.o/graphs (set it blocking), maybe someone with access can land this sooner (catlee?)
I can take a look at the foreign key constraints problem tomorrow (will probably need help from the AMO team on this).
Depends on: 635119
Assignee | ||
Comment 39•14 years ago
|
||
(In reply to comment #35)
> I _think_ this is what's being sent, possibly constructed differently:
> START
> AMO
> Firefox,3.6.15pre,https://addons.mozilla.org/en-US/firefox/downloads/latest/4925
> talos-r3-snow-001,ts,Tryserver,baaf2315445d,20110216033208,1297952166
> 0,1267.00,NULL
> 1,958.00,NULL
> END
OK this works if we pass just the id (4925) and not the whole URL on the first line after AMO (per https://wiki.mozilla.org/Buildbot/Talos/DataFormat#AMO)
Just did a successful test:
"""
[rhelmer@bm-buildgraph01 ~]$ cat amo.txt
START
AMO
Firefox,3.6.15pre,4925
talos-r3-snow-001,ts,Tryserver,baaf2315445d,20110216033208,1297952166
0,1267.00,NULL
1,958.00,NULL
END
[rhelmer@bm-buildgraph01 ~]$ curl -F filename=@amo.txt 'http://graphs-stage.mozilla.org/server/collect.cgi'
RETURN success
"""
clouserw verified that "958" was inserted for that addon ID in the perf_appversions table, for Firefox 3.6.15pre OS X 10.6.6 (the way averaging works in talos I believe is to throw out the first result and do a mean average on the rest, with only one value I think this is expected behavior).
Conclusion - looks like it works ok, we just need that little "import db" patch landed. I'll prod about my access today, if someone else can land it in the meantime that would be grand.
Assignee | ||
Comment 40•14 years ago
|
||
Comment on attachment 512852 [details] [diff] [review]
[checked in] import db (was patched locally on stage)
aki landed this for us: http://hg.mozilla.org/graphs/rev/092eb6cef1ea
Attachment #512852 -
Attachment description: import db (was patched locally on stage) → [checked in] import db (was patched locally on stage)
Assignee | ||
Comment 41•14 years ago
|
||
OK we are ready to go. Should I file an IT bug to get this pushed to prod? Do we need a downtime window?
Assignee | ||
Comment 42•14 years ago
|
||
(In reply to comment #41)
> OK we are ready to go. Should I file an IT bug to get this pushed to prod? Do
> we need a downtime window?
Went ahead and did this, bug 636851.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 43•14 years ago
|
||
Thanks!
Updated•8 years ago
|
Product: Webtools → Webtools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•