Closed
Bug 1124779
Opened 10 years ago
Closed 8 years ago
talos compare.py should pull data from treeherder
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: jmaher, Unassigned, Mentored)
References
Details
Attachments
(2 files, 10 obsolete files)
(deleted),
patch
|
jmaher
:
review-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jmaher
:
review-
|
Details | Diff | Splinter Review |
currently compare.py in talos pulls data from graph server. In transitioning to treeherder in the coming months, we should ensure we have full data and we can calculate similar results. This does a few things: 1) prepares a commandline tool for investigates to work in the future 2) validates the treeherder data is 100% accurate 3) validates the treeherder apis make sense :) compare.py is here: http://hg.mozilla.org/build/talos/file/tip/talos/compare.py there is code in there for datazilla- I envision replacing that verbatim so that for a given test, platform, revision, branch we can get all the related data. One difference in treeherder is we store each page of the test as a separate item, in this case we will need to take a geometric mean of the results and use that for the value to match graph server.
Reporter | ||
Comment 1•10 years ago
|
||
no need to use my example, but maybe as a means for a place to start.
Comment 2•9 years ago
|
||
I am interested in taking this up.
Reporter | ||
Comment 3•9 years ago
|
||
I would love to help you get started with this, here is some general information about talos: https://wiki.mozilla.org/Auto-tools/Projects/Talos https://wiki.mozilla.org/Buildbot/Talos Do keep me updated as you have questions or make progress.
Reporter | ||
Comment 4•9 years ago
|
||
some background: graph server results have been in use for years and have a simple api: http://graphs.mozilla.org/api/test/runs/?id=72&branchid=113&platformid=33 more docs on the graph server api (although I don't think they will be needed): https://wiki.mozilla.org/Perfomatic:API Now to translate this into the land of treeherder, we have a slightly different paradigm. In treeherder we store data in related performance signatures. The signatures have an id that maps to a unique collection of parameters (branch, platform, testname, options). To get all signatures, you can do this: https://treeherder.mozilla.org//api/project/mozilla-central/performance-data/0/get_performance_series_summary/?interval=2592000 ^ this gets signatures for the mozilla-central branch over an interval of 2592000 seconds Now that you have the signatures, you can filter out what you care about based on platform, branch, test, options, etc, then do a query for the raw data: https://treeherder.mozilla.org/api/project/mozilla-central/performance-data/0/get_performance_data/?interval_seconds=2592000 This will return a set of data points over the last 2592000 seconds for that given signature. In the patch I attached here I sort of do that already. So what is missing. There is a field in the signatures called 'options_collection_hash', sort of like a hash of options inside of the signature. In this case, I do not do anything with this and we end up merging e10s (multi process) results with non-e10s results because we have two signatures with identical data except for the options hash. In bug 1116601, there is discussion of the options hash. My understanding here is that we will need to pull useful information out of the options hash to make decisions. Right now I am concerned about e10s and that might be the only piece of data in the options hash which is useful at this time.
Reporter | ||
Comment 5•9 years ago
|
||
ok, wlach has corrected me, the options collection hash doesn't do anything with e10s (that is found in the signature "job group name", and "job group symbol" and luckily in the "job_type_name"). We have pgo vs non pgo (pgo is program optimized build). here we do this on all branches (although the branch is usually called mozilla-inbound (pgo) vs mozilla-inbound-non-pgo (non-pgo)) and graph server uses the branch name to differentiate. Maybe that is what is unique in the options hash when the rest of the fields are unique in the performance signature.
Comment 6•9 years ago
|
||
Joel i am trying to understand the correspondence between the functions of the old and previous scripts. Can you give me a valid link for the datazilla link??
Comment 7•9 years ago
|
||
It is required at two places, one is the talos/testdata one and the other is refdata/pushlog one. Sorry to bother you again and again.
Reporter | ||
Comment 8•9 years ago
|
||
hmm, the datazilla stuff is something we wish to get rid of asap. Here is a reference to the specific calls we would do: In datazilla we can get a pushlog: https://datazilla.mozilla.org//refdata/pushlog/list/?days_ago=7&branches=Mozilla-Inbound Then using the revisions from the pushlog, we can get the raw data: https://datazilla.mozilla.org/talos/testdata/raw/Mozilla-Inbound-Non-PGO/7fd0966c2650?os_name=win&os_version=6.1.7601&test_name=tp5n That should help clarify a few things!
Comment 9•9 years ago
|
||
Joel, can you attach the CompareResults function of your compare.py, coz there are many conflicts which i cannot resolve without looking at the function you had.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 10•9 years ago
|
||
I had some other patches which got in the way of the uploaded patch, here is a copy of compare.py that should help understand what I had started with.
Flags: needinfo?(jmaher)
Comment 11•9 years ago
|
||
Joel, can you check the logic i have used, You just have to check the compare function and a new function i have made "GetDataFromResults". I have attached the whole compare.py because i made changes in the one you attached , rather than the original one in the repo.
Flags: needinfo?(jmaher)
Attachment #8557617 -
Flags: feedback?(jmaher)
Reporter | ||
Comment 12•9 years ago
|
||
Comment on attachment 8557617 [details] compare.py > >def getTreeHerderSignatures(branch, history, test, platform): > sigfile = 'th_sigs.json' > ># https://treeherder.mozilla.org//api/project/mozilla-central/performance-data/0/get_performance_series_summary/?interval=2592000 > seconds = history * 86400 > if branch == 'Firefox': > branch = 'mozilla-central' > > thplatforms = {} > thplatforms['Linux'] = 'linux32' > thplatforms['Linux64'] = 'linux64' > thplatforms['WinXP'] = 'windowsxp' > thplatforms['Win7'] = 'windows7-32' > thplatforms['Win8'] = 'windows8-64' > thplatforms['OSX64'] = 'osx-10-6' > thplatforms['OSX10.8'] = 'osx-10-8' > thplatforms['Android'] = 'android-4-0-armv7-api10' these definitions should go up top and we should call them treeherder_platforms to be more specific. > retVal = [] > for sig in signature_data: > d = signature_data[sig] > if d['job_group_symbol'] == 'T-e10s': > continue please add a #TODO comment here that indicates we need to address the e10s issue in a followup bug > >def getTreeHerderResults(branch, history, signatures): > ts = {} we could use a more specific variable name than ts > >def getTreeHerderData(test, branch, platform): > #TODO: hardcoding 14 days in here > print "getting treeherder data for: %s, %s, %s" % (test, platform, branch) > signatures = getTreeHerderSignatures(branch, 14, test, platform) > print "got signatures: %s" % signatures > replicates = getTreeHerderResults(branch, 14, signatures) > print "got replicates: " > for timestamp in replicates: > print "%s: %s" % (timestamp, replicates[timestamp]) do we have >1 signature for a branch/test/platform? how many timestamps do we have? >def compareResults(revision, branch, masterbranch, skipdays, history, platforms, tests, pgo=False, printurl=False, compare_e10s=False, verbose=False, doPrint=False): > > platVal = {} > thdata = getTreeHerderSignatures(masterbranch,history,t p) > # data = getGraphData(test_map[t]['id'], bid, platform_map[p]) > if compare_e10s: > #testdata = getGraphData(test_map[t]['id'], test_bid, platform_map[p + " (e10s)"]) > testdata = getTreeHerderSignatures(branch,history,t,p+" (e10s)") this won't work because we ignore these with the code in getTreeHerderSignatures
Flags: needinfo?(jmaher)
Attachment #8557617 -
Flags: feedback?(jmaher) → feedback+
Reporter | ||
Comment 13•9 years ago
|
||
can you submit changes as a patch in the future? Here is a good way to look at submitting patches: https://developer.mozilla.org/en-US/docs/Mercurial_Queues This will help me see the differences faster. What this needs to do is print both graph server and treeherder data. Ideally you should be able to run this and see output: python compare.py --rev d46487a3cd1d --branch Inbound --platform Linux --test tresize --verbose test min. -> max. rev. Linux: tresize 12.5 -> 20.5 19.3 We could tackle this two ways: 1) print results for both data sources at the same time 2) add flag --treeHerder and when that is specified, pull from treeherder sources and print that data I would prefer to see option #2, so then we could run the above command and verify the results are the same, etc.
Comment 14•9 years ago
|
||
I am getting the following error when i run the command Traceback (most recent call last): File "compare.py", line 707, in <module> main() File "compare.py", line 684, in main datazilla, pgodatazilla, xperfdata = getDatazillaCSET(args.revision, branch) File "compare.py", line 178, in getDatazillaCSET conn.request("GET", cset) File "/usr/lib/python2.7/httplib.py", line 973, in request self._send_request(method, url, body, headers) File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request self.endheaders(body) File "/usr/lib/python2.7/httplib.py", line 969, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 829, in _send_output self.send(msg) File "/usr/lib/python2.7/httplib.py", line 791, in send self.connect() File "/usr/lib/python2.7/httplib.py", line 1172, in connect self.timeout, self.source_address) File "/usr/lib/python2.7/socket.py", line 571, in create_connection raise err socket.error: [Errno 111] Connection refused
Flags: needinfo?(jmaher)
Reporter | ||
Comment 15•9 years ago
|
||
as discussed on irc, this might be a problem locally with a connection. In general we will be removing all calls to datazilla, so this specific error is not of concern, we just need to remove the code that calls it.
Flags: needinfo?(jmaher)
Comment 16•9 years ago
|
||
Attachment #8557960 -
Flags: feedback?
Comment 17•9 years ago
|
||
The differentiation for pgo and non pgo has been implemented.
Attachment #8557960 -
Attachment is obsolete: true
Attachment #8557960 -
Flags: feedback?
Flags: needinfo?(jmaher)
Attachment #8557987 -
Flags: feedback?(jmaher)
Reporter | ||
Comment 18•9 years ago
|
||
Comment on attachment 8557987 [details] [diff] [review] patch2.diff Review of attachment 8557987 [details] [diff] [review]: ----------------------------------------------------------------- throughout the entire patch, please look at the 'hg diff' and ensure that there are no new blank lines or extra spaces at the end of the lines. ::: talos/compare.py @@ +187,5 @@ > platform = getDatazillaPlatform(testrun['test_machine']['os'], > testrun['test_machine']['platform'], > testrun['test_machine']['osversion'], > testrun['test_build']['name']) > + nit: this is a blank line which has a tab/whitespace, if we have a blank line, lets leave it a blank line. @@ +199,4 @@ > pgo = True > if 'Non-PGO' in testrun['test_build']['branch']: > pgo = False > + there is not need to add blank lines when not needed, especially ones that create two blank lines in the code. @@ +369,5 @@ > + with open(sigfile, 'w') as fHandle: > + fHandle.write(data) > + else: > + with open(sigfile, 'r') as fHandle: > + data = fHandle.read() 4 space indentation here @@ +381,5 @@ > + continue > + > + if d['suite'] == test and d['machine_platform'] == treeherder_platforms[platform]: > + # Updating the dictionary with the new signature and the corresponding option collection hash > + retVal.update({sig:d['option_collection_hash']}) I don't understand this line @@ +407,5 @@ > + > + ts = {} > + # now go through all signatures, find matching 'push_timestamp', and take a geometric "mean" values > + for item in results_data: > +# print item please remove this- it is debugging related. @@ +423,5 @@ > + high = 0 > + count = 0 > + vals = [] > + for push_timestamp in data: > + if push_timestamp>=start and push_timestamp<= end: please be consistent with spacing here, ideally: push_timestamp >= start <- notice the spaces around the operator. @@ +433,5 @@ > + count += 1 > + > + average = 0 > + geomean = 0 > + if count>0 : fix the spacing here: count > 0: @@ +526,5 @@ > + testdata = getTreeHerderSignatures(branch,history,t,p) > + # Checking for the pgo and non pgo branches > + for signature in thdata: > + mastersignature = signature > + if thdata[signature] == "32faaecac742100f7753f0c1d0aa0add01b4046b": please document what this magic string is for.
Attachment #8557987 -
Flags: feedback?(jmaher) → feedback-
Reporter | ||
Comment 19•9 years ago
|
||
as a note, if you ask for feedback or review, there is no need for a needinfo on the bug!
Flags: needinfo?(jmaher)
Comment 20•9 years ago
|
||
The corrections suggested in the previous review have been corrected.
Attachment #8558023 -
Flags: feedback?(jmaher)
Reporter | ||
Updated•9 years ago
|
Attachment #8557987 -
Attachment is obsolete: true
Reporter | ||
Updated•9 years ago
|
Attachment #8553231 -
Attachment is obsolete: true
Reporter | ||
Updated•9 years ago
|
Attachment #8557092 -
Attachment is obsolete: true
Reporter | ||
Updated•9 years ago
|
Attachment #8557617 -
Attachment is obsolete: true
Reporter | ||
Comment 21•9 years ago
|
||
Comment on attachment 8558023 [details] [diff] [review] patch3.diff Review of attachment 8558023 [details] [diff] [review]: ----------------------------------------------------------------- there is still a lot of 'whitespace' only changes, please do take a few extra minutes to look over your changes. I recommend looking at them in the splinter view: https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1124779&attachment=8558023 Running this locally, I get: (talos)jmaher@jmaher-ThinkPad-X230:~/mozilla/talos/talos$ python compare.py --rev d46487a3cd1d --branch Inbound --platform Linux --test tresize --verbose --treeHerder test min. -> max. rev. Traceback (most recent call last): File "compare.py", line 703, in <module> main() File "compare.py", line 684, in main compareResults(args.revision, args.branch, args.masterbranch, args.skipdays, args.history, platforms, tests, args.pgo, args.printurl, args.compare_e10s, datazilla, pgodatazilla,args.treeherder, args.verbose, True) File "compare.py", line 530, in compareResults testsignaturedata = getTreeHerderResults(branch, history, testsignature) File "compare.py", line 403, in getTreeHerderResults blob = item['blob'] TypeError: string indices must be integers (talos)jmaher@jmaher-ThinkPad-X230:~/mozilla/talos/talos$ If you have trouble running the script, maybe you can download the raw bits and feed them in to make sure the logic works for extracting the information.
Attachment #8558023 -
Flags: feedback?(jmaher) → feedback-
Comment 22•9 years ago
|
||
Attachment #8558632 -
Flags: feedback?(jmaher)
Reporter | ||
Comment 23•9 years ago
|
||
Comment on attachment 8558632 [details] [diff] [review] patch4.diff Review of attachment 8558632 [details] [diff] [review]: ----------------------------------------------------------------- A great start- lets figure out a way to ensure you have httpsconnection when you submit the patch and can test with a local proxy as well. the big problem here right now is that we have >1 value for nonpgo results coming back: cart 50.0 -> 59.7 56.9 The Non Pgo Branch :) cart 10.5 -> 13.4 7.2 The Non Pgo Branch cart 6.8 -> 7.8 7.2 The Non Pgo Branch :) cart 257.8 -> 291.2 7.2 The Non Pgo Branch :) cart 12.5 -> 16.0 7.2 The Non Pgo Branch :) cart 83.0 -> 101.0 7.2 The Pgo Branch :) cart 10.0 -> 12.0 9.6 The Non Pgo Branch :) cart 11.3 -> 17.9 7.2 The Pgo Branch :) cart 66.0 -> 74.6 9.6 The Pgo Branch :) cart 12.8 -> 17.1 9.6 The Non Pgo Branch :) cart 10.9 -> 12.1 7.2 The Non Pgo Branch :) cart 307.4 -> 337.4 7.2 The Pgo Branch :) cart 229.3 -> 256.8 9.6 The Pgo Branch cart 9.3 -> 10.1 9.6 The Pgo Branch :) cart 9.9 -> 12.8 9.6 The Pgo Branch :( cart 5.6 -> 6.4 9.6 The Non Pgo Branch :) cart 15.7 -> 25.1 7.2 The Pgo Branch cart 8.7 -> 10.5 9.6 The Pgo Branch :) cart 206.1 -> 221.6 9.6 I suspect the problem is that we are reporting values for each subtest whereas graph server is reporting a geometric mean of all the subtests. Maybe you could consider for each revision or specific date, taking the individual results for a test (i.e. cart) and doing a geometric mean (look in compare.py, we call filter.geometric_mean already from there).
Attachment #8558632 -
Flags: feedback?(jmaher) → feedback+
Comment 24•9 years ago
|
||
Joel , i manually checked there were more than 4 tests signatures for cart/linux32 , on https://treeherder.mozilla.org/api/project/mozilla-inbound/performance-data/0/get_performance_series_summary/?interval=2592000. And similar may be the case for other tests.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 25•9 years ago
|
||
I believe they are related to the individual tests as mentioned above in Comment 23. Can you verify these are specific test files?
Flags: needinfo?(jmaher)
Comment 26•9 years ago
|
||
Yes they have the same suite but different tests. So we need to collect data of different signature of non pgo branch , and same for the pgo branch and give only one output for each.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 27•9 years ago
|
||
correct. For all objects of the same test (and signature - i.e. non-pgo), we need to gather the raw data and create a geometric mean for it. This should be for the same data points. The good news is we could print out differences for individual test names as well, so this ends up being much more comprehensive! for now, lets do the geometric mean. If you want to do it now, feel free to print the 'page' or 'test' name out and do all the subtests as well. Of course the final format will be different here as time goes on.
Flags: needinfo?(jmaher)
Comment 28•9 years ago
|
||
Joel, this is now also printing the geomean.
Attachment #8559302 -
Flags: feedback?(jmaher)
Reporter | ||
Comment 29•9 years ago
|
||
this fails to work for me: http://pastebin.mozilla.org/8545443 It would be nice if you put some error handling logic around the pulling of the data as well as the json.load() of the data. Please take some time, clean up your whitespace and error handling and then we can try again!
Reporter | ||
Updated•9 years ago
|
Attachment #8558023 -
Attachment is obsolete: true
Reporter | ||
Updated•9 years ago
|
Attachment #8558632 -
Attachment is obsolete: true
Reporter | ||
Updated•9 years ago
|
Attachment #8559302 -
Flags: feedback?(jmaher)
Comment 30•9 years ago
|
||
Joel, sure the next version would be bad white spaces free and also i'll try to put some error handling logic. The main problem i face is that since i am behind the proxy i have to change some statements to run the script on my computer and then change them back to give them to you. I searched and i could not find a solution.
Reporter | ||
Comment 31•9 years ago
|
||
Kartrk- one solution would be to have a helper function for the retrieval of data. Then you only have to edit the code in one place and I would be fine adding --proxy to the commandline: python compare.py --proxy host:port --verbose --treeHerder ... Then you could determine if there is a proxy from the options and have all your logic in a single function.
Comment 32•9 years ago
|
||
Joel , i have asked my sys admin to give me an exception to access the mozilla resources without proxy, as soon as i get the exception i will resume my work.
Reporter | ||
Comment 33•9 years ago
|
||
sounds good. You are also welcome to put a function to retrive the data and support the proxy needs that you have.
Comment 34•9 years ago
|
||
I have removed all the white spaces and a little bit of error handling too. This code is working fine with me without proxy.
Attachment #8562828 -
Flags: review?(jmaher)
Comment 35•9 years ago
|
||
The previous version considered the existence of both pgo and non pgo branches due to which it was giving errors, however now it has been corrected.
Attachment #8562946 -
Flags: review?(jmaher)
Reporter | ||
Updated•9 years ago
|
Attachment #8559302 -
Attachment is obsolete: true
Reporter | ||
Updated•9 years ago
|
Attachment #8562828 -
Attachment is obsolete: true
Attachment #8562828 -
Flags: review?(jmaher)
Reporter | ||
Comment 36•9 years ago
|
||
Comment on attachment 8562946 [details] [diff] [review] patch7.diff Review of attachment 8562946 [details] [diff] [review]: ----------------------------------------------------------------- hey, this runs and gives me data- lets get the edits lined up well and looking good, then we can work on solving small differences in data as well as output formatting! I would be fine doing this in parts, so lets make the code readable! ::: talos/compare.py @@ +102,5 @@ > +treeherder_platforms['Win7'] = 'windows7-32' > +treeherder_platforms['Win8'] = 'windows8-64' > +treeherder_platforms['OSX64'] = 'osx-10-6' > +treeherder_platforms['OSX10.8'] = 'osx-10-8' > +treeherder_platforms['Android'] = 'android-4-0-armv7-api10' it would be nice to have a blank line between these definitions and the function below. @@ +192,5 @@ > pgodata[platform] = {} > > pgo = True > if 'Non-PGO' in testrun['test_build']['branch']: > + pgo = False nit: you have whitespace after False, ensure it is not a tab or a few whitespaces. @@ +349,5 @@ > + > + data = response.read() > + response.close() > + with open(sigfile, 'w') as fHandle: > + fHandle.write(data) do we need to have a signature file cached locally? I had this in my original patch to speed up local development. I don't believe we really cache this, but if we do that is great- we really should delete it at the end of this script. @@ +365,5 @@ > + continue > + > + if d['suite'] == test and d['machine_platform'] == treeherder_platforms[platform]: > + # This contains all the signature with the given platform est/build and its corresponding option_collection_hash to differentiate between Pgo and Non Pgo branches. > + retVal.update({sig:(d['option_collection_hash'],d['test'])}) the comment has an odd space after platform. also retVal.update has whitespace at the end of the line. @@ +393,5 @@ > + for d in blob: > + if not d['push_timestamp'] in ts: > + ts[d['push_timestamp']] = [] > + > + ts.update({d['push_timestamp']:d['mean']}) call me old fashioned, but I like to read: ts[d['push_timestamp']] = d['mean'] is there a beneifit to calling .update() ? @@ +396,5 @@ > + > + ts.update({d['push_timestamp']:d['mean']}) > + return ts > + > +def GetDataFromResults(data,start,end): nit: in an argument list add a ' ' after each ,: data, start, end @@ +427,5 @@ > + geomean = filter.geometric_mean(data) > + return {'low': low, 'high': high, 'avg': average, 'geomean': geomean} > + > + > +def GetString(test,results,t): t is not descriptive, and add a ' ' after the , @@ +568,5 @@ > + master_pgo_results.update({test:GetDataFromResults(master_pgo_data[test], startdate, enddate)}) > + master_pgo_combined.extend(GetDataFromResults(master_pgo_data[test], startdate, enddate)['data']) > + test_pgo_data.update({test:getTreeHerderResults(branch_name,history,test_pgo_signatures[test])}) > + test_pgo_results.update({test:GetDataFromResults(test_pgo_data[test], startdate, enddate)}) > + test_pgo_combined.extend(GetDataFromResults(test_pgo_data[test], startdate, enddate)['data']) this is really hard to read, it seems like we have 2 blocks of similar code: for index in master_pgo_signatures: master_sig = master_pgo_signatures[test] test_sig = test_pgo_signatures[test] branches = [masterbranch_name, branch_name] for branch in branches: b = branches[branch] results = getTreeHerderResults(b, history, master_sig) master_pgo_data[test] = results data = GetDataFromResults(results, startdate, enddate) master_pgo_results[test] = data master_pgo_combined.extend(data['data']) of course we would need to change out master_pgo_* with test_pgo_* @@ +576,5 @@ > + master_nonpgo_combined.extend(GetDataFromResults(master_nonpgo_data[test], startdate, enddate)['data']) > + test_nonpgo_data.update({test:getTreeHerderResults(branch_name,history,test_nonpgo_signatures[test])}) > + test_nonpgo_combined.extend(GetDataFromResults(test_nonpgo_data[test], startdate, enddate)['data']) > + master_nonpgo_results.update({test:GetDataFromResults(master_nonpgo_data[test], startdate, enddate)}) > + test_nonpgo_results.update({test:GetDataFromResults(test_nonpgo_data[test], startdate, enddate)}) this suffers fromthe same issue- maybe we could loop this better and just iterate through: master_pgo, test_pgo, master_nonpgo, test_nonpgo, we could use a dict instead of a list of variables, so 4 dict objects should suffice. @@ +586,5 @@ > + output.append(string) > + if len(master_pgo_results)>1: > + for test in master_pgo_results: > + string = GetString(test_pgo_results[test],master_pgo_results[test],test) > + output.append(string) we should only do the pgo stuff if we pass in --pgo on the cli, otherwise this might not need much cleanup! @@ +687,5 @@ > > if any(branch in args.branch or branch in args.masterbranch for branch in ['Aurora', 'Beta']) and not args.pgo: > parser.error("Error: please specify --pgo flag in case of Aurora/Beta branch") > > + we don't need this extra line
Attachment #8562946 -
Flags: review?(jmaher) → review-
Comment 37•9 years ago
|
||
I have tried to make the code less redundant and also i have made the corrections pointed out in the previous feedback.
Attachment #8568079 -
Flags: review?(jmaher)
Reporter | ||
Updated•9 years ago
|
Attachment #8562946 -
Attachment is obsolete: true
Reporter | ||
Comment 38•9 years ago
|
||
Comment on attachment 8568079 [details] [diff] [review] patch8.diff Review of attachment 8568079 [details] [diff] [review]: ----------------------------------------------------------------- One thing I would like to do is only show pgo if we have the --pgo flag. In fact, I would prefer to simplify the code and just use the nonpgo or pgo depending on if the flag is specified. This way we don't need duplicated code to handle pgo/non pgo variables. That leaves an open issue of comparing pgo vs nonpgo, but we still have an open issue of comparing e10s and non e10s, I think both of those could be addressed in the same way. Overall this patch is looking great- Take a pass through my review feedback and lets see if the next one is ready to land! ::: talos/compare.py @@ +326,5 @@ > + platform = 'WinXP' > + if mozinfo.os == 'mac': > + if mozinfo.version == "OS X 10.6.8": > + platform = "OSX64" > + #TODO: verify this can you remove this todo line, it works. Also this function is almost identical to getDatazillaPlatform, can you remove getDatazillaPlatform() ? @@ +350,5 @@ > + > + data = response.read() > + response.close() > + with open(sigfile, 'w') as fHandle: > + fHandle.write(data) please remove the local sigfile, we are not caching this across queries. If we were to load the signatures once for an entire compare.py run, then I would be happier to use sigfile (I would prefer, but can be done in a followup bug) @@ +357,5 @@ > + > + signature_data = json.loads(data) > + retVal = {} > + if not signature_data: > + print "Sorry unable to get signatures for the branch %s" %branch it would be nice to print the URL here as well. @@ +361,5 @@ > + print "Sorry unable to get signatures for the branch %s" %branch > + sys.exit(0) > + for sig in signature_data: > + d = signature_data[sig] > + if d['job_group_symbol'] == 'T-e10s': here we are ignoring talos-e10s jobs, how can we get those signatures in the future? @@ +366,5 @@ > + continue > + > + if d['suite'] == test and d['machine_platform'] == treeherder_platforms[platform]: > + # This contains all the signature with the given platform test/build and its corresponding option_collection_hash to differentiate between Pgo and Non Pgo branches. > + retVal.update({sig:(d['option_collection_hash'],d['test'])}) I would prefer: retVal[sig] = d['option_collection_hash'],d['test'] @@ +372,5 @@ > + return retVal > + > +def getTreeHerderResults(branch, history, signatures): > + > + seconds = history * 86400 we should make all instances of 86400 a const at the top of the file. @@ +386,5 @@ > + response.close() > + results_data = json.loads(data) > + ts = {} > + if not results_data: > + print "Sorry unable to get test results for %s" %branch please add output of the URl that failed. @@ +447,5 @@ > + else: > + string = "%2s %-18s\t%7.1f\t->\t%7.1f\t%7.1f" % (status, test_name, results['low'], results['high'], test['avg']) > + return string > + > +def GetDataDict(master_signatures,test_signatures,masterbranch_name,branch_name,history,startdate,enddate): nit: I prefer a space after each comma also you could remove _name and just have masterbranch and testbranch @@ +451,5 @@ > +def GetDataDict(master_signatures,test_signatures,masterbranch_name,branch_name,history,startdate,enddate): > + master = {} > + master['data'] = {} > + master['results'] = {} > + master['combined'] = [] you could do: master = {'data': {}, 'results': {}, 'combined': []} same for test below. @@ +462,5 @@ > + master['results'][test] = GetDataFromResults(master['data'][test], startdate, enddate) > + master['combined'].extend(master['results'][test]['data']) > + tests['data'][test] = getTreeHerderResults(branch_name,history,test_signatures[test]) > + tests['results'][test] = GetDataFromResults(tests['data'][test], startdate, enddate) > + tests['combined'].extend(tests['results'][test]['data']) do we use ['data'][test] for anything except input into GetDataFromResults? It appears we use ['results'][test], not sure if we can find a way to just use the ['combined'] value to make things simpler. @@ +478,5 @@ > itertests = getListOfTests(p, tests) > if p == "Android": > itertests = android_tests > > + nit: please remove this blank line @@ +546,5 @@ > else: > output.append(" %-18s\tNo data for platform" % t) > + if treeherder == True: > + masterbranch_name = branch_map[masterbranch]['pgo']['name'].lower() > + thdata = getTreeHerderSignatures(masterbranch_name,history,t,p) nit: please add a space after each , @@ +564,5 @@ > + for test_signatures in testdata: > + if testdata[test_signatures] == thdata[signature]: > + test_pgo_signatures.update({testdata[test_signatures][1]:test_signatures}) > + else: > + master_nonpgo_signatures.update({thdata[signature][1]:signature}) please avoid update calls: master_nonpgo_signatures[thdata[signature][1]] = signature same for the test one below. @@ +567,5 @@ > + else: > + master_nonpgo_signatures.update({thdata[signature][1]:signature}) > + for test_signatures in testdata: > + if testdata[test_signatures] == thdata[signature]: > + test_nonpgo_signatures.update({testdata[test_signatures][1]:test_signatures}) you have the a forloop for testdata inside the larger loop for master signatures. Is this needed? @@ +701,5 @@ > > if any(branch in args.branch or branch in args.masterbranch for branch in ['Aurora', 'Beta']) and not args.pgo: > parser.error("Error: please specify --pgo flag in case of Aurora/Beta branch") > > + nit: please remove this blank line.
Attachment #8568079 -
Flags: review?(jmaher) → review-
Comment 39•9 years ago
|
||
As we talked on IRC, I worked on the code to reduce the lines, and i could find some lines which could be reduced. And also the changes suggested in the last review have been implemented.
Attachment #8570603 -
Flags: review?(jmaher)
Reporter | ||
Comment 40•9 years ago
|
||
Comment on attachment 8570603 [details] [diff] [review] patch9.diff Review of attachment 8570603 [details] [diff] [review]: ----------------------------------------------------------------- thanks for fixing the issues I raised last time. I have a few additional comments, mostly more cleanup to avoid pgo. ::: talos/compare.py @@ +419,5 @@ > + for test in master_signatures: > + master['data'][test] = getTreeHerderResults(masterbranch, history, master_signatures[test]) > + master['results'][test] = GetDataFromResults(master['data'][test], startdate, enddate) > + master['combined'].extend(master['results'][test]['data']) > + tests['data'][test] = getTreeHerderResults(branch,history, test_signatures[test]) we are assuming that we have a signature 'test' in test_signatures. @@ +422,5 @@ > + master['combined'].extend(master['results'][test]['data']) > + tests['data'][test] = getTreeHerderResults(branch,history, test_signatures[test]) > + tests['results'][test] = GetDataFromResults(tests['data'][test], startdate, enddate) > + tests['combined'].extend(tests['results'][test]['data']) > + return (master,tests) I think to simplify this, it would be better to have this get a dict for one type, and we could call it twice. def getDataDict(signatures, branch, history, startdate, enddate): retVal = {'results': {}, 'combined': []} for test in signatures: data = getTreeHerderResults(branch, history, signatures[test]) retVal['results'][test] = GetDataFromResults(data, startdate, enddate) retVal['combined'].extend(retVal['results'][test]['data']) return retVal That would be a lot cleaner! @@ +512,5 @@ > + # Checking for the pgo and non pgo branches > + master_pgo_signatures = {} > + master_nonpgo_signatures = {} > + test_pgo_signatures = {} > + test_nonpgo_signatures = {} I thought we were going to not do pgo here. I would rather see: master_signatures = {} test_signatures = {} @@ +525,5 @@ > + else: > + master_nonpgo_signatures.update({thdata[signature][1]:signature}) > + for test_signatures in testdata: > + if testdata[test_signatures] == thdata[signature]: > + test_nonpgo_signatures.update({testdata[test_signatures][1]:test_signatures}) in this for loop should be the only stuff for pgo: if options.pgo and thdata[signature][0] == "f69e1b00908837bf0550250abb1645014317e8ec": master_singatures = ... test_signatures = ... please avoid the .update() calls as well. @@ +557,5 @@ > + if len(tests[pgo_or_nonpgo])>0: > + output.append(("##### The %s Branch %s #####" %(pgo_or_nonpgo, t))) > + tests[pgo_or_nonpgo]['combined'] = GetDataForGeomean(tests[pgo_or_nonpgo]['combined']) > + master[pgo_or_nonpgo]['combined'] = GetDataForGeomean(master[pgo_or_nonpgo]['combined']) > + string = GetString(tests[pgo_or_nonpgo]['combined'], master[pgo_or_nonpgo]['combined'], t) removing pgo by default will remove a lot of this code as well!
Attachment #8570603 -
Flags: review?(jmaher) → review-
Reporter | ||
Comment 41•9 years ago
|
||
Kartik, any progress on this? We have made a few small changes to compare.py, which means this patch would be bitrotted.
Comment 42•9 years ago
|
||
Heyy Joel, Sorry i got busy with some other org for my gsoc, i shall resume my work in mozilla maybe after 15 days or so.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 43•9 years ago
|
||
Thanks Kartik! if you get too busy with your other org, do let us know so we can find someone else to finish this great work you have done.
Flags: needinfo?(jmaher)
Comment 44•9 years ago
|
||
I can never be too busy to work for mozilla. Mozilla is the place I grew as an open source contributor.
Updated•9 years ago
|
Reporter | ||
Comment 45•9 years ago
|
||
Hi Kartik, let me know if you can start looking back into this soon!
Comment 46•9 years ago
|
||
Hi, Joel, I think i have got some time to work on this now. Has the compare.py already changed?
Reporter | ||
Comment 47•9 years ago
|
||
compare.py has had a few minor changes, but nothing bit- it will probably bitrot your code and cause it to fail to apply. We can work through it. I am excited to see you working on this again!
Comment 48•9 years ago
|
||
Joel , can you tell me how to obtain the new compare.py, a simple hg pull does not change my compare.py?
Reporter | ||
Comment 49•9 years ago
|
||
ah, simple fix: hg pull hg update or: hg pull -u
Reporter | ||
Comment 50•8 years ago
|
||
now that we have better tools in perfherder this is a deprecated tool.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•