Closed Bug 1165356 Opened 9 years ago Closed 8 years ago

Use requests everywhere instead of urllib / urllib2 / httplib

Categories

(Tree Management :: Treeherder, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1144417 already switched treeherder-client over to using requests rather than httplib. There are no other uses of httplib in the repo, but we have urllib / urllib2 calls sprinkled all over the place that we should replace with requests. This will fix things like bug 1165335, since we can avoid using urllib.urlencode(). The admin node has requests installed, so we can even switch over the update.py deployment script - which currently uses not just urllib2 but urllib too. Stage/prod venv have version 2.6.2 installed and the admin node has 2.7.0, so both support the new .post() json param - which means we can can simplify some of our existing requests usage too.
Blocks: 1165335
No longer blocks: 1144417
Depends on: 1144417
Depends on: 1167168
No longer blocks: 1165335
This will fix a few things (eg bug 1165335, possibly some of the ETL leaks), so let's do this sooner rather than later.
Priority: P3 → P2
No longer depends on: 1167168
Attached file PR 640 (obsolete) (deleted) —
Attachment #8623783 - Flags: feedback?(emorley)
Comment on attachment 8623783 [details] PR 640 Left a comment - re-request feedback/review when that change is made, so flake8 passes and the tests can run :-)
Attachment #8623783 - Flags: feedback?(emorley)
:KWierso what's the current status of this bug? Is attachment 8623783 [details] ready for a r?
Flags: needinfo?(wkocher)
The PR hasn't been updated since the feedback request in comment 6. (Side note: camd also has a PR open for this bug, from a while ago https://github.com/mozilla/treeherder/pull/637)
:camd :KWierso are your patches overlapping?
Flags: needinfo?(cdawson)
I'd rather someone else take over this.
Flags: needinfo?(wkocher)
Yeah, I'm happy to take this on. I'll revisit my PR and see if it can be(or needs to be) merged with wes' and made ready for review.
Flags: needinfo?(cdawson)
Assignee: nobody → cdawson
Depends on: 1165335
No longer blocks: 1188661
As part of this work, we'll end up having to revert bug 1155451 (streaming the gzipped log and parsing as we do so), unless we can find a way to do similar with requests. If that's not possible with requests, IMO we should file a feature request against requests and then still switch to it, since the simplification if offers is preferable (again IMO) to the perf optimisation from streaming.
(In reply to Ed Morley [:emorley] from comment #12) > As part of this work, we'll end up having to revert bug 1155451 (streaming > the gzipped log and parsing as we do so), unless we can find a way to do > similar with requests. If that's not possible with requests, IMO we should > file a feature request against requests and then still switch to it, since > the simplification if offers is preferable (again IMO) to the perf > optimisation from streaming. Requests offers an API to let you stream content: http://docs.python-requests.org/en/latest/user/quickstart/#raw-response-content I would push back against any patch which didn't preserve the optimization. Log parsing speed is still an issue when we're trying to recover from problems -- simplicity of implementation is less of a concern for a 20-line, well-isolated block of code.
The perf gain was pretty small in bug 1155451, and that was testing locally rather than on a datacenter that's both closer to where the logs are located (at least for FTP) and also likely with a faster connection. The streaming also obscured the problem with slow downloads recently, since it distorted the profiles shown in New Relic. So I think it's not as straightforwards a tradeoff as it seems at first glance. Though either way I think this comes down to benchmarks (like it did for Cython) - we can run them with and without - and on the actual treeherder stage (or Heroku) nodes to make a decision later :-)
(In reply to Ed Morley [:emorley] from comment #14) > The streaming also obscured the problem with slow downloads recently, > since it distorted the profiles shown in New Relic. There may also be more we can do to annotate the method so this is clearer in New Relic.
Sure, profiles (which is something that should be done regardless) and/or compelling arguments on maintainability could convince me otherwise. That said, replacing this chunk of code is pretty low priority unless there's something I'm missing...
I didn't say it was high priority :-) James queried in IRC why we were using a chunk size of 4K, which reminded me about the implementation and how it might conflict with the desire to move to the requests library. Now we know there is a way to implement similar with requests, deciding whether to stream or not stream is orthogonal to this bug and can be decided at another point.
Depends on: 1214183
I'm going to be touching parts of this for bug 1230610 et al.
Assignee: cdawson → emorley
This will be helped by the refactoring in bug 1230610.
Depends on: 1230610
Depends on: 1240809
I have a WIP for the ETL parts using urllib2, in bug 1240809. With that fixed, it just leaves the log parser iirc, which can be the focus of this bug.
Assignee: emorley → nobody
Priority: P2 → P3
Depends on: 1275903
Depends on: 1191934
Depends on: 1276213
Depends on: 1276219
Assignee: nobody → emorley
Blocks: 1295997
Attachment #8849352 - Flags: review?(cdawson)
Attachment #8623783 - Attachment is obsolete: true
Comment on attachment 8849352 [details] [treeherder] mozilla:logparser-requests > mozilla:master Awesome! Nice work! :)
Attachment #8849352 - Flags: review?(cdawson) → review+
Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/9c14899e0cfe781bb90be13f0534ee36672b9550 Bug 1165356 - Log parser: Improve readability of test_parse_log output It's only ever consumed by humans, so readability takes priority. https://github.com/mozilla/treeherder/commit/823189e4c7793131cc2b0491853a5234eaec8153 Bug 1165356 - Log parser: Simplify error messages in post_log_artifacts Prior to bug 1284289, exceptions of type `urllib2.URLError` had different retry behaviour, however there's now no need for the conditional just for the sake of a different error message wording. https://github.com/mozilla/treeherder/commit/6717d9f5fc71ff2477968b827a55344d765e13bd Bug 1165356 - Log parser: Fetch logs using requests instead of urllib2 Doing so also fixes incorrect line numbers in logs that have Windows line endings (bug 1328880), hence having to update the expected logview output. The tests have to be adjusted to use responses, since requests doesn't support `file://` URLs. https://github.com/mozilla/treeherder/commit/aff2152a810b5d68d17252669185aca9f34d28bd Bug 1165356 - Log parser: Remove tests workaround for file:// log URLs Now that we're using requests, the log URL being used to access the file is consistent across environments (since it doesn't reference the local directory structure), so we don't need to exclude it from comparisons.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Blocks: 1328880
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: