Closed Bug 1267683 Opened 9 years ago Closed 8 years ago

Prevent default scripting user agents from interacting with Treeherder

Categories

(Tree Management :: Treeherder: API, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(2 files)

In the past we've had infra incidents caused by excessive use of Treeherder's API (both accidental and deliberate). To make these easier to diagnose, IMO all consumers of our API should set an appropriate user agent. Bug 1230222 and its dependants have been making existing consumers do so, and we're now at the point where we can consider blocking some default User Agents, such as: 'libcurl/' 'Python-urllib/' 'python-requests/' To implement this, we can use Django Rest framework's permissions system, per: http://www.django-rest-framework.org/api-guide/permissions/#custom-permissions This is preferable to using Django's own user agent blacklisting feature (https://docs.djangoproject.com/en/1.8/ref/settings/#std:setting-DISALLOWED_USER_AGENTS), since that (a) affects static assets too, (b) doesn't allow for setting a custom permission denied response: https://github.com/django/django/blob/1.8.12/django/middleware/common.py#L58 With Django rest framework, we can set a custom error response so the user knows that they just need to set an appropriate UA, per: https://github.com/tomchristie/django-rest-framework/issues/3754#issuecomment-206953020 Note: I'll be specifically not blacklisting UAs for tools like curl itself (only the library libcurl), so CLI experimenting still works.
Thinking about this more, I don't think it's worth the added complexity of using a custom django-rest-framework permissions class, just so we can set a custom error message instead of Django's default of 'Forbidden user agent'. By sticking with Django's feature we save needing to use the workaround in https://github.com/tomchristie/django-rest-framework/issues/3754#issuecomment-206953020 as well as having to add our own tests, and we can just document the custom user agent requirement on read the docs.
Summary: Prevent default scripting user agents from interacting with Treeherder's API → Prevent default scripting user agents from interacting with Treeherder
Attachment #8750802 - Flags: review?(wlachance)
Comment on attachment 8750802 [details] [treeherder] mozilla:user-agent-blacklist > mozilla:master I'd prefer if we only enforced this on treeherder production, could you add a setting for that and document it? r+ with that addressed.
Attachment #8750802 - Flags: review?(wlachance) → review+
As opposed to stage, local or ..? One disadvantage of that is if someone is testing something against stage and it works, they would I think rightly be surprised if it doesn't work against prod. It's worth noting that the UAs blacklisted specifically don't include the command line client curl.
(In reply to Ed Morley [:emorley] from comment #4) > As opposed to stage, local or ..? One disadvantage of that is if someone is > testing something against stage and it works, they would I think rightly be > surprised if it doesn't work against prod. > > It's worth noting that the UAs blacklisted specifically don't include the > command line client curl. Possibly... though I think such a person is more likely to just start testing in production (where they would also be confused). :) I personally like being able to just open a developer console, loading requests, and experimenting against treeherder. I think we should make that use-case "just work" without any extra hoops to jump through.
(In reply to William Lachance (:wlach) from comment #5) > I personally like being able to just open a developer console, loading > requests, and experimenting against treeherder. I think we should make that > use-case "just work" without any extra hoops to jump through. Doing so via the console is pretty easy: >>> requests.get(url, headers={'User Agent': 'wlach'}) I can add that example to the docs entry if that helps? The problem is that is one of the very cases where people can abuse our API. Or at least it's hard to tell that case from some script running locally. Also, once we set rate limits on GETs too (bug 1203556), setting a UA is going to be the least of it. In addition, I would posit that a naive user experimenting is most likely going to be doing so against Treeherder production, in which case leaving this blocklist switched off for local/stage doesn't help them, and only contributes towards an ever-growing list of differences between environments, when in an ideal world we should be making them as similar as possible.
In parallel with this work I was thinking about sending out an email to some of the tools/automation lists encouraging unrelated projects to set User Agents when they make requests too. IMO it should be considered best (or even normal) practice to set UAs when making requests - it sure would have helped people diagnose hg.m.o issues for example.
(In reply to Ed Morley [:emorley] from comment #6) > (In reply to William Lachance (:wlach) from comment #5) > > I personally like being able to just open a developer console, loading > > requests, and experimenting against treeherder. I think we should make that > > use-case "just work" without any extra hoops to jump through. > > Doing so via the console is pretty easy: > > >>> requests.get(url, headers={'User Agent': 'wlach'}) > > I can add that example to the docs entry if that helps? Yes, I think that would be helpful. Ok, let's just land it as-is (with that docs update). We can revisit this approach if it proves too annoying. I think a message to the tools mailing lists (maybe even dev.platform) would be useful too.
(In reply to William Lachance (:wlach) from comment #8) > I think a message to the tools mailing lists (maybe even dev.platform) would > be useful too. Please do send out a message to (at least) the tools mailing list.
That's the plan :-) (Also waiting on bug 1230222 comment 9)
Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/decf573e9ff146fcf91488b478adb5ff8e53378c Bug 1267683 - Docs: Add a generic API section For content that's not specifically related to only one of submitting or retrieving data from the Treeherder API, to avoid duplication. Ideally the submitting/retrieving sections would be nested under this new REST API section, however there isn't a way to get Sphinx to do this that doesn't then mis-display the subheadings: http://stackoverflow.com/questions/25276415/prevent-sub-section-nesting-in-python-sphinx-when-using-toctree https://github.com/mozilla/treeherder/commit/c49e6953908ca8881217be449361052d9ad294d1 Bug 1267683 - Docs: Move Python client section under 'REST API' https://github.com/mozilla/treeherder/commit/f66cd6db9387e84889a2425198b9e8915069ddc8 Bug 1267683 - Docs: Move authentication section under 'REST API' Since in the future we'll be adding rate limiting for all API requests and as such authentication is not specific to just data submissions. https://github.com/mozilla/treeherder/commit/f93329bb7b7a23e36d1dc56aea9516e3496597e1 Bug 1267683 - Docs: Move 'managing API credentials' under 'REST API' To make the content easier to find, and reduce the size of the generic 'common tasks' section. https://github.com/mozilla/treeherder/commit/c5a0513413e549c9b8d6c79338edaa0683fcaf9c Bug 1267683 - Docs: Describe the upcoming User Agent requirements We're soon going to start blacklisting some default scripting User Agents, to try and make accidental API abuse easier to trace back to the source. The docs are being updated in advance, so that the newsgroup posts have the real pages available to link to.
Newsgroup post sent to tools-treeherder and the tools list: https://groups.google.com/forum/#!topic/mozilla.tools.treeherder/fOIGkDG60JA I'll hold off landing the actual changes here for a bit.
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/fb29dd89428345188416fb55f747eb6ac9552c07 Bug 1267683 - Forbid requests made using default scripting user agents In the past we've had infra incidents caused by excessive use of Treeherder's API. To make the accidental cases of these easier to diagnose, consumers of our API should set an appropriate user agent. To enforce this, common default user agents are now blacklisted using the Django common middleware: https://docs.djangoproject.com/en/1.8/ref/middleware/#module-django.middleware.common https://docs.djangoproject.com/en/1.8/ref/settings/#disallowed-user-agents Docs for this change have landed in advance, in #1534.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: