Closed Bug 1203556 Opened 9 years ago Closed 7 years ago

Implement a GET rate limit for Treeherder's API

Categories

(Tree Management :: Treeherder: API, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

References

Details

We could always set quite a low limit (not so low that the UI hits it, but low for tooling) and then allow people to apply for read-only API keys as part of bug 1160111 that gave them higher limits? (Similar to many other site's APIs).
+1 to your proposal in comment 1!
(In reply to Ed Morley [:emorley] from comment #1) > We could always set quite a low limit (not so low that the UI hits it, but > low for tooling) and then allow people to apply for read-only API keys as > part of bug 1160111 that gave them higher limits? (Similar to many other > site's APIs). I'm not inherently opposed to the idea, but I think we do want to encourage people to try to use the treeherder api to mine data and create dashboards, and I'm a bit concerned that making them jump through hoops might discourage this. Also, using the API is one of the easiest routes to populating a local server with performance data (which we might want to expand upon later to encompass jobs and result sets). Can we start by just setting some kind of high (but reasonable) limit to make sure people don't DDOS our server, and then expand on it if necessary?
That's exactly what has been proposed. Note that the solution in bug 1160111 will eventually be self-serve, so even then it won't be a big hoop.
(In reply to Ed Morley [:emorley] from comment #4) > That's exactly what has been proposed. In that: bug 1160111 might not be ready for a while (and presumably won't have full self-serve initially), and we're not going to impose a low limit here whilst that's the case, but could lower it in the future. Once it is self-serve, that seems a small price to pay to avoid re-occurrences of today's tree closure due to one IP hammering us.
Sounds like we need to use AnonRateThrottle: http://www.django-rest-framework.org/api-guide/throttling/ "The AnonRateThrottle will only ever throttle unauthenticated users. The IP address of the incoming request is used to generate a unique key to throttle against."
Though we'll have to check that in production: (a) Zeus sets X-Forwarded-For correctly (b) When apache proxies the request to gunicorn, X-Forwarded-For is passed through too ...otherwise all requests will appear to the django-rest-framework as having come from the same IP, we'll have some fun and games.
With some of the other recent changes I think this is less urgent. It's also going to end up being more of a can of worms than it appears as first glance IMO. (eg multiple people sharing an IP address using the Treeherder UI but who are not logged in etc).
Priority: P1 → P3
Blocks: 1289830
Assignee: nobody → emorley
Rough thoughts on how to do this: * Adjust hawk credentials so that even when they are not authorised, they can be used for GETs. * Improve docs for Hawk auth, emphasising that credential use is recommended, even when not submitting data. (The Python client already supports their use for GETs etc). * Make sure the throttle key handling is working properly for Hawk client_ids * Add support to the throttle key handling for a user's Django login too * As a fallback, add an IP-based throttle * Check that zeus/gunicorn/django settings are correctly passing the IP through to the throttling layer * Pick initially *very* lenient throttling defaults * Make sure we add lots of logging * Add support to Treeherder's UI for detecting hitting the throttling limit (the limits would be set so high this should never happen, but just in case eg for shared IP addresses), which would say to users "Too many requests from this IP, please login to raise your limit" or similar. * Open PRs against various projects that interact with our API (starting with those that appear high in the "counts by User Agent" analysis of the webhead logs) to get them to use credentials * Add extensive docs for all this on Read the Docs + email newsgroups etc Even a very high rate limit (eg 20,000 requests per hour per user or IP address) would have still helped with bug 1289830, where 1 million requests were made in 4 hours.
Adding an additional limit, based on User-Agent, will help with the ActiveData-ETL case. ActiveData's ETL machines are constantly changing IPs because they are spot instances.
(In reply to Ed Morley [:emorley] from comment #9) > Rough thoughts on how to do this: Also: * Make sure responses have things like the `Retry-After` header set, so clients know when to try again. * Make sure the python client and optionally node client handle the HTTP 429 to indicate they should back off. * See if we can add "remaining API request quota" counts to the credentials management page. An interesting alternative throttling strategy we can use in the future: http://jsatt.com/blog/abusing-django-rest-framework-part-2-non-rate-based-throttling/ (In reply to Kyle Lahnakoski [:ekyle] from comment #10) > Adding an additional limit, based on User-Agent, will help with the > ActiveData-ETL case. ActiveData's ETL machines are constantly changing IPs > because they are spot instances. The problem is that User Agents can be spoofed. In the ActiveData-ETL case, we'd just get you to use Hawk credentials for all requests, then your user id would give you your own limit :-)
> The problem is that User Agents can be spoofed. ie a malicious party could: a) Use a random user agent each time, thereby working around any limit completely b) Spoof eg the ActiveData-ETL user agent and use up your limit, causing your legitimate requests to fail
My suggestion for User-Agent limiting was to resolve the problem of friendly agents making too many requests. Limiting malicious parties seems harder, and does not seem to be an existential problem.
Assignee: emorley → nobody
Lets not worry about this for now.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.