1203556 - Implement a GET rate limit for Treeherder's API

Reporter

Description

•

9 years ago

Bug 1203518 was caused by thousands of requests to: https://treeherder.mozilla.org/api/project/try/jobs/?count=2000&result_set_id__in=&return_type=list We rate limit POSTs to our API, but not GETs to it. We should do that :-) https://github.com/mozilla/treeherder/blob/801596504f5c626058a060b56c75480e0e71c00c/treeherder/settings/base.py#L282-L288 https://github.com/mozilla/treeherder/blob/master/treeherder/webapp/api/throttling.py https://github.com/mozilla/treeherder/blob/90ba77e596e2c3329e0b840fe45e661e8e2a0370/treeherder/webapp/api/jobs.py#L19 https://github.com/mozilla/treeherder/blob/90ba77e596e2c3329e0b840fe45e661e8e2a0370/treeherder/webapp/api/resultset.py#L21

Ed Morley [:emorley]

Reporter

Comment 1

•

9 years ago

We could always set quite a low limit (not so low that the UI hits it, but low for tooling) and then allow people to apply for read-only API keys as part of bug 1160111 that gave them higher limits? (Similar to many other site's APIs).

Mauro Doglio [:mdoglio]

Comment 2

•

9 years ago

+1 to your proposal in comment 1!

William Lachance (:wlach)

Comment 3

•

9 years ago

(In reply to Ed Morley [:emorley] from comment #1) > We could always set quite a low limit (not so low that the UI hits it, but > low for tooling) and then allow people to apply for read-only API keys as > part of bug 1160111 that gave them higher limits? (Similar to many other > site's APIs). I'm not inherently opposed to the idea, but I think we do want to encourage people to try to use the treeherder api to mine data and create dashboards, and I'm a bit concerned that making them jump through hoops might discourage this. Also, using the API is one of the easiest routes to populating a local server with performance data (which we might want to expand upon later to encompass jobs and result sets). Can we start by just setting some kind of high (but reasonable) limit to make sure people don't DDOS our server, and then expand on it if necessary?

Ed Morley [:emorley]

Reporter

Comment 4

•

9 years ago

That's exactly what has been proposed. Note that the solution in bug 1160111 will eventually be self-serve, so even then it won't be a big hoop.

Ed Morley [:emorley]

Reporter

Comment 5

•

9 years ago

(In reply to Ed Morley [:emorley] from comment #4) > That's exactly what has been proposed. In that: bug 1160111 might not be ready for a while (and presumably won't have full self-serve initially), and we're not going to impose a low limit here whilst that's the case, but could lower it in the future. Once it is self-serve, that seems a small price to pay to avoid re-occurrences of today's tree closure due to one IP hammering us.

Ed Morley [:emorley]

Reporter

Comment 6

•

9 years ago

Sounds like we need to use AnonRateThrottle: http://www.django-rest-framework.org/api-guide/throttling/ "The AnonRateThrottle will only ever throttle unauthenticated users. The IP address of the incoming request is used to generate a unique key to throttle against."

Ed Morley [:emorley]

Reporter

Comment 7

•

9 years ago

Though we'll have to check that in production: (a) Zeus sets X-Forwarded-For correctly (b) When apache proxies the request to gunicorn, X-Forwarded-For is passed through too ...otherwise all requests will appear to the django-rest-framework as having come from the same IP, we'll have some fun and games.

Ed Morley [:emorley]

Reporter

Comment 8

•

9 years ago

With some of the other recent changes I think this is less urgent. It's also going to end up being more of a can of worms than it appears as first glance IMO. (eg multiple people sharing an IP address using the Treeherder UI but who are not logged in etc).

Priority: P1 → P3

Ed Morley [:emorley]

Reporter

Updated

•

8 years ago

Blocks: 1289830

Ed Morley [:emorley]

Reporter

Updated

•

8 years ago

Assignee: nobody → emorley

Ed Morley [:emorley]

Reporter

Comment 9

•

8 years ago

Rough thoughts on how to do this: * Adjust hawk credentials so that even when they are not authorised, they can be used for GETs. * Improve docs for Hawk auth, emphasising that credential use is recommended, even when not submitting data. (The Python client already supports their use for GETs etc). * Make sure the throttle key handling is working properly for Hawk client_ids * Add support to the throttle key handling for a user's Django login too * As a fallback, add an IP-based throttle * Check that zeus/gunicorn/django settings are correctly passing the IP through to the throttling layer * Pick initially *very* lenient throttling defaults * Make sure we add lots of logging * Add support to Treeherder's UI for detecting hitting the throttling limit (the limits would be set so high this should never happen, but just in case eg for shared IP addresses), which would say to users "Too many requests from this IP, please login to raise your limit" or similar. * Open PRs against various projects that interact with our API (starting with those that appear high in the "counts by User Agent" analysis of the webhead logs) to get them to use credentials * Add extensive docs for all this on Read the Docs + email newsgroups etc Even a very high rate limit (eg 20,000 requests per hour per user or IP address) would have still helped with bug 1289830, where 1 million requests were made in 4 hours.

Kyle Lahnakoski [:ekyle]

Comment 10

•

8 years ago

Adding an additional limit, based on User-Agent, will help with the ActiveData-ETL case. ActiveData's ETL machines are constantly changing IPs because they are spot instances.

Ed Morley [:emorley]

Reporter

Comment 11

•

8 years ago

(In reply to Ed Morley [:emorley] from comment #9) > Rough thoughts on how to do this: Also: * Make sure responses have things like the `Retry-After` header set, so clients know when to try again. * Make sure the python client and optionally node client handle the HTTP 429 to indicate they should back off. * See if we can add "remaining API request quota" counts to the credentials management page. An interesting alternative throttling strategy we can use in the future: http://jsatt.com/blog/abusing-django-rest-framework-part-2-non-rate-based-throttling/ (In reply to Kyle Lahnakoski [:ekyle] from comment #10) > Adding an additional limit, based on User-Agent, will help with the > ActiveData-ETL case. ActiveData's ETL machines are constantly changing IPs > because they are spot instances. The problem is that User Agents can be spoofed. In the ActiveData-ETL case, we'd just get you to use Hawk credentials for all requests, then your user id would give you your own limit :-)

Ed Morley [:emorley]

Reporter

Comment 12

•

8 years ago

> The problem is that User Agents can be spoofed. ie a malicious party could: a) Use a random user agent each time, thereby working around any limit completely b) Spoof eg the ActiveData-ETL user agent and use up your limit, causing your legitimate requests to fail

Kyle Lahnakoski [:ekyle]

Comment 13

•

8 years ago

My suggestion for User-Agent limiting was to resolve the problem of friendly agents making too many requests. Limiting malicious parties seems harder, and does not seem to be an existential problem.

Ed Morley [:emorley]

Reporter

Updated

•

8 years ago

Assignee: emorley → nobody

Ed Morley [:emorley]

Reporter

Comment 14

•

7 years ago

Lets not worry about this for now.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Bugzilla

Implement a GET rate limit for Treeherder's API

Categories

(Tree Management :: Treeherder: API, defect, P3)

Tracking

(Not tracked)

People

(Reporter: emorley, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 14