Closed
Bug 1088329
Opened 10 years ago
Closed 10 years ago
Investigate bottleneck in homepage feed API endpoint
Categories
(Marketplace Graveyard :: API, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cvan, Unassigned)
References
()
Details
(Keywords: perf)
Attachments
(2 files)
This bug was filed in response to bug 1085739 (bug 1085739 comment 7, specifically). This bug is to do profiling to figure out what our the bottlenecks in returning a speedy response. My hope is that we can discover exactly where all of the computing time is spent in the client->server->client request lifecycle.
I'm excited!
(By Christopher Van Wiemeersch [:cvan] from bug 1085739 comment 7)
> We agree we should increase the `max-age`, we already serve the important public API responses from the CDN to avoid hitting Zamboni directly, and we cache all API responses in `localStorage` and refresh them asynchronously - but all these perf tricks are only masking an underlying problem with Zamboni/the webheads. Do we know why hitting Zamboni directly is taking so long? Servers? TLS/SSL? Nginx? Python? Django? Elasticsearch? MySQL? Redis? Which tools can we use to profile this and who can help?
from CDN (with response header of `Cache-Control:must-revalidate, max-age=180`, gets cached on CDN):
https://marketplace.cdn.mozilla.net/api/v2/feed/get/?lang=en-US&limit=10®ion=us&cache=1&vary=0
from Zamboni directly:
https://marketplace.firefox.com/api/v2/feed/get/?lang=en-US&limit=10®ion=us&cache=1&vary=0
Reporter | ||
Comment 1•10 years ago
|
||
New Relic seems like a legit tool to use. Ask oremj for access, if you don't have perms.
Examples:
mkt.fireplace.views:SearchView
https://rpm.newrelic.com/accounts/315282/applications/2914756/transactions?type=app#id=570429503
mkt.feed.views:FeedView
https://rpm.newrelic.com/accounts/315282/applications/2914756/transactions?type=app#id=570429503
https://rpm.newrelic.com/accounts/315282/applications/2914756/databases#id=280041187
https://docs.newrelic.com/docs/apm/transactions-menu/x-ray-sessions/x-ray-sessions (but seems outdated)
Check out the "Thread profiler":
https://rpm.newrelic.com/accounts/315282/applications/2914756/profiles
Comment 2•10 years ago
|
||
So, FeedView from start to finish, including middlewares etc take between 1 and 1.5 seconds. That's already a lot, there is room for improvement.
Interestingly, our Ajax response times for marketplace.cdn.mozilla.net/api/v2/feed/get/ are much worse. This seems to indicate that the CDN is evicting our content from its cache.
Locally:
- Loading the endpoint a couple times it takes between 1.5 and 3 seconds to load
- All subsequent loads take 0.150 to 0.250 ms.
That seems to prove this is a sound theory. I've merged bug 1088362 and bug 1088361 so *next tuesday* we should see some improvements, as this endpoint will get cached for 6 hours.
Updated•10 years ago
|
Priority: -- → P3
Comment 3•10 years ago
|
||
I started looking into this and dropping in some timer decorators around various things to try to get a more fine-grained idea of what inside FeedView is taking so long.
Pretty quickly I discovered that all of FeedView is pretty fast except for these lines here where the majority of time is spent:
https://github.com/mozilla/zamboni/blob/2014.11.04/mkt/feed/views.py#L726-L731
From there I started digging deeper. There are many `to_native` calls, some taking longer than others but nothing that really seemed to take a majority of time. Then I found that this call:
https://github.com/mozilla/zamboni/blob/2014.11.04/mkt/feed/serializers.py#L70-L80
would take up 1/3rd of the time of the view but only the first time it is called. On successive calls it takes 1/10th of its original time. I couldn't figure why the first time takes so long.
There are also a couple thing just reading the code that might be cause for concern:
1. In some edge cases when a feed for a certain region has feed items but those feed items get filtered out, we essentially call the FeedView a 2nd time except with region=RESTOFWORLD. This is to ensure we don't deliver an empty feed. But this also means if the feed took 1s to load and came up empty we've got another 1s to get the RoW feed. This happens here: https://github.com/mozilla/zamboni/blob/2014.11.04/mkt/feed/views.py#L664-L674 We should probably added some logging here to get a sense of how often this might happen (if at all?) https://github.com/mozilla/zamboni/blob/2014.11.04/mkt/feed/views.py#L738
2. The note here about https://github.com/mozilla/zamboni/blob/2014.11.04/mkt/feed/serializers.py#L70-L80 to native getting called twice. In my timing output I saw while `to_native` is pretty fast it's by no means free and cumulatively could add up to a decent amount of time since it gets called a lot. If we could avoid the double call I think it would help a little.
There isn't much actionable here but I think knowing that Elasticsearch isn't the culprit is a good discovery. On -dev loading the region=us feed results in 3 ES queries with the following times as reported by the ES call output: 0.004s, 0.006s, and 0.068s... 0.078s in total. All the time spent was after these 3 ES calls in Python serializing the data into JSON.
Another oddity is that if you hit the API endpoint it seems the first time is always slow, while successive calls are faster.
time curl 'https://marketplace-dev.allizom.org/api/v2/feed/get/?cache=1&lang=en-US&limit=25®ion=us' => 2.339s and a successive call is 0.590s. I'm not sure why this is.
I'm open to exploring more and digging deeper but I wanted to report what I had so far if anyone had ideas of where to go next.
Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Rob Hudson [:robhudson] from comment #3)
> time curl
> 'https://marketplace-dev.allizom.org/api/v2/feed/get/?cache=1&lang=en-
> US&limit=25®ion=us' => 2.339s and a successive call is 0.590s. I'm not
> sure why this is.
Drop the `?cache=1` and you'll notice that both take ~2.339s. This is because passing `?cache=1` causes Zamboni to set this response header, `Cache-Control: must-revalidate, max-age=180`, which caches the response for 3 minutes.
––
Lots of great discoveries. Thanks, Rob! What are the actionable tasks in your opinion?
Comment 5•10 years ago
|
||
?cache=1 shouldn't affect the timings, since it only sets the response header, we are not using the CDN here, and we are using `curl` and not a browser, so there is nothing doing caching ?
Comment 6•10 years ago
|
||
(In reply to Mathieu Pillard [:mat] from comment #5)
> ?cache=1 shouldn't affect the timings, since it only sets the response
> header, we are not using the CDN here, and we are using `curl` and not a
> browser, so there is nothing doing caching ?
Whatever it does, however, does make the times consistently slow when doing cache=0.
(In reply to Christopher Van Wiemeersch [:cvan] from comment #4)
> Lots of great discoveries. Thanks, Rob! What are the actionable tasks in
> your opinion?
I'm not sure exactly. I'd definitely like to see a log so we know how often we run through the full FeedView twice with the 2nd time using region=RESTOFWORLD. Hopefully that never happens.
Other than that I think looking into speeding up the serialization process would be the big win and along the way trying to avoid the double `to_native` call? The labyrinth of serializer classes is a challenge to really see what's going on.
Reporter | ||
Comment 7•10 years ago
|
||
(In reply to Mathieu Pillard [:mat] from comment #5)
> ?cache=1 shouldn't affect the timings, since it only sets the response
> header, we are not using the CDN here, and we are using `curl` and not a
> browser, so there is nothing doing caching ?
Yeah, that's what I thought.
I know that `?cache=1` should add a `Cache-Control: must-revalidate, max-age=180`, but why is it also adding an `Expires` header for 3 minutes in the future?
> Expires: Fri, 31 Oct 2014 22:37:04 GMT
And `?cache=1` is also adding this header:
> X-Cache-Info: cached
https://www.dropbox.com/s/36jsy70eab3shyl/Screenshot%202014-10-31%2015.37.22.png?dl=0
FWIW, I was not able to reproduce this behaviour on prod.
That same endpoint on prod takes the same amount of time regardless of the `?cache=1` query-string parameter:
> % time curl 'https://marketplace.firefox.com/api/v2/feed/get/?lang=en-US&limit=25®ion=us' -I ~
> curl -I 0.04s user 0.01s system 2% cpu 2.031 total
> % time curl 'https://marketplace.firefox.com/api/v2/feed/get/?lang=en-US&limit=25®ion=us&cache=1' -I ~
> curl -I 0.04s user 0.01s system 2% cpu 2.033 total
Same thing on stage.
Y'all think this is a regression from bug 1088362? Or something wonky going on with -dev setup?
Comment 8•10 years ago
|
||
I don't think it's a regression from bug 1088362, I think it's something on dev configuration: X-Cache-Info sounds like something from nginx. I suspect django is not doing any caching, it's just that when we add headers suddently nginx starts caching it.
We should ask ops.
Comment 9•10 years ago
|
||
(In reply to Christopher Van Wiemeersch [:cvan] from comment #7)
> That same endpoint on prod takes the same amount of time regardless of the
> `?cache=1` query-string parameter:
That's not what I'm seeing. I get 2.138 first time, then 0.393 after when passing ?cache=1.
Reporter | ||
Comment 10•10 years ago
|
||
Flags: needinfo?(jthomas)
Comment 11•10 years ago
|
||
On -dev and stage X-Cache-Info is set by the LB. On prod it will say something like 'Via: NS-CACHE-10.1'.
-dev: cache=1 vs no cache
sdiff <(curl https://marketplace-dev.allizom.org/api/v2/feed/get/\?lang\=en-US\&limit\=25\®ion\=us\&cache\=1 -D- -so /dev/null) <(curl https://marketplace-dev.allizom.org/api/v2/feed/get/\?lang\=en-US\&limit\=25\®ion\=us -D- -so /dev/null)
HTTP/1.1 200 OK HTTP/1.1 200 OK
Content-Security-Policy-Report-Only: script-src https://*.new Content-Security-Policy-Report-Only: script-src https://*.new
Server: nginx Server: nginx
Vary: Accept, API-Filter, Accept-Language, Cookie, Accept-Enc Vary: Accept, API-Filter, Accept-Language, Cookie, Accept-Enc
X-Backend-Server: dev1 X-Backend-Server: dev1
Cache-Control: must-revalidate, max-age=180 <
Content-Type: application/json Content-Type: application/json
Access-Control-Expose-Headers: API-Filter, API-Status, API-Ve Access-Control-Expose-Headers: API-Filter, API-Status, API-Ve
Strict-Transport-Security: max-age=31536000 Strict-Transport-Security: max-age=31536000
Date: Tue, 04 Nov 2014 17:47:47 GMT | Date: Tue, 04 Nov 2014 17:49:22 GMT
API-Pinned: False API-Pinned: False
Expires: Tue, 04 Nov 2014 17:50:47 GMT <
Transfer-Encoding: chunked Transfer-Encoding: chunked
Access-Control-Allow-Origin: * Access-Control-Allow-Origin: *
ETag: "692375c51aa4e102e8c23daec64ea8f0" ETag: "692375c51aa4e102e8c23daec64ea8f0"
Via: Moz-zlb10 Via: Moz-zlb10
> Connection: keep-alive
API-Filter: carrier=&lang=en-US&pro=®ion=us API-Filter: carrier=&lang=en-US&pro=®ion=us
Last-Modified: Tue, 04 Nov 2014 17:47:47 GMT <
Allow: GET, HEAD, OPTIONS Allow: GET, HEAD, OPTIONS
Access-Control-Allow-Methods: GET, OPTIONS Access-Control-Allow-Methods: GET, OPTIONS
Access-Control-Allow-Headers: X-HTTP-Method-Override, Content Access-Control-Allow-Headers: X-HTTP-Method-Override, Content
API-Version: 2 API-Version: 2
Connection: Keep-Alive <
X-Cache-Info: cached <
prod:
sdiff <(curl https://marketplace.firefox.com/api/v2/feed/get/\?lang\=en-US\&limit\=25\®ion\=us\&cache\=1 -D- -so /dev/null) <(curl https://marketplace.firefox.com/api/v2/feed/get/\?lang\=en-US\&limit\=25\®ion\=us -D- -so /dev/null)
HTTP/1.1 200 OK HTTP/1.1 200 OK
Age: 3 <
Date: Tue, 04 Nov 2014 17:51:12 GMT <
Expires: Tue, 04 Nov 2014 17:48:06 GMT <
Cache-Control: max-age=180 ,must-revalidate <
Content-Length: 24225 <
Connection: Keep-Alive <
Via: NS-CACHE-10.1: 11 <
ETag: "cd6faa3d6d4b974c89e9209fe554832e" <
Server: nginx Server: nginx
> Date: Tue, 04 Nov 2014 17:51:14 GMT
Content-Type: application/json Content-Type: application/json
> Transfer-Encoding: chunked
> Connection: keep-alive
API-Pinned: False API-Pinned: False
Access-Control-Allow-Methods: GET, OPTIONS Access-Control-Allow-Methods: GET, OPTIONS
API-Filter: carrier=&lang=en-US&pro=®ion=us API-Filter: carrier=&lang=en-US&pro=®ion=us
Access-Control-Expose-Headers: API-Filter, API-Status, API-Ve Access-Control-Expose-Headers: API-Filter, API-Status, API-Ve
Vary: Accept, API-Filter, Accept-Language, Cookie, Accept-Enc Vary: Accept, API-Filter, Accept-Language, Cookie, Accept-Enc
Last-Modified: Tue, 04 Nov 2014 17:45:06 GMT | ETag: "cd6faa3d6d4b974c89e9209fe554832e"
Strict-Transport-Security: max-age=31536000 Strict-Transport-Security: max-age=31536000
Allow: GET, HEAD, OPTIONS Allow: GET, HEAD, OPTIONS
API-Version: 2 API-Version: 2
Access-Control-Allow-Origin: * Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-HTTP-Method-Override, Content Access-Control-Allow-Headers: X-HTTP-Method-Override, Content
X-Backend-Server: web2 | X-Backend-Server: web4
Flags: needinfo?(jthomas)
Comment 12•10 years ago
|
||
https://gist.github.com/jasonthomas/b16114fa3ac73f57b494 for a better output.
Comment 13•10 years ago
|
||
As per my conversation with :mat, we aren't really gaining much by having LB cache enabled. This is now disabled on -dev, stage and prod.
Comment 14•10 years ago
|
||
Found some small optimizations committed here:
https://github.com/mozilla/zamboni/commit/06ed8d2
1. The app serializer was using `AppESField` which serializes all fields for an app object but we only need a handful for the feed page.
2. In some cases we could have queried for a list of apps and that list wasn't unique.
Comment 15•10 years ago
|
||
After profiling the serializer call it seems much of the time spent is spent resolving urls. See the attached profile output. We can see what urls this serialization process is resolving and possibly resolve them during indexing and skip the need for this at view time.
Comment 16•10 years ago
|
||
Reporter | ||
Comment 17•10 years ago
|
||
Heh very interesting. On a different Django project I discovered the same thing.
Which tools did you use to profile this?
Comment 18•10 years ago
|
||
Comment 19•10 years ago
|
||
(In reply to Christopher Van Wiemeersch [:cvan] from comment #17)
> Heh very interesting. On a different Django project I discovered the same
> thing.
>
> Which tools did you use to profile this?
I copy/pasted the (attached) profile.py into `./manage.py shell` and it prints out the profile view. I tried to run it directly but zamboni does some patch hacking I couldn't reproduce so this was easier. Hopefully we'll move the path hacking once we finish moving everything out of apps/.
Comment 20•10 years ago
|
||
This commit slims down the app object of collection listing feed elements:
https://github.com/mozilla/zamboni/commit/7108c1a
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Blocks: marketplace-perf
You need to log in
before you can comment on or make changes to this bug.
Description
•