Closed Bug 926640 Opened 11 years ago Closed 10 years ago

[tracking] Improve API performance

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: chuck, Unassigned)

References

Details

(Keywords: perf)

Keywords:

perf

Votes:

Attachments

(3 files, 2 obsolete files)

Screen shot of weekly performance report email 11 years ago Chuck Harmston [:chuck] (deleted), image/png		Details
Graph indicating the highest average response time per view 11 years ago Chuck Harmston [:chuck] (deleted), image/png		Details
Table showing the amount of time spent in segments of the response 11 years ago Chuck Harmston [:chuck] (deleted), image/png		Details
Data about the most common web transactions (CSV) 11 years ago Chuck Harmston [:chuck] (deleted), text/csv		Details
Response segment data for mkt.search.api:FeaturedSearchView 11 years ago Chuck Harmston [:chuck] (deleted), image/png		Details

Chuck Harmston [:chuck]

Reporter

Description

•

11 years ago

Attached image Screen shot of weekly performance report email (deleted) — Details

In the weekly performance report, New Relic indicated some potential performance problems in Marketplace production. Despite no notable differences in traffic, the Apdex score dipped to its 12-week low, largely on the back of a 4.4% "Frustrated" rate, several times higher than it has been in recent weeks. Full SLA report (login required): https://rpm.newrelic.com/accounts/315282/applications/2914756/optimize/sla_report We should investigate possible causes of this spike. GA and Nagios might be good places to start. Screenshot of email attached. For reference, Apdex calculation methodology: https://docs.newrelic.com/docs/site/apdex-measuring-user-satisfaction

Matt Basta [:basta]

Comment 1

•

11 years ago

I'd bet that it has to do with the filtering stuff. I'm not really sure what we can do. Are there statsd graphs that correspond to this?

Rob Hudson [:robhudson]

Comment 2

•

11 years ago

(In reply to Matt Basta [:basta] from comment #1) > I'd bet that it has to do with the filtering stuff. I'm not really sure what > we can do. Are there statsd graphs that correspond to this? If by filtering you mean search and buchets (?) I see no indication of that in the graphite charts: http://dashboard.mktadm.ops.services.phx1.mozilla.com/graphite?site=marketplace&graph=search

Matt Basta [:basta]

Comment 3

•

11 years ago

I mean collection filtering in the search/featured/ endpoint(s)

Matt Basta [:basta]

Comment 4

•

11 years ago

It might also be the case that it's the rocketfuel endpoints and a significant portion of our traffic is just curators/admins :-/

Chuck Harmston [:chuck]

Reporter

Comment 5

•

11 years ago

Attached image Graph indicating the highest average response time per view (obsolete) (deleted) — Details

Chuck Harmston [:chuck]

Reporter

Comment 6

•

11 years ago

I've spent some time digging into numbers this morning, and it's pretty safe to say that the new filtering features are at least partially responsible for these problems. It has been consuming almost 70% of our wall clock time (a high number is expected as it the most-accessed views in Marketplace), and it has the highest average response time of any of our public-facing views by a level of magnitude (see attachment in comment 5). The complexities of this view will make optimization difficult: we serve different responses based on a large number of variables: carrier, region, category, device capabilities, etc. I'd like to improve those numbers, as it is a very high-traffic endpoint that will contribute to first impressions of an important part of the platform. What are some strategies we can take to improve our response time? One potential tact: gather metrics on each of those variables, and precalculate and cache responses for the most common configurations for a short period of time.

Matt Basta [:basta]

Comment 7

•

11 years ago

We could try caching collections/FACs/OSCs independently (not sure if we do this already). Curated collections can't be created per-carrier, so anyone in a particular region will see them. OSCs can only be created for the homepage, so there might be an optimization there.

Chuck Harmston [:chuck]

Reporter

Comment 8

•

11 years ago

Attached image Table showing the amount of time spent in segments of the response (obsolete) (deleted) — Details

Some additional data: a breakdown of where time is spend in various segments of the response. It appears that the majority of the time is spent outside of database transactions, so caching at that level probably wouldn't be sufficient.

Mathieu Pillard [:mat]

Comment 9

•

11 years ago

The 3 types of collections (basic/fac/os) in search/featured are queried and therefore cached separately by cache-machine. However note that if the fallback kicks in, each query made by the fallback is also cached separately, so we want to add a layer of caching on top of it all (or maybe not) We should probably gather more metrics to find out what is slow exactly, since as chuck pointed out we do quite a lot in that view because of all the variables. One thing to note is that we didn't create any specific database indexes for the collections, might be worth looking into, even though we have a very limited number of collections right now, and they are quite small, it might help a little.