Closed
Bug 926640
Opened 11 years ago
Closed 10 years ago
[tracking] Improve API performance
Categories
(Marketplace Graveyard :: API, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: chuck, Unassigned)
References
Details
(Keywords: perf)
Attachments
(3 files, 2 obsolete files)
In the weekly performance report, New Relic indicated some potential performance problems in Marketplace production. Despite no notable differences in traffic, the Apdex score dipped to its 12-week low, largely on the back of a 4.4% "Frustrated" rate, several times higher than it has been in recent weeks.
Full SLA report (login required):
https://rpm.newrelic.com/accounts/315282/applications/2914756/optimize/sla_report
We should investigate possible causes of this spike. GA and Nagios might be good places to start.
Screenshot of email attached.
For reference, Apdex calculation methodology: https://docs.newrelic.com/docs/site/apdex-measuring-user-satisfaction
Comment 1•11 years ago
|
||
I'd bet that it has to do with the filtering stuff. I'm not really sure what we can do. Are there statsd graphs that correspond to this?
Comment 2•11 years ago
|
||
(In reply to Matt Basta [:basta] from comment #1)
> I'd bet that it has to do with the filtering stuff. I'm not really sure what
> we can do. Are there statsd graphs that correspond to this?
If by filtering you mean search and buchets (?) I see no indication of that in the graphite charts:
http://dashboard.mktadm.ops.services.phx1.mozilla.com/graphite?site=marketplace&graph=search
Comment 3•11 years ago
|
||
I mean collection filtering in the search/featured/ endpoint(s)
Comment 4•11 years ago
|
||
It might also be the case that it's the rocketfuel endpoints and a significant portion of our traffic is just curators/admins :-/
Reporter | ||
Comment 5•11 years ago
|
||
Reporter | ||
Comment 6•11 years ago
|
||
I've spent some time digging into numbers this morning, and it's pretty safe to say that the new filtering features are at least partially responsible for these problems. It has been consuming almost 70% of our wall clock time (a high number is expected as it the most-accessed views in Marketplace), and it has the highest average response time of any of our public-facing views by a level of magnitude (see attachment in comment 5).
The complexities of this view will make optimization difficult: we serve different responses based on a large number of variables: carrier, region, category, device capabilities, etc.
I'd like to improve those numbers, as it is a very high-traffic endpoint that will contribute to first impressions of an important part of the platform.
What are some strategies we can take to improve our response time?
One potential tact: gather metrics on each of those variables, and precalculate and cache responses for the most common configurations for a short period of time.
Comment 7•11 years ago
|
||
We could try caching collections/FACs/OSCs independently (not sure if we do this already). Curated collections can't be created per-carrier, so anyone in a particular region will see them. OSCs can only be created for the homepage, so there might be an optimization there.
Reporter | ||
Comment 8•11 years ago
|
||
Some additional data: a breakdown of where time is spend in various segments of the response. It appears that the majority of the time is spent outside of database transactions, so caching at that level probably wouldn't be sufficient.
Comment 9•11 years ago
|
||
The 3 types of collections (basic/fac/os) in search/featured are queried and therefore cached separately by cache-machine. However note that if the fallback kicks in, each query made by the fallback is also cached separately, so we want to add a layer of caching on top of it all (or maybe not)
We should probably gather more metrics to find out what is slow exactly, since as chuck pointed out we do quite a lot in that view because of all the variables.
One thing to note is that we didn't create any specific database indexes for the collections, might be worth looking into, even though we have a very limited number of collections right now, and they are quite small, it might help a little.
Updated•11 years ago
|
Component: General → API
Comment 10•11 years ago
|
||
Turning this into a tracking bug for our API performance issues. We are going to open up new ones and made them blocking that one.
Depends on: 927420
Summary: New Relic indicates performance problems week of 10/3 → [tracking] API performance problems
Reporter | ||
Updated•11 years ago
|
Priority: -- → P2
Reporter | ||
Comment 11•11 years ago
|
||
To follow: attachments containing updated data for the 7 day period ending right now, after the DRF conversion has finished.
Reporter | ||
Comment 12•11 years ago
|
||
Attachment #817205 -
Attachment is obsolete: true
Reporter | ||
Comment 13•11 years ago
|
||
This is, by far, the highest impact view: 54.7% of processor time is spent processing these request.
Attachment #817225 -
Attachment is obsolete: true
Updated•11 years ago
|
Summary: [tracking] API performance problems → [tracking] Improve API performance
Updated•11 years ago
|
Whiteboard: [perf]
Updated•11 years ago
|
Blocks: tarako-marketplace
Updated•11 years ago
|
No longer blocks: tarako-marketplace
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•