Closed Bug 1392457 Opened 7 years ago Closed 7 years ago

Process list views as background jobs w/ auto-loading

Categories

(developer.mozilla.org Graveyard :: Performance, enhancement)

All
Other
enhancement
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jwhitlock, Unassigned)

Details

(Keywords: in-triage, Whiteboard: [specification][type:change])

What feature should be changed? Please provide the URL of the feature if possible. ================================================================================== Lists of documents are part of a few workflows. These should be split into a background job that gathers the data, and a front-end display that displays the data or says it is being processed. What problems would this solve? =============================== Non-compliant scrapers often find paginated lists, which have a pagination widget for requesting different pages. The scrapers then request several lists at once. This makes the database unavailable for other queries, increasing the request queue and leading to downtime. Who would use this? =================== "Regular" visitors and non-compliant scrapers What would users see? ===================== Visitors will see a message like "Gathering documents, please refresh in a few seconds.". With JS, the page will refresh when data is available. Visitors may get data that is a little out of date. Logged-in users can force-refresh to get the latest data. Scrapers will get a lot of "Gathering documents, please refresh in a few seconds." pages. What would users do? What would happen as a result? =================================================== Visitors will be able to access document lists Scrapers will be able to crawl the site, without getting their IP blocked or making the site unavailable. Is there anything else we should know? ====================================== Scraper issue first reported in confidential bug 1388492, which was handled with a manual and temporary ban of the IP.
Severity: normal → major
Keywords: in-triage
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/b27c1ecd5eb505484600be7aab0cf8eaff8eeffd bug 1392457: Add Document manager tests https://github.com/mozilla/kuma/commit/9317800966b98923140f0815700e88c399ace2a2 bug 1392457: Use one method for counting documents The paginator's count method saves the result, meaning there is one rather than two COUNT(*) database queries. https://github.com/mozilla/kuma/commit/85a0cab446c9f4203d551640490f9f7762185314 bug 1392457: Only return data needed for display The Document model has gained several pre-processed fields over the years. Rather than a blacklist, use a whitelist of the fields needed by the views that call this method (sitemaps, filtered lists of documents). https://github.com/mozilla/kuma/commit/602b3ba9e831bc588fdae0345e1c23ac942b9c85 bug 1392457: Test that Project_talk: is excluded https://github.com/mozilla/kuma/commit/14e1ab43cb3ef0ed00c5c4d75e14f480af07838c bug 1392457: Combine filter for revision flags Combine the logic for the revision review and localization flags into an implementation method _filter_by_revision_flag. https://github.com/mozilla/kuma/commit/f0ae239bb72216b01bf112e2f647606c80f3d15d Merge pull request #4463 from jwhitlock/background-list-1392457 bug 1392457, 1274874: Improve document list views
The views are now kinder to the database. These views have not been associated with any recent downtime events, so further changes are on hold.
Testing my luck worked. A scraper that started requesting the document-by-tag pages was associated with a downtime event today. I'll start work on moving the processing to the backend.
Assignee: nobody → jwhitlock
Status: NEW → ASSIGNED
Rate limiting (bug 1423738) seems to have protected these pages from being involved in further downtime incidents. We'll save the big architectural changes for later as we deal with higher priority stuff.
Assignee: jwhitlock → nobody
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.