Closed Bug 1536147 Opened 6 years ago Closed 6 years ago

make archivescraper faster

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Attachments

(1 file)

pr 4849: bug 1536147: make archivescraper faster 6 years ago Will Kahn-Greene [:willkg] ET needinfo? me (deleted), text/x-github-pull-request		Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Description

•

6 years ago

I wrote archivescraper to scrape version information for betaversion lookups. It's based on ftpscraper and when I wrote it, I wanted to stay as close to ftpscraper as I could so as to make the jump from one to the other as small as possible.

archivescraper takes a while to run. In a fresh local dev environment, it can take 20+ minutes. It's kind of irritating and a time sink and I run it at least once a week.

Relatedly, I wrote a verifyprocessed job. That uses multiprocessing to reduce the time it takes to run significantly.

archivescraper has similar properties--the bulk of the time it takes to run is traversing links on a website which is predominantly slow HTTP conversations. That's pretty ideal for multiprocessing with lots of workers.

This bug covers taking what I did with verifyprocessing and applying it to archivescraper.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 1

•

6 years ago

The last fresh run I did took 40 minutes.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 2

•

6 years ago

Grabbing this to tinker with today. I think it's straight-forward except for error handling and reporting. That's a bit trickier.

Assignee: nobody → willkg

Status: NEW → ASSIGNED

Priority: -- → P2

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 3

•

6 years ago

Attached file pr 4849: bug 1536147: make archivescraper faster (deleted) — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 4

•

6 years ago

willkg merged PR #4849: "bug 1536147: make archivescraper faster" in d8d1136.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 5

•

6 years ago

This has been running on stage for a while and it's significantly faster. Yay!

We just pushed this to prod. Marking as FIXED.

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

make archivescraper faster

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

People

(Reporter: willkg, Assigned: willkg)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Attachment

General

Description

File Name

Content Type