Closed
Bug 629211
Opened 14 years ago
Closed 13 years ago
Need Data Ageing Policy for Socorro
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
2.1
People
(Reporter: jberkus, Assigned: laura)
References
Details
Before we can go "full throttle" on Socorro, or indeed before we can last another 6 months, we need to have a policy on "data ageing", i.e. what data we keep for how long, and what data we get rid of. This is not a single number, but a set of numbers across the various different kinds of data we keep. At the least, we need to decide how long we keep:
1. Base-level data
a. raw dumps on HBase (Daniel suggest 6 months)
b. processed dumps on HBase
c. processed reports in PostgreSQL
2. Summary Data
a. hourly summary and counts (TCBS etc.)
b. daily smmary and counts (would be a rollup of (a))
c. monthly/per product summary and counts (would be a rollup of (a))
Of the two, expiring (1) is far more important that expiring (2), which is comparatively much smaller. However, both kinds of data need to expire eventually and be purged, so let's set a policy and automate it.
Expiration times could be based on the calendar, or based on product release dates.
Reporter | ||
Comment 1•14 years ago
|
||
Oh, I forgot:
3. Other data
a. e-mail campaigns
b. raw_adu
c. probably other stuff I'm not thinking about right now ...
Comment 2•14 years ago
|
||
Does this need to be global or per product - ie have a policy for Ff a different one for product with less users (Camino, SeaMonkey and Thunderbird) ?
Assignee | ||
Comment 3•14 years ago
|
||
(In reply to comment #2)
> Does this need to be global or per product - ie have a policy for Ff a
> different one for product with less users (Camino, SeaMonkey and Thunderbird) ?
Right now everything but Fx is a drop in the data bucket, so I'm less concerned about those. If you have input though Ludo, please let us know.
Reporter | ||
Comment 4•14 years ago
|
||
Data sizes:
Currently the PG database is ~~330GB in size. With 10% throttling of the major release versions of FF, that grows at about 12GB per week. This means that if we were concerned about disk space on master01 alone, we could allow data to persist for about another 30 weeks before we started running out of disk space.
HOWEVER, there are other considerations:
1) We currently make full data copies for the relay server and devDB. These servers have less disk space, and in fact only have room for another 8 weeks of data.
2) The size of the PG database adds to the following:
a) amount of time required to resync if the relay server gets out of sync.
b) the amount of time required to make archival backups should we start doing so
c) amount of time required to fail-back if required
Therefore, within the next 8 weeks, we either need to increase the amount of disk space available to relayDB and devDB, or we need to start purging data.
See also bug 635098
Reporter | ||
Comment 5•14 years ago
|
||
Oh, other data:
* We currently have 41 weeks of data on the Postgres database.
* Currently devDB and relayDB have 500GB each available, and master01 has 830GB.
Comment 6•14 years ago
|
||
Josh out of These 330Gb can we break the numbers down by Versions ?
Ie out much 3.5 and earlier represent
How much 3.6 etc ....
Reporter | ||
Comment 7•14 years ago
|
||
Clarification per e-mail:
The 30 weeks and 8 weeks projections were based on the idea that we start throttling FF4 crashes within the next couple of days.
If we continue FF4 at 100%, we only have 3-4 weeks on RelayDB and DevDB.
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → laura
Reporter | ||
Comment 8•14 years ago
|
||
Laura,
Update on this?
Reporter | ||
Comment 9•14 years ago
|
||
Ludovic,
I don't currently have a machine where I can run a query which will answer that question, since it would involve scanning the entire database. Unfortunately, DevDB is far too slow, and I can't run such a report on prod. Possibly StageDB will become available for this purpose sometime soon.
Assignee | ||
Comment 10•14 years ago
|
||
Waiting on data reconcilation from Lars.
Assignee | ||
Comment 11•14 years ago
|
||
PS this isn't a code bug so it doesn't block 1.7.8 freeze.
Lars will have the reconcilation finished by end of this week. We'll give that data to CrashKill next week, and they'll get back to us with comments.
Assignee | ||
Updated•14 years ago
|
Target Milestone: 1.7.8 → 2.0
Assignee | ||
Updated•13 years ago
|
Target Milestone: 2.0 → 2.1
Assignee | ||
Comment 12•13 years ago
|
||
Policy drafted and sent to Crashkill team.
Assignee | ||
Comment 13•13 years ago
|
||
Got signoff on policy, here it is:
Keep indefinitely:
- Crashes/ADU (ideally put this into some kind of metrics DB, Pentaho or whatever)
- Data on crash-analysis (revisit as needed)
Raw crashes: Delete after 6 months
Processed crashes:
- Nightly and Aurora crashes - 3 months
- Beta crashes - 6 months
- Release crashes - 12 months
If non-Firefox projects have longer needs (please let me know) that may be okay, depending on how hard that is to implement. I believe for now we're going to delete everything over a year old until we get some infrastructure in place for the granularity we want.
Comment 14•13 years ago
|
||
This sounds reasonable to me for now.
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
Assignee | ||
Comment 15•13 years ago
|
||
Calling this done.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•