Closed Bug 645704 Opened 14 years ago Closed 8 years ago

reign in memory usage on crash_analysis scripts

Categories

(Socorro :: General, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: rhelmer, Unassigned)

References

Details

The first casualty from Fx4 being set full-throttle (643661) was bug 645530; the crash analysis scripts seem to be using up all available memory and sending sp-admin01 into swap. sp-admin01 has 8GB of RAM, so as a temporary measure jabba moved the cron job (cron_libraries.sh) to sp-processor10 which has 24GB of RAM, and the job seems to complete there. I took 2-second samples from ps (SIZE) while this was running, here is the peak per-process memory usage during that time period: pid size(kb) cmd 16614 5824204 python /data/crash-data-tools/per-crash-interesting-modules.py -p Firefox -r 4.0 -f /tmp/Firefox_4.0.tar 5912 1288136 python /data/socorro/application/socorro/storage/hbaseClient.py -h socorro-thrift1.zlb.phx1.mozilla.com export_jsonz_tarball_for_ooids /tmp /tmp/Firefox_4.0.tar We should determine if using this much memory is necessary.
the other solution to the problem is just to sample a subset of the data for any given release. seems like the script ran fine up to the point where we had about 11 million active daily users on firefox 4.0 reporting crashes. if we get a release with more ADUs than that, or the volume of any one crash gets out of control, or the number of modules that we are tracking or number versions of those modules increases they could all lead to high memory use conditions. something to think about for all reports as we move to processing 100% of all report submissions would be to reduce the window for the span of data we look at, or do sampling out of window which in this case is 24 hours.
the 11 million unthrottled adu's is under what we might expect in crash volume of 150 million users throttled at 10%. its probably something more like trying to process the module correlations for the top crash on 4.0 25680 crashes per day signature: mozalloc_abort(char const* const) | NS_DebugBreak_P | nsCycleCollectingAutoRefCnt::decr(nsISupports*) the bug tracking that signature is bug 633445 25k crashes per day probably exceeds anything we have ever seen for a single signature by a wide margin.
Component: Socorro → General
Product: Webtools → Socorro
We don't support the correlation script on crash-analysis any more.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.