Closed Bug 493779 Opened 16 years ago Closed 14 years ago

Breakpad should collect/detect OOM

Categories

(Toolkit :: Crash Reporting, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: jruderman, Assigned: ted)

References

(Blocks 1 open bug)

Details

(Whiteboard: [crashkill][crashkill-metrics])

Attachments

(1 file, 1 obsolete file)

It would be nice if Breakpad collected information to distinguish OOM crashes from other crashes. With this information, it will be more clear how many users are hitting terrible memory leaks, and crashes due to OOM will be less confusing. My guesses are: 1) virtual memory used by the Firefox process 2) disk space free but I'm probably wrong about what to measure ;)
Fun fact: there were 215 crash reports corresponding to OOM crash bug 502687 in the last week. Some Linux users have ulimits set, which means Breakpad can't determine whether we're in an OOM condition just by looking at the amount of virtual memory used. I wonder if there's another way. FWIW, the xmalloc work might obsolete the need for Breakpad changes.
Jesse: if you can tell me exactly what you'd like to see collected here on all 3 platforms, then I can make it happen.
I'm not really sure :( Maybe we should try measuring virtual memory usage, and try it with a known OOM crash to see if it works reliably. Or maybe dbaron has other ideas.
Whiteboard: [crashkill][crashkill-metrics]
free(1) on Linux seems to give reasonable output for swap so, if the process is still alive, you should be able to read /proc/{pid}/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 10485760 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 16384 16384 processes Max open files 1024 1024 files Max locked memory 32768 32768 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 16384 16384 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 that handles getting limits on Linux.
on os x, vm_stat(1) seems like Linux's free(1). For Solaris, it sounds like that's swap -l/swap -s. for os x/solaris, it's probably best to just use getrlimit at startup and dump that data in advance, or just rely on the fact that the spawned crash reporter will inherit the limits and have it look them up. For windows, probably: http://msdn.microsoft.com/en-us/library/ms683227(VS.85).aspx GetProcessWorkingSetSizeEx http://msdn.microsoft.com/en-us/library/aa366589(VS.85).aspx GlobalMemoryStatusEx
The thing is (for me) that breakpad does not seem to catch some OOM crashes (such as i reported in bug 590674 ), instead windows 7 just shows the "problem with application" dialog offering me to close it or search for solutions without opening the crash reporter window. So it's possible that they're under-reported.
Yes, it's quite possible. Since we write browser dumps in-process, if we're OOM we might not have enough memory left to write a dump. I filed bug 587729 on making us do all dump generation out of process, which should help this.
(In reply to comment #5) > http://msdn.microsoft.com/en-us/library/aa366589(VS.85).aspx > GlobalMemoryStatusEx This looks pretty useful, the resulting structure has ullTotalVirtual / ullAvailVirtual fields, which I think would be pretty indicative of whether we're crashing due to OOM. (Although if it's VM fragmentation, it might not be as obvious.)
Assignee: nobody → ted.mielczarek
This patch adds three extra annotations to Windows crash reports: SystemMemoryUsePercentage: the percentage of physical memory in use by the system (in bytes) TotalVirtualMemory: the total amount of virtual memory available to the process (in bytes) AvailableVirtualMemory: the current amount of virtual memory available to the process (in bytes) The first should let us estimate if we're hitting swap on the system, and the latter two should let us determine if we're OOMing.
blocking2.0: --- → ?
Attachment #497701 - Flags: review?(benjamin)
blocking2.0: ? → -
Attachment #497701 - Flags: review?(benjamin) → review+
Comment on attachment 497701 [details] [diff] [review] Report some memory information with Windows crash reports Low-risk patch that should give us more info about OOM crashes on Windows.
Attachment #497701 - Flags: approval2.0?
Attachment #497701 - Flags: approval2.0? → approval2.0+
Pushed to m-c: http://hg.mozilla.org/mozilla-central/rev/b84f528e2e10 I'll file a followup on doing something useful on Mac/Linux, but I think most of our problems were on Windows.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Simple patch, just switches to use the alternate constructor for ExceptionHandler that allows passing in the MINIDUMP_TYPE parameter. The other two parameters have to do with out-of-process dump generation, passing them as NULL makes them effectively ignored. The fewer-argument constructor we were previously using would just pass them down as NULL internally anyway.
Attachment #499530 - Flags: review?(timeless)
Comment on attachment 499530 [details] [diff] [review] Report some memory information with Windows crash reports. r+ Oops, user error on my part with bzexport. :)
Attachment #499530 - Attachment is obsolete: true
Attachment #499530 - Flags: review?(timeless)
Comment on attachment 497701 [details] [diff] [review] Report some memory information with Windows crash reports it would be helpful if we could have this on the 1.9.2 branch. this is safe and localized.
Attachment #497701 - Flags: approval1.9.2.15?
Comment on attachment 497701 [details] [diff] [review] Report some memory information with Windows crash reports Approved for 1.9.2.15, a=dveditz
Attachment #497701 - Flags: approval1.9.2.15? → approval1.9.2.15+
Comment on attachment 497701 [details] [diff] [review] Report some memory information with Windows crash reports Didn't make the code-freeze for non-blockers, you can try again next time.
Attachment #497701 - Flags: approval1.9.2.17+ → approval1.9.2.17-
Blocks: 528657
Blocks: 733262
Blocks: 669108
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: