Closed Bug 1001760 Opened 10 years ago Closed 8 years ago

Calculate wasted address space from OOM crash reports

Categories

(Core :: General, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: away, Assigned: away)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(5 files)

jemalloc and JS GC both want 1MB blocks at 1MB alignment. In an OOM crash we might have some "available" virtual memory that cannot be used due to size or alignment. 

I'd like to get a sense of the magnitude of that wasted space. Some amount of fragmentation is unavoidable, so I don't expect the number to be zero, but if it's huge then that might indicate a problem somewhere.
Whiteboard: [MemShrink]
Do you want to be doing this from crash reports, in about:memory, in telemetry, or via some other system?
Flags: needinfo?(dmajor)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> Do you want to be doing this from crash reports, in about:memory, in
> telemetry, or via some other system?

Initially I just want to sample some crash reports to get a rough idea of the situation. If the results warrant further investigation then we can do something more formal.
Flags: needinfo?(dmajor)
ok. I think you can do hand-sampling yourself; https://github.com/bsmedberg/minidump-memorylist/ produces output in .csv format which might be easier to script. Let me know if you need assistance.
Assignee: nobody → dmajor
Whiteboard: [MemShrink] → [MemShrink:P1]
Blocks: 1005844
Blocks: 1005849
Attached file OOMData0526.txt (deleted) —
This is a sample of 50 crashes from recent Nightly that ran out of address space at 1MB chunk size. (Maybe I should have done CSV, but this text format is good enough for my purposes)

Explanation of fields:
Free: Total free virtual memory, sum of all the other fields
Tiny: Blocks smaller than 1MB
Misaligned: Blocks at least 1MB but smaller than 2MB, unusable by our allocators due to alignment
Usable: Blocks at least 1MB but smaller than 2MB, suitably aligned for our allocators
Other: Blocks greater than 2MB. These are leftovers from the Breakpad reservation. They weren't actually free at the time of failure (or else we would have used them)

Some observations:
- We generally lose 70-100MB to fragmentation (the "Tiny" category). That's unfortunate but not horrible.
- The "Misaligned" category is not worth worrying about.
- In theory the "Usable" blocks should be zero. The large numbers there are what led me to bug 1005844 and bug 1005849. About 1 in 5 crashes had Usable numbers in the hundreds of megabytes (ouch). Once the allocator fixes land, I expect these numbers to hit 0MB or very close (Breakpad residuals).

I am going to keep this bug open and run the numbers again after the allocator fixes.
Attached patch minidump-memorylist patch (deleted) — Splinter Review
This is not meant for checkin, I'm just sharing it here.
Whiteboard: [MemShrink:P1] → [MemShrink:P2]
Blocks: 1101179
Attached file OOMData0613.txt (deleted) —
After the JS fix (bug 1005849)
Attached file OOMData1113.txt (deleted) —
Before jemalloc cleanup (bug 1073662)
Attached file OOMData1118.txt (deleted) —
After jemalloc cleanup (bug 1073662)
I'm mostly interested in the "Usable" columns. The reports that have huge numbers for "Tiny" are a different issue, for which I opened bug 1101179.

What I see in the new reports is that:
* The JS fix alone did not really make a dent in the number of "missed opportunity" OOMs -- the kind where there are hundreds of free blocks. This is fine; I wasn't expecting much improvement from that half alone. We're going to fail as long as either allocator has the issue.
* Between June and November, the situation got better on its own, without fixes for bug 1073662 or bug 1005844. I'm no longer seeing hundreds of megabytes of unused blocks. I don't really understand this though, since the log doesn't show anything major in jemalloc.c during this time.
* Bug 1073662 did not further reduce these types of OOMs (not that we were really expecting it to).

My conclusion from this is that we could still get some wins from fixing bug 1005844, but it's less of an issue now. The gains would be in the 50-100 MB range rather than many hundreds.
Has anyone tried to make the js allocator malloc() its memory zones instead of mmap/VirtualAlloc'ing them? That would essentially do the same from the js engine perspective, and it would be free to do whatever it wants within those zones, but that would make the global chunk logic shared between jemalloc and the js engine, which also means the js engine could benefit from the recycled chunks from jemalloc.
There was some talk of doing that when I was working on bug 1005849, but mozjs hadn't been folded back into xul yet at the time. We could probably do it now, though since bug 1073662 didn't help reduce the amount of wasted address space I'd want to land the old patch from bug 1005844 first.
(In reply to David Major [:dmajor] (UTC+13) from comment #4)
> Explanation of fields:
> Free: Total free virtual memory, sum of all the other fields
> Tiny: Blocks smaller than 1MB
> Misaligned: Blocks at least 1MB but smaller than 2MB, unusable by our
> allocators due to alignment
> Usable: Blocks at least 1MB but smaller than 2MB, suitably aligned for our
> allocators
> Other: Blocks greater than 2MB. These are leftovers from the Breakpad
> reservation. They weren't actually free at the time of failure (or else we
> would have used them)

I realize this post is old, but there's one thing I'm missing from this description: are the 'Usable' blocks all aligned, or do they include misaligned-but-alignable blocks? Or do these blocks fall in the 'Misaligned' category? It would be nice to have data with 'Unusable', 'Unaligned', 'Aligned' categories to get a sense of how useful it is to *attempt* to align chunks before falling back to other strategies.
"Usable" includes blocks that can be aligned. For example:
A block from 0x100000 to 0x200000 is Usable
A block from 0x2F0000 to 0x400000 is Usable
A block from 0x4F0000 to 0x5F0000 is Misaligned
OK, thanks. It's interesting to see that the amount of Misaligned blocks can be fairly high - for instance in the most recent data there's a report with 24 MiB worth, and the set before that includes a report with 23 MiB of Misaligned blocks. 

These are essentially blocks that must be 'overcome' for us to be able to allocate, so it makes sense for a last ditch allocation pass to be able to hold at least this many blocks (the alternative is crashing, so I don't think the performance impact matters that much).
> These are essentially blocks that must be 'overcome' for us to be able to allocate

Sort of. There are a couple of factors in our favor:
* Unless all of the Misaligned blocks are exactly 1MiB then the actual count will be a little lower (the divisor is somewhere between 1 and 2)
* We only need to overcome the Misaligned blocks that occur before the first Usable block
* It's OK if we don't program for the absolute worst case
(I don't remember for sure, but I think that's how I arrived at "<= 7 misaligned blocks" in bug 1005844 comment 16)
True, but we're talking about the size of a stack-allocated array and the number of system calls - I don't think setting it to 32 or so is really a problem, and that way the last ditch pass would live up to its name.
> * We only need to overcome the Misaligned blocks that occur before the first Usable block
Granted, as we use more memory, there will be more Misaligned blocks before we find a Usable one.
One other reason that we might as well try as hard as possible is that with chunk recycling implemented, we'll keep hold of up to 128 of these hard-to-obtain chunks, so we won't keep hitting the OS allocator unless we keep using up more (and eventually crash).
Fair enough. I have no objection to holding a larger number of blocks. :) Just saying that if you were forced to use a smaller number, we'd probably still be okay.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: