Closed Bug 527356 Opened 15 years ago Closed 15 years ago

Infinite loop at start-up with ASLR (mmap() randomization)

Categories

(Core :: Memory Allocator, defect)

All
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 470217

People

(Reporter: linkfanel, Unassigned)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.1.4) Gecko/20091028 Iceweasel/3.5.4 (Debian-3.5.4-1) Build Identifier: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.1.4) Gecko/20091028 Iceweasel/3.5.4 (Debian-3.5.4-1) On linux kernel featuring PaX RANDMMAP, Firefox 3.5 gets caught in an infinite mmap()/munmap() loop, because mmap() returns an address different than the one expected. This forces the user to turn off this security feature. See original bug report: http://bugs.gentoo.org/show_bug.cgi?id=278698 A patch by PaX Team is available. Reproducible: Always Steps to Reproduce: 1. Start Firefox after upgrading to 3.5 Actual Results: Infinite loop, nothing shows up, CPU to 100% Expected Results: Firefox starts up
Attached patch Patch by Pax Team (deleted) — Splinter Review
Component: General → jemalloc
Product: Firefox → Core
QA Contact: general → jemalloc
I think we may need a better solution than the one in the patch, which reverts to a very old behavior of jemalloc. The problem is that this approximately triples the number of system calls (mmap, munmap, munmap) in the common case. Some possible solutions are: * Only switch to the slow-but-sure chunk allocation algorithm if mmap has previously refused to return the requested mapping. The main failure mode for this solution is that race conditions between threads can cause failures that are indistinguishable from those caused by RANDMMAP; this is why I haven't already implemented such a solution. * Modify the arena-related code to be able to handle unaligned over-sized mappings, such that the extraneous pages are tracked, but never touched or otherwise used for anything. This inflates the amount of virtual memory used, but it avoids the vast majority of extra munmap system calls. * Increase the chunk size in order to reduce the total number of chunk-related system calls. * Increase the amount of chunk caching to more than one per chunk.
> The problem is that this approximately triples the number of system calls > (mmap, munmap, munmap) in the common case. what are the performance targets here that two munmap calls are considered so much of a burden? i take it that they're still considered faster than an infinite loop at least ;). do you perhaps have real life test cases/traces where these two extra munmap calls showed up? i'm just wondering because if the problem is not with the syscall speed per se but lock contention (mmap/munmap/etc are serialized in the linux kernel at least) then solving that problem is a different issue (and was a topic on lkml lately in fact). > Only switch to the slow-but-sure chunk allocation algorithm if mmap has > previously refused to return the requested mapping. why's there a concern with threads racing their allocations? i mean, this happens already with the current code anyway and must be handled (by reissuing the hinted mmap by the thread that lost the race). i don't think turning this second mmap into a mmap+munmaps sequence can be measured as such races must be rare else there're bigger problems at a higher level. > Modify the arena-related code to be able to handle unaligned over-sized > mappings, such that the extraneous pages are tracked, but never touched or > otherwise used for anything. why cannot the oversized mapping be used fully? > Increase the chunk size in order to reduce the total number of chunk-related > system calls. this would in general (regardless of the PaX issue) be a good idea as there's a cost in the kernel for tracking individual mappings that cannot be merged (in linux terms these are the vm_area_struct structures, /proc/pid/maps shows them). it's not only about kernel memory usage but more importantly, algorithmic complexity when the kernel has to traverse/modify this list/tree of mappings. i cannot comment on the other ideas as i don't know the jemalloc internals, but in general i'd like to suggest that this old/new approach be made configurable so that at least gentoo/PaX users can easily enable it (vs. having to patch jemalloc as it is the case for now).
(In reply to comment #3) > > The problem is that this approximately triples the number of system calls > > (mmap, munmap, munmap) in the common case. > > what are the performance targets here that two munmap calls are considered so > much of a burden? i take it that they're still considered faster than an > infinite loop at least ;). > > do you perhaps have real life test cases/traces where these two extra munmap > calls showed up? I've had reports of this being a performance issue on both FreeBSD and Windows. I just spent the past 6 hours experimenting with this on Linux, and discovered that at least on my 64-bit Ubuntu 9.04 system, mmap never honors requested addresses (perhaps some aspect of SELinux?), but it does not appear to actively randomize addresses, so it is possible to repeatedly "get lucky". I implemented the following solution in the standalone jemalloc repository: http://canonware.com/cgi-bin/hg_jemalloc/rev/d763e24ca024 My experiments with the standalone jemalloc preloaded into firefox indicated that the fast method got used ~30% of the time. That's only a marginal improvement, so I withdraw my concerns regarding performance. > > Modify the arena-related code to be able to handle unaligned over-sized > > mappings, such that the extraneous pages are tracked, but never touched or > > otherwise used for anything. > > why cannot the oversized mapping be used fully? Chunks must be properly aligned so that bit masking of an object pointer results in the base address of the associated chunk.
Comment on attachment 411077 [details] [diff] [review] Patch by Pax Team It wouldn't hurt to check whether the length arguments to the munmap() calls are non-zero.
> http://canonware.com/cgi-bin/hg_jemalloc/rev/d763e24ca024 if i'm reading your code correctly, chunk_alloc_mmap_slow itself will never set 'unaligned' to false (should be in the 'else' branch i guess), probably not what you intended. > It wouldn't hurt to check whether the length arguments to the munmap() calls > are non-zero. sure, that patch was written to solve the immediate problem of breaking the infinite loop, not to save every last cycle one can ;). still, one would think that memory allocation itself is not a fast path (i.e., not something you execute a million times a second), so an effectively empty syscall (at least on linux) shouldn't show up anywhere.
Status: UNCONFIRMED → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: