Randomize small allocations in mozjemalloc
Categories
(Core :: Memory Allocator, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox69 | --- | fixed |
People
(Reporter: stephen, Assigned: tjr)
References
(Blocks 1 open bug)
Details
(Keywords: parity-edge, sec-want, Whiteboard: [adv-main69-])
Attachments
(3 files, 2 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/x-phabricator-request
|
Details |
Assignee | ||
Comment 1•7 years ago
|
||
Updated•7 years ago
|
Reporter | ||
Comment 2•7 years ago
|
||
Assignee | ||
Comment 3•7 years ago
|
||
Reporter | ||
Comment 4•7 years ago
|
||
Assignee | ||
Comment 5•7 years ago
|
||
Updated•7 years ago
|
Assignee | ||
Comment 6•7 years ago
|
||
Reporter | ||
Comment 7•7 years ago
|
||
Updated•7 years ago
|
Comment 8•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Updated•7 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 9•6 years ago
|
||
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Comment 11•6 years ago
|
||
Assignee | ||
Comment 12•6 years ago
|
||
Comment 13•6 years ago
|
||
Assignee | ||
Comment 14•6 years ago
|
||
I'm going to pick this back up again. I'm making it per-arena opt-in; but I'm trying to get performance good enough that it's not infeasible to enable it on all arenas by default. Using mozilla/RandomNum.h unsurprisingly was abysmal from a performance standpoint so I will be trying some tricks to get it better...
Assignee | ||
Comment 15•6 years ago
|
||
Performance update for those following at home: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=02a5e3b5b4fb03fc774298814eaf8c7dc023057d&selectedTimeRange=172800
This run is with a fake RNG (I chose 12!) so it shows some of the impact of affecting the memory layout without the RNG cost. The real memory layout would be a bit more random than skipping to the 12th slow in the freelist.
The cost is generally pretty low: around 1% sometimes 2% with the exception of the nth-index tests in perf-reftest-singletons which tests stylo's nth-index-cache where we obliterate performance with a regression of like 40%. The other outliers (4% regression on two dromaeo_css tests) might (probably?) feed from that general regression.
That's probably a sufficient argument for either not enabling this on all arenas or perhaps taking rust code into account or something - but I think using overall performance cost is still a good metric to refine a performant RNG design even if it's only used on a few high-value arenas.
Assignee | ||
Comment 16•6 years ago
|
||
Okay; more performance data (focused on linux64):
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1ed7e00e1d7dcbcf23bff6bc2b7af451df1245cd&newProject=try&newRevision=b858a4e2d1c512f86719d8b06bb09f96d3371a36
This is the comparison between using a 64-bit state xorshift PRNG and a 128-bit state. 128-bit is preferable and we see it adds a max of about 2% all with low confidence results.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=b858a4e2d1c512f86719d8b06bb09f96d3371a36&selectedTimeRange=172800 is the comparison of -central vs a 128-bit state. On a few specific CSS tests we see significant regressions. (Which I believe is caused by the aforementioned n-th index cache.) On responsiveness we see 2.5-4% regressions. Other than those we see a lot of low-percentage increases, a few decreases, and a few 2-4% increases with low confidence.
I'm going to see what I can do about this one specific cache...
Assignee | ||
Comment 17•6 years ago
|
||
So I spent some time switching rust's allocator to use a separate arena that did not have the randomization applied. This did not fix the tests. After consultation, it seems like the culprit isn't specifically the nth-index cache but rather those tests have a lot of DOM Traversal that happens. So the Dromaeo isn't an indicator of the nth-index, it's an indication of hurting dom traversal, and the nth-index tests show the same thing.
I'm going to prepare patches to turn this on by default in private arenas (which will hit the segregated String and Arraybuffer) but off by default in the non-private arenas.
Assignee | ||
Comment 18•6 years ago
|
||
More updates: Turning it on on all private arenas (which included the default JS Runtime) was still too expensive **. I'm going to try and turn it on for just the arraybuffer and string arenas and see what that does to performance.
The gains from this are very slight though. I'm going to try and land something in this patch, and then - assuming the performance isn't too bad - prepare a more complicated follow-up patch that will enable the protection by default in non-content processes. (The parent, and utility processes.) This will provide greater protection against sandbox escapes as the attacker will be going over IPC and have less ability to craft the heap as they desire (like they can when they have a scripting engine.)
** https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b4646d0aba90eaa6eb6b5b9321d188c01df4c7c&selectedJob=247429116 and https://treeherder.mozilla.org/#/jobs?repo=try&revision=381431222c7f002d79fcd29929ef1418d8d88ca2
Assignee | ||
Comment 19•6 years ago
|
||
This allows freelist randomization on a per-arena basis, by supplying parameters to
arena creation. It opts in the Arraybuffer and StringBuffer arenas.
It uses an xorshift PRNG with a 128-bit state. It is not cryptographically secure. An
attacker who can observe outputs of the RNG, or read its state, is already in a position
to bypass the randomization applied. At the same time we make its state 128 bit to prevent
a trivial bypass if one or two outputs are observed.
The way a run selects masks to check has not been modified, so the randomization is limited
to at most 32 bits in the current mask being tested. It should be noted that while allocations
from the same run may now be non deterministic (up to the maximum entropy as previously
stated), an attacker who can perform multiple allocations will still be able to allocate
a targeted free region (for example while exploiting a use after free vulnerability in the
DOM). Non deterministic allocations will only impede an attacker who has less control over
how they allocate a targeted free region, and may provide some benefit during exploitation
of a heap based buffer overflow vulnerability where the attacker wishes to construct a
precise layout of regions pre overflow.
Comment 20•6 years ago
|
||
FWIW, I don't know how much of a drop-in replacement it would be but there's a version of the same or a very similar PRNG in MFBT: https://searchfox.org/mozilla-central/source/mfbt/XorShift128PlusRNG.h
Assignee | ||
Comment 21•6 years ago
|
||
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #20)
FWIW, I don't know how much of a drop-in replacement it would be but there's a version of the same or a very similar PRNG in MFBT: https://searchfox.org/mozilla-central/source/mfbt/XorShift128PlusRNG.h
Thanks! I should be able to switch over to that; I'll do that soon.
The patch I attached adds randomization and enables it for the arraybuffer and stringbuffer arenas only. Here is a try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ca00141959506ff8305356334a655da6a7b6c15
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=3ca00141959506ff8305356334a655da6a7b6c15&selectedTimeRange=172800
There are small regressions, most of them low confidence. Joel, wondering what you think of this performance-wise.
Comment 22•6 years ago
|
||
perf wise this is looking good- the small changes seem to be similar to your baseline which was 4 days ago, thanks for double checking!
Comment 23•6 years ago
|
||
Tom, can you please run the awsy tests against shippable builds for all tier-1 platforms as well? I'm concerned about possible memory regressions. I can help triage the results when they come in.
Assignee | ||
Comment 24•6 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #23)
Tom, can you please run the awsy tests against shippable builds for all tier-1 platforms as well? I'm concerned about possible memory regressions. I can help triage the results when they come in.
They're run in https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ca00141959506ff8305356334a655da6a7b6c15&searchStr=SY - but I don't know where to find the results :)
Assignee | ||
Updated•6 years ago
|
Comment 25•6 years ago
|
||
(In reply to Tom Ritter [:tjr] from comment #24)
(In reply to Eric Rahm [:erahm] from comment #23)
Tom, can you please run the awsy tests against shippable builds for all tier-1 platforms as well? I'm concerned about possible memory regressions. I can help triage the results when they come in.
They're run in https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ca00141959506ff8305356334a655da6a7b6c15&searchStr=SY - but I don't know where to find the results :)
Similar to talos we need a bunch of re-triggers. I ran some overnight and have a few more triggered now. You can see the results here:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=3ca00141959506ff8305356334a655da6a7b6c15&framework=4&selectedTimeRange=172800
For the "Base" numbers you can just look at the top-level summary that is shown. For the measurements w/o "Base" in the title those are actually a summary of sub-tests, you'll need to drill down to see if any of the sub-tests regressed. A cursory look says yes there are rather large regressions, and by a fair amount although talos seems uncertain.
It would help to have a baseline push (ie pop your current patch) that runs the same tests to get a more accurate comparison.
Assignee | ||
Comment 26•6 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #25)
It would help to have a baseline push (ie pop your current patch) that runs the same tests to get a more accurate comparison.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a5ddff304ec6427f74c3816b6dab26af376f9bd is a run without my patches apply but using the same base.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=7a5ddff304ec6427f74c3816b6dab26af376f9bd&newProject=try&newRevision=3ca00141959506ff8305356334a655da6a7b6c15&framework=4 doesn't seem to show the same regressions.
Comment 27•6 years ago
|
||
(In reply to Tom Ritter [:tjr] from comment #26)
(In reply to Eric Rahm [:erahm] from comment #25)
It would help to have a baseline push (ie pop your current patch) that runs the same tests to get a more accurate comparison.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a5ddff304ec6427f74c3816b6dab26af376f9bd is a run without my patches apply but using the same base.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=7a5ddff304ec6427f74c3816b6dab26af376f9bd&newProject=try&newRevision=3ca00141959506ff8305356334a655da6a7b6c15&framework=4 doesn't seem to show the same regressions.
That looks like a net improvement across the board! r=me and we can monitor the situation in central. Ignore the windows7-32-shippable base content resident unique
regression, that measurement is known to be problematic.
Comment 28•6 years ago
|
||
That looks like a net improvement across the board!
The patch can not really have any impact on memory usage, either good or bad.
Assignee | ||
Comment 29•5 years ago
|
||
I worked on the review comments. As I did so, I examined the freelist selection algorithm in more detail. It was bad.
Start with 10 slots, 2 of them available for choosing. Intuitively, we'd want to choose between the two equally with a 50/50 chance - assuming our RNG is good.
Instead, even with a good RNG, the choice was weighted by the number of unavailable slots to the right (lesser significance) of an available slot. Consider:
A0000B0000 - A has a 50% chance; B has a 50% chance
000AB00000 - A has a 10% chance; B has a 90% chance.
I improved the algorithm so each slot has an equal chance of being chosen. Doing so is slower. Significantly, if you perform a micro-benchmark.
Here's a Talos run that enables randomization on the arraybuffer and string arenas, which was what we reviewed earlier:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=556792925854b34a9042b3ce882f11a6931050e7&selectedTimeRange=172800
Even with a slower bit in the hot path of allocation (well, hot path if you've enabled randomization) it doesn't seem to have really moved the needle much. It's really just bit twiddling in a tight loop, so even though it's 'slower', I think it's dwarfed by everything else.
Comment 30•5 years ago
|
||
Instead, even with a good RNG, the choice was weighted by the number of unavailable slots to the right (lesser significance) of an available slot.
You mean, it's weighted by the number of unavailable slots between available slots.
Comment 31•5 years ago
|
||
Tom, this is Tongping who was supported by Mozilla to work on an randomized allocator, thank you so much for the support. I wonder whether you could use the array (originated from Guarder paper --- USENIX Security'18) to solve this issue. When only these two available objects in the allocation-buffer, then both of them will get 50% of being chosen. Please let me know if you would like to discuss this with me over the phone, and I can also consider to fly there to talk about my projects on the randomized allocator.
Assignee | ||
Comment 32•5 years ago
|
||
Oh waw, that is very much more expensive than the previous iteration. A modulo operation with a non-constant compiles to an integer division instruction, which is super slow. And you follow that with a loop that depends on the result of the division. I'm really not convinced your concern in https://bugzilla.mozilla.org/show_bug.cgi?id=1376408#c29 justifies this cost.
Performance for this should really be measured on Android. Also note that the less performant you make this the more difficult it will be to enable on more arenas.
I compared the old and new algorithms here:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=309f656d81f2d7045588438f85cce4f43a5b3339&newProject=try&newRevision=ec78dd86fa30ab971dcf960f8ba86349aca29d03
It does hurt quite a bit. I'm reverted to the old version and will think about a better approach.
Assignee | ||
Updated•5 years ago
|
Comment 33•5 years ago
|
||
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/c277d0025188
Randomize free region selection for small allocations in a run r=glandium
Comment 34•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Description
•