Add support for freezing shared memory
Categories
(Core :: IPC, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox70 | --- | fixed |
People
(Reporter: jld, Assigned: jld)
References
(Blocks 1 open bug)
Details
Attachments
(10 files)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
Assignee | ||
Comment 5•6 years ago
|
||
Comment 6•6 years ago
|
||
Just to be clear, the security bugs mentioned above do not affect our current read-only shared memory implementations, only the, er, more elegant solutions Jed intends to use for the IPC-owned helper. So it shouldn't block any of the shared memory work that we currently have in-flight which does not depend on that.
Updated•6 years ago
|
Comment 7•6 years ago
|
||
To clarify, when this bug speaks of "freezing" a shared memory region, does that mean the region will become permanently read-only for all users, or will it be possible to enforce a region being read-only for content processes, while remaining writable for the chrome process that originally created it?
Assignee | ||
Comment 8•6 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #7)
To clarify, when this bug speaks of "freezing" a shared memory region, does that mean the region will become permanently read-only for all users, or will it be possible to enforce a region being read-only for content processes, while remaining writable for the chrome process that originally created it?
Read-only for all users. One-sided read-only isn't possible with the ashmem and memfd backends. So, you'd need to create and send a new snapshot, or send incremental updates in separate memory segments.
Comment 9•6 years ago
|
||
This sounds like a potential problem for the shared-fontlist code, which depends on being able to incrementally update the shared data that the content processes are reading. The current patches handle this by allocating a large (256K) shared memory segment to use as an allocation arena, and then handing out smaller chunks of this for the various records that need to be stored and shared. Eventually (depending on the number of fonts installed and used) additional 256K segments will be allocated to extend the arena as needed.
In the course of a session, we're likely to do several hundred separate updates to the font list, as and when we load data for additional font families and individual faces. So if we were to use a new segment for each update, what sort of overheads are there per shared-memory segment? How many separate segments can we reasonably allocate, before running into performance degradation or hard limits -- dozens? hundreds? thousands?
Comment 10•6 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #9)
In the course of a session, we're likely to do several hundred separate updates to the font list, as and when we load data for additional font families and individual faces. So if we were to use a new segment for each update, what sort of overheads are there per shared-memory segment? How many separate segments can we reasonably allocate, before running into performance degradation or hard limits -- dozens? hundreds? thousands?
Thousands. There shouldn't be much additional overhead for shared memory mappings over non-shared mappings.
In principal, the main limiting factor should be the 64KB kernel VMA limit on Linux, which applies equally to shared mappings and non-shared mappings. In practice, the fact that some Linux environments place a hard limit on the number of file descriptors a process is allowed to have open is more likely to be a problem, since the parent process will need to keep a file descriptor open for each segment it needs to be able to share with new file processes.
The best option is probably to occasionally coalesce segments and broadcast them to the child processes to remap in place of their existing segments. The SharedMap code that we use for shared JS data does something similar, except that it skips incremental updates entirely and just rebuilds the entire snapshot with each update (while trying to coalesce as many updates as possible into a single rebuild, and defer the rebuild until an idle slice).
I don't know if the shared font code would be able to get away with that, but I'm pretty sure it should be able to get away with a hybrid incremental/full rebuild approach.
Comment 11•6 years ago
|
||
How many separate segments can we reasonably allocate, before running into performance degradation or hard limits -- dozens? hundreds? thousands?
Thousands. There shouldn't be much additional overhead for shared memory mappings over non-shared mappings.
I guess the other thing I should've asked is what is the typical cost (time-wise) of allocating a new shared memory segment? I'm imagining it is probably a lot more expensive than simply grabbing the next N bytes of an existing region.
The best option is probably to occasionally coalesce segments and broadcast them to the child processes to remap in place of their existing segments.
I'm a bit wary of doing that, as I think it would force all the processes to entirely flush their font and shaped-text caches, and reflow everything, which we really don't want to be triggering if we can help it. With the current patches, the only time this happens is if the actual list of available fonts changes during the browser session -- e.g. the user installs a new font while Firefox is running -- which is (a) a rare event, and (b) likely to be at a time when Firefox performance isn't in focus, because the user is busy managing fonts rather than loading web pages.
I'll do some instrumentation to determine how many individual allocations are currently happening in a few typical runs, though I suspect the number will be rather large.
Comment 12•6 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #11)
I guess the other thing I should've asked is what is the typical cost (time-wise) of allocating a new shared memory segment? I'm imagining it is probably a lot more expensive than simply grabbing the next N bytes of an existing region.
That's a good question. It might be a good idea to profile, but as a rough idea:
- On Windows, it should be pretty cheap, even as things stand now. The shared memory APIs are pretty lightweight.
- On Linux, when we're using /dev/shm it's relatively expensive. Each allocation requires several path-based filesystem operations, each of which also happens to require a system call. Once we switch to memfd_create and ashmem, it should be comparable to Windows.
Regardless, it would obviously be more expensive than just adding onto the end of an existing allocation, but probably not much more than a malloc().
The real cost in either case, though, would probably be broadcasting the descriptors to the content processes.
The best option is probably to occasionally coalesce segments and broadcast them to the child processes to remap in place of their existing segments.
I'm a bit wary of doing that, as I think it would force all the processes to entirely flush their font and shaped-text caches, and reflow everything, which we really don't want to be triggering if we can help it. With the current patches, the only time this happens is if the actual list of available fonts changes during the browser session -- e.g. the user installs a new font while Firefox is running -- which is (a) a rare event, and (b) likely to be at a time when Firefox performance isn't in focus, because the user is busy managing fonts rather than loading web pages.
Hm. I was assuming we would be able to do this with little disruption as long as data doesn't change, but I don't know how references to the shared memory data are stored in the current regime.
If it's that difficult to update it without rebuilding everything, it would probably be possible to remap the coalesced data to the same addresses, and with the same layout as the piecemeal data. That would leave us some wasted padding, and we'd need some sort of fallback to handle another thread mapping something else in our target address space in the middle of the update, but it would probably still be better than not coalescing the mappings.
Assignee | ||
Comment 13•6 years ago
|
||
We might have a solution for one-sided write access; it should work with memfd but it's not yet known if it works with ashmem.
On Linux, a process can open /proc/self/fd/N
to get a new file descriptor for the object referenced by its existing fd N
, with a new set of flags and a new check of its credentials against the object's permissions. This came up before in some context previously (but apparently not the comments of this bug, yet), because a process with access to /proc/self/fd
could regain write permissions on a “frozen” fd to an unlinked /dev/shm
file, for example. (The sandbox policies don't allow that, so we're safe there.)
But it can also drop permissions, so it's possible to get an O_RDONLY
fd to a memfd
that way, and it's not possible to create PROT_WRITE
MAP_SHARED
mappings with that fd (or write it with write
).
(I can't take the credit for this discovery; it was pointed out by someone Alex Gaynor asked about this problem, whereas I'd forgotten about that feature of procfs magic symlinks.)
What I'm not sure about yet is Android: the same trick should work to get a read-only ashmem fd, but I don't trust that ashmem will follow the normal Unix file behavior without testing. (We also don't have sandboxing on Android yet, but there are plans for it, and I don't want it to become impossible because of what we do here.)
But also, newer Androids should have memfd: definitely Oreo (when Google started enforcing kernel versions) and potentially back to Marshmallow depending on the device.
Comment 14•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #13)
We might have a solution for one-sided write access; it should work with memfd but it's not yet known if it works with ashmem.
On Linux, a process can open
/proc/self/fd/N
to get a new file descriptor for the object referenced by its existing fdN
, with a new set of flags and a new check of its credentials against the object's permissions. This came up before in some context previously (but apparently not the comments of this bug, yet), because a process with access to/proc/self/fd
could regain write permissions on a “frozen” fd to an unlinked/dev/shm
file, for example. (The sandbox policies don't allow that, so we're safe there.)
The problem with this is that we don't currently require procfs, and certain users (and Tor Browser) want to run without it.
Assignee | ||
Comment 15•6 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #14)
The problem with this is that we don't currently require procfs, and certain users (and Tor Browser) want to run without it.
We kind of do require procfs on kernels before 3.17: it's used by the sandbox and we effectively release-assert that it's present, unless sandboxing is completely turned off. (And this doesn't seem to have come up on the Tor bug yet; maybe nobody interested in the full-browser sandbox is using an older LTS distro?) But, yes. We'd have to detect that case and fall back from memfd to POSIX shm, and hopefully that won't be a problem on Android.
Assignee | ||
Comment 16•6 years ago
|
||
It doesn't work on Android, but not the way I thought it wouldn't work. A read/write mapping of a read-only ashmem fd fails with EACCES
, as expected. However, a read-only mapping of a read-only fd also fails, with EINVAL
.
The problem seems to be using procfs to reopen the inode: doing that with O_RDWR
instead of O_RDONLY
returns a fd that fails mmap
with EINVAL
both for read-only and read/write.
Assignee | ||
Comment 17•6 years ago
|
||
So it looks like the only way we could get memory that one process can write and another can only read (including if malicious but adequately sandboxed) on Android with kernel < 3.17 is to create a temporary file on an actual storage-backed filesystem (i.e., not tmpfs), which is not ideal. I just did some experiments with memory-mapped ext4 files (vs. memfd) and checking the clean/dirty stats in /proc/N/smaps
, and Linux appears to write changes back to secondary storage even if the file was unlinked.
Assignee | ||
Comment 18•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #4)
In light of https://bugs.chromium.org/p/project-zero/issues/detail?id=1671
(Mac POSIX shm was always mappable read/write, even from a read-only fd), an
additional requirement is a unit test to make sure freezing actually works.
I did this, and there's a problem: CVE-2018-4435 is fixed only in MacOS 10.12 and up (because older versions are no longer supported by Apple), but we still support back to 10.9, and (helpfully, in this case) our CI runs on 10.10
So we're going to need to continue using temporary files for freezable shared memory on older MacOS. (Non-freezeable can still use shm_open
.) I have an implementation and it passes simple unit tests.
Assignee | ||
Comment 19•6 years ago
|
||
FreeBSD's SHM_ANON is useful for the usual case of shared memory, but it
doesn't support freezing. It could be re-added later, but for now it's
simplest to remove it (and use named shm instead) while refactoring.
Depends on D26738
Assignee | ||
Comment 20•6 years ago
|
||
Despite the comment saying not to use the "handle" except as an opaque
identifier, it is being used to pass the handle to other OS APIs. Direct
access to the handle needs to be controlled to make sure freezing is
safe, so this patch replaces that with interfaces that are more explicit
about ownership and lifetime.
Depends on D26739
Assignee | ||
Comment 21•6 years ago
|
||
The Unix backend for shared memory needs to keep the mapped size to pass
to munmap, while the Windows backend doesn't. Currently it's reusing the
max_size field, and then zeroing it when it's unmapped, which breaks the
freezing use case. This patch uses a dedicated field for that.
Depends on D26740
Assignee | ||
Comment 22•6 years ago
|
||
This uses RAII to handle error-case cleanup in the POSIX backend for
SharedMemory::Create, to simplify the complexity that will be added to
support freezing.
Depends on D26741
Assignee | ||
Comment 23•6 years ago
|
||
This allows writing to shared memory and then making it read-only before
sharing it to other processes, such that a malicious sandboxed process
cannot regain write access. This is currently available only in the
low-level base::SharedMemory interface.
The freeze operation exposes the common subset of read-only shared
memory that we can implement on all supported OSes: with some APIs
(POSIX shm_open) we can't revoke writeability from existing capabilies,
while for others (Android ashmem) we must revoke it. Thus, we require
that the writeable capability not have been duplicated or shared to
another process, and consume it as part of freezing. Also, because in
some backends need special handling at creation time, freezeability must
be explicitly requested.
In particular, this doesn't allow giving an untrusted process read-only
access to memory that the original process can write.
Note that on MacOS before 10.12 this will use temporary files in order to
avoid an OS security bug that allows regaining write access; those OS
versions are no longer supported by Apple (but are supported by Firefox).
Depends on D26742
Assignee | ||
Comment 24•6 years ago
|
||
One problem with using shared memory instead of files for MemMapSnapshot
is that AutoMemMap was trying to use fstat() to obtain the object size;
that doesn't work with ashmem on Android and was causing problems with
the Mac sandbox, but it's not necessary, because we already know the
size. This patch changes it to not do that.
Depends on D26743
Assignee | ||
Comment 25•6 years ago
|
||
There was a strange bug when converting SharedStringMap to use shared
memory: on Android, some tests would fail because a pref wasn't set or
there was something wrong with an expected error message. The root
cause was that mapping ashmem with MAP_PRIVATE results in all zeroes
(see bug 1525803), but that was read as a valid SharedStringMap with
length 0.
To prevent any possible future issues like that, this patch adds a
nonzero magic number to the header. It fits into padding on 64-bit and
the cost of setting and checking it should be essentially free.
Depends on D26744
Assignee | ||
Comment 26•6 years ago
|
||
Depends on D26745
Assignee | ||
Comment 27•6 years ago
|
||
Also refactor SharedMemoryBasic::SystemProtect to allow testing cases
that are expected to fail.
Depends on D26748
Updated•6 years ago
|
Updated•6 years ago
|
Comment 28•6 years ago
|
||
Jed, can you add a quick summary of where we're at on this?
It also seems like this should still block 1550037, from what I understand the shared font impl doesn't need read/write yet (I could be wrong about that), Jonathan can you confirm?
Comment 29•6 years ago
|
||
The shared font-list implementation wants ongoing read/write access from the parent process, while child (content) processes have read-only access. AIUI that's what bug 1550900 intends to offer, so bug 1550037 (which should probably be re-titled a bit) will build on that.
To make the font-list work with totally-frozen blocks (i.e. the parent loses all write access as soon as it shares the block to any content process) would require its allocation strategy to be reworked to use many smaller shmem blocks, instead of allocating larger "arenas" within which it can incrementally sub-allocate extremely cheaply. I'm hoping bug 1550900 means we can avoid going that way.
Assignee | ||
Comment 30•6 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #28)
Jed, can you add a quick summary of where we're at on this?
It turns out there's a regression caused by bug 1540135 encoding the old Chromium code's path names, as seen on Try here and in the code here, so I need to fix that.
IMHO we should ignore anything in /dev/shm
for that — it's not really I/O any more than mmap
/munmap
are. But it will also need a special case for MacOS ≤ 10.11 (where we actually will do disk I/O, as a workaround for the above-mentioned security bug; and in theory that would have also failed in my Try run, because the filenames changed, except for some reason the fancy new task-based try chooser didn't run any Mac opt tests and this test is disabled on debug builds pending bug 1549723).
Assignee | ||
Comment 31•5 years ago
|
||
The tests for unexpected main thread I/O had exemptions for the specific
paths that were being used for shared memory, which would cause it to
fail with the changes in this bug. This patch does two things:
-
On Linux, /dev/shm is always tmpfs (a memory filesystem), so it's not
going to cause disk I/O, and it's used by glibc to implement the POSIX
standard shm_open API. This allows all /dev/shm paths instead of
limiting it to a hard-coded prefix. -
On MacOS, with the patches in this bug, we'll no longer use temporary
files for shared memory on current OS versions, but we still need them on
older versions to avoid an OS bug (https://crbug.com/project-zero/1671),
and they are backed by disk in this case, so we want to allow only the
IPC files. However, the path prefix has changed.
Updated•5 years ago
|
Comment 32•5 years ago
|
||
Assignee | ||
Comment 33•5 years ago
|
||
Backed out for Android build breakage: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=5645e0cc3915e49c4ffee8ce5c0080615caf565c
Comment 34•5 years ago
|
||
Assignee | ||
Comment 35•5 years ago
|
||
The one-line fix for the ifdef has passed Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6dd820ec86c7e89e002cbc20ca1de72862dd19d3
Comment 36•5 years ago
|
||
Comment 37•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6c2b995a9d30
https://hg.mozilla.org/mozilla-central/rev/39e18373e3d3
https://hg.mozilla.org/mozilla-central/rev/036809330a51
https://hg.mozilla.org/mozilla-central/rev/0aa8af4965c5
https://hg.mozilla.org/mozilla-central/rev/9d7f1835f96f
https://hg.mozilla.org/mozilla-central/rev/8332565a6943
https://hg.mozilla.org/mozilla-central/rev/47d387b6cd4a
https://hg.mozilla.org/mozilla-central/rev/8a322064cf6d
https://hg.mozilla.org/mozilla-central/rev/2c4d12bdfec3
https://hg.mozilla.org/mozilla-central/rev/a8518ea4b594
Comment 38•5 years ago
|
||
Backed out for toolchain bustages on UniquePtrExtensions.h
Backout link: https://hg.mozilla.org/integration/autoland/rev/56834c5fc9390da3bdfb41a3fe3f06119fbb3502
Push link: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=a8518ea4b5944cd5563a882c27b304fc82d07dba&selectedJob=253975140
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=253975140&repo=autoland&lineNumber=922
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 39•5 years ago
|
||
Bug 1562358 will take care of the build breakage (confirmed on Try), but I'm going to wait until after the branch next Monday to try to re-land.
Comment 40•5 years ago
|
||
Hey Jed, can we try again now that the soft freeze is over?
Assignee | ||
Updated•5 years ago
|
Comment 41•5 years ago
|
||
There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:jld, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•5 years ago
|
Comment 42•5 years ago
|
||
Comment 43•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/16d99c5a8a55
https://hg.mozilla.org/mozilla-central/rev/f814f58d43d6
https://hg.mozilla.org/mozilla-central/rev/bdf7a041928a
https://hg.mozilla.org/mozilla-central/rev/6da235c5a77c
https://hg.mozilla.org/mozilla-central/rev/ddfa5ff81061
https://hg.mozilla.org/mozilla-central/rev/225411558a4e
https://hg.mozilla.org/mozilla-central/rev/0f466f2a18c0
https://hg.mozilla.org/mozilla-central/rev/1fa598bf2699
https://hg.mozilla.org/mozilla-central/rev/c3ca7014893c
https://hg.mozilla.org/mozilla-central/rev/d8ac382b5f17
Assignee | ||
Updated•5 years ago
|
Description
•