Open Bug 1465669 Opened 6 years ago Updated 2 years ago

Evaluate/consider using shm_open instead of our current shared memory on Mac

Categories

(Core :: IPC, enhancement, P3)

x86_64
macOS
enhancement

Tracking

()

People

(Reporter: jld, Assigned: jld)

References

Details

Attachments

(1 file, 1 obsolete file)

Once we have shm_open as a possible backend for shared memory, it might be interesting to see how it compares to our current OS X implementation for performance. They're both implemented directly on top of Mach shared memory (i.e., not using the filesystem like shm_open on Linux, and especially not using files on a disk-backed FS like we would have been doing before bug 1161166), so there probably isn't a large difference. But maybe the overhead of file descriptors is significant, or maybe SCM_RIGHTS winds up being more efficient than our Mach port passing code, or maybe there's no significant difference (in which case we might want to consider not effectively duplicating code that's already provided by the OS). Also, using shm_open would avoid whatever edge case of our current implementation is causing bug 1264209.
Blocks: 1473156
Summary: Try benchmarking shm_open against our current shared memory on Mac → Evaluate/consider using shm_open instead of our current shared memory on Mac
I looked at Chromium's code, and there's a comment[1] that Mac shm_open is limited to 4MB. If that's still true it would be a problem, although I was able to use the profiler (which sends the entire profile as JSON text in a shared memory segment) with a patch to use shm_open, on 10.13. [1] https://chromium.googlesource.com/chromium/src/+/7eb2180cf9d43c64984e0f190ab17e1cc5a17bd6/base/memory/shared_memory_helper.cc#41 Chromium also has first-class support for attaching Mach ports to IPC messages (using Mojo, not the old IPC that we forked nine years ago), and I haven't looked at the implementation yet but I'd guess it's probably doing something reasonable like giving each channel a pair of Mach ports as well as a socketpair and sending attached ports out-of-band. SharedMemoryBasic_mach, in contrast, does *blocking* Mach IPC to the other process to send the port, get back the numeric value of the port in the other process, and then send it back to that process in the IPDL IPC message. This might not be a huge problem in practice given that each ordered pair of processes has a dedicated thread to handle this IPC… …except that, thanks to bug 1265824, that thread can block for an uncertain amount of time in WaitForTextureIdsToUnlock, which is kind of not good. One way or another we'll want to do something about Mach IPC integration, but if we have to keep using it for shared memory then influences the priority.
I did a Try run: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=e516fba5632770a80ecc2dbd155180bb0a5b8a57&framework=1&showOnlyComparable=1&selectedTimeRange=172800 At the default setting of 2 days' m-c history, there are no significant changes. If I dial it up to 7 days, I get a regression on Kraken which I'm pretty sure is spurious — that's a pure JS computation benchmark, and it goes away if I include more history. At 14 days, there are significant improvements on tpaint and tsvgr_opacity, and an improvement approaching significance on tabpaint, but I don't know if that might be due to some other change that landed during that time. So it seems that there isn't a strong reason to do this, but there's also no obvious performance reason not to.
Attached patch WIP (obsolete) (deleted) — Splinter Review
This is the patch I was testing with; it's also the base for my attempt at bug 1480205. It renames the Mach SharedMemoryBasic class (because that now has dependencies from layers) and builds both that and the POSIX SharedMemoryBasic.
Is this more likely to run into file descriptor exhaustion than the existing approach?
Flags: needinfo?(jld)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #4)

Is this more likely to run into file descriptor exhaustion than the existing
approach?

It will use file descriptors instead of not using them, so strictly speaking the answer is “yes”, but generally we close file descriptors after setting up mappings (ipc::Shmem does this, and CrossProcess{Semaphore,Mutex} expose the ability to do so, which graphics uses). Also, MacOS appears to not set a hard limit on RLIMIT_NOFILE, so we can always raise the descriptor limit as much as we want (which is not the case on many Linux distributions — e.g., Ubuntu sets a hard limit of 4k by default).

Flags: needinfo?(jld)
Attached patch WIP (deleted) — Splinter Review

Rebased, and fixed to handle with the content sandbox early init (maybe not the ideal way, but it works). The RDD and probably Flash sandboxes are broken but should be relatively easy to fix.

Attachment #9008589 - Attachment is obsolete: true

It looks like bug 1474793 is going to just use base::SharedMemory directly instead of blocking on this (see bug 1474793 comment #45).

FWIW, Mojo has switched to using mach on macOS for IPC because of file handle exhaustion: https://docs.google.com/document/d/1nEUEAP3pq6T9WgEucy2j02QwEHi0PaFdpaQcT8ooH9E/edit#

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)

FWIW, Mojo has switched to using mach on macOS for IPC because of file handle exhaustion: https://docs.google.com/document/d/1nEUEAP3pq6T9WgEucy2j02QwEHi0PaFdpaQcT8ooH9E/edit#

That's interesting. The comment about fd limits in the background section is a little off — for setrlimit/getrlimit, the default soft limit is 256 but there's no hard limit, so it can be increased arbitrarily (and we do, and they do), as I mentioned in comment #5.

But the part about a system-wide limit led me into sysctl, where, on 10.13:

kern.maxfiles: 49152
kern.maxfilesperproc: 24576

So that's potentially a problem. And https://crbug.com/714614 reports a value of 12288 for maxfiles (maybe an older OS version?), which is worse.

We might need to worry about how many fds we're using for IPC channels, too — Chromium's new IPC can multiplex many independent channels over one socket, but ours has a lot of baggage about single-threadedness and message ordering, so various things create their own channels, and empirically (about:memory and search for ipc-channels) each content process currently has about 10-12 endpoints each as a result. But that's a separate problem.

We're definitely going to need some way to transport Mach ports (but probably not the current implementation long-term), and Mach shm has some advantages over POSIX just from an API point of view anyway, so this bug might end up being WONTFIX.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: