Crash in [@ mozilla::SandboxFork::SandboxFork]
Categories
(Core :: Security: Process Sandboxing, defect, P1)
Tracking
()
People
(Reporter: sefeng, Assigned: jld)
References
Details
(Keywords: crash, topcrash, Whiteboard: [not-a-fission-bug])
Crash Data
Attachments
(1 file, 2 obsolete files)
(deleted),
image/png
|
Details |
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/19f3b6a8-c652-4f46-bb4d-b7bb90210107
MOZ_CRASH Reason: MOZ_CRASH(socketpair failed)
Top 10 frames of crashing thread:
0 libxul.so mozilla::SandboxFork::SandboxFork security/sandbox/linux/launch/SandboxLaunch.cpp:409
1 libxul.so mozilla::SandboxLaunchPrepare security/sandbox/linux/launch/SandboxLaunch.cpp:353
2 libxul.so mozilla::ipc::GeckoChildProcessHost::AsyncLaunch ipc/glue/GeckoChildProcessHost.cpp:686
3 libxul.so mozilla::dom::ContentParent::BeginSubprocessLaunch dom/ipc/ContentParent.cpp:2414
4 libxul.so mozilla::dom::ContentParent::PreallocateProcess dom/ipc/ContentParent.cpp:658
5 libxul.so mozilla::PreallocatedProcessManagerImpl::AllocateNow dom/ipc/PreallocatedProcessManager.cpp:304
6 libxul.so mozilla::detail::RunnableMethodImpl<mozilla::PreallocatedProcessManagerImpl*, void xpcom/threads/nsThreadUtils.h:1201
7 libxul.so mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:739
8 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1200
9 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
We've had this assertion for a few years, however, we start to hit this assertion regularly since mid last year.
Reporter | ||
Comment 1•4 years ago
|
||
This link gives us all the crashes since the beginning of 2020, 5 of them had fission enabled. So this is might not be fission related. (Not sure when did we add the fission enabled flag to crash reports)
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Very low volume, we have some ideas and will re-investigate if it raises (might be fd exhaustion).
Updated•4 years ago
|
Comment 3•4 years ago
|
||
I can reproduce this with fission enabled and a lot of tabs, if right after startup I try to close all tabs by keeping Ctrl+W pressed:
https://crash-stats.mozilla.org/report/index/541fc4ab-1e62-4ea7-9c68-ef3f90210416
It does look like FD exhaustion of some sort, because I get a similar crash with `MOZ_RELEASE_ASSERT(result.mFd.fd != -1) (DuplicateDescriptor failed) with the same STR:
https://crash-stats.mozilla.org/report/index/b68af5c4-5689-4442-9d51-9bff60210416
Comment 4•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #3)
I can reproduce this with fission enabled and a lot of tabs, if right after startup I try to close all tabs by keeping Ctrl+W pressed:
https://crash-stats.mozilla.org/report/index/541fc4ab-1e62-4ea7-9c68-ef3f90210416
It does look like FD exhaustion of some sort, because I get a similar crash with `MOZ_RELEASE_ASSERT(result.mFd.fd != -1) (DuplicateDescriptor failed) with the same STR:
https://crash-stats.mozilla.org/report/index/b68af5c4-5689-4442-9d51-9bff60210416
Could this come from bug 1719391 ?
Updated•3 years ago
|
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Updated•3 years ago
|
Updated•3 years ago
|
Updated•1 year ago
|
Comment 24•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Linux on release
:gcp, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
Comment 25•1 year ago
|
||
Jed, this started spiking, can you have a look?
Assignee | ||
Comment 26•1 year ago
|
||
This looks like file descriptor exhaustion, which tends to cause a lot of other problems besides this. In general we don't have good data on how much that happens (even crash reports might be an undercount, because I'm pretty sure parent process fd exhaustion can prevent being able to write out a minidump). Possibly we could increase our fd limit; the Debian/Ubuntu family as well as Fedora have fairly high hard limits, but RHEL could need some outreach. That's a larger issue than this particular bug, of course.
Comment 27•1 year ago
|
||
I think there are several indicators here suggesting that this may be a file descriptor leak introduced in 116:
- Most crashes are with Ubuntu which according to comment 26 has a high hard limit.
- Some user comments mention that they did not have this issue before updating and that now it is recurrent (e.g. here).
- Graphs per version suggest 116 was a turning point and that this is likely to continue in 117 (see attachment).
There could be some interesting hints in the correlations, where e.g. the following modules have "100% in signature", which seems unusual: libpixbufloader-svg.so
, libdrm_radeon.so.1
, libdrm_nouveau.so.2
.
Assignee | ||
Comment 28•1 year ago
|
||
To clarify the resource limit terminology: the soft limit (rlim_cur
) is the limit that's applied when using the resource in question; the hard limit (rlim_max
) is how high an unprivileged process can increase the soft limit. We raise the fd soft limit to 4096 or the hard limit, whichever is lower; so, right now we have 4096 fds per process whether it's Red Hat or Ubuntu or anything else, but in principle we could change this number and it should work on every(?) major distro other than Red Hat (where the hard limit is 4096).
If we're seeing a lot of crashes on Ubuntu, it's probably because a lot of Linux users are on Ubuntu.
As for the correlations, note that this is specific to Linux and to the parent process, but the “overall” numbers are all OSes and all process types, so the presence of common Linux libraries maybe isn't too informative.
I was wondering if the memfd:xshmfence
correlation might be a clue, but it's marked “96.15% vs 65.70% if platform_pretty_version = Ubuntu 22.04.3 LTS” which might mean nothing more than that it's only seen in the parent process, which is expected.
There are a lot of graphics errors about unexpected remote texture size: Size(0,0)
and similar, but that might be a side-effect of fd exhaustion.
I don't have any great ideas here (at least nothing that can be implemented quickly), and this doesn't seem to happen in significant volumes except on release, so if we wanted to do something like increase the limit and see what happens we'd need to wait for a release cycle.
Description
•