Closed Bug 1176099 Opened 9 years ago Closed 8 years ago

glib blocks all signals on "gmain" thread

Categories

(Core :: Security: Process Sandboxing, defect)

All
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox41 --- affected
firefox49 --- fixed

People

(Reporter: jld, Assigned: tedd)

References

Details

(Whiteboard: sblc1)

Attachments

(5 files, 6 obsolete files)

(deleted), application/x-shellscript
Details
(deleted), application/x-shellscript
Details
(deleted), application/x-shellscript
Details
(deleted), patch
tedd
: review+
Details | Diff | Splinter Review
(deleted), patch
luke
: review+
Details | Diff | Splinter Review
g_get_worker_context in glib creates the thread named "gmain" with all signals blocked (except SIGKILL and SIGSTOP, which can't be blocked, and the glibc-reserved signals that are masked out by pthread_sigmask).

There are two problems here, both of which will affect Linux desktop content process sandboxing.  One is that our seccomp-bpf sandboxing code, which signals each thread to cause it to enable system call filtering, won't be able to signal that thread.  Bug 1004011 will fix this on newer systems, which might be good enough.

The other problem is that any seccomp-bpf trap taken on that thread will kill the process.  The kernel's logic is a little weird — if the signal is blocked OR ignored, the signal is unblocked AND the disposition is reset to default.  So if there's a handler but the signal is blocked, the process is just killed.  This means that nothing done on that thread can be polyfilled via SIGSYS.

This might be fixable by using glib's main-loop interface to dispatch a task to that thread that unmasks the desired signals.
Alternate approach: intercept the sigprocmask() calls at the library level — ask glandium or someone else who knows ELF better than me how best to do that — and remove the signals in question from the mask.  (The caller's sigset_t is const, so this is a little fiddly, but glibc documents an extension sigandset() that looks useful.)

For the broadcast signal, that means “allocating” it (by installing a signal handler) ahead of time, probably in SandboxEarlyInit — if needed; if a sandbox prototoype is tsync-only until that can be done, that's not the worst outcome.

One advantage of this approach is that it also covers any other all-signals-blocked worker threads; in particular, pulseaudio is known to create them, so this could help with making progress on sandboxing before audio remoting happens.
I would like to work on this
Assignee: nobody → julian.r.hector
Whiteboard: sblc1
I have investigated a few ways to solve the problem and here are the ways I could come up with to solve it.

First off, my firefox binary is dynamically linked against the following libraries:

> ldd firefox
> linux-vdso.so.1 (0x00007fff5d4e9000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc938a0c000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00007fc938808000)
> librt.so.1 => /lib64/librt.so.1 (0x00007fc938601000)
> libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6 (0x00007fc9382f2000)
> libm.so.6 => /lib64/libm.so.6 (0x00007fc937ff1000)
> libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libgcc_s.so.1 (0x00007fc937dda000)
> libc.so.6 => /lib64/libc.so.6 (0x00007fc937a43000)
> /lib64/ld-linux-x86-64.so.2 (0x00007fc938c28000)

But gecko does a bunch of dlopen() to load more libraries along the way.
When taking the approach of intercepting sigprocmask() mentioned by :jld in Comment 1, we can do it in the following ways.

1. Use of LD_PRELOAD
I think most of the people familiar with Linux know of LD_PRELOAD, we could simply write an library that implements sigprocmask() and set the LD_PRELOAD variable prior to starting firefox, this would probably be the fastest to implement, but in my opinion also the most hackish way. Plus it also requires to build an additional library and ship it. 

But an added benefit for this method is that it would also 'hook' the calls to that function inside the libraries that firefox links against (might turn into a problem with the other approaches).

2. Wrap functionality of ld
ld provides a way to wrap library functions using the --wrap parameter (or -wrap for gcc), we already use this for Android [1]. For example getuid(), we could implement a function called __wrap_getuid() and specify the --wrap parementer, and everytime getuid() is called, __wrap_getuid(). Inside our wrapper we can call, __real_getuid() which will be the actual getuid() function.

With this approach we can easily hook the desired function, but I don't think it will solve the problem because it is only valid in the context of the binary compiled with the --wrap parameter, so each library that is not statically linked in the binary will still call the actual getuid() function.

3. Patch DT_SYMTAB of libpthread.so
With this approach, we patch the symbol information for sigprocmask() in memory, so that whenever a the address for sigprocmask() gets resolved by the linker, it will actually point to our wrapper function.

Using this method, we won't have to ship an extra library and load it with LD_PRELOAD, and patching the symbol table in memory shouldn't have too much of an impact performance wise. It also needs to be done only once. 

The drawback here is that every library that is already loaded or has the symbol for sigprocmask() already resolved prior to the hooking is not affected by the hook. Unless the library is loaded with RTLD_LAZY and sigprocmask() hasn't been called prior to the hooking. Assuming we do the hooking in the main() function, the above mentioned libraries could not be affected.

4. Inline hooking
I think this is probably the least favorable method, because it requires to patch the function prolog of sigprocmask() to jump to our hook method upon entering the function. This heavily relies on the underlying architecture (x86, x86_64, ARM, etc.).

I wrote a PoC for the first 3 methods where the PoC for method (2) shows the problem with loading shared libraries. (I will attach them to this bug)
I think that method (3) would be the best choice (assuming we don't want to mess around with LD_PRELOAD).

:glandium what do you think? Maybe you know of a better way to do this.

[1] https://dxr.mozilla.org/mozilla-central/rev/9c5d494d05485aebf3fedf649abc0e7ae9d2dcf2/old-configure.in#6142
Flags: needinfo?(mh+mozilla)
Attached file PoC for LD_PRELOAD (deleted) —
Attached file PoC for --wrap option (deleted) —
Attached file Poc for DT_SYMTAB patching (deleted) —
> With this approach we can easily hook the desired function, but I don't think it will solve the problem because it is only valid in the context of the binary compiled with the --wrap parameter, so each library that is not statically linked in the binary will still call the actual getuid() function.

I don't think that's a problem.  The executable precedes all libraries in the symbol search path, so if it defines "sigprocmask" (with appropriate visibility; i.e., MOZ_EXPORT) then any calls to sigprocmask should resolve to that instead of the one in libc.  This includes calls from within libc itself (they indirect via libc's PLT), at least on not-Android.

https://software.intel.com/sites/default/files/m/a/1/e/dsohowto.pdf isn't the most help explanation of ELF linking, but it is fairly complete.
Thanks :jld, I just did some testing, and defining:

> __attribute__((visibility("default"))) int getuid() {...}

in my test cases above, seemed to work as you described it.

So the easiest way would be define sigprocmask() with MOZ_EXPORT in plugin-container [1] or rather firefox itself?

We might also want to hook pthread_sigmask() since that seems to be used in glib [2], or is that just a wrapper around sigprocmask() to guarantee thread safety?

[1] https://dxr.mozilla.org/mozilla-central/rev/f14898695ee0dd14615914f3e1401f17df57fdd7/ipc/contentproc/plugin-container.cpp
[2] https://github.com/winlibs/glib/blob/6d362f0abc17050454c86d03e93127a0685b7b77/glib/gmain.c#L5646
I think we could also do it on the seccomp layer, since sigprocmask is a syscall, or am I missing something? Would only be a solution for this particular case though.
I think doing it on the seccomp layer has an additional benefit. If a library decides. for whatever reason, to use the syscall instead of the library function we would catch those cases as well.

:jld, :glandium what do you both think?
Flags: needinfo?(jld)
If you can do it with seccomp-bpf, then I'd say that's the best thing to do. Otherwise, what :jld said, you can just expose the function from the relevant executable (the one for the sandbox, whichever it is these days).
Flags: needinfo?(mh+mozilla)
(In reply to Julian Hector [:tedd] [:jhector] from comment #9)
> I think we could also do it on the seccomp layer, since sigprocmask is a
> syscall, or am I missing something? Would only be a solution for this
> particular case though.

The signal mask is passed by pointer, so the seccomp-bpf program can't read it; you'd have to do something like allowing the syscall only from a given instruction pointer value that normal code wouldn't use (see security/sandbox/chromium/sandbox/linux/seccomp-bpf/syscall.h), then using that to reissue the intercepted syscalls with a possibly modified signal mask.  (Malicious code could bypass this interception, but it's not a security exposure, so that's fine.)  And it looks like the Chromium bpf_dsl doesn't expose functionality for checking the IP value, so that would need to be added — and, ideally, upstreamed.  Note that sigprocmask needs to be thread-safe and async signal safe, so other ways of distinguishing the post-interception syscalls, like using a known pointer value for the signal mask pointer, are problematic.  Also, this wouldn't help for signal mask changes made before seccomp-bpf is started (which includes the glib thread).

Using ELF-level interception for this seems simpler to me.
Flags: needinfo?(jld)
Attached patch [WIP] Add sigprocmask/pthread_sigmask hooks (obsolete) (deleted) — Splinter Review
This is a WIP patch for adding the desired hooks. Just hooking sigprocmask() doesn't seem to work, when I did local tests I also had to hook pthread_sigmask(). 

libpthread can be compiled with INTERNAL_SYSCALL [1] which causes pthread_sigmask() to use the system call instead of the library function sigprocmask(). Nonetheless, I also added a hook for sigprocmask() in case some other threads block all signals.

I did a try push, but didn't run any tests for now since it is the first version of the patch. https://treeherder.mozilla.org/#/jobs?repo=try&revision=d28e3cca118d

What test would you suggest I should run for the final push to try?

[1] https://github.com/lattera/glibc/blob/59ba27a63ada3f46b71ec99a314dfac5a38ad6d2/nptl/sysdeps/pthread/pthread_sigmask.c#L45
Attachment #8734100 - Flags: feedback?(mh+mozilla)
Attachment #8734100 - Flags: feedback?(jld)
Comment on attachment 8734100 [details] [diff] [review]
[WIP] Add sigprocmask/pthread_sigmask hooks

Review of attachment 8734100 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/sandbox/linux/hooks/moz.build
@@ +5,5 @@
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +SOURCES += [
> +    'pthread_sigmask.cpp',
> +    'sigprocmask.cpp',

There's not much added value from having one source file per function.

@@ +16,5 @@
> +OS_LIBS += [
> +    'dl',
> +]
> +
> +FINAL_LIBRARY = 'mozglue'

Do you really want this applied to all programs? I seems to me this should go in mozsandbox. So you might as well just add it to security/sandbox/linux instead of adding a directory.

::: security/sandbox/linux/hooks/pthread_sigmask.cpp
@@ +20,5 @@
> +extern "C" MOZ_EXPORT int
> +pthread_sigmask(int how, const sigset_t* set, sigset_t* oldset)
> +{
> +  if (!gRealPthreadSigmask) {
> +    gRealPthreadSigmask = (int (*)(int, const sigset_t*, sigset_t*))

I think you can do

static auto gRealPthreadSigmask = (int (*)(int, const sigset_t*, sigset_t*)) dlsym(RTLD_NEXT, "pthread_sigmask");

and remove the if and the declaration outside the function.

::: security/sandbox/linux/hooks/sigprocmask.cpp
@@ +12,5 @@
> +int (*gRealSigprocmask)(int, const sigset_t*, sigset_t*) = nullptr;
> +
> +// This file defines a hook for the sigprocmask() function. As described in
> +// Bug 1176099, some threads block SIGSYS signal which breaks our seccomp-bpf
> +// sandbox. To avoid this, we intercept the call and remove

seems like the carriage return here is too early
Attachment #8734100 - Flags: feedback?(mh+mozilla)
Comment on attachment 8734100 [details] [diff] [review]
[WIP] Add sigprocmask/pthread_sigmask hooks

Review of attachment 8734100 [details] [diff] [review]:
-----------------------------------------------------------------

Would it work to use the -Wl,--wrap approach (see MOZ_GLUE_WRAP_LDFLAGS in old-configure.in, although I'm not sure where to add new things like that in the not-quite-post-autoconf-yet world) instead of dlsym and RTLD_NEXT?

Also, would it be possible to reduce the code duplication between the pthread_sigmask and sigprocmask wrappers?

::: security/sandbox/linux/hooks/pthread_sigmask.cpp
@@ +36,5 @@
> +
> +  sigset_t mask = *set;
> +  if (sigdelset(&mask, SIGSYS) < 0) {
> +    errno = ENOSYS;
> +    return -1;

pthread_sigmask returns the error code directly instead of using the -1/errno convention.  See http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_sigmask.html#tag_16_459_04
Attachment #8734100 - Flags: feedback?(jld)
(In reply to Mike Hommey [:glandium] from comment #14)
> Do you really want this applied to all programs? I seems to me this should
> go in mozsandbox. So you might as well just add it to security/sandbox/linux
> instead of adding a directory.

mozsandbox might not always be statically linked into plugin-container, so it seemed better to not depend on that and to make it its own directory instead.  It could also be its own library instead of reusing mozglue, if that would make sense.
(In reply to Jed Davis [:jld] from comment #15)
> Would it work to use the -Wl,--wrap approach (see MOZ_GLUE_WRAP_LDFLAGS in
> old-configure.in, although I'm not sure where to add new things like that in
> the not-quite-post-autoconf-yet world) instead of dlsym and RTLD_NEXT?

That wouldn't work.

(In reply to Jed Davis [:jld] from comment #16)
> (In reply to Mike Hommey [:glandium] from comment #14)
> > Do you really want this applied to all programs? I seems to me this should
> > go in mozsandbox. So you might as well just add it to security/sandbox/linux
> > instead of adding a directory.
> 
> mozsandbox might not always be statically linked into plugin-container, so
> it seemed better to not depend on that and to make it its own directory
> instead.  It could also be its own library instead of reusing mozglue, if
> that would make sense.

It is always statically linked on the platform where this hack is necessary, right?
Attached patch [WIP] Add sigprocmask/pthread_sigmask hooks v2 (obsolete) (deleted) — Splinter Review
Thanks :jld and :glandium for your feedback, I made the changes to the patch. Fixed the return values, combined both function into a single file and tried to reduce the redundant code. 

Regarding the FINAL_LIBRARY, how should we handle this now? Leave it in mozglue, or put it somewhere else?
Attachment #8735454 - Flags: feedback?(mh+mozilla)
Attachment #8735454 - Flags: feedback?(jld)
Attachment #8734100 - Attachment is obsolete: true
Comment on attachment 8735454 [details] [diff] [review]
[WIP] Add sigprocmask/pthread_sigmask hooks v2

Review of attachment 8735454 [details] [diff] [review]:
-----------------------------------------------------------------

I think the comment about putting it in mozsandbox and sticking it in security/sandbox/linux still applies.
Attachment #8735454 - Flags: feedback?(mh+mozilla)
:gcp found an interesting discussion on this topic from pulseaudio:

https://lists.freedesktop.org/archives/pulseaudio-discuss/2015-October/024533.html
I guess it could go into mozsandbox if there's a comment in the moz.build warning that something needs to change if mozsandbox is ever moved out of the executable.

As for pulseaudio: it's great that they're doing (or considering doing?) that; how much it will actually help depends on how much of the userbase is on a new enough (or patched?) version of the library by the time it would be needed.
Attachment #8735454 - Flags: feedback?(jld) → feedback+
Thanks for the feedback. I moved the code into security/sandbox/linux/, and added a comment to moz.build saying that SandboxHooks.cpp needs to be statically linked into the main executable.

Here is the latest try push for builds: https://treeherder.mozilla.org/#/jobs?repo=try&revision=191ff55eca8e

I don't know which tests I should cover for try, any ideas?
Attachment #8735454 - Attachment is obsolete: true
Attachment #8738427 - Flags: review?(mh+mozilla)
Attachment #8738427 - Flags: review?(jld)
Comment on attachment 8738427 [details] [diff] [review]
Add hooks for sigprocmask/pthread_sigmask r=jld r=glandium

Review of attachment 8738427 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/sandbox/linux/SandboxHooks.cpp
@@ +25,5 @@
> +
> +extern "C" MOZ_EXPORT int
> +sigprocmask(int how, const sigset_t* set, sigset_t* oldset)
> +{
> +  static auto gRealSigprocmask = (int (*)(int, const sigset_t*, sigset_t*))

you could further factor things by having a function like HandleSigset(const char* func_name, int how, const sigset_t* set, sigset_t* oldset, bool use_errno)
Attachment #8738427 - Flags: review?(mh+mozilla) → feedback+
Attachment #8738427 - Flags: review?(jld) → review+
Thanks :glandium, I think I see what you mean with the new helper function signature. But I don't think it is going to work with a static local function pointer, because if were like this:

> static int HandleSigset(const char* aName, ...)
> {
>   static auto sRealFunc = (...) dlsym(RTLD_NEXT, name);
>   ...
> }

It would initialize sRealFunc only once, with the symbol of the first function that called it, so either sigprocmask() or pthread_sigmask()...

Instead of passing the name, I passed a function pointer as the argument and did the dlsym() call in the wrapper function.

If you had something different in mind, please let me know.

Carry over r+ from :jld

Push to try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=811be4f108e7
Attachment #8739389 - Flags: review?(mh+mozilla)
Attachment #8738427 - Attachment is obsolete: true
Comment on attachment 8739389 [details] [diff] [review]
Add hooks for sigprocmask/pthread_sigmask r=jld r=glandium

Review of attachment 8739389 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/sandbox/linux/SandboxHooks.cpp
@@ +21,5 @@
> +                        int aHow, const sigset_t* aSet,
> +                        sigset_t* aOldSet, bool aUseErrno)
> +{
> +  if (!aRealFunc) {
> +    errno = ENOSYS;

You shouldn't set errno in the case where aUseErrno is true. The meaning you gave to aUseErrno is also the opposite of what I'd expect. That is, that aUseErrno = true should mean that the error is set in errno, not that errno is returned. Maybe a clearer variable name, like aReturnErrorCode, would make things clearer.
Attachment #8739389 - Flags: review?(mh+mozilla) → feedback+
Carry over r+ from :jld.

:glandium, I think you meant to say "You shouldn't set errno when aUseErrno is false", I adjusted the patch to reflect your intended meaning of aUseErrno (if true -> error is set in errno)
Attachment #8739389 - Attachment is obsolete: true
Attachment #8742368 - Flags: review?(mh+mozilla)
Comment on attachment 8742368 [details] [diff] [review]
Add hooks for sigprocmask/pthread_sigmask r=jld r=glandium

Review of attachment 8742368 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/sandbox/linux/SandboxHooks.cpp
@@ +21,5 @@
> +                        int aHow, const sigset_t* aSet,
> +                        sigset_t* aOldSet, bool aUseErrno)
> +{
> +  if (!aRealFunc) {
> +    errno = aUseErrno ? ENOSYS : errno;

This always reads and sets errno if aUseErrno is not set. It would be better to go with a simple if (aUseErrno) {}
Attachment #8742368 - Flags: review?(mh+mozilla)
Carry over r+ from :jld.
Attachment #8742368 - Attachment is obsolete: true
Attachment #8743270 - Flags: review?(mh+mozilla)
Attachment #8743270 - Flags: review?(mh+mozilla) → review+
Created a proper hg patch.

Another push to try, just in case: https://treeherder.mozilla.org/#/jobs?repo=try&revision=06d24a4556f5

Carry over r+ from :jld and :glandium
Attachment #8743270 - Attachment is obsolete: true
Attachment #8743814 - Flags: review+
I ran all tests on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=37dc53399d65&selectedJob=19966874

some of them failed. Some of those that failed, also failed on other tests.
Just to check I ran another try push while selecting the tests that failed before:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee2d273e5d10

Only M-e10s(6), and tc-M-e10s(1) have failed in both try pushes. Where M-e10s(6) seemed to have failed in previous test runs.

And the failures on windows, can't be related to these changes (since this is Linux only).

I will check tc-M-e10s(1) to see if it is related to this bug.
I couldn't find the problem to be related to my changes, I did a final try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3798cce9739a

And all the tests passed at least once on one of the 3 try runs, so I think we are good to land.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/458d4af56800
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
:luke encountered a crash because of this patch. Here is his error log: https://pastebin.mozilla.org/8869111

Thanks to his help, it seems the problem is caused by a NULL set being passed to the function.
I will rewrite to patch to include a NULL check for the set.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Fix missing NULL check (deleted) — Splinter Review
Quick fix. According to the man pages, if |set| is NULL, |how| is ignored and the signal mask remains unchanged, but the current signal mask is returned in |oldset|

So if |set| is NULL, we just call the original function.

Push to try for builds: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a17c2c7019ed
Attachment #8746659 - Flags: review?(luke)
Comment on attachment 8746659 [details] [diff] [review]
Fix missing NULL check

Thanks!
Attachment #8746659 - Flags: review?(luke) → review+
Depends on: 1268897
https://hg.mozilla.org/mozilla-central/rev/723d1f49780a
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: