Closed Bug 1660131 Opened 4 years ago Closed 3 years ago

Don't crash the RDD process due to sandbox violation

Categories

(Core :: Security: Process Sandboxing, defect, P5)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jya, Unassigned)

References

(Blocks 1 open bug)

Details

In bug 1595994 we are moving all decoders to the RDD process.

On Linux, to determine if FFmpeg is usable, the media framework will attempt to load FFmpeg and if that fails will disable further use.

On Linux any two attempts to violate the sandbox rules will cause the process to be killed.
The side effect is that when we attempt to see if FFmpeg is usable in the RDD, this will cause a sandbox violation; what should be a soft failure ends up causing a crash.

And so when the RDD is re-spawned, it will attempt to load FFmpeg again and crash again.

This should be a soft failure, FFmpeg should fail to initialise and should simply be disabled.

Of course, ultimately we will relax the RDD sandbox to allow for FFmpeg to load (and the content sandbox can then be tighten instead) ; but in the mean time, crashing is not a desired behaviour.

Assignee: nobody → gpascutto
Severity: -- → S4
Priority: -- → P1

what should be a soft failure ends up causing a crash.

Note that this is the case in Nightly, but not in Release, see e.g.: https://searchfox.org/mozilla-central/rev/d54210d490ef335b13fc1fcac817525120c8c46b/security/sandbox/linux/Sandbox.cpp#513

So I think we're good here?

I'll help investigate what causes the violation through your WIP
https://searchfox.org/mozilla-central/rev/d54210d490ef335b13fc1fcac817525120c8c46b/security/sandbox/linux/Sandbox.cpp#513

It's crashing because of some syscalls, which we can whitelist, but it's from gtk_init, which we may not want if it all possible. Looking at the callers, argh:

Sandbox: frame #07: gtk_parse_args (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0 + 0x22f5cb)
Sandbox: frame #08: gtk_init_check (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0 + 0x22f686)
Sandbox: frame #09: gtk_init (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0 + 0x22f6c9)
Sandbox: frame #10: gfxPlatformGtk::gfxPlatformGtk() (/home/morbo/hg/firefox/gfx/thebes/gfxPlatformGtk.cpp:80)
Sandbox: frame #11: gfxPlatform::Init() (/home/morbo/hg/firefox/gfx/thebes/gfxPlatform.cpp:936)
Sandbox: frame #12: gfxPlatform::GetPlatform() (/home/morbo/hg/firefox/gfx/thebes/gfxPlatform.cpp:511)
Sandbox: frame #13: mozilla::FFmpegRuntimeLinker::Init() (/home/morbo/hg/firefox/dom/media/platforms/ffmpeg/FFmpegRuntimeLinker.cpp:60)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #2)

It's crashing because of some syscalls, which we can whitelist, but it's from gtk_init, which we may not want if it all possible. Looking at the callers, argh:

side question then.

So we don't want to ever be able to call gfxPlatform::GetPlatform() from the RDD?

On Linux this is used to determine if we can use a HW decoder or not.

Sandbox: frame #07: gtk_parse_args (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0 + 0x22f5cb)
Sandbox: frame #08: gtk_init_check (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0 + 0x22f686)
Sandbox: frame #09: gtk_init (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0 + 0x22f6c9)
Sandbox: frame #10: gfxPlatformGtk::gfxPlatformGtk() (/home/morbo/hg/firefox/gfx/thebes/gfxPlatformGtk.cpp:80)
Sandbox: frame #11: gfxPlatform::Init() (/home/morbo/hg/firefox/gfx/thebes/gfxPlatform.cpp:936)
Sandbox: frame #12: gfxPlatform::GetPlatform() (/home/morbo/hg/firefox/gfx/thebes/gfxPlatform.cpp:511)
Sandbox: frame #13: mozilla::FFmpegRuntimeLinker::Init() (/home/morbo/hg/firefox/dom/media/platforms/ffmpeg/FFmpegRuntimeLinker.cpp:60)

This code is only used if wayland is enabled and compiled in. Which I don't believe is the default.

Flags: needinfo?(gpascutto)

This was a compile directly from your WIP branch (and I don't have Wayland). I'll continue investigating.

So we don't want to ever be able to call gfxPlatform::GetPlatform() from the RDD?

Not from RDD, but the gfxPlatform data can be forwarded to RDD from the parent.

Flags: needinfo?(gpascutto)

Ultimately, we will want to do HW decoding in the RDD too on linux.
Right now, all of that is done in the content process, including the loading of GTK.
which is less than ideal.

Ultimately, we will want to do HW decoding in the RDD too on linux.

For X Windows or only for Wayland? IIRC most of the movement I saw there were newer Wayland APIs. In that case we can perhaps avoid the X socket? (I'm not too familiar with Wayland details here...)

Avoiding GTK in RDD is still useful because a ton of (external libraries) get loaded that can all do arbitrary syscalls - makes it much harder to restrict what they can call if you don't know ahead of time what it is.

Do I understand correctly that I should be able to debug this against mozilla-central now by flipping some pref or similar to move the decoding into RDD?

Flags: needinfo?(jyavenard)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #7)

Do I understand correctly that I should be able to debug this against mozilla-central now by flipping some pref or similar to move the decoding into RDD?

yes.

media.rdd-ffmpeg.enabled.

You need to restart firefox for this pref to be acted upon (because of this bug)

Flags: needinfo?(jyavenard)

I guess this is what I am seeing every time I play a video:

Crash report: https://crash-stats.mozilla.org/report/index/04a4675c-b3b8-4ddd-b02d-ff4ed0201026

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(!XRE_IsRDDProcess()) (GFX: Not allowed in RDD process.)

Top 10 frames of crashing thread:

0 libxul.so libxul.so@0x3bac385 
1 libxul.so libxul.so@0x3baa2cf 
2 libxul.so libxul.so@0x5010a2d 
3 libxul.so libxul.so@0x4fd45ba 
4 libxul.so libxul.so@0x4fea668 
5 libxul.so libxul.so@0x2f609a3 
6 libxul.so libxul.so@0x2eed34a 
7 libxul.so libxul.so@0x2eec02c 
8 libxul.so libxul.so@0x2eeb0d5 
9 libxul.so libxul.so@0x2eee8bd 

I get a dozen of these every time, but then video starts playing anyway, using ffmpeg and vaapi (according to intel_gpu_top).

I guess this is what I am seeing every time I play a video:

That's an intentional crash, not a sandbox violation. Arch seems to be using its own build which we have no symbols for so there's no way for us to make sense of that stack.

Can you try a Mozilla build on your configuration, and if it crashes similarly, file a bug? That would help us debug your problem.

Flags: needinfo?(kubrick)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #10)

Arch seems to be using its own build which we have no symbols for so there's no way for us to make sense of that stack.

This is my own build, any tips on the build config to make the backtrace more usable?

With the official firefox nightly, I get this backtrace bp-1cd73931-a2b0-48fb-a372-279790201026 which points to bug 1672558.

Thank you.

Flags: needinfo?(kubrick)

(In reply to Francois Guerraz from comment #12)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #10)

Arch seems to be using its own build which we have no symbols for so there's no way for us to make sense of that stack.

This is my own build, any tips on the build config to make the backtrace more usable?

With the official firefox nightly, I get this backtrace bp-1cd73931-a2b0-48fb-a372-279790201026 which points to bug 1672558.

Thank you.

you must have been setting media.rdd-ffmpeg.enable to true which you shouldn't ; and there's no reason today why you would want to set that pref unless you're a mozilla dev.
And I believe you've been given that exact same information in another bug.

We now understand that ffvpx is already causing problems here: bug 1685463.

Assignee: gpascutto → nobody
Priority: P1 → --
Blocks: RDD
Priority: -- → P5

The gfxPlatformGtk::GetPlatform() RDD crash (caused by gtk_init()) was fixed by Bug 1660336.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.