Firefox content process has a live connection to the X11 server.
Categories
(Core :: Security: Process Sandboxing, defect, P3)
Tracking
()
People
(Reporter: jld, Assigned: gcp)
References
(Blocks 2 open bugs, Regressed 1 open bug)
Details
(Whiteboard: sblc4, qa-not-actionable)
Attachments
(2 files)
Reporter | ||
Comment 1•10 years ago
|
||
Comment 3•10 years ago
|
||
Reporter | ||
Comment 4•10 years ago
|
||
Comment 5•10 years ago
|
||
Comment 6•10 years ago
|
||
Reporter | ||
Comment 7•10 years ago
|
||
Reporter | ||
Comment 8•10 years ago
|
||
Updated•9 years ago
|
Reporter | ||
Comment 10•9 years ago
|
||
Updated•8 years ago
|
Comment 12•8 years ago
|
||
Assignee | ||
Comment 13•8 years ago
|
||
Reporter | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Reporter | ||
Comment 14•7 years ago
|
||
Reporter | ||
Comment 15•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Updated•7 years ago
|
Reporter | ||
Updated•6 years ago
|
Comment 16•5 years ago
|
||
It seems like this is coming up again due to Fission. With the high number of content processes, we run the risk of encountering the 256 Xorg connection limit, which can cause crashes. :jesup has apparently run into this multiple times while dogfooding fission with high tab counts.
What remaining features are currently requiring that we create X connections in every content process? If it's only needed for WebGL, for example, it might be nice to delay creating the Xorg connection until a webpage requests it.
Comment 17•5 years ago
|
||
Bug 788319 may help get rid of GLX, if needed.
Comment 18•5 years ago
|
||
WebGL can also potentially be remoted if need be. Does anyone know what's currently making X connections?
Reporter | ||
Comment 19•5 years ago
|
||
WebGL is most of it and there's work in progress to remote that (see bug 1621762).
We're also initializing GTK in the content process, but we don't use it to draw native widgets directly; as I understand it they're drawn to shared memory and composited. There are still some unnecessary calls to XSync
when that happens, but otherwise normal non-WebGL operation doesn't seem to actually talk to the X server. However, starting content processes in headless mode means no widgets are drawn (most noticeably scroll bars), and the last time I tried that, enabling :spohl's non-native widgets with widget.disable-native-theme-for-content
didn't help. And I don't currently know enough about graphics to say what might be going on there or what it might take to fix it.
Updated•5 years ago
|
Comment 20•5 years ago
|
||
Perhaps we can just switch to calling gtk-parse-args instead of gtk_init() to avoid the display connection?
Updated•5 years ago
|
Reporter | ||
Comment 21•5 years ago
|
||
I tested MOZ_HEADLESS
again, using the same patch as my previous attempt (just the existing headless mode, no gtk_parse_args
), and it seems to work now — scrollbars are drawn (same as non-native + non-headless), and form controls seem to work. Headless content + native GTK widgets continues to not work, but that's expected. And WebGL is broken, but that's also expected.
I confirmed the absence of X connections with a patch to log child processes' sockets before sandbox startup. I also observed that we normally make two connections to the X server per content process (a finding also reported in bug 1635451): one from gtk_init
called by ContentChild::Init
, and the other from XOpenDisplay
called by gfxPlatform::Init
; both are held open for the life of the process.
I haven't tested extensively, but if there aren't other regressions, this seems useful for use cases that don't need (or actively don't want) WebGL, like Tor Browser.
Reporter | ||
Comment 22•5 years ago
|
||
So I did some bisecting, and headless non-native widgets appear to have been fixed in bug 1619664 (specifically the “Decide which theme to use per document” patch), then regressed in bug 1615026, then fixed again in bug 1620246.
The second broken range does look like a coordinate system problem: scroll bars are blank but occupy the expected size, and form controls are displaced and partly cut off by their bounding boxes.
With the original breakage, the absent scroll bars are the narrower GTK width… because it was ignoring the pref and using GTK in the headless case, until bug 1619664 reordered the conditionals. So non-native widgets have probably always worked in headless-content mode.
Updated•5 years ago
|
Comment 23•5 years ago
|
||
Tracking for Fission Nightly milestone M6c because this bug causes crashes with many tabs. This problem affects all Linux users, but Fission makes it worse because we'll have more processes.
The fix is probably a lot of work so we should start soon for Fission.
Updated•5 years ago
|
Reporter | ||
Comment 24•5 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #23)
Tracking for Fission Nightly milestone M6c because this bug causes crashes with many tabs. This problem affects all Linux users, but Fission makes it worse because we'll have more processes.
The fix is probably a lot of work so we should start soon for Fission.
There are two problems here:
- Remove X11 connections from content processes in all cases, for security, which is this bug
- Avoid X11 connections in the common case, to avoid exhausting resource limits, which is bug 1635451
This bug has two main blockers at this point:
- WebGL, which needs to be either remoted (bug 1607940) or changed use DRI directly of using X (bug 788319, if I understand it correctly)
- Flash, which will be removed eventually but probably not before Fission is intended to ship
The first one of those has two potential solutions actively being worked on; I don't know how close either one is to being shippable. The second is more difficult; the NPAPI code has a number of dependencies on the content process being an X client (search nsPluginInstanceOwner.cpp
for uses of Display*
and Window
for some examples; see also bug 1548475) and I have no idea how feasible it would be to change that.
If Fission Nightly is intended to ship before we remove support for Flash, then it will likely be necessary to figure out a smaller-scoped solution for bug 1635451.
Comment 25•5 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #24)
There are two problems here:
- Remove X11 connections from content processes in all cases, for security, which is this bug
- Avoid X11 connections in the common case, to avoid exhausting resource limits, which is bug 1635451
Sounds like this bug might not need to block Fission Nightly, but bug 1635451 (avoiding X11 in the common case) would be good to fix before Nightly. I'll move this bug to Fission milestone M7 (Beta) so we continue to track this work without blocking Fission Nightly.
- Flash, which will be removed eventually but probably not before Fission is intended to ship
...
If Fission Nightly is intended to ship before we remove support for Flash, then it will likely be necessary to figure out a smaller-scoped solution for bug 1635451.
We currently plan to remove Flash support in Firefox 84 (in Nightly in October, Release in December). If Fission is indeed ready to roll out to (some percentage of) Nightly users before October, we can disable Flash for those Fission users even before Flash EOL in Fx84. Flash is going away soon and we don't want any extra Flash work to delay Fission development or rollout.
Updated•5 years ago
|
Comment 26•5 years ago
|
||
As :cpeterson mentioned, we explicitly have no intention of supporting Flash with Fission enabled, so it's OK to land code which breaks NPAPI when Fission is enabled.
Comment 27•5 years ago
|
||
I've been told there are plans to implement WebGL remoting in Firefox 80 (July) or 81 (August). Tracking for Fission Nightly M6c.
Comment 28•5 years ago
|
||
To help alleviate this bug, we only need to disable Flash for Fission Linux users (bug 1642463), not all users on all platforms (bug 1455897).
Updated•4 years ago
|
Assignee | ||
Comment 29•4 years ago
|
||
Pieces of this are slowly getting into place. If you enable all of these:
- widget.disable-native-theme-for-content (https://bugzilla.mozilla.org/show_bug.cgi?id=1411425)
- webgl.out-of-process
- layers.gpu-process.enabled (https://bugzilla.mozilla.org/show_bug.cgi?id=1653444)
- security.sandbox.content.headless (https://bugzilla.mozilla.org/show_bug.cgi?id=1640345)
- widget.remote-look-and-feel (in progress in bug 1503054 and bug 1470983)
- ...and don't have Flash.
Then in theory you shouldn't have an X connection. So it's a matter of stabilizing/finishing and shipping these features.
Assignee | ||
Updated•4 years ago
|
Comment 30•4 years ago
|
||
I think we should remove all unused / legacy X11 code from the codebase as I think the only way forward for HW rendering is GL rendering. So we should support only SW fallback and GL. Karl, what do you think? Do you agree to remove the stuff like XRender, cairo XLib surfaces?
Comment 31•4 years ago
|
||
(In reply to Martin Stránský [:stransky] from comment #30)
... Do you agree to remove the stuff like XRender, cairo XLib surfaces?
I personally can't agree more, but it would great to have some numbers about affected users - apparently there are still some out there, e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1280795#c6
Comment 32•4 years ago
|
||
https://www.scm.com/doc/Installation/Remote_GUI.html
OpenGL with X11 can run in two different modes: direct rendering and indirect rendering. The difference between them is that indirect rendering uses the GLX protocol to relay the OpenGL commands from the program to the hardware, which limits OpenGL capabilities to OpenGL 1.4.
My naive understanding:
When using X over the network (X11 forwarding), you can choose between two GLX modes: Direct Rendering on the remote computer you are accessing, or Indirect Rendering on your computer by transmitting GLX commands over the network. Indirect rendering supports OpenGL 1.4 at maximum, and Firefox anyway wants to move to EGL (bug 788319). WebRender requests GLX with OpenGL 3.2 or EGL with OpenGL 3.1, so in any case direct rendering (or software rendering) has to be used and the end result be uploaded over the network.
From reading on Bugzilla, XRender seems to help with scroll performance of such previously painted Basic compositor tiles(??). WebRender doesn't have such tiles by default and it seems the external compositor trait couldn't be implemented for X11 as you can't composite all surfaces "with a single atomic transaction" (bug 1617498 comment 0).
Regular remote desktop tools use video codecs, but X seems to transmit in raw or when using SSH compression (-C
) still half the size of raw.(?)
https://stackoverflow.com/questions/12977879/ssh-compression-for-x11-forwarding
X11 graphics take up a lot of bandwidth. If your remote host is some distance away (i.e. not on the LAN), then you'll probably suffer sluggishness in your exported X11 applications.
Instead of interacting with the GUI using X11 forwarding, consider something else that has better optimization/compression, such as VNC or NX/FreeNX.
Actually well written X11 programs don't comsume a lot of bandwidth. The main issue with X11 is the abysmal badly written Xlib which performs a full synchronous roundtrip for almost every operation, while many operations could be performed asynchronously. If using another X11 protocol backend, like Xcb and not doing moronic things like rasterizing the whole GUI on the client and then just blitting the damn thing over the network, plain X11 over TCP over a low bandwidth, high latency connection is actually very usable. However many programs are written badly.
https://superuser.com/questions/1217280/why-is-x11-forwarding-so-inefficient
X11, by default, doesn't do any compression on the network data that gets passed between the application and the display server. As you mentioned, you can use the -C option on SSH to enable compression, and while this does help, it's not optimized for compressing graphical data. Compared to formats like H.264 which can obtain compression rates of 100 to 1, the -C compression will be lucky to achieve 2 to 1 compression. A better solution is to use a graphics or video optimized codec for the X11 video, but we have to be careful not to make it too lossy as desktops generally need to have sharper images than a movie so the user can still clearly read the text and make out fine details.
https://unix.stackexchange.com/questions/286691/more-efficient-x-forwarding
https://superuser.com/questions/1400952/x11-forwarding-and-efficiency
- Over time this changed, programs started doing the rendering by themselves, and many of those instructions became just "here's a bitmap which I already rendered, put this on screen" – very fast locally, but very slow over the network due to X11 lacking any image compression.
This means programs built using modern toolkits are much slower over networked X11, even if it's something as basic as antialiased fonts.
(In contrast, RDP has adapted over time and includes various forms of image compression such as JPEG and even H.264; you can often notice the compression artifacts while the full image is loading.)
- Fortunately, most UI toolkits such as GTK implement damage tracking so only updated regions are re-sent. However, a few programs (such as several Firefox/Thunderbird versions), don't support this and request a complete re-render of the entire window, even if it hasn't really updated.
That's another difference between programs – well-behaving ones are still quite usable over the network, but buggy ones can saturate a 100 Mbps link doing absolutely nothing useful.
To 3: Partial present seems to work with Basic. Was it broken in February 2019 or just not enough in regard to scroll performance?
Anyway, if users choose X11 forwarding, they consciously or unconsciously prefer a technically bad solution over more performant and more secure ones (regular remote desktop solutions).
Comment 33•4 years ago
|
||
I think maintenance cost of Xrender as deprecated and technically inferior edge case is unjustifiable in the light of Linux user benefit, marketshare and Mozilla's aims. We regular Linux users don't profit from it at all. Those few commercial thin client users who want to keep absolutely best performance should invest into using regular remote desktop solutions that use video codecs and are more secure. If even RedHat is no longer interested in maintaining deprecated Xrender, then why should poor Mozilla bear the costs/risks?
Comment 34•4 years ago
|
||
I would agree with the observation that GL is preferred over XRender for HW rendering.
I assumed the XRender situation would already be poorly supported or at least inefficient with e10s, so I'm surprised that it is still used successfully.
I'm not able to weigh in on the VNC / X11-forwarding argument, but I doubt we could justify the maintenance burden of XRender just for X11 forwarding, particularly if VNC is a viable alternative.
We should keep some fallback for when neither GL nor XShm is available, which may mean keeping some basic xlib surface concept in the parent process.
Assignee | ||
Comment 35•4 years ago
|
||
I think we should remove all unused / legacy X11 code from the codebase as I think the only way forward for HW rendering is GL rendering.
I don't understand the relevance to this bug? I think you want a separate bug about removing it for HW rendering, but that doesn't seem to be the only case for which Firefox currently uses the X connection.
Comment 36•4 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #35)
I think we should remove all unused / legacy X11 code from the codebase as I think the only way forward for HW rendering is GL rendering.
I don't understand the relevance to this bug? I think you want a separate bug about removing it for HW rendering, but that doesn't seem to be the only case for which Firefox currently uses the X connection.
The relevance is that when we're fine to draw Gtk widgets to image surfaces it may be possible to render native theming without X connection as we use GtkStyles for that and create the widgets from widget path. I'll look at it. Anyway, the connection from gtkPlatform was removed in Bug 1657315 so there's only one left (when GL is not used).
Comment 37•4 years ago
|
||
(In reply to Martin Stránský [:stransky] from comment #36)
The relevance is that when we're fine to draw Gtk widgets to image surfaces it may be possible to render native theming without X connection as we use GtkStyles for that and create the widgets from widget path. I'll look at it. Anyway, the connection from gtkPlatform was removed in Bug 1657315 so there's only one left (when GL is not used).
Hm, looks like we can't create GtkStyleContext without a display connection / screen so we can't use Gtk to render in headless mode.
Reporter | ||
Updated•4 years ago
|
Comment 38•4 years ago
|
||
Nika doesn't think removing 100% of X11 from the content process needs to block Fission MVP. Non-native theming will remove the biggest use of X11 in content processes and that's probably good enough for Fission MVP.
Updated•4 years ago
|
Comment 40•3 years ago
|
||
Hey Jed,
Can you still reproduce this issue or should we close it?
Reporter | ||
Comment 41•3 years ago
|
||
Yes, this is still a live issue. It's blocked on bug 1638466, which needs more work to deal with GPU driver issues as I understand it.
(Note that bug 1635451 avoids keeping the X11 connection open if it's not needed, but access is still allowed.)
Updated•3 years ago
|
Reporter | ||
Comment 43•3 years ago
|
||
Background: The X11 protocol has a very permissive security model;
clients have essentially full access to the windows of other clients,
and to global resources like input devices. Previously, our sandbox
policy for content processes needed to allow access to the X server;
this limited its effectiveness against a dedicated attacker.
This patch turns on the security.sandbox.content.headless
pref added
in bug 1640345, which removes the sandbox policy rules that allowed
making new X11 connections, as well as opening the Xauthority file,
reading hardware info needed by Mesa, etc. It also runs content
processes in headless mode (whence the name) so they won't connect to a
display server at startup.
This also removes access to the Wayland compositor: the sandbox policy
never allowed that (as of when socket connections became default-deny),
but now content processes won't connect to it at startup. Wayland is
more capability-oriented so this is less significant for security, but at
a minimum it removes unnecessary attack surface.
Note that if the webgl.out-of-process
pref is turned off, WebGL
will break unless security.sandbox.content.headless
is also turned
off. (Similarly, widget.non-native-theme.enabled
is needed to render
scrollbars and form controls in content.) As a result, this patch
adjusts the job definitions used by CI to test in-process WebGL so that
that they will continue to work.
Comment 44•3 years ago
|
||
Comment 45•3 years ago
|
||
bugherder |
Assignee | ||
Comment 46•3 years ago
|
||
Release Note Request (optional, but appreciated)
[Why is this notable]: Important improvement in security for majority of Linux users (on older distros).
[Affects Firefox for Android]: No.
[Suggested wording]: The Linux sandbox was strengthened: processes exposed to web content no longer have access to the X Window system (X11).
[Links (documentation, blog post, etc)]: TBD - May blog about this depending on win32k lockdown rollout.
Comment 47•3 years ago
|
||
Release note added, please see https://www.mozilla.org/firefox/99.0beta/releasenotes/
Updated•3 years ago
|
Description
•