Closed
Bug 1384336
Opened 7 years ago
Closed 7 years ago
Avoid running the appshell in content processes
Categories
(Core :: DOM: Content Processes, enhancement)
Core
DOM: Content Processes
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: billm, Assigned: billm)
References
Details
Attachments
(2 files, 3 obsolete files)
(deleted),
patch
|
bugzilla
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
Having to run the appshell in content processes makes Quantum DOM a lot harder. It's also mostly useless, and it probably is a small drag on our performance. And it makes sandboxing harder (see bug 1381022).
I'm trying to make this bug pretty tightly scoped: rather than remove the appshell entirely from the content process, I just want to avoid calling the nsIThreadObserver callbacks for it. That will allow some of the widget-specific code that gets initialized in appshell to continue working. But it means we won't be using OS-specific event waiting mechanisms, which is what I really want to avoid.
Assignee | ||
Comment 1•7 years ago
|
||
How does this seem to you, Olli?
Attachment #8890124 -
Flags: review?(bugs)
Assignee | ||
Comment 2•7 years ago
|
||
Our accessibility code wants relies on us being in an "alertable state" (as defined by Win32) while processing events since that allows us to process APCs. I tried getting rid of the APC mechanism entirely and just using events, but that caused deadlocks if the main thread is blocked in EnsureMTA. So this patch does both, as we discussed at the workweek, Aaron.
If we decide to go this route, I think there's a bit more cleanup that could be done. However, I want to get some feedback before doing that.
These patches are green on Linux and Windows. I haven't tried Mac yet.
Attachment #8890125 -
Flags: feedback?(aklotz)
Comment 3•7 years ago
|
||
Comment on attachment 8890124 [details] [diff] [review]
don't register thread observer in child process
Is this ok for all the non-parent processes?
I think someone familiar with our plugin processes and gpu processes should take a look too.
But if we can do this, great!
Attachment #8890124 -
Flags: review?(bugs) → review+
Comment 4•7 years ago
|
||
Comment on attachment 8890125 [details] [diff] [review]
accessibility changes
Review of attachment 8890125 [details] [diff] [review]:
-----------------------------------------------------------------
This looks great, Bill!
FYI, with respect to additional cleanup work and a11y, we can do some stuff in MessageChannel::WaitForSyncNotifyWithA11yReentry once the main event loop changes are done.
Attachment #8890125 -
Flags: feedback?(aklotz) → feedback+
Comment 5•7 years ago
|
||
(In reply to Olli Pettay [:smaug] (vacation-ish until July 30) from comment #3)
> Is this ok for all the non-parent processes?
>
> I think someone familiar with our plugin processes and gpu processes should
> take a look too.
I can't comment on gpu or GMP processes, but we probably can't in the NPAPI plugin container. I suspect that Flash assumes a fully-fledged Windows message pump in there.
Comment 6•7 years ago
|
||
(Unless we're already using something different down there... I can't remember)
Comment 7•7 years ago
|
||
(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #4)
> Comment on attachment 8890125 [details] [diff] [review]
> accessibility changes
> FYI, with respect to additional cleanup work and a11y, we can do some stuff
> in MessageChannel::WaitForSyncNotifyWithA11yReentry once the main event loop
> changes are done.
I filed bug 1385014 for this.
Assignee | ||
Comment 8•7 years ago
|
||
OK, here's a cleaner patch.
Attachment #8890125 -
Attachment is obsolete: true
Attachment #8891509 -
Flags: review?(aklotz)
Updated•7 years ago
|
Attachment #8891509 -
Flags: review?(aklotz) → review+
Assignee | ||
Comment 9•7 years ago
|
||
Ugh, this is totally orange on Mac.
Comment 10•7 years ago
|
||
I don't know the appshell code very well, but please let me know if there's anyway I can help out with the mac bits.
Assignee | ||
Comment 11•7 years ago
|
||
This patch is similar to the previous one: it disables the appshell's event loop code in the content process. However, I made it pref-controlled. I also needed some extra changes for Mac, since the MacOS appshell code is a little different.
Stephen, could you review this? The main question here is whether there's any need to run a native event loop in the content process on Macs.
Attachment #8890124 -
Attachment is obsolete: true
Attachment #8893563 -
Flags: review?(spohl.mozilla.bugs)
Comment 12•7 years ago
|
||
Comment on attachment 8893563 [details] [diff] [review]
don't register thread observer in child process, v2
(In reply to Bill McCloskey (:billm) from comment #11)
> The main question here is whether there's
> any need to run a native event loop in the content process on Macs.
Forwarding to mstange to make sure.
Attachment #8893563 -
Flags: review?(spohl.mozilla.bugs) → review?(mstange)
Comment 13•7 years ago
|
||
Things that would need the native event loop in the content process are:
(1) Callers that use [someObjCObject performSelector:@selector(someSelector) afterDelay:someDelay].
(2) Notification observers, e.g. things that listen for changes to system settings.
I don't see either of those things used in any code that runs in the content process. That's great! I was expecting more problems here.
Looking through nsAppShell.mm, I also noticed two other things: HangReporter integration, and MacWakeLockListener. We need to make sure that this patch doesn't cause hang reporting to detect the content process as hanging all the time. And the fact that we currently set up MacWakeLockListeners in the content process is probably just a bug.
I also checked what kinds of events go through -[GeckoNSApplication sendEvent:] in the content process at the moment. I've only seen two event types: NSApplicationDefined events, which we use internally to wake up the event loop, and NSSystemDefined events which seem to be firing system-wide when certain things happen. For some of them I wasn't able to determine a cause. Some seem to fire whenever you use the dictionary popup anywhere on the system.
I'm excited about this change and I think it's time that we made it. Sometimes I see overhead from the native event loop in profiles, and somethings things related to -[NSApplication setWindowsNeedUpdate:] which I don't understand, and this change should reduce that overhead.
I've briefly tested this patch now, and the only problem I've found is that Activity Monitor now says our child processes are "Not Responding". We should fix that. There's a Chrome bug about a similar problem with some useful information, it gets interesting around here: https://bugs.chromium.org/p/chromium/issues/detail?id=304860#c21
Comment 14•7 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #13)
> Looking through nsAppShell.mm, I also noticed two other things: HangReporter
> integration, and MacWakeLockListener. We need to make sure that this patch
> doesn't cause hang reporting to detect the content process as hanging all
> the time. And the fact that we currently set up MacWakeLockListeners in the
> content process is probably just a bug.
Looking at the patch again, I can see that you're not changing anything about the MacWakeLockListener situation, so let's not worry about that.
Comment 15•7 years ago
|
||
In bug 1386308 I removed the use of MacWakeLockListener in content.
Assignee | ||
Comment 16•7 years ago
|
||
I'm afraid I didn't understand that Chrome bug. Is there something we could do to fix it?
Comment 17•7 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #15)
> In bug 1386308 I removed the use of MacWakeLockListener in content.
Oh, good.
(In reply to Bill McCloskey (:billm) from comment #16)
> I'm afraid I didn't understand that Chrome bug. Is there something we could
> do to fix it?
I think we need to find out where we trigger a call to _RegisterApplication and avoid making that call. The chromium issue points out process renaming as one source of these calls. The general idea is: if the WindowServer has never heard from our process, it won't care about whether we're responding. So we need to make sure that we never communicate with it in the first place.
Comment 18•7 years ago
|
||
Not sure if it'd solve this problem, but bug 1322024 deals with blocking new connections to the windoserver at the sandbox level -- as long as we don't obtain a connection to the windowserver before the sandbox is enabled, then it'd become impossible for us to ever talk to the windowserver. (If you find any places where we're talking to the windowserver, I guess they're blockers for that bug!)
Assignee | ||
Comment 19•7 years ago
|
||
The patch in bug 1322024 doesn't fix the "not responding" problem. It seems like the window server connection is probably created before the sandbox is initialized. I set a breakpoint in _RegisterApplication and saw it being called from SetProcessName [1]. When I commented out that call, I saw it in appshell initialization [2]. I was afraid to comment out that thing since I don't understand what it does.
However, I did find a Chromium patch [3] that shuts down the connection to the window server and prevents all future connections. I tried that and it seemed to work. I've got a try push pending to see how that goes [4]. I'm assuming that would subsume the change to the sandboxing rules, but I don't really know. None of this stuff seems to be documented.
[1] http://searchfox.org/mozilla-central/rev/765cc1b925c5d32d05111c364257a0b79bf2952e/dom/plugins/ipc/PluginUtilsOSX.mm#276
[2] http://searchfox.org/mozilla-central/rev/765cc1b925c5d32d05111c364257a0b79bf2952e/widget/cocoa/nsAppShell.mm#287
[3] https://codereview.chromium.org/673443002/patch/20001/30001
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b4356599ff4d04d3233fad5ca7701913963173c
Comment 20•7 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #19)
> However, I did find a Chromium patch [3] that shuts down the connection to
> the window server and prevents all future connections. I tried that and it
> seemed to work. I've got a try push pending to see how that goes [4]. I'm
> assuming that would subsume the change to the sandboxing rules, but I don't
> really know. None of this stuff seems to be documented.
We'd still want to remove the allowance for windowserver.active from the sandbox rules because that blocks access at the kernel level. I don't know exactly how CGSSetDenyWindowServerConnections works, but from the disassembly it appears to set a process-local flag which could be easily subverted.
It's a shame to have to use private API's, but it seems like a reasonable thing to do in this situation.
I'm not sure it's worth it, but if we wanted to avoid the private API's, we could check if _RegisterApplication still works properly after the sandbox has initialized. If it does, we might be able to refactor our ContentChild setup code so that _RegisterApplication is called after we've blocked connections to the window server in the sandbox and then we wouldn't need to use the private API calls. Chromium intentionally uses the windowserver in their "sandbox warmup code" before enabling the sandbox. Since this is all undocumented behavior, I'm not sure we'd be gaining much though. And explicitly closing the windowserver.active is a good thing to do in case it is ever opened as a side-effect of a different Apple API call / in a new OS version.
--
I think that not using a native event loop will resolve bug 1306663. That's an example of messages coming in on the native event loop trying to register an IPC service in plugin-container. (i.e. an IPC server in plugin-container). We disable registering IPC services in the sandbox rules, but we still get the event loop messages which attempt to do it. And they trigger error messages on the command line console and in the Console app.
Assignee | ||
Comment 21•7 years ago
|
||
What do you think, Markus? I think our alternatives are:
1. Use CGSShutdownServerConnections when we shut down the sandbox.
2. Avoid setting the process name and also figure out how to fix up nsAppShell::Init so it doesn't use the window server. And maybe fix up anything else that contacts the window server after that.
Flags: needinfo?(mstange)
Comment 22•7 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #19)
> I set a breakpoint in _RegisterApplication and saw it being
> called from SetProcessName [1].
Ok, this one matches what the Chrome people encountered in their bug.
> When I commented out that call, I saw it in
> appshell initialization [2]. I was afraid to comment out that thing since I
> don't understand what it does.
This initializes the process-wide NSApplication instance. NSApplication manages the event loop and any open windows. Since we don't need either of those in the content process, ideally we wouldn't ever need to create NSApplication.
I'd prefer not using anything of the Mac nsAppShell in the content process, instead of initializing some of it and then never calling back into its code. But I don't know how much work that would be.
> However, I did find a Chromium patch [3] that shuts down the connection to
> the window server and prevents all future connections. I tried that and it
> seemed to work.
Interesting. So maybe CGSShutdownServerConnections() effectively tells the window server to stop monitoring this process for responsiveness.
(In reply to Bill McCloskey (:billm) from comment #21)
> What do you think, Markus? I think our alternatives are:
> 1. Use CGSShutdownServerConnections when we shut down the sandbox.
> 2. Avoid setting the process name and also figure out how to fix up
> nsAppShell::Init so it doesn't use the window server. And maybe fix up
> anything else that contacts the window server after that.
2 certainly seems like the more prudent thing to do, but 1 is so much simpler. And since Chrome is using that API, they must have determined that it's necessary. So if we have to use it anyway, we might as well use it to save us some work. Let's go with 1.
Flags: needinfo?(mstange)
Assignee | ||
Comment 23•7 years ago
|
||
Attachment #8893563 -
Attachment is obsolete: true
Attachment #8893563 -
Flags: review?(mstange)
Attachment #8895537 -
Flags: review?(mstange)
Comment 24•7 years ago
|
||
Comment on attachment 8895537 [details] [diff] [review]
don't register thread observer in child process, v3
Review of attachment 8895537 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/ipc/ContentChild.cpp
@@ +1480,5 @@
> + // future connections as well. We do this for sandboxing, but it also ensures
> + // that the Activity Monitor will not label the content process as "Not
> + // responding" because it's not running a native event loop. See bug 1384336.
> + CGSSetDenyWindowServerConnections(true);
> + CGSShutdownServerConnections();
So this also runs if dom.ipc.useNativeEventProcessing.content is true? Does that work?
Attachment #8895537 -
Flags: review?(mstange) → review+
Comment 25•7 years ago
|
||
Pushed by wmccloskey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4786ec6700d0
Fix to MainThreadInvoker to avoid deadlocks (r=aklotz)
https://hg.mozilla.org/integration/mozilla-inbound/rev/01fd73e4b8b8
Stop using OS-level event loop in content process (r=mstange)
Comment 26•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4786ec6700d0
https://hg.mozilla.org/mozilla-central/rev/01fd73e4b8b8
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment 28•7 years ago
|
||
According to the kick-off meeting with Andrew, stage 2 from https://docs.google.com/document/d/1KYuJu23tzweO4WOceF5Fhle6SQ-PXnfV6NfrGf4kiBM is to be tested by Softvision - to which bug 1384336 refers to.
Our current understanding is that we need to cover testing Firefox with third-party software (accessibility, antivirus).
So, we are planning to test as follows:
Environment:
- Windows : Win 7 x32, Win 7x 64, Win 8.1 x32, Win 8.1 x64, Win 10 x32, Win 10 x64
- Linux : Ubuntu 16.04 x64 , Ubuntu 15.04 x32
- Mac OS X 10.12
Test Scenarios: running FF in the above environments with:
-Mac+ Ubuntu - native firewalls, native screen readers
-Windows: KAV 10.3.0, NVDA
Andrew, Bill could you list that pref. that is mentioned in comment 11 if that's still applicable?
Could you also please advise if there are any variations to the above scenarios or any other scenarios or setups that would be relevant to test against? Is there anything related to content processes that would need to be tested during this test run?
Flags: needinfo?(overholt)
Flags: needinfo?(WJMII)
Comment 29•7 years ago
|
||
I'll let Bill answer.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(overholt)
Flags: needinfo?(WJMII)
Assignee | ||
Comment 30•7 years ago
|
||
The pref is set by default, so it's fine to test a normal Nightly here. The testing setup sounds good, although the more AV/accessibility software we can test with, the better. I think that skipping Windows 8 should be fine. Probably no testing on Linux is needed at all.
Flags: needinfo?(wmccloskey)
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 31•7 years ago
|
||
I'm going to be disabling this for 57 in bug 1384336. I'll try to sort out these regressions and hopefully we can re-enable in 58.
Assignee | ||
Comment 32•7 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #31)
> I'm going to be disabling this for 57 in bug 1384336. I'll try to sort out
> these regressions and hopefully we can re-enable in 58.
That should have been bug 1397956.
Comment 33•6 years ago
|
||
This got re-enabled in bug 1426100.
You need to log in
before you can comment on or make changes to this bug.
Description
•