Closed
Bug 1506776
Opened 6 years ago
Closed 6 years ago
17.78% Base Content Resident Unique Memory (osx-10-10) regression on push 22d335fc020fb0509c67c4a945db21a27b8332d4 (Sat Nov 10 2018)
Categories
(Core :: Security: Process Sandboxing, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla65
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox63 | --- | unaffected |
firefox64 | --- | unaffected |
firefox65 | + | fixed |
People
(Reporter: igoldan, Assigned: haik)
References
Details
(Keywords: perf, regression)
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
Details |
We have detected an awsy regression from push:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=e51037702c9e8febad9b39f219fec3f22a6d7f26&tochange=22d335fc020fb0509c67c4a945db21a27b8332d4
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
18% Base Content Resident Unique Memory osx-10-10 opt stylo 22,576,298.67 -> 26,590,549.33
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=17483
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests
Reporter | ||
Updated•6 years ago
|
Component: General → Security: Process Sandboxing
Product: Testing → Core
Reporter | ||
Updated•6 years ago
|
Flags: needinfo?(haftandilian)
Comment 1•6 years ago
|
||
This is a pretty huge regression from a Fission Memshrink point of view, I think we might need to backout.
Assignee | ||
Comment 2•6 years ago
|
||
I'll be looking into this. I first need to understand at what point in content process startup do we snapshot the memory size. Starting the sandbox earlier (which is enabled with bug 1501126) is going to use memory earlier in the content process startup than before.
Comment 3•6 years ago
|
||
The diff is pretty weird. 'explicit', 'resident', and 'vsize' went down but 'resident-unique' went up.
> Web Content (pid NNN)
> Explicit Allocations
>
> -1.82 MB (100.0%) -- explicit
> ├──-1.77 MB (96.98%) ── heap-unclassified
> ├──-0.06 MB (03.38%) -- heap-overhead
> │ ├──-0.09 MB (04.94%) ── page-cache [2]
> │ ├───0.03 MB (-01.55%) ── bin-unused [2]
> │ └───0.00 MB (-00.01%) ── bookkeeping [2]
> └───0.01 MB (-00.36%) ++ (3 tiny)
>
> Other Measurements
>
> 0 (100.0%) ++ extensions
>
> -1.82 MB (100.0%) -- heap-committed
> ├──-1.76 MB (96.62%) ── allocated [2]
> └──-0.06 MB (03.38%) ── overhead [2]
>
> 0.01 MB (100.0%) -- js-main-runtime
> ├──0.01 MB (101.12%) -- zones
> │ ├──0.01 MB (114.29%) ── type-pool [2]
> │ ├──-0.00 MB (-15.29%) -- shapes
> │ │ ├──-0.00 MB (-14.29%) ── malloc-heap/tree-kids [2]
> │ │ └──-0.00 MB (-01.00%) ++ gc-heap
> │ └──0.00 MB (02.12%) ── unused-gc-things [2]
> └──-0.00 MB (-01.12%) ── realms/classes/objects/gc-heap [2]
>
> 0.00 MB (100.0%) -- js-main-runtime-gc-heap-committed
> ├──0.00 MB (100.0%) -- unused/gc-things
> │ ├──0.00 MB (100.0%) ── objects [2]
> │ ├──0.00 MB (100.0%) ── shapes [2]
> │ └──0.00 MB (100.0%) ── base-shapes [2]
> └──-0.00 MB (100.0%) -- used/gc-things
> ├──-0.00 MB (100.0%) ── objects [2]
> ├──-0.00 MB (100.0%) ── shapes [2]
> └──-0.00 MB (100.0%) ── base-shapes [2]
>
> -1.76 MB ── heap-allocated [2]
> 2,282 ── page-faults-soft [2]
> -6.25 MB ── resident [2]
> -4.48 MB ── resident-peak [2]
> 7.03 MB ── resident-unique [2]
> -25.96 MB ── vsize [2]
Updated•6 years ago
|
status-firefox63:
--- → unaffected
status-firefox64:
--- → unaffected
status-firefox65:
--- → affected
status-firefox-esr60:
--- → unaffected
tracking-firefox65:
--- → +
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → haftandilian
Flags: needinfo?(haftandilian)
Priority: -- → P1
Comment 4•6 years ago
|
||
How much do we know about the services we're blocking access to with the early sandbox startup? Could there be something that normally accesses shared memory but falls back to making its own copy when that fails? If it's something like that, comparing the output of vmmap(1) before/after could help find what's different.
Assignee | ||
Comment 5•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #4)
> How much do we know about the services we're blocking access to with the
> early sandbox startup? Could there be something that normally accesses
> shared memory but falls back to making its own copy when that fails? If
> it's something like that, comparing the output of vmmap(1) before/after
> could help find what's different.
Yes, that's definitely possible and I think I've come across some instances of that before--where without access to a service, the process creates its own instances of things. Once I can reproduce this, vmmap is first on my list of things to check.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 6•6 years ago
|
||
I've found that adding back access to the coreservicesd daemon via com.apple.CoreServices.coreservicesd improves the "Base Content Resident Unique Memory" measurement bringing it back in line with the measurement prior to enabling early sandbox startup.
This compares earlyinit=true (base) with earlyinit=true + coreservicesd:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3f0e50d050e7007f8b15fb1f6f2434a2f3b45f49&newProject=try&newRevision=5c7d54b858be6213be146661bd752fcc095a960b&framework=4
This compares earlyinit=false (base) with earlyinit=true + coreservicesd:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=28035db7d9a0&newProject=try&newRevision=5c7d54b858be6213be146661bd752fcc095a960b&framework=4
coreservicesd is sandboxed, but unfortunately its sandbox policy is highly permissive allowing global filesystem read access, filesystem write access to temp and cache dirs, outbound network access and more. If we added this back, it would still be a win to be dropping access to the unsandboxed dock daemon along with distnoted.
I'm going to continue debugging to try to get a better understanding of what's happening and to see if we can realize the same memory savings without granting access to coreservicesd.
Assignee | ||
Comment 7•6 years ago
|
||
I have not found a workaround for enabling CoreServices so I plan to add back CoreServices to the content policy and file a new bug to cover investigating this more. See below for some debugging details.
The memory impact of blocking CoreServices seems to be less on newer OS versions. To investigate this further, I think we should compare the memory impact and possibly turn on blocking just for newer OS versions. In addition, tracing all allocations and trying to see where/why we do the additional allocations when CoreServices is blocked may lead to some discovery.
On 10.10, comparing vmmap output between an about:blank content process without allowing CoreServices vs allowing Coreservices, the total writable regions is larger by ~4 MB when CoreServices is blocked which is consistent with the difference flagged by the awsy base test "Base Content Resident Unique Memory" (ab on try). I don't have an explanation for why just the awsy base test regressed, but not the "after tabs open" test. More debugging needed.
There are also some memory mapped files that I only see when CoreServices is allowed, but they are small and don't appear to be causing the difference.
Diff of vmmap output:
*** vmmap-base.51799
--- vmmap-with-core-services.51862
! ==== Summary for process 51799
ReadOnly portion of Libraries: Total=345.5M resident=0K(0%) swapped_out_or_unallocated=345.5M(100%)
! Writable regions: Total=61.4M written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=61.4M(100%)
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
CG image 4K 2
CG shared images 32K 2
! CoreServices 500K 14
--
! ==== Summary for process 51862
ReadOnly portion of Libraries: Total=345.5M resident=0K(0%) swapped_out_or_unallocated=345.5M(100%)
! Writable regions: Total=56.7M written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=56.7M(100%)
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
CG image 4K 2
CG shared images 32K 2
! CoreServices 128K 4
I looked into trying to avoid triggering the connection to CoreServices. We first connect in ContentChild::Init() when we call SetProcessName(). With that code removed, we then try to connect in nsURLHelperOSX.cpp calling ::FSGetVolumeInfo() for some HFS-specific path checks. And with that code removed, we try to connect when calling [NSApplication sharedApplication] from nsAppShell::Init().
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0 libxpc.dylib`bootstrap_look_up3()
frame #1 libxpc.dylib`bootstrap_look_up2()
frame #2 CarbonCore`SCClientSession::checkinWithServer()
frame #3 CarbonCore`connectToCoreServicesD()
frame #4 CarbonCore`getStatus()
frame #5 CarbonCore`getSystemSession()
frame #6 CarbonCore`scCreateSystemCacheable()
frame #7 CarbonCore`___ZL17_CLV_GetCacheablev_block_invoke()
frame #8 libdispatch.dylib`_dispatch_client_callout()
frame #9 libdispatch.dylib`_dispatch_once_callout()
frame #10 CarbonCore`_CLV_GetCacheable()
frame #11 CarbonCore`_standalone_CopyCheckFixForBundleID()
frame #12 CarbonCore`GetBugsForOurBundleIDFromCoreservicesd()
frame #13 CarbonCore`_CSCheckFix()
frame #14 SkyLight`initDisplayState()
frame #15 SkyLight`initDisplayMappings()
frame #16 SkyLight`__SLSInitialize_block_invoke()
frame #17 libdispatch.dylib`_dispatch_client_callout()
frame #18 libdispatch.dylib`_dispatch_once_callout()
frame #19 SkyLight`CGS_CHECK_INIT()
frame #20 SkyLight`SLSMainConnectionID()
frame #21 CoreFoundation`_CFAppSleepSetupCoreGraphics()
frame #22 CoreFoundation`____CFRunLoopSetOptionsReason_block_invoke_2()
frame #23 libdispatch.dylib`_dispatch_client_callout()
frame #24 libdispatch.dylib`_dispatch_once_callout()
frame #25 CoreFoundation`__CFRunLoopSetOptionsReason()
frame #26 LaunchServices`_LSApplicationCheckIn()
frame #27 HIServices`_RegisterApplication()
frame #28 HIServices`GetCurrentProcess()
frame #29 HIToolbox`MenuBarInstance::GetAggregateUIMode()
frame #30 HIToolbox`MenuBarInstance::IsVisible()
frame #31 AppKit`_NSInitializeAppContext()
frame #32 AppKit`-[NSApplication init]()
frame #33 AppKit`+[NSApplication sharedApplication]()
frame #34 XUL`nsAppShell::Init() at nsAppShell.mm:337
frame #35 XUL`nsAppShellInit() at nsAppShellSingleton.h:45
frame #36 XUL`nsComponentManagerImpl::KnownModule::Load() at nsComponentManager.cpp:811
frame #37 XUL`nsFactoryEntry::GetFactory() at nsComponentManager.cpp:1811
frame #38 XUL`nsComponentManagerImpl::CreateInstance() at nsComponentManager.cpp:1048
frame #39 XUL`nsComponentManagerImpl::GetService() at nsComponentManager.cpp:1293
frame #40 XUL`nsGetServiceByCID::operator()() at nsComponentManagerUtils.cpp:56
frame #41 XUL`nsGetServiceByCID::operator()() at nsComponentManagerUtils.cpp:253
frame #42 XUL`nsCOMPtr_base::assign_from_gs_cid() at nsCOMPtr.cpp:73
frame #43 XUL`XRE_RunAppShell() at nsCOMPtr.h:660
frame #44 XUL`XRE_RunAppShell() at nsCOMPtr.h:657
frame #45 XUL`XRE_RunAppShell() at nsEmbedFunctions.cpp:911
frame #46 XUL`MessageLoop::Run() at message_loop.cc:325
frame #47 XUL`MessageLoop::Run() at message_loop.cc:318
frame #48 XUL`MessageLoop::Run() at message_loop.cc:298
frame #49 XUL`XRE_InitChildProcess() at nsEmbedFunctions.cpp:780
frame #50 plugin-container`main() at plugin-container.cpp:50
frame #51 plugin-container`main() at MozillaRuntimeMain.cpp:25
frame #52 libdyld.dylib`start()
frame #53 libdyld.dylib`start()
Assignee | ||
Comment 8•6 years ago
|
||
Add back access to com.apple.CoreServices.coreservicesd to avoid extra allocations that are triggered when the service is blocked.
Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/823acff9cd2f
17.78% Base Content Resident Unique Memory regression r=Alex_Gaynor
Comment 10•6 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Assignee | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•