Perma toolkit/components/glean/tests/browser/browser_fog_utility.js | Cannot start Utility process? -
Categories
(Core :: IPC, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox96 | --- | unaffected |
firefox97 | --- | fixed |
firefox98 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: gerard-majax)
References
Details
(Keywords: regression)
Attachments
(7 files, 1 obsolete file)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-release+
|
Details |
Filed by: alissy [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=364628559&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/IScwqDqtSZKCr75hy0emgg/runs/0/artifacts/public/logs/live_backing.log
UtilityProcess seems broken because of https://searchfox.org/mozilla-central/rev/c3cbf7e56630d12443459a6bd7938358c231ce3b/xpcom/rust/gecko_logger/src/lib.rs#34 breaking like in bug 1746254
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
Comment 2•3 years ago
|
||
Set release status flags based on info from the regressing bug 1731890
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 4•3 years ago
|
||
So I can repro that on a Firefox 64-bits, running on Windows 7 64-bits when I try to run test from bug 1745173. As mentionned in bug 1746254 comment 4, there's a call to ZwDeviceIoControlFile
from BCryptGenRandom
, but interestingly, I do see the call in both the case where the process dies with the sandbox enabled, as well as the case where the process works without the sandbox.
Assignee | ||
Comment 5•3 years ago
|
||
placing a breakpoint on BCryptGenRandom
in utility child process with sandboxing enabled
Assignee | ||
Comment 6•3 years ago
|
||
placing a breakpoint on BCryptGenRandom in utility child process with sandboxing disabled (I had to manually stop, tracing was continuing with the process running)
Assignee | ||
Comment 7•3 years ago
|
||
this one is after:
- adding
WinBuiltinUsersSid
to theUSER_LOCKDOWN
restricted token - allowing a few registry keys related to
bcrypt
diff --git a/security/sandbox/chromium/sandbox/win/src/restricted_token_utils.cc b/security/sandbox/chromium/sandbox/win/src/restricted_token_utils.cc
--- a/security/sandbox/chromium/sandbox/win/src/restricted_token_utils.cc
+++ b/security/sandbox/chromium/sandbox/win/src/restricted_token_utils.cc
@@ -164,18 +164,20 @@ DWORD CreateRestrictedToken(HANDLE effec
if (use_restricting_sids) {
restricted_token.AddRestrictingSid(WinRestrictedCodeSid);
if (unique_restricted_sid)
restricted_token.AddRestrictingSid(Sid(unique_restricted_sid));
}
break;
}
case USER_LOCKDOWN: {
+ sid_exceptions.push_back(WinBuiltinUsersSid);
restricted_token.AddUserSidForDenyOnly();
if (use_restricting_sids) {
+ restricted_token.AddRestrictingSid(WinBuiltinUsersSid);
restricted_token.AddRestrictingSid(WinNullSid);
if (unique_restricted_sid)
restricted_token.AddRestrictingSid(Sid(unique_restricted_sid));
}
break;
}
default: { return ERROR_BAD_ARGUMENTS; }
}
diff --git a/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp b/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
--- a/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
+++ b/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
@@ -1325,16 +1325,28 @@ bool SandboxBroker::SetSecurityLevelForU
// Add the policy for the client side of the crash server pipe.
result = mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_FILES,
sandbox::TargetPolicy::FILES_ALLOW_ANY,
L"\\??\\pipe\\gecko-crash-server-pipe.*");
SANDBOX_ENSURE_SUCCESS(
result,
"With these static arguments AddRule should never fail, what happened?");
+ result = mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_REGISTRY,
+ sandbox::TargetPolicy::REG_ALLOW_READONLY,
+ L"HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\Cryptography\\Providers");
+
+ result = mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_REGISTRY,
+ sandbox::TargetPolicy::REG_ALLOW_READONLY,
+ L"HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\Cryptography\\Configuration");
+
+ result = mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_REGISTRY,
+ sandbox::TargetPolicy::REG_ALLOW_READONLY,
+ L"HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\CMF\\Config");
+
switch (aSandbox) {
case mozilla::ipc::SandboxingKind::GENERIC_UTILITY:
// Nothing specific to perform yet?
break;
default:
MOZ_ASSERT(false, "Invalid SandboxingKind");
break;
One of the problem identified so far is that bcrypt!BCryptGenRandom
called by rust, e.g., https://github.com/rust-lang/rust/blob/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093/library/std/src/sys/windows/rand.rs#L10-L14 and it fails, thus hitting the panic code path.
Accessing those registry keys is one of the reasons for BCryptGenRandom
failure.
As mentionned by :toshi
on Matrix, Windows 7 enforces a OBJ_FORCE_ACCESS_CHECK
when performing ZwOpenKey
calls, in cng!BCryptResolveProviders
(cng.sys
). It seems not to be the case on at least Windows 10.
Assignee | ||
Comment 8•3 years ago
|
||
a very nasty first hack: https://treeherder.mozilla.org/jobs?repo=try&revision=7353e71382e45a83e8d06913561df1022d70c346
Assignee | ||
Comment 9•3 years ago
|
||
Depends on D136477
Updated•3 years ago
|
Assignee | ||
Comment 10•3 years ago
|
||
Depends on D136477
Updated•3 years ago
|
Assignee | ||
Comment 11•3 years ago
|
||
It was not really regressed by bug 1731890, but rather it made it very visible. It is mostly certain that bug 1751094 is another facet of that.
Comment 12•3 years ago
|
||
Comment 13•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/98ee4ba4b96f
https://hg.mozilla.org/mozilla-central/rev/e566984baff7
https://hg.mozilla.org/mozilla-central/rev/1917bf0f9ff6
Comment 14•3 years ago
|
||
Any chance we could consider this for a ride along in a point release for 97?
Comment 15•3 years ago
|
||
I was under the impression that bug 1746254 was supposed to be sufficient to work around these crashes on 97. Is that not the case after all?
Assignee | ||
Comment 16•3 years ago
|
||
Assignee | ||
Comment 17•3 years ago
|
||
Depends on D138589
Assignee | ||
Comment 18•3 years ago
|
||
So I lack visibility on the problem on the side of release, but in case it turns out bug 1746254 uplift was not enough, I'm preparing two patches that should uplift this one. Except the UtilityProcess test added here, because UtilityProcess is not a thing on release yet.
This is a try: https://treeherder.mozilla.org/jobs?repo=try&revision=157d13dde76b46543260776ac48db19c1c6c0b26
Assignee | ||
Comment 19•3 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #18)
So I lack visibility on the problem on the side of release, but in case it turns out bug 1746254 uplift was not enough, I'm preparing two patches that should uplift this one. Except the UtilityProcess test added here, because UtilityProcess is not a thing on release yet.
This is a try: https://treeherder.mozilla.org/jobs?repo=try&revision=157d13dde76b46543260776ac48db19c1c6c0b26
mach try auto
against release seems not to work, so here: https://treeherder.mozilla.org/jobs?repo=try&revision=45dad7d42a9730696ea52a6a85d68689eedd1943
Comment 20•3 years ago
|
||
I'm not sure which signatures are associated with the work related here, but I see similar error messages for crashes that seem to point to a continuing problem on release. Some of the signatures we get from crash ping telemetry include -
std::thread::local::fast::Key::try_initialize<T>
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | OpenSystemPreferredAlgorithmProvider
core::ops::function::Fn::call<T> | env_logger::Builder::new
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | bcrypt.pdb@0x14f75
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | mozilla::ipc::NodeController::ForwardEvent
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | bcrypt.pdb@0x14fd5
core::ops::function::Fn::call<T> | je_malloc | moz_xmalloc | _tailMerge_d3dcompiler_47.dll | std::thread::local::fast::Key::try_initialize<T>
core::ops::function::Fn::call<T> | szDynamicDaylightDisabled
These tend to point to Bug 1751094 - Crash in [@ std::thread::local::fast::Key::try_initialize<T>] with "couldn't generate random bytes"
Which was marked resolved fixed as a result of the work here.
These crashes show up in the RDD and Socket processes currently, all have a top frame of RustMozCrash, and the same error message - ' couldn't generate random bytes: The operation completed successfully. (os error 0)'
My guess is the mixed signatures above are the result of oddness in the way we collect stacks for crash ping telemetry.
https://www.mathies.com/mozilla/crashes/rddrelease.html
https://www.mathies.com/mozilla/crashes/sockrelease.html
Thoughts?
Assignee | ||
Comment 21•3 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #20)
I'm not sure which signatures are associated with the work related here, but I see similar error messages for crashes that seem to point to a continuing problem on release. Some of the signatures we get from crash ping telemetry include -
std::thread::local::fast::Key::try_initialize<T>
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | OpenSystemPreferredAlgorithmProvider
core::ops::function::Fn::call<T> | env_logger::Builder::new
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | bcrypt.pdb@0x14f75
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | mozilla::ipc::NodeController::ForwardEvent
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | bcrypt.pdb@0x14fd5
core::ops::function::Fn::call<T> | je_malloc | moz_xmalloc | _tailMerge_d3dcompiler_47.dll | std::thread::local::fast::Key::try_initialize<T>
core::ops::function::Fn::call<T> | szDynamicDaylightDisabledThese tend to point to Bug 1751094 - Crash in [@ std::thread::local::fast::Key::try_initialize<T>] with "couldn't generate random bytes"
Which was marked resolved fixed as a result of the work here.
These crashes show up in the RDD and Socket processes currently, all have a top frame of RustMozCrash, and the same error message - ' couldn't generate random bytes: The operation completed successfully. (os error 0)'
My guess is the mixed signatures above are the result of oddness in the way we collect stacks for crash ping telemetry.
https://www.mathies.com/mozilla/crashes/rddrelease.html
https://www.mathies.com/mozilla/crashes/sockrelease.htmlThoughts?
Well, as I said, I've asked whether https://bugzilla.mozilla.org/show_bug.cgi?id=1746254 would be enough to be uplifted and it was unclear.
Assignee | ||
Comment 22•3 years ago
|
||
Can I ask you to have a look at https://treeherder.mozilla.org/jobs?repo=try&revision=45dad7d42a9730696ea52a6a85d68689eedd1943 ? It's mostly finished, from what I verified it seems only known intermittents, but I'm not so used to properly verify that, so another pair of eyes could be welcome.
Updated•3 years ago
|
Assignee | ||
Comment 24•3 years ago
|
||
Comment on attachment 9263568 [details]
Bug 1751177 - Give sandbox permissions for BCryptGenRandom on Win7 r=tkikuchi
Beta/Release Uplift Approval Request
- User impact if declined: child process crash (content, rdd, socket)
- Is this code covered by automated tests?: Yes (on nightly)
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Small change, tested on try/release: https://treeherder.mozilla.org/jobs?repo=try&revision=45dad7d42a9730696ea52a6a85d68689eedd1943
- String changes made/needed: N/A
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 25•3 years ago
|
||
Comment on attachment 9263567 [details]
Bug 1751177 - Revert glean workaround for HashMap r=janerik
Low-risk crash fix, approved for 97.0.1.
Updated•3 years ago
|
Comment 26•3 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-release/rev/c9241cacf79e
https://hg.mozilla.org/releases/mozilla-release/rev/724b2d7ef3a9
Updated•3 years ago
|
Comment 27•3 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #26)
https://hg.mozilla.org/releases/mozilla-release/rev/c9241cacf79e
https://hg.mozilla.org/releases/mozilla-release/rev/724b2d7ef3a9
Looking at crash ping telemetry in the RDD and socket processes, the .1 release appears unaffected. Great work!
Description
•