Closed Bug 1751177 Opened 3 years ago Closed 3 years ago

Perma toolkit/components/glean/tests/browser/browser_fog_utility.js | Cannot start Utility process? -

Categories

(Core :: IPC, defect, P5)

Desktop
Windows 7
defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox96 --- unaffected
firefox97 --- fixed
firefox98 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gerard-majax)

References

Details

(Keywords: regression)

Attachments

(7 files, 1 obsolete file)

Filed by: alissy [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=364628559&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/IScwqDqtSZKCr75hy0emgg/runs/0/artifacts/public/logs/live_backing.log


UtilityProcess seems broken because of https://searchfox.org/mozilla-central/rev/c3cbf7e56630d12443459a6bd7938358c231ce3b/xpcom/rust/gecko_logger/src/lib.rs#34 breaking like in bug 1746254
Assignee: nobody → lissyx+mozillians
OS: Unspecified → Windows 7
Hardware: Unspecified → Desktop
Version: unspecified → Trunk

Set release status flags based on info from the regressing bug 1731890

Has Regression Range: --- → yes

So I can repro that on a Firefox 64-bits, running on Windows 7 64-bits when I try to run test from bug 1745173. As mentionned in bug 1746254 comment 4, there's a call to ZwDeviceIoControlFile from BCryptGenRandom, but interestingly, I do see the call in both the case where the process dies with the sandbox enabled, as well as the case where the process works without the sandbox.

Attached file utility process crash (deleted) —

placing a breakpoint on BCryptGenRandom in utility child process with sandboxing enabled

Attached file utility process no crash (deleted) —

placing a breakpoint on BCryptGenRandom in utility child process with sandboxing disabled (I had to manually stop, tracing was continuing with the process running)

Attached file utility process still crash (deleted) —

this one is after:

  • adding WinBuiltinUsersSid to the USER_LOCKDOWN restricted token
  • allowing a few registry keys related to bcrypt
diff --git a/security/sandbox/chromium/sandbox/win/src/restricted_token_utils.cc b/security/sandbox/chromium/sandbox/win/src/restricted_token_utils.cc
--- a/security/sandbox/chromium/sandbox/win/src/restricted_token_utils.cc
+++ b/security/sandbox/chromium/sandbox/win/src/restricted_token_utils.cc
@@ -164,18 +164,20 @@ DWORD CreateRestrictedToken(HANDLE effec
       if (use_restricting_sids) {
         restricted_token.AddRestrictingSid(WinRestrictedCodeSid);
         if (unique_restricted_sid)
           restricted_token.AddRestrictingSid(Sid(unique_restricted_sid));
       }
       break;
     }
     case USER_LOCKDOWN: {
+      sid_exceptions.push_back(WinBuiltinUsersSid);
       restricted_token.AddUserSidForDenyOnly();
       if (use_restricting_sids) {
+        restricted_token.AddRestrictingSid(WinBuiltinUsersSid);
         restricted_token.AddRestrictingSid(WinNullSid);
         if (unique_restricted_sid)
           restricted_token.AddRestrictingSid(Sid(unique_restricted_sid));
       }
       break;
     }
     default: { return ERROR_BAD_ARGUMENTS; }
   }
diff --git a/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp b/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
--- a/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
+++ b/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
@@ -1325,16 +1325,28 @@ bool SandboxBroker::SetSecurityLevelForU
   // Add the policy for the client side of the crash server pipe.
   result = mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_FILES,
                             sandbox::TargetPolicy::FILES_ALLOW_ANY,
                             L"\\??\\pipe\\gecko-crash-server-pipe.*");
   SANDBOX_ENSURE_SUCCESS(
       result,
       "With these static arguments AddRule should never fail, what happened?");
 
+  result = mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_REGISTRY,
+                            sandbox::TargetPolicy::REG_ALLOW_READONLY,
+                            L"HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\Cryptography\\Providers");
+
+  result = mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_REGISTRY,
+                            sandbox::TargetPolicy::REG_ALLOW_READONLY,
+                            L"HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\Cryptography\\Configuration");
+
+  result = mPolicy->AddRule(sandbox::TargetPolicy::SUBSYS_REGISTRY,
+                            sandbox::TargetPolicy::REG_ALLOW_READONLY,
+                            L"HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\CMF\\Config");
+
   switch (aSandbox) {
     case mozilla::ipc::SandboxingKind::GENERIC_UTILITY:
       // Nothing specific to perform yet?
       break;
 
     default:
       MOZ_ASSERT(false, "Invalid SandboxingKind");
       break;

One of the problem identified so far is that bcrypt!BCryptGenRandom called by rust, e.g., https://github.com/rust-lang/rust/blob/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093/library/std/src/sys/windows/rand.rs#L10-L14 and it fails, thus hitting the panic code path.

Accessing those registry keys is one of the reasons for BCryptGenRandom failure.

As mentionned by :toshi on Matrix, Windows 7 enforces a OBJ_FORCE_ACCESS_CHECK when performing ZwOpenKey calls, in cng!BCryptResolveProviders (cng.sys). It seems not to be the case on at least Windows 10.

Attachment #9259932 - Attachment description: WIP: Bug 1751177 - Add browser chrome tests for UtilityProcess → Bug 1751177 - Add browser chrome tests for UtilityProcess r?#xpcom-reviewers!
Attachment #9261159 - Attachment description: WIP: Bug 1751177 - Give sandbox permissions for BCryptGenRandom on Win7 → Bug 1751177 - Give sandbox permissions for BCryptGenRandom on Win7 r?tkikuchi!

It was not really regressed by bug 1731890, but rather it made it very visible. It is mostly certain that bug 1751094 is another facet of that.

Pushed by alissy@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/98ee4ba4b96f Add browser chrome tests for UtilityProcess r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/e566984baff7 Revert glean workaround for HashMap r=janerik https://hg.mozilla.org/integration/autoland/rev/1917bf0f9ff6 Give sandbox permissions for BCryptGenRandom on Win7 r=tkikuchi
Blocks: 1751094

Any chance we could consider this for a ride along in a point release for 97?

Flags: needinfo?(ryanvm)

I was under the impression that bug 1746254 was supposed to be sufficient to work around these crashes on 97. Is that not the case after all?

Flags: needinfo?(ryanvm) → needinfo?(jmathies)

So I lack visibility on the problem on the side of release, but in case it turns out bug 1746254 uplift was not enough, I'm preparing two patches that should uplift this one. Except the UtilityProcess test added here, because UtilityProcess is not a thing on release yet.
This is a try: https://treeherder.mozilla.org/jobs?repo=try&revision=157d13dde76b46543260776ac48db19c1c6c0b26

(In reply to Alexandre LISSY :gerard-majax from comment #18)

So I lack visibility on the problem on the side of release, but in case it turns out bug 1746254 uplift was not enough, I'm preparing two patches that should uplift this one. Except the UtilityProcess test added here, because UtilityProcess is not a thing on release yet.
This is a try: https://treeherder.mozilla.org/jobs?repo=try&revision=157d13dde76b46543260776ac48db19c1c6c0b26

mach try auto against release seems not to work, so here: https://treeherder.mozilla.org/jobs?repo=try&revision=45dad7d42a9730696ea52a6a85d68689eedd1943

I'm not sure which signatures are associated with the work related here, but I see similar error messages for crashes that seem to point to a continuing problem on release. Some of the signatures we get from crash ping telemetry include -

std::thread::local::fast::Key::try_initialize<T>
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | OpenSystemPreferredAlgorithmProvider
core::ops::function::Fn::call<T> | env_logger::Builder::new
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | bcrypt.pdb@0x14f75
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | mozilla::ipc::NodeController::ForwardEvent
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | bcrypt.pdb@0x14fd5
core::ops::function::Fn::call<T> | je_malloc | moz_xmalloc | _tailMerge_d3dcompiler_47.dll | std::thread::local::fast::Key::try_initialize<T>
core::ops::function::Fn::call<T> | szDynamicDaylightDisabled

These tend to point to Bug 1751094 - Crash in [@ std::thread::local::fast::Key::try_initialize<T>] with "couldn't generate random bytes"

Which was marked resolved fixed as a result of the work here.

These crashes show up in the RDD and Socket processes currently, all have a top frame of RustMozCrash, and the same error message - ' couldn't generate random bytes: The operation completed successfully. (os error 0)'

My guess is the mixed signatures above are the result of oddness in the way we collect stacks for crash ping telemetry.

https://www.mathies.com/mozilla/crashes/rddrelease.html
https://www.mathies.com/mozilla/crashes/sockrelease.html

Thoughts?

Flags: needinfo?(jmathies) → needinfo?(lissyx+mozillians)

(In reply to Jim Mathies [:jimm] from comment #20)

I'm not sure which signatures are associated with the work related here, but I see similar error messages for crashes that seem to point to a continuing problem on release. Some of the signatures we get from crash ping telemetry include -

std::thread::local::fast::Key::try_initialize<T>
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | OpenSystemPreferredAlgorithmProvider
core::ops::function::Fn::call<T> | env_logger::Builder::new
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | bcrypt.pdb@0x14f75
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | mozilla::ipc::NodeController::ForwardEvent
core::ops::function::Fn::call<T> | _tailMerge_d3dcompiler_47.dll | xul.dll | moz_xmalloc | bcrypt.pdb@0x14fd5
core::ops::function::Fn::call<T> | je_malloc | moz_xmalloc | _tailMerge_d3dcompiler_47.dll | std::thread::local::fast::Key::try_initialize<T>
core::ops::function::Fn::call<T> | szDynamicDaylightDisabled

These tend to point to Bug 1751094 - Crash in [@ std::thread::local::fast::Key::try_initialize<T>] with "couldn't generate random bytes"

Which was marked resolved fixed as a result of the work here.

These crashes show up in the RDD and Socket processes currently, all have a top frame of RustMozCrash, and the same error message - ' couldn't generate random bytes: The operation completed successfully. (os error 0)'

My guess is the mixed signatures above are the result of oddness in the way we collect stacks for crash ping telemetry.

https://www.mathies.com/mozilla/crashes/rddrelease.html
https://www.mathies.com/mozilla/crashes/sockrelease.html

Thoughts?

Well, as I said, I've asked whether https://bugzilla.mozilla.org/show_bug.cgi?id=1746254 would be enough to be uplifted and it was unclear.

Flags: needinfo?(lissyx+mozillians)

Can I ask you to have a look at https://treeherder.mozilla.org/jobs?repo=try&revision=45dad7d42a9730696ea52a6a85d68689eedd1943 ? It's mostly finished, from what I verified it seems only known intermittents, but I'm not so used to properly verify that, so another pair of eyes could be welcome.

Flags: needinfo?(ryanvm)

lgtm

Flags: needinfo?(ryanvm)

Comment on attachment 9263568 [details]
Bug 1751177 - Give sandbox permissions for BCryptGenRandom on Win7 r=tkikuchi

Beta/Release Uplift Approval Request

  • User impact if declined: child process crash (content, rdd, socket)
  • Is this code covered by automated tests?: Yes (on nightly)
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Small change, tested on try/release: https://treeherder.mozilla.org/jobs?repo=try&revision=45dad7d42a9730696ea52a6a85d68689eedd1943
  • String changes made/needed: N/A
Attachment #9263568 - Flags: approval-mozilla-release?
Attachment #9263567 - Flags: approval-mozilla-release?

Comment on attachment 9263567 [details]
Bug 1751177 - Revert glean workaround for HashMap r=janerik

Low-risk crash fix, approved for 97.0.1.

Attachment #9263567 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9263568 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9263568 - Attachment is obsolete: true

(In reply to Ryan VanderMeulen [:RyanVM] from comment #26)

https://hg.mozilla.org/releases/mozilla-release/rev/c9241cacf79e
https://hg.mozilla.org/releases/mozilla-release/rev/724b2d7ef3a9

Looking at crash ping telemetry in the RDD and socket processes, the .1 release appears unaffected. Great work!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: