Closed Bug 1686138 Opened 4 years ago Closed 3 years ago

ThreadSanitizer: data race [@ PR_CallOnce] vs. [@ PR_CallOnce]

Categories

(NSPR :: NSPR, defect, P1)

Tracking

(firefox-esr78- wontfix, firefox86 wontfix, firefox87 wontfix, firefox88 wontfix, firefox89 wontfix, firefox90 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr78 - wontfix
firefox86 --- wontfix
firefox87 --- wontfix
firefox88 --- wontfix
firefox89 --- wontfix
firefox90 --- fixed

People

(Reporter: tsmith, Assigned: keeler)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-race, sec-moderate, testcase, Whiteboard: [fuzzblocker][sec-survey][post-critsmash-triage][adv-main90+r])

Attachments

(3 files)

Attached file testcase.html (deleted) —

The attached crash information was detected by ThreadSanitizer while fuzzing on mozilla-central 20210107-958d142c083c.

General information about TSan reports

Why fix races?

Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.

Rating

If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.

False Positives / Benign Races

Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].

[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf

Suppressing unfixable races

If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.

Flags: in-testsuite?
Attached file Detailed Crash Information (deleted) —

Looks like PR_CallOnce (which isn't threadsafe, despite using the trappings of thread safety) is being used to set useFreeList in PORT_FreeArena. Presumably that could go earlier in initialization when we're hoping that multiple threads aren't running?

Component: DOM: Security → NSPR
Flags: in-testsuite?
Product: Core → NSPR
Version: unspecified → other

Would a Pernosco session be helpful in this case?

The fuzzers are frequently tripping over this issue and has been marked as a fuzzblocker[1]. Please prioritize this issue accordingly.

[1] https://firefox-source-docs.mozilla.org/tools/fuzzing/index.html#fuzz-blockers

Whiteboard: [fuzzblocker]

I don't think pernosco is necessary here - it's pretty clear we need to ensure NSS calls SetupUseFreeList early in init, when in theory there aren't multiple threads running in NSS.

Assignee: nobody → nobody
Severity: -- → S1
Component: NSPR → Libraries
Flags: needinfo?(dkeeler)
Product: NSPR → NSS
Whiteboard: [fuzzblocker] → [nss-fx]

Actually, we really should just fix PR_CallOnce.

Assignee: nobody → nobody
Component: Libraries → NSPR
Product: NSS → NSPR
Whiteboard: [nss-fx] → [fuzzblocker]
Assignee: nobody → dkeeler
Status: NEW → ASSIGNED

Shall we target Firefox 90, or do you have other preferences?

You have marked this as a security issue. Are you worried it is exploitable?

Dana, will you file the sec-approval request?

Please let me know when we're good to commit the fix to NSPR, I'm happy to land and drive uplifts with a NSPR beta.

Comment on attachment 9214837 [details]
Bug 1686138 - lock access to PRCallOnceType members in PR_CallOnce* for thread safety r?kaie

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: I think it would be rather difficult to exploit this.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Yes
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: This file hasn't changed in a year and a half, so this patch will probably apply cleanly to all supported branches.
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely - if there is an issue, it should be obvious pretty quickly.
Attachment #9214837 - Flags: sec-approval?

Targeting 90 sounds good to me.

Comment on attachment 9214837 [details]
Bug 1686138 - lock access to PRCallOnceType members in PR_CallOnce* for thread safety r?kaie

Approved to land and request uplift. Not setting the flags because I don't know if there's an unusual uplift procedure for NSPR.

Attachment #9214837 - Flags: sec-approval? → sec-approval+
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.31

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #12)

Targeting 90 sounds good to me.

setting 89 beta to wontfix

Group: crypto-core-security → core-security-release

Actually, I might have been premature setting wontfix for ESR78 given the sec-high rating. We're using NSPR 4.25.1 on ESR78 - is this something we should consider backporting and spinning a 4.25.2 release for?

Flags: needinfo?(dkeeler)
Blocks: 1708093

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(dkeeler)
Whiteboard: [fuzzblocker] → [fuzzblocker][sec-survey]

(In reply to Ryan VanderMeulen [:RyanVM] from comment #16)

Actually, I might have been premature setting wontfix for ESR78 given the sec-high rating. We're using NSPR 4.25.1 on ESR78 - is this something we should consider backporting and spinning a 4.25.2 release for?

We could, to be safe. I'm having a hard time coming up with a way this could be exploitable/dangerous. So far all I've got is the existing code could hang, which isn't great, but at least it isn't RCE or something.

Flags: needinfo?(dkeeler)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #18)

We could, to be safe. I'm having a hard time coming up with a way this could be exploitable/dangerous. So far all I've got is the existing code could hang, which isn't great, but at least it isn't RCE or something.

Should we revisit the severity rating in that case?

Flags: needinfo?(dveditz)

NSS isn't the only user of PR_CallOnce*(). We don't know races in all cases would be benign, but we don't know a specific instance that's harmful, either. sec-high is too much but it's still sec-something.

Might be nice to fix on ESR-78 but ESR-90 is right around the corner and might be good enough.

Flags: needinfo?(dveditz)
Keywords: sec-highsec-moderate
Flags: qe-verify+
Whiteboard: [fuzzblocker][sec-survey] → [fuzzblocker][sec-survey][post-critsmash-triage]
Priority: -- → P1
Whiteboard: [fuzzblocker][sec-survey][post-critsmash-triage] → [fuzzblocker][sec-survey][post-critsmash-triage][adv-main90+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: