Closed
Bug 1486400
Opened 6 years ago
Closed 6 years ago
Make use-after-free due to thread races better detectable in ASan Nightly
Categories
(Firefox :: Security, defect)
Tracking
()
RESOLVED
FIXED
Firefox 64
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox62 | --- | wontfix |
firefox63 | --- | wontfix |
firefox64 | --- | fixed |
People
(Reporter: decoder, Assigned: decoder)
References
(Blocks 1 open bug)
Details
(Keywords: sec-other, sec-want, Whiteboard: [post-critsmash-triage][adv-main64-])
Attachments
(2 files, 1 obsolete file)
After the first few use-after-free bugs reported by the ASan Nightly project have been fixed, I reached out to developers to figure out what we can do to make thse bugs more likely to trigger. All of the bugs reported (with the exception of one) only had a single report associated, indicating that there might be many more similar bugs that we cannot detect because the userbase is too small. Before asking developers, I thought GC/CC was the key and filed bug 1484655, but now it seems that thread races or other issues around thread/task scheduling are the primary source of the uafs we have seen so far.
Some more detailed comments:
Bug 1478849:
> For this UAF, an async runnable/task needs to run before something else that's
> done synchronously, which more or less means that the first thread needs to be
> preempted. This is the kind of thing that rr chaos mode approaches more
> generally, but for this specific case we might get some mileage out of doing a
> sched_yield() and/or a short sleep when posting a runnable/task to another
> thread.
Bug 1478575:
> This is influenced by thread timing, ordering, load, etc. some form of thread
> 'chaos' mode may expose more (see rr's chaos mode, etc). Especially timing
> around IPC; typically on multicore an IPC may not force an immediate yield by
> the caller; conversely a receiver may run immediately and finish before the
> caller has done "a lot". Ditto Dispatch() (perhaps even more in Dispatch).
> Having the sender or receiver occasionally (randomly) yield for a short time
> might increase the odds of an IPC/Dispatch-related UAF.
In ASan Nightly, we have the option to implement the things suggested in these comments. It is perfectly acceptable to add any kinds of delays, even if it slows down the browser a bit. For the purpose of ASan Nightly, the crashes also don't have to be reproducible. We have a very high fix rate for use-after-free, typically the trace is enough to point out the problem.
Since I'm not very familiar with our IPC or our threading code, I won't be able to implement any of this on my own, so I'm asking any of you to help with suggestions or by implementing parts of these suggestions to make this happen.
Comment 1•6 years ago
|
||
Short, even random-ish delays on the dispatching thread for tasks/runnables are super-easy to implement. It's less easy for me to imagine how to implement delays on the dispatched-to thread, but I probably just lack imagination here.
Comment 2•6 years ago
|
||
If you're willing to block the receiving thread, it isn't hard; if you want to not block it or the sender, you could send it to a "delay" thread which holds it for N ms and then sends it to the target thread. the runnable to the delay thread would just take a runnable, a target thread, and a delay value, and on the delay thread would set up a timer to fire and Dispatch() it to the real target.
Assignee | ||
Comment 3•6 years ago
|
||
I see two action points here:
1) Enable MOZ_CHAOSMODE for ASan Nightly builds. I can make a patch for that, but should we just enable it unconditionally? I tested it locally and did not see any noticeable slowdown, so performance-wise I would think this is ok. If we feared that we would miss certain bugs because it is enabled, I can enable it 50:50 at random every time the browser starts. Opinions?
2) Implement comment 1 and comment 2 as an additional mode for MOZ_CHAOSMODE. Nathan, can you help with this?
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 4•6 years ago
|
||
Comment 5•6 years ago
|
||
This is maybe not perfect, but it is at least a start. Seems to compile at least.
Attachment #9008461 -
Flags: review?(rjesup)
Updated•6 years ago
|
Flags: needinfo?(nfroyd)
Comment 6•6 years ago
|
||
Comment on attachment 9008461 [details] [diff] [review]
add task dispatch/run delays for ChaosMode
Review of attachment 9008461 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/threads/nsThreadPool.cpp
@@ +242,5 @@
> +
> + // Delay event processing to encourage whoever dispatched this event
> + // to run.
> + if (ChaosMode::isActive(ChaosFeature::TaskRunning)) {
> + const uint32_t milliseconds = 100;
Not that I think it's a problem, but is there a reason this is 100ms and the others are all 1000ms?
Attachment #9008461 -
Flags: review?(rjesup) → review+
Comment 7•6 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #6)
> Comment on attachment 9008461 [details] [diff] [review]
> add task dispatch/run delays for ChaosMode
>
> Review of attachment 9008461 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: xpcom/threads/nsThreadPool.cpp
> @@ +242,5 @@
> > +
> > + // Delay event processing to encourage whoever dispatched this event
> > + // to run.
> > + if (ChaosMode::isActive(ChaosFeature::TaskRunning)) {
> > + const uint32_t milliseconds = 100;
>
> Not that I think it's a problem, but is there a reason this is 100ms and the
> others are all 1000ms?
The idea was to delay event running less, but I can change it to be consistent.
Also, thinking about it now, delaying for 1000ms means delays up to 1s, which is probably unacceptable. What we really want are delays of up to 1000us, I think? Will probably have to modify things because PR_Sleep is not usable with such intervals...
Comment 8•6 years ago
|
||
Comment on attachment 9007735 [details]
Bug 1486400 - Enable MOZ_CHAOSMODE by default for ASan Nightly Reporter builds. r?froydnj
Nathan Froyd [:froydnj] has approved the revision.
Attachment #9007735 -
Flags: review+
Comment 9•6 years ago
|
||
probably a 10ms sleep is good enough. We don't need this to have no impact on browser speed, and we may need it to service other code waiting to run, or to run "enough".
Assignee | ||
Comment 10•6 years ago
|
||
Nathan, is this ready to land or do we need to do something more here?
Flags: needinfo?(nfroyd)
Comment 11•6 years ago
|
||
OK, fixed up to do something reasonable on Windows and Unix.
decoder, are you able to try this locally and make sure it doesn't break too much?
Attachment #9010067 -
Flags: review+
Attachment #9010067 -
Flags: feedback?(choller)
Updated•6 years ago
|
Attachment #9008461 -
Attachment is obsolete: true
Comment 12•6 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #10)
> Nathan, is this ready to land or do we need to do something more here?
This is ready to land once decoder successfully browses with it for a bit.
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 13•6 years ago
|
||
Comment on attachment 9010067 [details] [diff] [review]
add task dispatch/run delays for ChaosMode
I surfed around the whole day with the Linux try build and also started the build on a Windows machine for some quick surfing. Overall, I feel it has a somewhat noticable performance impact, but nothing that wouldn't be acceptable within the project. Also no crashes on any of the two platforms, so I think we're good to land this. \o/
Flags: needinfo?(nfroyd)
Attachment #9010067 -
Flags: feedback?(choller) → feedback+
Comment 14•6 years ago
|
||
Already landed in https://hg.mozilla.org/integration/mozilla-inbound/rev/1dd7a055bae0985e895946fa62b561feb8fbe78d
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 15•6 years ago
|
||
Landed the second patch:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a99736fa8479cbac5da5226c14db5004203dc21
Comment 16•6 years ago
|
||
Assignee: nobody → choller
status-firefox62:
--- → wontfix
status-firefox64:
--- → affected
status-firefox-esr60:
--- → unaffected
Comment 17•6 years ago
|
||
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 64
Updated•6 years ago
|
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Updated•6 years ago
|
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main64-]
Updated•5 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•