Closed Bug 1421963 Opened 7 years ago Closed 7 years ago

Intermittent GECKO(3202) | ==3255==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d000a582b4 at pc 0x7f1472493c22 bp 0x7f14688f2d30 sp 0x7f14688f2d28

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 - wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 + fixed

People

(Reporter: intermittent-bug-filer, Assigned: jesup)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-uaf, intermittent-failure, sec-high, Whiteboard: [post-critsmash-triage][adv-main59+])

Attachments

(1 file)

Nico, is this a dupe of the other one you're looking at?
Group: media-core-security
Component: Build Config → WebRTC: Audio/Video
Flags: needinfo?(na-g)
Ryan, this doesn't look related to me, this appears to be DataChannels.
Flags: needinfo?(na-g)
Should be fixed by backout of bug 1297418. Please reopen if you think otherwise.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Reopening to track UAF issues in the new library, which may also be relevant to the current impl and cause the existing security crashes. (copied from an email): I *finally* reproed it with ASAN + rr, so I could look into why it happens. The memory was freed due to setsockopt() called from SendOutgoingStreamReset() in DataChannels.cpp. This is freed on "MainThread" aka thread 0. The crash happens when it's accessed on thread 6 ("Socket Thread") in response to a packet input. We appear to be getting a stream reset while we're in the middle of closing it from our end. The ordering is apparently (per rr): t0: DataChannel::Close() { MutexAutoLock lock(mLock) .... setsockopt() ... t6: <packet input> usrsctp_conninput() { ... sctp_reset_out_streams() { ... sctp_ulp_notify(SCTP_NOTIFY_STR_RESET_SEND,...) { ... sctp_notify_stream_reset() { ... sctp_add_to_readq() { ... sctp_invoke_recv_callback() { ... inp->recv_callback() { ... mozilla::receive_cb() { ... DataChannelConnection::ReceiveCallback() { MutexAutoLock lock(mLock); <wait on lock> t0: <finish freeing buffers> <release lock in Close()> .... t6: <wake up with lock> HandleStreamResetEvent() finish sctp_reset_out_streams(); call sctp_reset_clear_pending() { crash Now to figure out what's the right thing to do here, and if this is in some what a hole in the library or in how Close() is implemented... Note that the sctp library doesn't lock access while it's invoking the recv callback or running reset_out_streams(); the setsockopt() does lock in sctp. The lock in the ReceiveCallback just makes this more likely/repeatable; if you wrote code that called setsockopt() on one thread, and it interleaved with conninput() and you simply task-switched at the right points, you'd have the same problem. Grabbing the global lock before calling conninput instead of in ReceiveCallback() would be another option, though I worry about possible deadlocks. Another solution would be to force Close() to proxy the setsockopt() to the Socket thread, though that begs the question of what else needs to be proxied. Is it supposed to be allowed to call conninput() on a different thread than other calls like setsockopt()? (or is it allowed with external locking?) In general I thought the API was supposed to be threadsafe. Phew... that wasn't easy (and solutions aren't obvious either) Further notes: it appears moving the lock to SctpDtlsInput() (which calls usrsctp_conninput()) works (one minor mod to assert lock held in the threshold code instead of grabbing it). This might be worth doing in beta 58 if we believe that's safe deadlock-wise. it's tough to believe it wouldn't be safe in general; there are only a couple of callbacks into our code, and this acts as a giant lock around the library input code.
Status: RESOLVED → REOPENED
Flags: needinfo?(tuexen)
Flags: needinfo?(lennart.grahl)
Resolution: DUPLICATE → ---
sec-high since this may affect trunk
Assignee: nobody → rjesup
Rank: 7
Priority: P5 → P1
Lock around the entire input processing of the sctp library, since there appears to be incomplete locking within the library. Assert we own the lock on the callbacks from the library instead of taking the lock. Ran 1000 iterations of the failing test with no problems; before it would fail in 5-50 iterations. This may also solve bug 1400563 and related issues if applied on the old library, since that seems related to resets also.
Attachment #8934387 - Flags: review?(tuexen)
Attachment #8934387 - Flags: review?(drno)
Comment on attachment 8934387 [details] [diff] [review] lock around SCTP input processing, not just the receive callback Review of attachment 8934387 [details] [diff] [review]: ----------------------------------------------------------------- In general the patch looks good to me. I am worried though that our mochitests for data channels don't cover much of what (else) could go wrong. Dispatching from MainThread to STS seems to the safer option to me, but I understand it's probably harder to get right. Michael should have the last verdict here I think.
Attachment #8934387 - Flags: review?(drno) → review+
Flags: needinfo?(lennart.grahl)
Blocks: 1408485
Blocks: 1400563
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Where does this need uplift to?
Flags: needinfo?(rjesup)
beta, and 52esr - though I'm tempted to wait for confirmation in beta before uplifting to ESR. Signal is strong enough in beta to know.
Flags: needinfo?(rjesup)
Comment on attachment 8934387 [details] [diff] [review] lock around SCTP input processing, not just the receive callback Approval Request Comment [Feature/Bug causing the regression]: NA [User impact if declined]: Continued UAFs [Is this code covered by automated tests?]: yes, but they don't hit it. requires simultaneous closes from both ends, plus luck in timing. [Has the fix been verified in Nightly?]: no; will watch for crashes after merge date. volume is low but not 0 in nightly. easily verified in beta via crash-stats [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: Not very [Why is the change risky/not risky?]: risk is only of deadlock. analysis indicates it shouldn't deadlock, and in tests it hasn't. [String changes made/needed]: none
Attachment #8934387 - Flags: review?(tuexen) → approval-mozilla-beta?
This still needs sec-approval, right?
Flags: needinfo?(rjesup)
Tracking this and bug 1400563, sounds like this might fix both bugs.
(In reply to Julien Cristau [:jcristau] from comment #12) > This still needs sec-approval, right? This report was for a new checkin that was then backed out; we landed this fix with the new code. we're supposing that it will also fix a series of existing UAF crashes. For completeness, even though this code is already in Nightly as part of the major SCTP update, I'll ask for s-a
Flags: needinfo?(rjesup)
Comment on attachment 8934387 [details] [diff] [review] lock around SCTP input processing, not just the receive callback [Security approval request comment] How easily could an exploit be constructed based on the patch? Very hard Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No. Clearly we move where the lock is grabbed, which gives a very indirect idea that there's a locking issue somewhere, but not what it might be or cause. Which older supported branches are affected by this flaw? all; appears to be a bug in the imported library If not all supported branches, which bug introduced the flaw? Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Trivial; should apply cleanly How likely is this patch to cause regressions; how much testing does it need? Very unlikely to cause regressions, none reported yet (this is on nightly as part of a bigger sctp update). Only likely regression would be deadlocks
Attachment #8934387 - Flags: sec-approval?
Asking for sec-approval and beta, but please don't land on beta until I do more checking of crash-stats to see if we see an improvement. The crashes in the field may be related to this, but not the same bug. Creashes in the field are (mostly) due to timeouts, while this bug was directly related to crossing internal-close and external-close-packet. This may point to the problem when a timeout occurs, however. We may need a different patch to solve the crash-stats failures, or this patch plus another.
Comment on attachment 8934387 [details] [diff] [review] lock around SCTP input processing, not just the receive callback Sec-approval+ and beta approval.
Attachment #8934387 - Flags: sec-approval?
Attachment #8934387 - Flags: sec-approval+
Attachment #8934387 - Flags: approval-mozilla-beta?
Attachment #8934387 - Flags: approval-mozilla-beta+
Thanks - reminder we don't want to actually uplift this until I do more checks that this problem exists in the older sctp library.
Group: media-core-security → core-security-release
Any news on this?
Flags: needinfo?(rjesup)
No noted regressions in 59; it doesn't fix the timeout in beta, but it should block the same issue if it exists in 58 (58 doesn't have the updated SCTP library, and sctp_add_to_readq() has been significantly rewritten. It *appears* that the same sort of issue could occur, but it's hard to be 100% sure. Given a lack of in-the-field crashes, I don't think we should risk it with the older SCTP library. The other bug is worth trying to fix in 58 if we can.
Flags: needinfo?(rjesup)
I assume that means ESR52 is also wontfix.
Comment on attachment 8934387 [details] [diff] [review] lock around SCTP input processing, not just the receive callback this wasn't uplifted to 58 after all, clearing flag.
Attachment #8934387 - Flags: approval-mozilla-beta+
Flags: needinfo?(tuexen)
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main59+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: