Closed
Bug 1059081
Opened 10 years ago
Closed 10 years ago
TCPSocket crash in nsMultiplexInputStream::ReadSegments on "Socket Thread" due to apparent racey use of nsMultiplexInputStream
Categories
(Core :: Networking, defect)
Tracking
()
VERIFIED
FIXED
mozilla41
People
(Reporter: asuth, Assigned: jdm)
References
Details
(Keywords: sec-low, Whiteboard: [adv-main39+][b2g-adv-main2.2+])
Attachments
(1 file)
(deleted),
patch
|
froydnj
:
review+
gerard-majax
:
feedback+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
jocheng
:
approval-mozilla-b2g34+
jocheng
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
The Gaia email app uses TCPSocket in our back-end tests using b2g-desktop in a non-OOP mode. I have observed intermittent crashes in nsMultiplexInputStream::ReadSegments. I'm not exactly clear what's going on, but the fact that TCPSocket.js calls removeStream(0) and appendStream(...) on the main thread while nsMultiplexInputStream::ReadSegments is doing something on the "Socket Thread" without any apparent synchronization constructs seems like it could explain things.
The call stack on an -O1 self-built trunk build from earlier today looks like this:
#0 nsMultiplexInputStream::ReadSegments (this=0x2aaaca7bb740, aWriter=<optimized out>, aClosure=<optimized out>, aCount=65536, aResult=0x2aaabeaffbcc)
318 rv = mStreams[mCurrentStream]->ReadSegments(ReadSegCb, &state, aCount, &read);
#1 0x00002aaaac9dae65 in nsStreamCopierIB::DoCopy (this=<optimized out>, aSourceCondition=0x2aaabeaffc08, aSinkCondition=0x2aaabeaffc0c) at xpcom/io/nsMultiplexInputStream.cpp:318
#2 0x00002aaaac9db093 in Process (this=0x2aaad01f0500) at xpcom/io/nsStreamUtils.cpp:291
#3 nsAStreamCopier::Run (this=0x2aaad01f0500) at xpcom/io/nsStreamUtils.cpp:414
#4 0x00002aaaac9e7588 in nsThread::ProcessNextEvent (this=0x2aaabe8263c0, aMayWait=<optimized out>, aResult=0x2aaabeaffcef)
#5 0x00002aaaaca053ee in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>)
#6 0x00002aaaaca37c3e in nsSocketTransportService::Run (this=0x2aaaabd8f600) at netwerk/base/src/nsSocketTransportService2.cpp:744
#7 0x00002aaaac9e7588 in nsThread::ProcessNextEvent (this=0x2aaabe8263c0, aMayWait=<optimized out>, aResult=0x2aaabeaffdcf)
#8 0x00002aaaaca053ee in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>)
#9 0x00002aaaacc56927 in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0x2aaabe82d040, aDelegate=0x2aaaabd67060)
#10 0x00002aaaacc3532a in MessageLoop::RunInternal (this=this@entry=0x2aaaabd67060)
#11 0x00002aaaacc35598 in RunHandler (this=0x2aaaabd67060)
#12 MessageLoop::Run (this=this@entry=0x2aaaabd67060)
#13 0x00002aaaac9ebee3 in nsThread::ThreadFunc (aArg=0x2aaabe8263c0)
#14 0x00002aaaabfd2fb8 in _pt_root (arg=0x2aaaabd37920)
#15 0x00002aaaaacd8182 in start_thread (arg=0x2aaabeb00700)
#16 0x00002aaaab7f738d in clone ()
(gdb) p *mStreams.mHdr
$20 = {
static sEmptyHdr = {
static sEmptyHdr = <same as static member of an already seen type>,
mLength = 0,
mCapacity = 0,
mIsAutoArray = 0
},
mLength = 2,
mCapacity = 3,
mIsAutoArray = 0
}
(gdb) x/4g mStreams.mHdr
0x2aaad0f0ee20: 0x0000000300000002 0x00002aaac153a400
0x2aaad0f0ee30: 0x00002aaac153a4c0 0x5a5a5a5a5a5a5a5a
(gdb) p mCurrentStream
$21 = 1
Unfortunately, Elements() seems to have been optimized out
The disassembly around the crash point start from the while loop comparator is:
0x00002aaaac9ccc58 <+102>: cmp %r12d,%edx
0x00002aaaac9ccc5b <+105>: jae 0x2aaaac9ccce6 <nsMultiplexInputStream::ReadSegments(tag_nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*)+244>
0x00002aaaac9ccc61 <+111>: jmp 0x2aaaac9cccd5 <nsMultiplexInputStream::ReadSegments(tag_nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*)+227>
0x00002aaaac9ccc63 <+113>: lea 0x1c(%rsp),%r15
0x00002aaaac9ccc68 <+118>: lea 0x20(%rsp),%r14
0x00002aaaac9ccc6d <+123>: lea -0x3e4(%rip),%r13 # 0x2aaaac9cc890 <nsMultiplexInputStream::ReadSegCb(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*)>
0x00002aaaac9ccc74 <+130>: mov %edx,%edx
// get mStreams
0x00002aaaac9ccc76 <+132>: mov 0x28(%rbx),%rax
// get mStreams.mHdr.element[1] (not literally, but)
0x00002aaaac9ccc7a <+136>: mov 0x8(%rax,%rdx,8),%rdi
=> 0x00002aaaac9ccc7f <+141>: mov (%rdi),%rax
0x00002aaaac9ccc82 <+144>: mov %r15,%r8
0x00002aaaac9ccc85 <+147>: mov %ebp,%ecx
0x00002aaaac9ccc87 <+149>: mov %r14,%rdx
0x00002aaaac9ccc8a <+152>: mov %r13,%rsi
0x00002aaaac9ccc8d <+155>: callq *0x30(%rax)
(gdb) info registers rax rbx rdx rdi
rax 0x2aaaca642500 46913028367616
rbx 0x2aaaca7bb740 46913029912384
rdx 0x1 1
rdi 0x100000001 4294967297
// just confirming offsets here
(gdb) p this
$27 = (nsMultiplexInputStream * const) 0x2aaaca7bb740
(gdb) p &mStreams
$26 = (nsTArray<nsCOMPtr<nsIInputStream> > *) 0x2aaaca7bb768
(gdb) x 0x100000001
0x100000001: Cannot access memory at address 0x100000001
I'm guessing that subscripting the nsTArray raced a main-thread call to appendStream and this resulted in a corrupt/crazy value. Although 0x100000001 is a pretty bad-ass pointer.
I think this is probably concerning for B2G in both non-OOP and OOP since based on the remoting currently in use, all the same risky business would happen in the parent process. I'm not marking this a security bug, though, since TCPSocket is a privileged-only API within the circle of trust, if you will.
cc'ing other affected email people, leaving this up to Core::Network people for assessment/etc.
Comment 1•10 years ago
|
||
looks like jdm and donovan are the likely people to understand this?
Flags: needinfo?(josh)
I have the same problem as you.
I encountered the problem in xulRunner 31.
At this point, I compiled Xulrunner 32 and TCPSocket doesn't seem to work.
For now, I have to disable optimize flag to make things work.
I'm working on MSWINDOWS.
Assignee | ||
Comment 3•10 years ago
|
||
Yeah, this class says it's threadsafe and does absolutely nothing to back that up. Whee.
Flags: needinfo?(josh)
Assignee | ||
Comment 4•10 years ago
|
||
I suspect the original intention of nsMultiplexInputStream was that you would set it up just so, then you would process it, and then you would drop your references to it whenever you stop caring about it. We're using it as an ongoing queue of streams, however, so that's a bit of a problem. My original thought was just to throw a lock into the mix and move on, but there's a lot of shared, mutating state that looks hairy to lock efficiently. I'm going to ponder a mechanism more suitable to our requirements and whether that would be less work and/or more correct.
Assignee: nobody → josh
Reporter | ||
Comment 5•10 years ago
|
||
If the core of TCPSocket weren't implemented in JS we could always do the append on the necko network thread via runnable dispatch. (Or if the multiplex stream did that itself after it had been activated and if it wasn't on the network thread.)
Another possible partial mitigation for the worst of this would be to call SetCapacity on the multiplex stream to a large enough number initially that it should never actually expand need to dynamically resize the array. It seems like there's likely to still be races on the concurrent array access, but at least it wouldn't involve new memory allocations / freed memory allocations in the nsTArray itself.
Reporter | ||
Comment 6•10 years ago
|
||
It just occurred to me that because the actual data sending happens in the parent process in e10s mode, a segfault will take the root b2g process down and that could constitute a DoS security bug. (Originally I wasn't thinking things through and believed that because the tcp-socket API was privileged and subprocesses were the vulnerable ones, so we were safe.) So marking to get triaged.
The specific the specific attack would be something like this.
Prereqs:
- The user is using a legitimate app that uses TCPSocket. For example the built-in email app or the loqui IM app for Firefox OS (https://github.com/loqui/im). These apps tend to have some combination of auto-connection (ex: periodic email sync) and auto-reconnection (ex: email has some exponential backoff stuff with one immediate retry on connection reuse primarily to deal with stale network connections to compensate for the network stack's crappy TCP keep-alive settings as they relate to TCPSocket.)
- Attacker has the ability to passively notice the connection (to see TCP connections) and some ability to reset the connection (via TCP reset or knocking the device off wi-fi) to cause the device to reconnect.
Attack:
- The attacker keeps resetting the TCP connections to try and cause the nsMultiplexStream vulnerability window to trigger and crash the root b2g process.
- In the case mozAlarms are used to launch the vulnerable app, this could result in the phone ending up in the same situation again soon, depending on the alarm interval. mozAlarms fires all alarms that were in the past when it starts up. The email app supports a maximum interval of every 5 minutes; other apps like IM apps may be more aggressive.
Group: core-security
This problem is really annoying for my xulrunner application (on microsoft Windows).
At this point, I can't use the optimized version.
Cc["@mozilla.org/tcp-socket;1"].createInstance(Ci.nsIDOMTCPSocket)
is no longer reliable for my work.
Reporter | ||
Comment 8•10 years ago
|
||
I agree we need to resolve this. As a temporary workaround, you might be able to issue multiple .send() calls in rapid succession when you first open the socket (possibly even zero-length?), forcing the array to perform all of its reallocations up-front. Of course, this may also just immediately crash the application instead. In a non-e10s situation this may turn out okay, but the latency of the messages in the e10s case may increase the probability of collision/explosion too much.
Comment 9•10 years ago
|
||
I'm also hitting this in a very reliable way while working on an addon that relies on ADB Helper to pull lots of files from a device.
Comment 10•10 years ago
|
||
Any hope to investigate this ? It's blocking me badly to create a new tool that makes intensive use of ADB pull and I do have 100% repro on the crash in my case (need two addons and specific devices though for now).
Flags: needinfo?(josh)
Flags: needinfo?(bugmail)
Updated•10 years ago
|
status-firefox38:
--- → ?
status-firefox38.0.5:
--- → ?
status-firefox39:
--- → ?
status-firefox40:
--- → affected
status-firefox-esr31:
--- → ?
Assignee | ||
Comment 11•10 years ago
|
||
My best idea so far is to create a new C++ class like this:
class nsThreadsafeMultiplexInputStream : public nsIMultiplexStream
, public nsISeekableStream
, public nsIIPCSerializableStream
{
Mutex mLock;
nsCOMPtr<nsIMultiplexStream> mStream;
public:
NS_DECL_THREADSAFE_ISUPPORTS
NS_DECL_NSIMULTIPLEXSTREAM
NS_DECL_NSISEEKABLESTREAM
NS_DECL_NSIIPCSERIALIZABLESTREAM
};
Every method would first lock mLock before calling through to mStream. This wouldn't guarantee perfect threadsafety (due to the single use of the `this` pointer in nsMultiplexStream::ReadSegments) but it should be enough to avoid the crash exhibited here.
Flags: needinfo?(josh)
Assignee | ||
Comment 12•10 years ago
|
||
Assignee | ||
Comment 13•10 years ago
|
||
Comment on attachment 8604631 [details] [diff] [review]
Add a threadsafe wrapper for persistent nsMultiplexStream queues
Does this help?
Attachment #8604631 -
Flags: feedback?(lissyx+mozillians)
Comment 14•10 years ago
|
||
Comment on attachment 8604631 [details] [diff] [review]
Add a threadsafe wrapper for persistent nsMultiplexStream queues
Review of attachment 8604631 [details] [diff] [review]:
-----------------------------------------------------------------
And trust me, if I could put more '+' on this 'f', I would. This has fixed the issue for me so far: while the code was crashing nearly instantaneously when doing my set of adb pull (850+ files to get from the device), I have now been able to complete more than 5 consecutive full run of pulling, without hitting a single crash. Thank you so much for the fast workaround!
Attachment #8604631 -
Flags: feedback?(lissyx+mozillians) → feedback+
Assignee | ||
Comment 16•10 years ago
|
||
Comment on attachment 8604631 [details] [diff] [review]
Add a threadsafe wrapper for persistent nsMultiplexStream queues
Nathan, what do you think about this solution? It looks like TCPSocket is using nsMultiplexInputStream in a different way than it way originally intended (a dedicated queue of streams vs. a fire-and-forget collection of multiple streams), hence the threading issues that we ran into. This is the most naive locking behaviour possible because the ReadSegments code is pretty hairy to pull apart.
Attachment #8604631 -
Flags: review?(nfroyd)
Comment 17•10 years ago
|
||
To reword comment 16, the theory is that nsMultiplexInputStream was meant to be used like:
1. make nsMultiplexInputStream
2. appendStream a bunch of things
3. drain the stream entirely
But this different usage is constantly calling appendStream and removeStream as new streams show up or are emptied? Is that correct?
Flags: needinfo?(josh)
Comment 18•10 years ago
|
||
Oh, also, do we have any tests for tcpsocket in the tree that might show issues under something like TSan?
Reporter | ||
Comment 19•10 years ago
|
||
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #18)
> Oh, also, do we have any tests for tcpsocket in the tree that might show
> issues under something like TSan?
Just calling send a bunch of times in a for loop should trigger the problem blatantly enough to crash stuff. Smart tooling that understands locks might already detect stuff with the mochitest:
https://dxr.mozilla.org/mozilla-central/source/dom/network/tests/test_tcpsocket_client_and_server_basics.html
Comment 21•10 years ago
|
||
Comment on attachment 8604631 [details] [diff] [review]
Add a threadsafe wrapper for persistent nsMultiplexStream queues
Review of attachment 8604631 [details] [diff] [review]:
-----------------------------------------------------------------
Yeah, this is probably as good a solution as any.
::: xpcom/io/nsMultiplexInputStream.cpp
@@ +58,5 @@
> static NS_METHOD ReadSegCb(nsIInputStream* aIn, void* aClosure,
> const char* aFromRawSegment, uint32_t aToOffset,
> uint32_t aCount, uint32_t* aWriteCount);
>
> + Mutex mLock;
Can you add a comment here that specifies the mutex protects all of the members below?
Attachment #8604631 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 22•10 years ago
|
||
Assignee | ||
Comment 23•10 years ago
|
||
Assignee | ||
Comment 24•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Updated•10 years ago
|
Comment 26•10 years ago
|
||
Seems like we want this on Aurora/Beta/b2g37 at a minimum? Not sure about b2g34 or not.
status-b2g-v2.0:
--- → wontfix
status-b2g-v2.0M:
--- → wontfix
status-b2g-v2.1:
--- → affected
status-b2g-v2.1S:
--- → affected
status-b2g-v2.2:
--- → affected
status-b2g-master:
--- → fixed
status-firefox-esr38:
--- → wontfix
Flags: needinfo?(josh)
Assignee | ||
Comment 27•10 years ago
|
||
Comment on attachment 8604631 [details] [diff] [review]
Add a threadsafe wrapper for persistent nsMultiplexStream queues
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Original TCPSocket implementation.
User impact if declined: Intermittent crashes in certain b2g apps such as the email client.
Testing completed: Manual testing; easily-reproduced crashes no longer occur.
Risk to taking this patch (and alternatives if risky): Should be none. This merely serializes some code that could run simultaneously on multiple threads before.
String or UUID changes made by this patch: None
Flags: needinfo?(josh)
Attachment #8604631 -
Flags: approval-mozilla-beta?
Attachment #8604631 -
Flags: approval-mozilla-b2g37?
Attachment #8604631 -
Flags: approval-mozilla-aurora?
Reporter | ||
Comment 28•10 years ago
|
||
(In reply to Josh Matthews [:jdm] from comment #27)
> User impact if declined: Intermittent crashes in certain b2g apps such as
> the email client.
For clarity, the crash will occur in the root/parent b2g process, since that's where the actual crashy network code is running. This is notable because it's much worse than just the app process crashing.
Comment 29•10 years ago
|
||
Comment on attachment 8604631 [details] [diff] [review]
Add a threadsafe wrapper for persistent nsMultiplexStream queues
This fix has been on mozilla-central for over a week and has not had a negative impact that we know of. Approving for uplift to Beta and Aurora.
Attachment #8604631 -
Flags: approval-mozilla-beta?
Attachment #8604631 -
Flags: approval-mozilla-beta+
Attachment #8604631 -
Flags: approval-mozilla-aurora?
Attachment #8604631 -
Flags: approval-mozilla-aurora+
Comment 30•10 years ago
|
||
Andrew, can you please verify that this is fixed? Thanks!
Flags: needinfo?(bugmail)
It would definitely be good to verify that this is fixed once it lands in 39 and 40.
Flags: qe-verify+
Reporter | ||
Comment 32•10 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #30)
> Andrew, can you please verify that this is fixed? Thanks!
This fixed the crashes we were encountering, yes. I am marking as verified for trunk.
Status: RESOLVED → VERIFIED
Flags: needinfo?(bugmail)
Comment 33•10 years ago
|
||
Comment on attachment 8604631 [details] [diff] [review]
Add a threadsafe wrapper for persistent nsMultiplexStream queues
Approving for b2g37. Please NI if causing any side effect.
Attachment #8604631 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Comment 34•10 years ago
|
||
Comment 35•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/65aab9ab5a05
Josh, what do you think about this for v2.1?
Flags: needinfo?(jocheng)
Comment 36•10 years ago
|
||
This is hitting non-unified bustage on b2g37:
https://treeherder.mozilla.org/logviewer.html#?job_id=133699&repo=mozilla-b2g37_v2_2
Josh, easy fix?
Flags: needinfo?(josh)
Comment 37•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #35)
> https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/65aab9ab5a05
>
> Josh, what do you think about this for v2.1?
Hmm, let's wait until Josh fix b2g37 issues. Thanks
Flags: needinfo?(jocheng)
Comment 38•10 years ago
|
||
Hi Josh,
Please also raise approval for b2g34 for better stability. Thanks!
Comment 39•10 years ago
|
||
Follow-up fix from jdm for the non-unified bustage:
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/57fa202d3421
Assignee | ||
Comment 40•10 years ago
|
||
Comment on attachment 8604631 [details] [diff] [review]
Add a threadsafe wrapper for persistent nsMultiplexStream queues
See earlier approval request.
Flags: needinfo?(josh)
Attachment #8604631 -
Flags: approval-mozilla-b2g34?
Comment 41•10 years ago
|
||
Landed the non-unified fix on aurora/inbound too. Will fold it in for other branches.
https://hg.mozilla.org/integration/mozilla-inbound/rev/9715aeb5d7a4
https://hg.mozilla.org/releases/mozilla-aurora/rev/036a8a5d4a2d
Comment 42•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/a8c1768e0d87
And because I'm a glutton for punishment, I only half-applied jdm's earlier bustage fix. Here's the second half:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7e68af7a1922
https://hg.mozilla.org/releases/mozilla-aurora/rev/1e7ff09e4903
https://hg.mozilla.org/releases/mozilla-beta/rev/4b3a7292ddad
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/cb8a5117ee1a
Comment 43•10 years ago
|
||
Updated•10 years ago
|
Attachment #8604631 -
Flags: approval-mozilla-b2g34? → approval-mozilla-b2g34+
Comment 44•10 years ago
|
||
Updated•9 years ago
|
Whiteboard: [adv-main39+]
Updated•9 years ago
|
Whiteboard: [adv-main39+] → [adv-main39+][b2g-adv-main2.2+]
Updated•9 years ago
|
Group: core-security → core-security-release
Updated•8 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•