Closed Bug 557586 Opened 15 years ago Closed 15 years ago

XPCPerThreadData::~XPCPerThreadData race leading to possible crash

Categories

(Core :: XPConnect, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a5

People

(Reporter: jseward, Assigned: jseward)

Details

Attachments

(1 file)

There's a race in XPCPerThreadData::~XPCPerThreadData that can cause XPCPerThreadData::gLock to be PR_Destroy'd and nulled out whilst a second thread is holding it. In the worst case this could cause an essentially unreproducible segfault at shutdown. All assuming the following analysis is correct, of course. What follows is a bit of background and an interleaving of execution which I believe can lead to the failure. All line numbers pertain to js/src/xpconnect/src/xpcthreadcontext.cpp (m-c as of today). BACKGROUND This code appears to maintain a linked list of XPCPerThreadDatas. Each such element has a field mNextThread which points to the next element. The list is rooted by a global variable 325 XPCPerThreadData* XPCPerThreadData::gThreads = nsnull; which is protected by this lock 324 PRLock* XPCPerThreadData::gLock = nsnull; XPCPerThreadData::~XPCPerThreadData() is intended to remove itself from the linked list |gThreads|. Then, if the list is empty, |gLock| is destroyed and nulled out. However, these actions are not carried out atomically, and it can be that a second thread T2 entering ~XPCPerThreadData() just after a previous thread T1 entered it, can cause T1 to destroy and null out |gLock| while T2 still holds it. FAILURE INTERLEAVING Imagine there are two threads, T1 and T2, there are two entries in |gThreads|, and |gLock| is non-NULL. Here's the relevant code: 394 // Unlink 'this' from the list of threads. 395 if(gLock) 396 { 397 nsAutoLock lock(gLock); 398 if(gThreads == this) 399 gThreads = mNextThread; 400 else 401 { 402 XPCPerThreadData* cur = gThreads; 403 while(cur) 404 { 405 if(cur->mNextThread == this) 406 { 407 cur->mNextThread = mNextThread; 408 break; 409 } 410 cur = cur->mNextThread; 411 } 412 } 413 } 414 415 if(gLock && !gThreads) 416 { 417 PR_DestroyLock(gLock); 418 gLock = nsnull; 419 } T1: 395 (taken) 397 (T1 acquires lock) 398 .. 412 (T1 removes itself from list) At this point |gThreads| is non-NULL (still one entry in it), and we still hold the lock. 413 (T1 drops lock at end of scope) 414 stall T2: 395 (taken) 397 (T2 acquires lock) 398 .. 412 (T2 removes itself from list) |gThreads| is now null 412 stall, still holding the lock T1: 415 (taken - since |gLock| is non-NULL and gThreads is NULL) 416 .. 419 destroy |gLock| and null it T2: 413 (drop lock at end of scope). Unfortunately it's now NULL. Possible segfault. This race was pointed out by Helgrind. It reports that the write of |gThreads| at 399 conflicts with the read of it at 415: Possible data race during write of size 8 at 0x6b23220 by thread #16 at 0x558138C: XPCPerThreadData::~XPCPerThreadData() (xpcthreadcontext.cpp:399) by 0x55813AD: xpc_ThreadDataDtorCB(void*) (xpcthreadcontext.cpp:427) by 0x76DFA4E: _PR_DestroyThreadPrivate (prtpd.c:265) by 0x76F5033: _pt_root (ptthread.c:270) by 0x4C2A5D8: mythread_wrapper (hg_intercepts.c:213) by 0x4E34A03: start_thread (pthread_create.c:300) by 0xAB2680C: clone (clone.S:112) This conflicts with a previous read of size 8 by thread #14 at 0x5581357: XPCPerThreadData::~XPCPerThreadData() (xpcthreadcontext.cpp:415) by 0x55813AD: xpc_ThreadDataDtorCB(void*) (xpcthreadcontext.cpp:427) by 0x76DFA4E: _PR_DestroyThreadPrivate (prtpd.c:265) by 0x76F5033: _pt_root (ptthread.c:270) by 0x4C2A5D8: mythread_wrapper (hg_intercepts.c:213) by 0x4E34A03: start_thread (pthread_create.c:300) by 0xAB2680C: clone (clone.S:112)
This isn't a problem in practice, because we know the main thread outlives all other threads: we explicitly join all other threads (at least the ones which might run xpconnect) during shutdown well before we get around to destroying the XPCPerThreadData for the main thread.
Comment on attachment 437374 [details] [diff] [review] patch to make update and delete-decision atomic i can live with this.
Attachment #437374 - Flags: review+
(In reply to comment #1) > This isn't a problem in practice, [...] It would be better if the code was "obviously correct" without having to invoke non-local invariants, given that in this case it's easy to make it so.
Keywords: checkin-needed
Assignee: nobody → jseward
Status: NEW → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: