Closed Bug 828630 Opened 12 years ago Closed 12 years ago

data race on nsSocketTransportService::mOffline

Categories

(Core :: Networking, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21
Tracking Status
firefox20 --- fixed
firefox21 --- fixed

People

(Reporter: jseward, Assigned: jaas)

References

Details

(Keywords: valgrind)

Attachments

(1 file)

nsSocketTransportService::mOffline and possibly mGoingOffline are accessed from multiple threads but there is no lock that consistently protects it. nsSocketTransportService::Run() does protect them both with mLock. But nsSocketTransportService::SetOffline(bool offline) accesses them without any holding mLock.
STR: start up on desktop and let it idle. Then quit. Helgrind's complainery follows. ---------------------------------------------------------------- Lock at 0x4EDFDA0 was first observed at 0x4A0A0C9: pthread_mutex_init (hg_intercepts.c:429) by 0x50589FD: PR_NewLock (ptsynch.c:147) by 0x6BF5F07: nsSocketTransportService::nsSocketTransportService() (Mutex.h:49) by 0x6BCBECC: nsSocketTransportServiceConstructor(nsISupports*, nsID const&, void**) (in /home/sewardj/MOZ/MC-08-01-2013-HG/ff-opt/toolkit/library/libxul.so) by 0x7C3E49A: mozilla::GenericFactory::CreateInstance(nsISupports*, nsID const&, void**) (GenericFactory.cpp:16) by 0x7C6E930: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (nsComponentManager.cpp:1034) by 0x7C6FD0D: nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) (nsComponentManager.cpp:1426) by 0x7C380DC: CallGetService(char const*, nsID const&, void**) (nsComponentManagerUtils.cpp:62) by 0x7C38367: nsGetServiceByContractIDWithError::operator()(nsID const&, void**) const (nsComponentManagerUtils.cpp:256) by 0x7C37105: nsCOMPtr_base::assign_from_gs_contractid_with_error(nsGetServiceByContractIDWithError const&, nsID const&) (nsCOMPtr.cpp:101) by 0x6BDCF13: nsIOService::InitializeSocketTransportService() (nsCOMPtr.h:682) by 0x6BDD03A: nsIOService::SetOffline(bool) (nsIOService.cpp:728) Possible data race during write of size 1 at 0x4EDFD03 by thread #1 Locks held: none at 0x6BF570E: nsSocketTransportService::SetOffline(bool) (nsSocketTransportService2.cpp:536) by 0x6BDD153: nsIOService::SetOffline(bool) (nsIOService.cpp:714) by 0x6BDF24D: nsIOService::Observe(nsISupports*, char const*, unsigned short const*) (nsIOService.cpp:917) by 0x7C4CD16: nsObserverList::NotifyObservers(nsISupports*, char const*, unsigned short const*) (nsObserverList.cpp:99) by 0x7C4D210: nsObserverService::NotifyObservers(nsISupports*, char const*, unsigned short const*) (nsObserverService.cpp:161) by 0x6BB04A8: nsXREDirProvider::DoShutdown() (nsXREDirProvider.cpp:847) by 0x6BA9B42: ScopedXPCOMStartup::~ScopedXPCOMStartup() (nsAppRunner.cpp:1119) by 0x6BAF33B: XREMain::XRE_main(int, char**, nsXREAppData const*) (nsAppRunner.cpp:3915) by 0x6BAF5A6: XRE_main (nsAppRunner.cpp:4093) by 0x40208E: do_main(int, char**, nsIFile*) (nsBrowserApp.cpp:195) by 0x402195: main (nsBrowserApp.cpp:388) This conflicts with a previous read of size 1 by thread #4 Locks held: 1, at address 0x4EDFDA0 at 0x6BF6F81: nsSocketTransportService::Run() (nsSocketTransportService2.cpp:660) by 0x7C74686: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:627) by 0x7C3B1D0: NS_ProcessNextEvent_P(nsIThread*, bool) (nsThreadUtils.cpp:237) by 0x7C751F9: nsThread::ThreadFunc(void*) (nsThread.cpp:265) by 0x505DFD6: _pt_root (ptthread.c:158) by 0x4A09F6C: mythread_wrapper (hg_intercepts.c:219) by 0x30E2C07D13: start_thread (in /usr/lib64/libpthread-2.15.so) by 0x30E28F167C: clone (in /usr/lib64/libc-2.15.so) Address 0x4EDFD03 is 83 bytes inside a block of size 168 alloc'd at 0x4A077BC: malloc (vg_replace_malloc.c:270) by 0x62A7078: moz_xmalloc (mozalloc.cpp:54) by 0x6BCBEC1: nsSocketTransportServiceConstructor(nsISupports*, nsID const&, void**) (mozalloc.h:200) by 0x7C3E49A: mozilla::GenericFactory::CreateInstance(nsISupports*, nsID const&, void**) (GenericFactory.cpp:16) by 0x7C6E930: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (nsComponentManager.cpp:1034) by 0x7C6FD0D: nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) (nsComponentManager.cpp:1426) by 0x7C380DC: CallGetService(char const*, nsID const&, void**) (nsComponentManagerUtils.cpp:62) by 0x7C38367: nsGetServiceByContractIDWithError::operator()(nsID const&, void**) const (nsComponentManagerUtils.cpp:256) by 0x7C37105: nsCOMPtr_base::assign_from_gs_contractid_with_error(nsGetServiceByContractIDWithError const&, nsID const&) (nsCOMPtr.cpp:101) by 0x6BDCF13: nsIOService::InitializeSocketTransportService() (nsCOMPtr.h:682) by 0x6BDD03A: nsIOService::SetOffline(bool) (nsIOService.cpp:728) by 0x6BDE479: nsIOService::InitializeNetworkLinkService() (nsIOService.cpp:278) ----------------------------------------------------------------
Keywords: valgrind
Taking this in order to find an owner.
Assignee: nobody → joshmoz
this is probably from adam's patch..
(In reply to Patrick McManus [:mcmanus] from comment #3) > this is probably from adam's patch.. I don't know what this is a reference to.
(In reply to Josh Aas (Mozilla Corporation) from comment #4) > (In reply to Patrick McManus [:mcmanus] from comment #3) > > this is probably from adam's patch.. > > I don't know what this is a reference to. bug 87717
Blocks: 87717
Attached patch fix v1.0 (deleted) — Splinter Review
Maybe the fix is this simple? I don't know this code, just trying to help move things along.
Attachment #700706 - Flags: review?(unusualtears)
Comment on attachment 700706 [details] [diff] [review] fix v1.0 Thanks Josh. That should do it. I am building a patched, valgrind-enabled build to verify it with helgrind, but I don't really have any doubt. The traces in comment 1 are clear: the main thread toggles |mGoingOffline| while another thread may be locking it in |Run|.
Attachment #700706 - Flags: review?(unusualtears)
Attachment #700706 - Flags: review?(mcmanus)
Attachment #700706 - Flags: review+
Getting helgrind to behave nicely is more work than I anticipated (building suppression file and all). I don't have the patience at the moment for it. But I stand by my comment that Josh's patch to grab the lock in |SetOffline| will eliminate the race for |mGoingOffline|.
Getting H to behave nicely requires marking up the source in a few critical places; see bug 551155. Anyway, with the patch in place I am no longer seeing the race. So, +1 from me.
Attachment #700706 - Flags: review?(mcmanus) → review+
Comment on attachment 700706 [details] [diff] [review] fix v1.0 [Approval Request Comment] Bug caused by (feature/regressing bug #): 87717 User impact if declined: This was found by a tool, not a user. I guess Firefox could get confused about its offline state. Generally races like this are pretty problematic, hard to diagnose issues because they can be hard to reproduce. Testing completed (on m-c, etc.): on m-c now Risk to taking this patch (and alternatives if risky): doesn't seem risky String or UUID changes made by this patch: none
Attachment #700706 - Flags: approval-mozilla-aurora?
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Comment on attachment 700706 [details] [diff] [review] fix v1.0 We'll take a low risk fix on Aurora to prevent the need to diagnose amorphous networking bugs. Approving.
Attachment #700706 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: