Closed Bug 1222782 Opened 9 years ago Closed 9 years ago

TSan: data race netwerk/cache2/CacheIOThread.cpp:148 Target (race on mXPCOMThread)

Categories

(Core :: Networking: Cache, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed

People

(Reporter: jseward, Assigned: michal)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

Attached file TSan stack trace (deleted) —
STR: apply test harness patch in bug 1222043, then run (with a TSan'd build of course) ./mach xpcshell-test --sequential --log-mach - browser/components/dirprovider/tests/unit/test_bookmark_pref.js The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer). * Specific information about this bug This is a race on CacheIOThread::mXPCOMThread. CacheIOThread::ThreadFunc() writes to it .. MonitorAutoLock lock(mMonitor); ... mXPCOMThread.swap(xpcomThread); whilst holding the mMonitor lock for this object. But CacheIOThread::Target() reads it without holding that lock .. it holds some other (I assume unrelated) lock instead: target = mXPCOMThread; if (!target && mThread) { MonitorAutoLock lock(mMonitor); if (!mXPCOMThread) lock.Wait(); target = mXPCOMThread; } If we move the acquisition of mMonitor out of the conditional and put it before the first line of this fragment, the read of mXPCOMThread is then suitably protected, and TSan stops complaining. * General information about TSan, data races, etc. Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2]. If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist. [1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong [2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Attached patch bug1222782-1.diff (obsolete) (deleted) — Splinter Review
Proposed fix. Moves the acquisition of mMonitor a little earlier in Target().
This race is reported 1527 times out of a total of 18693 races reported during a complete xpcshell test run. That makes it the 3rd most common race reported.
I think this is a duplicate of bug 1190969? But this bug actually has a plausible fix, so maybe we should dup the other bug to this one.
Flags: needinfo?(jseward)
(In reply to Nathan Froyd [:froydnj] from comment #3) > I think this is a duplicate of bug 1190969? Plausible, but hard to say for sure, because bug 1190969 doesn't have a stack trace to look at. Maybe we should just fix this and then see if bug 1190969 is still alive after that.
Flags: needinfo?(jseward)
Attachment #8684638 - Flags: review?(michal.novotny)
Attached patch patch (obsolete) (deleted) — Splinter Review
(In reply to Julian Seward [:jseward] from comment #1) > Proposed fix. Moves the acquisition of mMonitor a little earlier in > Target(). I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread atomic should fix the data race too while being a bit more efficient.
Attachment #8686347 - Flags: review?(jseward)
(In reply to Michal Novotny (:michal) from comment #5) > I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread > atomic should fix the data race too while being a bit more efficient. For a complete xpcshell test run, the proposed patch adds 36000 monitor entries and exits, which presumably means 72000 global bus operations. Spread over many minutes (~ 30) of CPU time I would think it very unlikely that it would be possible to measure any performance loss. > I guess making mXPCOMThread atomic should fix the data race I am not convinced of that. IIUC, atomics only de-race certain operations such as add-a-number-to-a-memory location. In this case the read side of the race looks like this target = mXPCOMThread; if (!target && mThread) { MonitorAutoLock lock(mMonitor); if (!mXPCOMThread) lock.Wait(); and making mXPCOMThread atomic won't do anything to stop the two uses of it here getting inconsistent values due to racing against a writer.
(In reply to Julian Seward [:jseward] from comment #6) > For a complete xpcshell test run, the proposed patch adds 36000 monitor > entries > and exits, which presumably means 72000 global bus operations. Spread over > many > minutes (~ 30) of CPU time I would think it very unlikely that it would be > possible > to measure any performance loss. Maybe, but please note that cache isn't used too much during xpcshell tests when compared to normal firefox run. > > I guess making mXPCOMThread atomic should fix the data race > > I am not convinced of that. IIUC, atomics only de-race certain operations > such as > add-a-number-to-a-memory location. In this case the read side of the race > looks > like this > > target = mXPCOMThread; > if (!target && mThread) > { > MonitorAutoLock lock(mMonitor); > if (!mXPCOMThread) > lock.Wait(); > > and making mXPCOMThread atomic won't do anything to stop the two uses of it > here > getting inconsistent values due to racing against a writer. It might be that I don't understand enough how atomic works. I thought that making mXPCOMThread atomic ensures that CacheIOThread::Target() will see the assigned pointer immediately once it is set on the different thread in CacheIOThread::ThreadFunc().
(In reply to Michal Novotny (:michal) from comment #7) > Maybe, but please note that cache isn't used too much during xpcshell tests > when compared to normal firefox run. Some more numbers, with the proposed patch applied. The test is: start the browser with an empty cache, load news.bbc.co.uk, wait till it loads, and then re-load 3 times. This requires 47.54 million global bus events, of which about half are from malloc and free. The cost incurred by mozilla::net::CacheIOThread::Target() and the functions it calls, is 21695 global bus events. That is, about 0.046% of the total.
(In reply to Michal Novotny (:michal) from comment #5) > Created attachment 8686347 [details] [diff] [review] > patch > I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread > atomic should fix the data race too while being a bit more efficient. Ah, sorry. I didn't see this new patch at first. It does remove the race. The profile data is difficult to interpret but I suspect it is more efficient than the patch I proposed in comment #2.
Comment on attachment 8686347 [details] [diff] [review] patch The patch does remove the race, so I am r+'ing it on that basis. I can't say anything about the wider implications of the patch, except one thing: + ((nsIThread *)mXPCOMThread)->Release(); + rv = ((nsIThread *)mXPCOMThread)->ProcessNextEvent(false, These casts are ugly and IMO dangerous; what happens if one day mXPCOMThread stops being an nsIThread* ? Please remove them if possible. Are they actually necessary, given that mXPCOMThread is declared as follows? + Atomic<nsIThread *> mXPCOMThread;
Attachment #8686347 - Flags: review?(jseward) → review+
I get following error when I don't use the cast: /opt/moz/hg-inbound/netwerk/cache2/CacheIOThread.cpp: In member function ‘void mozilla::net::CacheIOThread::ThreadFunc()’: /opt/moz/hg-inbound/netwerk/cache2/CacheIOThread.cpp:211:28: error: base operand of ‘->’ has non-pointer type ‘mozilla::Atomic<nsIThread*>’ rv = mXPCOMThread->ProcessNextEvent(false, &processedEvent); Instead of casting I can do: nsIThread *thread = mXPCOMThread; rv = thread->ProcessNextEvent(false, &processedEvent); It is safer because the assignment would fail to compile if declaration of mXPCOMThread changes. Is this OK?
Flags: needinfo?(jseward)
(In reply to Michal Novotny (:michal) from comment #11) > Is this OK? Yes that's fine. Thanks for fixing it.
Flags: needinfo?(jseward)
Attached patch patch v2 (deleted) — Splinter Review
Assignee: nobody → michal.novotny
Attachment #8684638 - Attachment is obsolete: true
Attachment #8686347 - Attachment is obsolete: true
Attachment #8684638 - Flags: review?(michal.novotny)
Attachment #8687296 - Flags: review+
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Thanks for the patch here! I was thinking of something similar for CacheFile::mHandle race, just have an Atomic<T*> and nsRefPtr<T> side by side. The atomic to provide access and refptr to automate the release.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: