Closed Bug 1222782 Opened 9 years ago Closed 9 years ago

TSan: data race netwerk/cache2/CacheIOThread.cpp:148 Target (race on mXPCOMThread)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla45

Tracking Flags:

Tracking

Status

firefox45

---

fixed

People

(Reporter: jseward, Assigned: michal)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

TSan stack trace 9 years ago Julian Seward [:jseward] (deleted), text/plain		Details
bug1222782-1.diff 9 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
patch 9 years ago Michal Novotny [:michal] (deleted), patch	jseward : review+	Details \| Diff \| Splinter Review
patch v2 9 years ago Michal Novotny [:michal] (deleted), patch	michal : review+	Details \| Diff \| Splinter Review

Julian Seward [:jseward]

Reporter

Description

•

9 years ago

Attached file TSan stack trace (deleted) — Details

STR: apply test harness patch in bug 1222043, then run (with a TSan'd build of course) ./mach xpcshell-test --sequential --log-mach - browser/components/dirprovider/tests/unit/test_bookmark_pref.js The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer). * Specific information about this bug This is a race on CacheIOThread::mXPCOMThread. CacheIOThread::ThreadFunc() writes to it .. MonitorAutoLock lock(mMonitor); ... mXPCOMThread.swap(xpcomThread); whilst holding the mMonitor lock for this object. But CacheIOThread::Target() reads it without holding that lock .. it holds some other (I assume unrelated) lock instead: target = mXPCOMThread; if (!target && mThread) { MonitorAutoLock lock(mMonitor); if (!mXPCOMThread) lock.Wait(); target = mXPCOMThread; } If we move the acquisition of mMonitor out of the conditional and put it before the first line of this fragment, the read of mXPCOMThread is then suitably protected, and TSan stops complaining. * General information about TSan, data races, etc. Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2]. If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist. [1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong [2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf

Julian Seward [:jseward]

Reporter

Comment 1

•

9 years ago

Attached patch bug1222782-1.diff (obsolete) (deleted) — Details — Splinter Review

Proposed fix. Moves the acquisition of mMonitor a little earlier in Target().

Julian Seward [:jseward]

Reporter

Comment 2

•

9 years ago

This race is reported 1527 times out of a total of 18693 races reported during a complete xpcshell test run. That makes it the 3rd most common race reported.

Nathan Froyd [:froydnj]

Comment 3

•

9 years ago

I think this is a duplicate of bug 1190969? But this bug actually has a plausible fix, so maybe we should dup the other bug to this one.

Flags: needinfo?(jseward)

Julian Seward [:jseward]

Reporter

Comment 4

•

9 years ago

(In reply to Nathan Froyd [:froydnj] from comment #3) > I think this is a duplicate of bug 1190969? Plausible, but hard to say for sure, because bug 1190969 doesn't have a stack trace to look at. Maybe we should just fix this and then see if bug 1190969 is still alive after that.

Flags: needinfo?(jseward)

Julian Seward [:jseward]

Reporter

Updated

•

9 years ago

Attachment #8684638 - Flags: review?(michal.novotny)

Michal Novotny [:michal]

Assignee

Comment 5

•

9 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

(In reply to Julian Seward [:jseward] from comment #1) > Proposed fix. Moves the acquisition of mMonitor a little earlier in > Target(). I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread atomic should fix the data race too while being a bit more efficient.

Attachment #8686347 - Flags: review?(jseward)

Julian Seward [:jseward]

Reporter

Comment 6

•

9 years ago

(In reply to Michal Novotny (:michal) from comment #5) > I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread > atomic should fix the data race too while being a bit more efficient. For a complete xpcshell test run, the proposed patch adds 36000 monitor entries and exits, which presumably means 72000 global bus operations. Spread over many minutes (~ 30) of CPU time I would think it very unlikely that it would be possible to measure any performance loss. > I guess making mXPCOMThread atomic should fix the data race I am not convinced of that. IIUC, atomics only de-race certain operations such as add-a-number-to-a-memory location. In this case the read side of the race looks like this target = mXPCOMThread; if (!target && mThread) { MonitorAutoLock lock(mMonitor); if (!mXPCOMThread) lock.Wait(); and making mXPCOMThread atomic won't do anything to stop the two uses of it here getting inconsistent values due to racing against a writer.

Michal Novotny [:michal]

Assignee

Comment 7

•

9 years ago

(In reply to Julian Seward [:jseward] from comment #6) > For a complete xpcshell test run, the proposed patch adds 36000 monitor > entries > and exits, which presumably means 72000 global bus operations. Spread over > many > minutes (~ 30) of CPU time I would think it very unlikely that it would be > possible > to measure any performance loss. Maybe, but please note that cache isn't used too much during xpcshell tests when compared to normal firefox run. > > I guess making mXPCOMThread atomic should fix the data race > > I am not convinced of that. IIUC, atomics only de-race certain operations > such as > add-a-number-to-a-memory location. In this case the read side of the race > looks > like this > > target = mXPCOMThread; > if (!target && mThread) > { > MonitorAutoLock lock(mMonitor); > if (!mXPCOMThread) > lock.Wait(); > > and making mXPCOMThread atomic won't do anything to stop the two uses of it > here > getting inconsistent values due to racing against a writer. It might be that I don't understand enough how atomic works. I thought that making mXPCOMThread atomic ensures that CacheIOThread::Target() will see the assigned pointer immediately once it is set on the different thread in CacheIOThread::ThreadFunc().

Julian Seward [:jseward]

Reporter

Comment 8

•

9 years ago

(In reply to Michal Novotny (:michal) from comment #7) > Maybe, but please note that cache isn't used too much during xpcshell tests > when compared to normal firefox run. Some more numbers, with the proposed patch applied. The test is: start the browser with an empty cache, load news.bbc.co.uk, wait till it loads, and then re-load 3 times. This requires 47.54 million global bus events, of which about half are from malloc and free. The cost incurred by mozilla::net::CacheIOThread::Target() and the functions it calls, is 21695 global bus events. That is, about 0.046% of the total.

Julian Seward [:jseward]

Reporter

Comment 9

•

9 years ago

(In reply to Michal Novotny (:michal) from comment #5) > Created attachment 8686347 [details] [diff] [review] > patch > I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread > atomic should fix the data race too while being a bit more efficient. Ah, sorry. I didn't see this new patch at first. It does remove the race. The profile data is difficult to interpret but I suspect it is more efficient than the patch I proposed in comment #2.

Julian Seward [:jseward]

Reporter

Comment 10

•

9 years ago

Comment on attachment 8686347 [details] [diff] [review] patch The patch does remove the race, so I am r+'ing it on that basis. I can't say anything about the wider implications of the patch, except one thing: + ((nsIThread *)mXPCOMThread)->Release(); + rv = ((nsIThread *)mXPCOMThread)->ProcessNextEvent(false, These casts are ugly and IMO dangerous; what happens if one day mXPCOMThread stops being an nsIThread* ? Please remove them if possible. Are they actually necessary, given that mXPCOMThread is declared as follows? + Atomic<nsIThread *> mXPCOMThread;

Attachment #8686347 - Flags: review?(jseward) → review+

Michal Novotny [:michal]

Assignee

Comment 11

•

9 years ago

I get following error when I don't use the cast: /opt/moz/hg-inbound/netwerk/cache2/CacheIOThread.cpp: In member function ‘void mozilla::net::CacheIOThread::ThreadFunc()’: /opt/moz/hg-inbound/netwerk/cache2/CacheIOThread.cpp:211:28: error: base operand of ‘->’ has non-pointer type ‘mozilla::Atomic<nsIThread*>’ rv = mXPCOMThread->ProcessNextEvent(false, &processedEvent); Instead of casting I can do: nsIThread *thread = mXPCOMThread; rv = thread->ProcessNextEvent(false, &processedEvent); It is safer because the assignment would fail to compile if declaration of mXPCOMThread changes. Is this OK?

Flags: needinfo?(jseward)

Julian Seward [:jseward]

Reporter

Comment 12

•

9 years ago

(In reply to Michal Novotny (:michal) from comment #11) > Is this OK? Yes that's fine. Thanks for fixing it.

Flags: needinfo?(jseward)

Michal Novotny [:michal]

Assignee

Comment 13

•

9 years ago

Attached patch patch v2 (deleted) — Details — Splinter Review

Assignee: nobody → michal.novotny

Attachment #8684638 - Attachment is obsolete: true

Attachment #8686347 - Attachment is obsolete: true

Attachment #8684638 - Flags: review?(michal.novotny)

Attachment #8687296 - Flags: review+

Pulsebot

Comment 14

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/058f15f9cf5e

Phil Ringnalda (:philor)

Comment 15

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/058f15f9cf5e

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox45: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla45

Honza Bambas (:mayhemer)

Comment 16

•

9 years ago

Thanks for the patch here! I was thinking of something similar for CacheFile::mHandle race, just have an Atomic<T*> and nsRefPtr<T> side by side. The atomic to provide access and refptr to automate the release.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

TSan: data race netwerk/cache2/CacheIOThread.cpp:148 Target (race on mXPCOMThread)

Categories

(Core :: Networking: Cache, defect)

Tracking

()

People

(Reporter: jseward, Assigned: michal)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Attachment

General

Description

File Name

Content Type