Closed
Bug 1222782
Opened 9 years ago
Closed 9 years ago
TSan: data race netwerk/cache2/CacheIOThread.cpp:148 Target (race on mXPCOMThread)
Categories
(Core :: Networking: Cache, defect)
Tracking
()
RESOLVED
FIXED
mozilla45
Tracking | Status | |
---|---|---|
firefox45 | --- | fixed |
People
(Reporter: jseward, Assigned: michal)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 2 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
michal
:
review+
|
Details | Diff | Splinter Review |
STR: apply test harness patch in bug 1222043, then run (with a TSan'd build of course)
./mach xpcshell-test --sequential --log-mach - browser/components/dirprovider/tests/unit/test_bookmark_pref.js
The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).
* Specific information about this bug
This is a race on CacheIOThread::mXPCOMThread. CacheIOThread::ThreadFunc() writes to it ..
MonitorAutoLock lock(mMonitor);
...
mXPCOMThread.swap(xpcomThread);
whilst holding the mMonitor lock for this object. But CacheIOThread::Target() reads it
without holding that lock .. it holds some other (I assume unrelated) lock instead:
target = mXPCOMThread;
if (!target && mThread)
{
MonitorAutoLock lock(mMonitor);
if (!mXPCOMThread)
lock.Wait();
target = mXPCOMThread;
}
If we move the acquisition of mMonitor out of the conditional and put it before the
first line of this fragment, the read of mXPCOMThread is then suitably protected, and TSan stops complaining.
* General information about TSan, data races, etc.
Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].
If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.
[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Reporter | ||
Comment 1•9 years ago
|
||
Proposed fix. Moves the acquisition of mMonitor a little earlier in Target().
Reporter | ||
Comment 2•9 years ago
|
||
This race is reported 1527 times out of a total of 18693 races
reported during a complete xpcshell test run. That makes it the
3rd most common race reported.
Comment 3•9 years ago
|
||
I think this is a duplicate of bug 1190969? But this bug actually has a plausible fix, so maybe we should dup the other bug to this one.
Flags: needinfo?(jseward)
Reporter | ||
Comment 4•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #3)
> I think this is a duplicate of bug 1190969?
Plausible, but hard to say for sure, because bug 1190969 doesn't have a
stack trace to look at.
Maybe we should just fix this and then see if bug 1190969 is still alive
after that.
Flags: needinfo?(jseward)
Reporter | ||
Updated•9 years ago
|
Attachment #8684638 -
Flags: review?(michal.novotny)
Assignee | ||
Comment 5•9 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #1)
> Proposed fix. Moves the acquisition of mMonitor a little earlier in
> Target().
I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread atomic should fix the data race too while being a bit more efficient.
Attachment #8686347 -
Flags: review?(jseward)
Reporter | ||
Comment 6•9 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #5)
> I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread
> atomic should fix the data race too while being a bit more efficient.
For a complete xpcshell test run, the proposed patch adds 36000 monitor entries
and exits, which presumably means 72000 global bus operations. Spread over many
minutes (~ 30) of CPU time I would think it very unlikely that it would be possible
to measure any performance loss.
> I guess making mXPCOMThread atomic should fix the data race
I am not convinced of that. IIUC, atomics only de-race certain operations such as
add-a-number-to-a-memory location. In this case the read side of the race looks
like this
target = mXPCOMThread;
if (!target && mThread)
{
MonitorAutoLock lock(mMonitor);
if (!mXPCOMThread)
lock.Wait();
and making mXPCOMThread atomic won't do anything to stop the two uses of it here
getting inconsistent values due to racing against a writer.
Assignee | ||
Comment 7•9 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #6)
> For a complete xpcshell test run, the proposed patch adds 36000 monitor
> entries
> and exits, which presumably means 72000 global bus operations. Spread over
> many
> minutes (~ 30) of CPU time I would think it very unlikely that it would be
> possible
> to measure any performance loss.
Maybe, but please note that cache isn't used too much during xpcshell tests when compared to normal firefox run.
> > I guess making mXPCOMThread atomic should fix the data race
>
> I am not convinced of that. IIUC, atomics only de-race certain operations
> such as
> add-a-number-to-a-memory location. In this case the read side of the race
> looks
> like this
>
> target = mXPCOMThread;
> if (!target && mThread)
> {
> MonitorAutoLock lock(mMonitor);
> if (!mXPCOMThread)
> lock.Wait();
>
> and making mXPCOMThread atomic won't do anything to stop the two uses of it
> here
> getting inconsistent values due to racing against a writer.
It might be that I don't understand enough how atomic works. I thought that making mXPCOMThread atomic ensures that CacheIOThread::Target() will see the assigned pointer immediately once it is set on the different thread in CacheIOThread::ThreadFunc().
Reporter | ||
Comment 8•9 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #7)
> Maybe, but please note that cache isn't used too much during xpcshell tests
> when compared to normal firefox run.
Some more numbers, with the proposed patch applied. The test is: start the
browser with an empty cache, load news.bbc.co.uk, wait till it loads, and then
re-load 3 times.
This requires 47.54 million global bus events, of which about half are from
malloc and free. The cost incurred by mozilla::net::CacheIOThread::Target()
and the functions it calls, is 21695 global bus events. That is, about 0.046%
of the total.
Reporter | ||
Comment 9•9 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #5)
> Created attachment 8686347 [details] [diff] [review]
> patch
> I'm pretty sure Honza wouldn't like this fix. I guess making mXPCOMThread
> atomic should fix the data race too while being a bit more efficient.
Ah, sorry. I didn't see this new patch at first. It does remove the race.
The profile data is difficult to interpret but I suspect it is more efficient
than the patch I proposed in comment #2.
Reporter | ||
Comment 10•9 years ago
|
||
Comment on attachment 8686347 [details] [diff] [review]
patch
The patch does remove the race, so I am r+'ing it on that basis. I can't
say anything about the wider implications of the patch, except one thing:
+ ((nsIThread *)mXPCOMThread)->Release();
+ rv = ((nsIThread *)mXPCOMThread)->ProcessNextEvent(false,
These casts are ugly and IMO dangerous; what happens if one day mXPCOMThread
stops being an nsIThread* ? Please remove them if possible. Are they actually
necessary, given that mXPCOMThread is declared as follows?
+ Atomic<nsIThread *> mXPCOMThread;
Attachment #8686347 -
Flags: review?(jseward) → review+
Assignee | ||
Comment 11•9 years ago
|
||
I get following error when I don't use the cast:
/opt/moz/hg-inbound/netwerk/cache2/CacheIOThread.cpp: In member function ‘void mozilla::net::CacheIOThread::ThreadFunc()’:
/opt/moz/hg-inbound/netwerk/cache2/CacheIOThread.cpp:211:28: error: base operand of ‘->’ has non-pointer type ‘mozilla::Atomic<nsIThread*>’
rv = mXPCOMThread->ProcessNextEvent(false, &processedEvent);
Instead of casting I can do:
nsIThread *thread = mXPCOMThread;
rv = thread->ProcessNextEvent(false, &processedEvent);
It is safer because the assignment would fail to compile if declaration of mXPCOMThread changes. Is this OK?
Flags: needinfo?(jseward)
Reporter | ||
Comment 12•9 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #11)
> Is this OK?
Yes that's fine. Thanks for fixing it.
Flags: needinfo?(jseward)
Assignee | ||
Comment 13•9 years ago
|
||
Assignee: nobody → michal.novotny
Attachment #8684638 -
Attachment is obsolete: true
Attachment #8686347 -
Attachment is obsolete: true
Attachment #8684638 -
Flags: review?(michal.novotny)
Attachment #8687296 -
Flags: review+
Comment 14•9 years ago
|
||
Comment 15•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Comment 16•9 years ago
|
||
Thanks for the patch here! I was thinking of something similar for CacheFile::mHandle race, just have an Atomic<T*> and nsRefPtr<T> side by side. The atomic to provide access and refptr to automate the release.
You need to log in
before you can comment on or make changes to this bug.
Description
•