1277709 - threadsafe refcounting uses stronger memory synchronization than is needed

Assignee

Description

•

8 years ago

The threadsafe reference counting provided in XPCOM uses stronger memory synchronization than is needed. In particular, it uses read-modify-write (RMW) operations with sequentially consistent memory synchronization for both incrementing and decrementing the reference count. A comment indicates that we could switch to release/acquire semantics, but we should actually be able to go even further than that. As explained in http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html#boost_atomic.usage_examples.example_reference_counters (and I thought I had another link, but I can't find it), we should be able to use relaxed semantics when incrementing the reference count, release semantics when decrementing it to nonzero, and then acquire semantics when decrementing to zero. This shouldn't make a difference on Intel processors, but it should make a difference on ARM. We don't have a whole lot of performance benchmarks that run on ARM, but I pushed a patch to try to get some numbers, and did some retriggers of the autophone-s1s2 test, which does performance measurements of load time (throbber start to throbber end, I think) of cached copies of twitter, the new york times, and a blank page. My current results are at: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=864cdd00360c&newProject=try&newRevision=f8126d991e02&framework=3&showOnlyImportant=0 which seem to be showing a 2.5%-4% performance improvement on those benchmarks on armv8 CPUs (although no significant change so far on ARMv7). The patch I pushed, which seems to compile on try on platforms other than Mac, is: https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/6917fd1a20d0/threadsafe-reference-counting-min-synchronization It would probably need a bit more work in order to be landable. The other question is whether we consider this worth doing for that amount of performance gain; it has some risk that other code may be depending on synchronization that this patch will remove.

David Baron :dbaron:

Assignee

Updated

•

8 years ago

Blocks: 1277711

Nathan Froyd [:froydnj]

Comment 1

•

8 years ago

(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) (vacation June 4-12) from comment #0) > As explained in > http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/ > usage_examples.html#boost_atomic.usage_examples.example_reference_counters > (and I thought I had another link, but I can't find it), we should be able > to use relaxed semantics when incrementing the reference count, release > semantics when decrementing it to nonzero, and then acquire semantics when > decrementing to zero. One concern about this is that it relies on RMW operations always being used to provide the necessary synchronization, and we don't always use RMW operations, e.g.: http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsISupportsImpl.h#346 I don't know how widely used those member functions are, though; maybe it's just a matter of deleting them.

David Baron :dbaron:

Assignee

Comment 2

•

8 years ago

In the patch, I made those be acquire and release operations, though I agree I'd want to investigate the uses further before landing.

Bobby Holley (:bholley)

Comment 3

•

8 years ago

NI njn to get it on the LLTT radar - 2.5%-4% pageload win is pretty big. No need for a response - I would just CC, but I'm guessing that might get lost in the PTO backlog.

Flags: needinfo?(n.nethercote)

Nathan Froyd [:froydnj]

Comment 4

•

8 years ago

(In reply to Bobby Holley (busy) from comment #3) > NI njn to get it on the LLTT radar - 2.5%-4% pageload win is pretty big. How many devices actually have armv8 CPUs, though? That benchmark only showed a win for those chips. If the same patch showed a win on x86 windows, that's a much different story.

Bobby Holley (:bholley)

Comment 5

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #4) > (In reply to Bobby Holley (busy) from comment #3) > > NI njn to get it on the LLTT radar - 2.5%-4% pageload win is pretty big. > > How many devices actually have armv8 CPUs, though? That benchmark only > showed a win for those chips. > > If the same patch showed a win on x86 windows, that's a much different story. Good point - I don't know.

Bobby Holley (:bholley)

Comment 6

•

8 years ago

Though even if they're low market share, we do get disproportionate benefits from being fast on high-end devices, since that's often what the press and other influencers use.

Nicholas Nethercote [inactive]

Comment 7

•

8 years ago

Thanks for letting me know about this.

Flags: needinfo?(n.nethercote)

Nathan Froyd [:froydnj]

Comment 8

•

8 years ago

(In reply to Bobby Holley (busy) from comment #6) > Though even if they're low market share, we do get disproportionate benefits > from being fast on high-end devices, since that's often what the press and > other influencers use. This is a good point.

David Baron :dbaron:

Assignee

Comment 9

•

8 years ago

So since I last worked on this, a Mac OS X compiler upgrade means that it now compiles on all of our platforms: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f69d7878b2175c3de06a2fab791ba72f42e835f8&group_state=expanded so I'm inclined to move forward on landing it.

David Baron :dbaron:

Assignee

Comment 10

•

8 years ago

(The last substantive changes to this code appear to have been in bug 884061.)

David Baron :dbaron:

Assignee

Comment 11

•

8 years ago

Attached patch Make threadsafe reference counting use the minimum memory sychronization needed (deleted) — Details — Splinter Review

This uses std::atomic rather than mozilla::Atomic since mozilla::Atomic does not support using different memory synchronization for different atomic operations on the same variable. The added comments could use careful review since, while they reflect my understanding of the issue, I don't consider myself an expert on the topic. MozReview-Commit-ID: 7xByCXt17Dr

Attachment #8853775 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 12

•

8 years ago

Comment on attachment 8853775 [details] [diff] [review] Make threadsafe reference counting use the minimum memory sychronization needed Review of attachment 8853775 [details] [diff] [review]: ----------------------------------------------------------------- r=me; the comments below are just me thinking out loud as I reviewed secondary sources for help. Can you file a followup bug to apply this optimization to nsStringBuffer, either manually repeating the code here, or making it use ThreadSafeAutoRefCnt (preferred)? That should result in about as much of a win as this patch. Also, does this count as multi-threaded code for the purposes of the sign above your desk? :) ::: xpcom/base/nsISupportsImpl.h @@ +330,5 @@ > + // We're going to destroy the object on this thread, so we need > + // acquire semantics to synchronize with the memory released by > + // the last release on other threads, that is, to ensure that > + // writes prior to that release are now visible on this thread. > + result = mValue.load(std::memory_order_acquire); Boost suggests that you can get by with the equivalent of: atomic_thread_fence(std::memory_order_acquire); here. But I'm not sure it makes much difference in practice, given that your load is likely to hit in cache anyway. Keeping fences out of the picture is probably good practice anyway. Herb Sutter's "atomic<> Weapons" presentation suggests that you could do: nsrefcnt result = mValue.fetch_sub(1, std::memory_order_acq_rel); which would be slightly less efficient, since you need the acquire ordering on every operation, rather than just the one that will result in the destruction of the object.

Attachment #8853775 - Flags: review?(nfroyd) → review+

David Baron :dbaron:

Assignee

Comment 13

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #12) > Can you file a followup bug to apply this optimization to nsStringBuffer, > either manually repeating the code here, or making it use > ThreadSafeAutoRefCnt (preferred)? That should result in about as much of a > win as this patch. I filed bug 1353181. (It seemed faster to just repeat the code than to refactor ThreadSafeAutoRefCnt to be templatized over the underlying reference count type. I was also a little worried about how to measure codesize since we stopped running our automated tests of it.)

David Baron :dbaron:

Assignee

Comment 14

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a52e75fdda073ca38e2a88b91ad7070c4138d702 Bug 1277709 - Make threadsafe reference counting use the minimum memory sychronization needed. r=froydnj

Carsten Book [:Tomcat]

Comment 15

•

8 years ago

backed this out in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=2ba7fac730edb476a51a732011872e95d34d82c8&filter-searchStr=OS+X+10.10+debug+XPCShell+(X) for failing xpcshell tests like https://treeherder.mozilla.org/logviewer.html#?job_id=88513551&repo=mozilla-inbound&lineNumber=7989 even after backout of the other patches. to keep the tree clear backing out this changes

Flags: needinfo?(dbaron)

David Baron :dbaron:

Assignee

Updated

•

8 years ago

Flags: needinfo?(dbaron)

David Baron :dbaron:

Assignee

Comment 16

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=50f3c56e5af5be0e01c02a67496ae5da91f7a8ac&group_state=expanded looks pretty green (though not *quite* done yet).

David Baron :dbaron:

Assignee

Comment 17

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/24f889bd554967c3008fd325c36903b5651188af Bug 1277709 - Make threadsafe reference counting use the minimum memory sychronization needed. r=froydnj

Carsten Book [:Tomcat]

Comment 18

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/24f889bd5549

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Xidorn Quan [:xidorn] UTC+11

Updated

•

7 years ago

Depends on: 1397052

David Baron :dbaron:

Assignee

Updated

•

6 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1353181, https://bugzilla.mozilla.org/show_bug.cgi?id=1385438