Closed
Bug 592557
Opened 14 years ago
Closed 14 years ago
Eliminate uses of PR_Atomic{Increment,Decrement} functions in favor of PR_ATOMIC_{INCREMENT,DECREMENT} macros
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
FIXED
mozilla5
People
(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)
References
Details
(Keywords: perf)
Attachments
(1 file, 1 obsolete file)
(deleted),
patch
|
benjamin
:
review+
gal
:
review+
benjamin
:
approval2.0-
|
Details | Diff | Splinter Review |
There are many uses of PR_Atomic{Increment,Decrement} left in the code. Let's get rid of 'em.
Assignee | ||
Comment 1•14 years ago
|
||
Wow. Some of these calls -- especially uses of atomic set -- are really fishy.
Feast your eyes, for instance, on
http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/nsNSSComponent.cpp#300
I'm going to try to only modify code which is plausibly using the atomic ops correctly.
Assignee | ||
Comment 2•14 years ago
|
||
Just figured out an ugly Windows build error and pushed to try. We'll see tomorrow morning what the perf looks like.
Based on the results from bug 587853, I'm not hopeful that this will speed things up a lot. But I think it's still useful if only for hygiene.
Assignee | ||
Comment 3•14 years ago
|
||
Results from my try push are
Assignee | ||
Comment 4•14 years ago
|
||
Results from my try push are trickling in. Looks like a 2% win on Dromaeo DOM on Linux 32- and 64-bit. No results from other platforms yet.
http://bit.ly/btOXE3
Wait, where's your patch?
Assignee | ||
Comment 6•14 years ago
|
||
Looks like an infra problem meant that I only got Linux builds for my latest push. I'll post here when we get the rest of the results.
Assignee | ||
Comment 7•14 years ago
|
||
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Assignee | ||
Comment 8•14 years ago
|
||
New results coming in at
http://bit.ly/d1SS2D
It's hard to tell what's statistically significant and what isn't. For instance, the Dromaeo win from comment 4 has disappeared on Linux 32-bit. No 64-bit results yet.
Assignee | ||
Comment 9•14 years ago
|
||
We're not going to get Linux-64 results on this push-to-try, due to another infra failure.
Assignee | ||
Comment 10•14 years ago
|
||
Miracle of miracles, we actually got a full try run with talos. Well...mostly. We're missing some Windows results.
http://bit.ly/96LJ1t
This is clean on try, but it actually looks like a perf regression on a number of tests. Maybe this is noise, but tsspider looks better, while dromaeo sunspider and dromaeo v8 look worse? dromaeo_dom is 4.5% better on Win7, dromaeo_jslib is improved ~1% everywhere.
The results are all across the board. I find it conceivable that this patch makes things slower, but it seems more likely that the data is unreliable. I'm not sure how to proceed here...
Assignee | ||
Comment 11•14 years ago
|
||
Taras, maybe you have a better idea how to interpret these numbers?
Comment 12•14 years ago
|
||
(In reply to comment #11)
> Taras, maybe you have a better idea how to interpret these numbers?
Nope
Assignee | ||
Comment 13•14 years ago
|
||
Since we don't have numbers we're confident in, maybe we should wait until we branch, land on trunk, and see if we observe an improvement there. If so, we can backport to 2.0 branch.
Assignee | ||
Comment 14•14 years ago
|
||
For kicks, I pushed to try to get new talos numbers. But I must have gotten a dying mac slave, because I saw things like a 735% regression on ts_cold_generated_med.
The high points of the results are a 5.4% improvement in dromaeo_jslib on Win7 and WinXP, and a general 1.5-4% improvement in dromaeo_css across the board (except on mac-32, but again, maybe that's just weirdness).
I'll get new numbers and see if they're more sensible.
http://goo.gl/FgFH
Comment 15•14 years ago
|
||
- PRInt32 n = PR_AtomicIncrement((PRInt32*)&_refc); \
+ PRInt32 n = NS_AtomicIncrementRefcnt(_refc); \
please eventually try to get the alignment of the slashes right :)
i think sometimes we find that changing from inline to function call changes a function's size and thus how the compiler handles inlining which affects the instruction cache.
Assignee | ||
Comment 16•14 years ago
|
||
(In reply to comment #15)
> please eventually try to get the alignment of the slashes right :)
Thanks for catching that. :)
> i think sometimes we find that changing from inline to function call changes a
> function's size and thus how the compiler handles inlining which affects the
> instruction cache.
Note that here we're changing from a function call to an inline, not the other way around. Since the inlined function consists of a single instruction, I doubt it's affecting the compiler's heuristics. But who knows...
Assignee | ||
Comment 17•14 years ago
|
||
Here's another set of numbers: http://goo.gl/6HtG (missing some Mac-64 results).
The compare talos site just went down, but I'll analyze when it comes back up.
Assignee | ||
Comment 18•14 years ago
|
||
New links:
http://goo.gl/yOHX
http://goo.gl/mWO2
These two sets of tests were run on the exact same code. The giant Mac-32 regressions from comment 14 didn't reproduce. It looks like the second time, I got a bad Linux slave for the same set of tests, so Linux-32 shows a ~10% regression on a bunch of them.
Regressions common to both runs are:
* dromaeo_sunspider -- The first run shows a Win7 and Linux-32 regression of 11%, and the second shows Linux flat and Win7 still down 11%. So I'd guess that the Windows result also noise.
* ts_cold_generated_max_shutdown -- There's a consistent large (40-16%) regression on mac-32 ts_cold_generated_max_shutdown. Seeing as this test is clearly wonky, I imagine this is not meaningful.
* ts_shutdown -- 30-50% mac-64 regression. All the other tests are flat, so I'd guess this doesn't mean anything.
Common upshots are a little less clear. There's a consistent 12-14% improvement on WinXP dromaeo_v8, and about a 1% improvement on Win7 tsspider_nochrome. Other tests show some amount of improvement, but it's hard to separate signal from noise.
I'm going to push to try again and see if those three common regressions go away. If so, I'd be in favor of taking this, since unless something really weird is going on, it should make us faster.
Assignee | ||
Comment 19•14 years ago
|
||
Fixing whitespace (comment 15).
Attachment #473609 -
Attachment is obsolete: true
Comment 20•14 years ago
|
||
Yeah, I know. The way I was thinking is that whereas before we had a fat function (AddRef/Release) calling another function (pr_inc/pr_dec), it couldn't be inlined elsewhere (everyone using addref/release). Potentially we now have a "seemingly" small function (addref/release) which could be inlined lots of places (everyone uses addref/release). In the case of AddRef/Release, you could actually be hurting the cpu cache since perhaps the classic addref/release were kept hot and now the inlined ones in lots of places wouldn't share a hot point.
If I'm right, it should be pretty easy to verify. But I don't have the resources to do that. I'm just back from vacation and have a month's worth of mail to read and projects to worry about.
Assignee | ||
Comment 21•14 years ago
|
||
I'd be really surprised if the wackyness we're seeing on Talos has to do with inlining effects. If it were, I'd expect to see consistent slow/fastness on a given platform, not certain tests (shutdown tests??) running tens of percents slower.
Assignee | ||
Comment 22•14 years ago
|
||
More Talos runs have been added to http://goo.gl/mWO2
I've verified that the three regressions identified in comment 18 are due to outliers.
The large WinXP dromaeo_v8 win may also be the work of an outlier, and the tsspider_nochrome Win7 improvement seems marginal at best.
There's a fair amount of green (well, green on my copy of compare talos, red on sdwilsh's copy) in the dromaeo suite. I'm not convinced that this is anything but noise, though.
The solution to this uncertainty, of course, is more Talos runs. :)
Assignee | ||
Comment 23•14 years ago
|
||
Newest numbers are at http://goo.gl/mWO2 (same link as comment 22).
I now have enough Talos runs to say that I don't think this actually results in any perf regressions.
The large (~9%) WinXP dromaeo_v8 win persisted even after a third Talos run, so I judge that as likely to be real, although maybe not quite 9%. There's also a more tenuous but solidly plausible ~3% Win XP/7 dromaeo_dom win.
With those numbers, I'm comfortable asking for review.
Assignee | ||
Updated•14 years ago
|
Attachment #480648 -
Flags: review?(benjamin)
Comment 24•14 years ago
|
||
Comment on attachment 480648 [details] [diff] [review]
Patch v1.1
This is fine. Brendan or somebody from JS should probably rubberstamp this as well.
Attachment #480648 -
Flags: review?(benjamin) → review+
Assignee | ||
Updated•14 years ago
|
Attachment #480648 -
Flags: superreview?(brendan)
Comment 25•14 years ago
|
||
Comment on attachment 480648 [details] [diff] [review]
Patch v1.1
r=me for the js engine and xpconnect changes
Attachment #480648 -
Flags: review+
Assignee | ||
Comment 26•14 years ago
|
||
Comment on attachment 480648 [details] [diff] [review]
Patch v1.1
Requesting a2.0. Are we good to go here, or does this need another set of eyes?
Attachment #480648 -
Flags: superreview?(brendan) → approval2.0?
Comment 27•14 years ago
|
||
More than good to go this point. Go!
/be
Assignee | ||
Comment 28•14 years ago
|
||
I'll land once the tree re-opens for non-b7 blockers.
Assignee | ||
Updated•14 years ago
|
Whiteboard: [check-in-after-b7]
Comment 29•14 years ago
|
||
This has nonzero risk and so I think we should wait.
Attachment #480648 -
Flags: approval2.0? → approval2.0-
Assignee | ||
Comment 30•14 years ago
|
||
(In reply to comment #29)
> This has nonzero risk and so I think we should wait.
Until when?
If Talos is to believed, this is a sizable Windows V8 win. If the numbers we have aren't convincing enough, I can have releng kick ten runs overnight so we can see.
Comment 31•14 years ago
|
||
Until after FF4 branches. I'm being conservative on approvals: this is a medium perf boost on a semi-contrived benchmark, and a small/medium risk that we got something wrong or that the behavior of the macro is different from the function call (or that the compiler will do optimize things incorrectly with the macro and skip necessary barriers). I'm willing to be overruled! ;-)
Assignee | ||
Comment 32•14 years ago
|
||
I definitely agree that there's risk, although if we're misunderstanding something about these macros, it's possible/likely that bug 587853 also messed something up. But that bug changed much less than this bug, so I understand the conservatism.
We could reduce risk by making this patch smaller, since I imagine that most of these callsites aren't performance-relevant. On the other hand, an advantage of using the macro most everywhere is that if it does do the Wrong Thing, we're more likely to notice that there's a problem.
The risk isn't unknowable, is it? I mean, we're in beta still. Let's get this in and then if we start noticing problems we can easily back this out.
Assignee | ||
Comment 34•14 years ago
|
||
I just re-ran the Talos numbers [1].
The big dromaeo-v8 improvement has gone away, I presume because the JS engine has gotten rid of atomic instructions in that benchmark's hot path.
It's still an improvement of 2% on dromaeo DOM and a 1.5% on dromaeo jslib on Windows.
Given that there's no longer a large perf win to be had, I'm OK waiting until after we branch.
[1] http://goo.gl/Bg0X3
Assignee | ||
Updated•14 years ago
|
Whiteboard: [check-in-after-b7] → [check-in-after-branch]
Updated•14 years ago
|
Comment 35•14 years ago
|
||
This patch doesn't apply cleanly on trunk any more.
Whiteboard: [check-in-after-branch] → [check-in-after-branch][not-ready]
Assignee | ||
Comment 36•14 years ago
|
||
Fixed in cedar: http://hg.mozilla.org/projects/cedar/rev/d49c938dbded
Whiteboard: [check-in-after-branch][not-ready] → [fixed-in-cedar]
Comment 37•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-cedar]
Target Milestone: --- → mozilla2.2
You need to log in
before you can comment on or make changes to this bug.
Description
•