Closed
Bug 1226627
Opened 9 years ago
Closed 9 years ago
3.5% Win7 Canvasmark PGO regression on November 17, 2015 (v.45) from push 029ab28c46cc
Categories
(Core :: Layout, defect)
Core
Layout
Tracking
()
RESOLVED
FIXED
mozilla45
Tracking | Status | |
---|---|---|
firefox45 | --- | fixed |
People
(Reporter: jmaher, Assigned: MatsPalmgren_bugz)
References
(Blocks 1 open bug)
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
Attachments
(2 files)
(deleted),
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
This is a PGO only regression, it is not seen on non-pgo. This is important because we ship PGO!
You can see a regression in canvasmark (higher is better):
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1]&zoom=1447680842451.128,1447877789338.346,7698.884758364313,8960.96654275093
This comes from filling in some missing builds and doing extra retriggers:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=ee70b9acae8a&tochange=233f67378440&filter-searchStr=Windows%207%20pgo%20Talos%20Performance%20Talos%20chrome%20T%28c%29
and we come up with a list of patches checked in at once:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=029ab28c46cc
There are a lot of patches there, pgo builds take 4.5 hours, I don't mind pushing to try to narrow this down, thought I would see if someone had an idea of the culprit, or was expecting this and knows a valid reason why this is happening.
Reporter | ||
Comment 1•9 years ago
|
||
:mats, can you take a brief look at this and let me know your thoughts?
Flags: needinfo?(mats)
Assignee | ||
Comment 2•9 years ago
|
||
It seems unlikely any of my patches could affect this test.
Most of them only touch CSS Grid related code. Bug 576927 is the only
one I can think of that could affect performance on anything else.
Does this test calculate font sizes *a lot*?
Flags: needinfo?(mats) → needinfo?(jmaher)
Reporter | ||
Comment 3•9 years ago
|
||
I don't know a lot about this test specifically, when it runs, I don't believe it does a lot of text related stuff, it is more rendering of the canvas.
:snorp, do you have knowledge if canvasmark calculates font sizes many times?
Flags: needinfo?(jmaher) → needinfo?(snorp)
Reporter | ||
Comment 4•9 years ago
|
||
I have pushed to try to bisect this down:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=cef44a16b418&tochange=7f7e9ecd0d74
In ~8 hours we should have results, I can determine the cause tomorrow morning.
(In reply to Joel Maher (:jmaher) from comment #3)
> I don't know a lot about this test specifically, when it runs, I don't
> believe it does a lot of text related stuff, it is more rendering of the
> canvas.
>
> :snorp, do you have knowledge if canvasmark calculates font sizes many times?
Nope, no idea, sorry.
Flags: needinfo?(snorp)
Reporter | ||
Comment 6•9 years ago
|
||
oh, try is 90-120 minutes faster for PGO windows builds- I have results and the culprit is:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b328d0b448c (bug 576927)
:mats, your initial guess was correct :)
Here is a breakdown of the subtests of canvasmark if it helps:
https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=try&originalRevision=465d422c1fcb&newProject=try&newRevision=7d396a39245c&originalSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363&newSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363
Blocks: 576927
Assignee | ||
Comment 7•9 years ago
|
||
Is this the testsuite we're talking about?
http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/canvasmark/
Flags: needinfo?(jmaher)
Assignee | ||
Comment 8•9 years ago
|
||
FTR, the code involved in that patch:
from gfx/src/nsCoord.h:
inline nscoord NSCoordSaturatingMultiply(nscoord aCoord, float aScale) {
return _nscoordSaturatingMultiply(aCoord, aScale, false);
}
inline nscoord _nscoordSaturatingMultiply(nscoord aCoord, float aScale,
bool requireNotNegative) {
VERIFY_COORD(aCoord);
if (requireNotNegative) {
MOZ_ASSERT(aScale >= 0.0f,
"negative scaling factors must be handled manually");
}
#ifdef NS_COORD_IS_FLOAT
return floorf(aCoord * aScale);
#else
float product = aCoord * aScale;
if (requireNotNegative ? aCoord > 0 : (aCoord > 0) == (aScale > 0))
return NSToCoordRoundWithClamp(std::min<float>(nscoord_MAX, product));
return NSToCoordRoundWithClamp(std::max<float>(nscoord_MIN, product));
#endif
}
inline nscoord NSToCoordRound(double aValue)
{
#if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__) && !defined(__clang__)
return NS_lroundup30((float)aValue);
#else
return nscoord(floor(aValue + 0.5f));
#endif /* XP_WIN32 && _M_IX86 && !__GNUC__ */
}
inline nscoord NSToCoordRoundWithClamp(float aValue)
{
#ifndef NS_COORD_IS_FLOAT
// Bounds-check before converting out of float, to avoid overflow
if (aValue >= nscoord_MAX) {
return nscoord_MAX;
}
if (aValue <= nscoord_MIN) {
return nscoord_MIN;
}
#endif
return NSToCoordRound(aValue);
}
and from xpcom/ds/nsMathUtils.h:
#if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__) && !defined(__clang__)
inline int32_t NS_lroundup30(float x)
{
/* Code derived from Laurent de Soras' paper at */
/* http://ldesoras.free.fr/doc/articles/rounding_en.pdf */
/* Rounding up on Windows is expensive using the float to */
/* int conversion and the floor function. A faster */
/* approach is to use f87 rounding while assuming the */
/* default rounding mode of rounding to the nearest */
/* integer. This rounding mode, however, actually rounds */
/* to the nearest integer so we add the floating point */
/* number to itself and add our rounding factor before */
/* doing the conversion to an integer. We then do a right */
/* shift of one bit on the integer to divide by two. */
/* This routine doesn't handle numbers larger in magnitude */
/* than 2^30 but this is fine for NSToCoordRound because */
/* Coords are limited to 2^30 in magnitude. */
static const double round_to_nearest = 0.5f;
int i;
__asm {
fld x ; load fp argument
fadd st, st(0) ; double it
fadd round_to_nearest ; add the rounding factor
fistp dword ptr i ; convert the result to int
}
return i >> 1; /* divide by 2 */
}
#endif /* XP_WIN32 && _M_IX86 && !__GNUC__ */
Assignee | ||
Comment 9•9 years ago
|
||
So from that I would guess that it's the rounding operation that is expensive.
The original code before bug 576927 truncated the value so we could try that
and see if it helps.
BTW, after looking briefly at the tests I think it's the "FPS" and other stat
numbers that triggers these calls. I suspect they are frequent and they pass
the font by name and set it for every single drawing operation:
http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/canvasmark/scripts/canvasmark_v6.js#1202
Assignee | ||
Comment 10•9 years ago
|
||
This should fix it I think:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca371aedc56b
Reporter | ||
Comment 11•9 years ago
|
||
Thanks for finding a fix and pushing to try. Looking at the numbers they seem to be half fixed (which is great), but could be fully fixed depending on the baseline used. Overall, this seems like a win for us.
As for the canvasmark code, yes- you found the correct source. That is my fault for not linking to the source and giving instructions for how to run it locally and on try. I am glad you were able to figure it out.
Flags: needinfo?(jmaher)
Assignee | ||
Comment 12•9 years ago
|
||
Assignee: nobody → mats
Attachment #8690405 -
Flags: review?(roc)
Comment on attachment 8690405 [details] [diff] [review]
fix
Review of attachment 8690405 [details] [diff] [review]:
-----------------------------------------------------------------
I find it hard to believe these measurably regress performance, but OK.
Attachment #8690405 -
Flags: review?(roc) → review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 14•9 years ago
|
||
Keywords: checkin-needed
Assignee | ||
Comment 15•9 years ago
|
||
I don't see any difference from this change, do you agree?
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1]
Flags: needinfo?(jmaher)
Reporter | ||
Comment 16•9 years ago
|
||
I agree, this isn't showing any effect on the data after landing for the last 24+ hours.
Flags: needinfo?(jmaher)
Comment 17•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Assignee | ||
Updated•9 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 18•9 years ago
|
||
I tried this:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffb241bad8dc
but I can't seem to get a useful graph for it at:
https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=ffb241bad8dc
Joel, can you get the results out of that Try push?
Flags: needinfo?(jmaher)
Reporter | ||
Comment 19•9 years ago
|
||
I looked at the parent revision and found it on inbound, right now I have retriggered the test on inbound, so in 30 minutes or so we should have enough data in perfherder to see the real impact. Here is the compare view for it:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=cf322a94b295&newProject=try&newRevision=ffb241bad8dc&originalSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363&newSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363
Flags: needinfo?(jmaher)
Reporter | ||
Comment 20•9 years ago
|
||
data is in, from what I can tell this is a win.
Assignee | ||
Comment 21•9 years ago
|
||
Thanks! Yes, this looks nice. I'll see if I can make the same change to the other
clamping functions too...
Assignee | ||
Comment 22•9 years ago
|
||
fmin[f]/fmax[f] are in C99 so I think we can use it generally without #ifdef.
The numbers certainly looks impressive, but I'm not sure how much I believe
in them though...
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=c321d8403851&newProject=try&newRevision=bc6178eb89f9
Attachment #8693114 -
Flags: review?(roc)
Comment 23•9 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #22)
> Created attachment 8693114 [details] [diff] [review]
> fix part 2
>
> fmin[f]/fmax[f] are in C99 so I think we can use it generally without #ifdef.
>
> The numbers certainly looks impressive, but I'm not sure how much I believe
> in them though...
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> central&originalRevision=c321d8403851&newProject=try&newRevision=bc6178eb89f9
As hinted at in the graphs view, you should do more retriggers on the base revision to increase confidence in your results.
If you click on the graph for (e.g.) cart, you'll see that the test has bimodal behaviour:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-central,8d47c06d6df551fe085883ad80cfe3fcb2d501f6,1]&series=[try,8d47c06d6df551fe085883ad80cfe3fcb2d501f6,1]&highlightedRevisions=c321d8403851&highlightedRevisions=bc6178eb89f9&timerange=604800
Attachment #8693114 -
Flags: review?(roc) → review+
Comment 24•9 years ago
|
||
(In reply to William Lachance (:wlach) from comment #23)
> As hinted at in the graphs view, you should do more retriggers on the base
> revision to increase confidence in your results.
>
> If you click on the graph for (e.g.) cart, you'll see that the test has
> bimodal behaviour:
Actually ignore that, I just realized I misread the graphs. The improvements do seem to be real, though I did a bunch of retriggers on the base (m-c) revision to be more sure.
Comment 25•9 years ago
|
||
Assignee | ||
Comment 26•9 years ago
|
||
It turns out we can't use this technique in NSToCoordRoundWithClamp because it
calls NSToCoordRound which calls NS_lroundup30 on Windows and that function
can't deal with values larger than (2^30)-1.
http://hg.mozilla.org/mozilla-central/annotate/7883e81f3c30/xpcom/ds/nsMathUtils.h#l43
nscoord_MAX is 2^30 and that makes NSToCoordRound(float(nscoord_MAX)) return a negative
value on win32. (Which was a surprise to me, I didn't know NSToCoordRound had badness
like that.) So I landed without the change in NSToCoordRoundWithClamp to avoid this.
The author of the paper points that out and suggests using a 64-bit int for the temp
value if you need the full integer range (see the PDF link in the code). I filed
a follow-up bug 1228858 to investigate that.
Blocks: 1228858
Assignee | ||
Comment 27•9 years ago
|
||
Backed out part 2 in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=53aa14927954
because it may have caused some Talos regressions on Android, e.g. http://mzl.la/1NXppAf
Comment 28•9 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 29•9 years ago
|
||
should we reopen this as the second patch was backed out?
Assignee | ||
Comment 30•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #29)
> should we reopen this as the second patch was backed out?
I don't think so; the tcanvasmark numbers in the past 14 days have slowly
improved in periods unrelated to this bug (Nov 24 - Nov 28) and is now back
where it was before:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1
The two clusters on Nov 29 are before/after part 2 landed, and it shows just
a minor improvement in comparison. I'm not very confident it's significant.
(I'll investigate this more in bug 1228858 and I'll run some micro-benchmarks
on these *Clamp functions to get more accurate figures. When time allows...)
Updated•9 years ago
|
Comment 31•9 years ago
|
||
Note that the Android regressions attributed to this bug are almost certainly misfiled due to an omitted clobber. See bug 1229118 for more details.
You need to log in
before you can comment on or make changes to this bug.
Description
•