Closed Bug 1226627 Opened 9 years ago Closed 9 years ago

3.5% Win7 Canvasmark PGO regression on November 17, 2015 (v.45) from push 029ab28c46cc

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed

People

(Reporter: jmaher, Assigned: MatsPalmgren_bugz)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(2 files)

This is a PGO only regression, it is not seen on non-pgo. This is important because we ship PGO! You can see a regression in canvasmark (higher is better): https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1]&zoom=1447680842451.128,1447877789338.346,7698.884758364313,8960.96654275093 This comes from filling in some missing builds and doing extra retriggers: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=ee70b9acae8a&tochange=233f67378440&filter-searchStr=Windows%207%20pgo%20Talos%20Performance%20Talos%20chrome%20T%28c%29 and we come up with a list of patches checked in at once: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=029ab28c46cc There are a lot of patches there, pgo builds take 4.5 hours, I don't mind pushing to try to narrow this down, thought I would see if someone had an idea of the culprit, or was expecting this and knows a valid reason why this is happening.
:mats, can you take a brief look at this and let me know your thoughts?
Flags: needinfo?(mats)
It seems unlikely any of my patches could affect this test. Most of them only touch CSS Grid related code. Bug 576927 is the only one I can think of that could affect performance on anything else. Does this test calculate font sizes *a lot*?
Flags: needinfo?(mats) → needinfo?(jmaher)
I don't know a lot about this test specifically, when it runs, I don't believe it does a lot of text related stuff, it is more rendering of the canvas. :snorp, do you have knowledge if canvasmark calculates font sizes many times?
Flags: needinfo?(jmaher) → needinfo?(snorp)
I have pushed to try to bisect this down: https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=cef44a16b418&tochange=7f7e9ecd0d74 In ~8 hours we should have results, I can determine the cause tomorrow morning.
(In reply to Joel Maher (:jmaher) from comment #3) > I don't know a lot about this test specifically, when it runs, I don't > believe it does a lot of text related stuff, it is more rendering of the > canvas. > > :snorp, do you have knowledge if canvasmark calculates font sizes many times? Nope, no idea, sorry.
Flags: needinfo?(snorp)
Flags: needinfo?(jmaher)
FTR, the code involved in that patch: from gfx/src/nsCoord.h: inline nscoord NSCoordSaturatingMultiply(nscoord aCoord, float aScale) { return _nscoordSaturatingMultiply(aCoord, aScale, false); } inline nscoord _nscoordSaturatingMultiply(nscoord aCoord, float aScale, bool requireNotNegative) { VERIFY_COORD(aCoord); if (requireNotNegative) { MOZ_ASSERT(aScale >= 0.0f, "negative scaling factors must be handled manually"); } #ifdef NS_COORD_IS_FLOAT return floorf(aCoord * aScale); #else float product = aCoord * aScale; if (requireNotNegative ? aCoord > 0 : (aCoord > 0) == (aScale > 0)) return NSToCoordRoundWithClamp(std::min<float>(nscoord_MAX, product)); return NSToCoordRoundWithClamp(std::max<float>(nscoord_MIN, product)); #endif } inline nscoord NSToCoordRound(double aValue) { #if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__) && !defined(__clang__) return NS_lroundup30((float)aValue); #else return nscoord(floor(aValue + 0.5f)); #endif /* XP_WIN32 && _M_IX86 && !__GNUC__ */ } inline nscoord NSToCoordRoundWithClamp(float aValue) { #ifndef NS_COORD_IS_FLOAT // Bounds-check before converting out of float, to avoid overflow if (aValue >= nscoord_MAX) { return nscoord_MAX; } if (aValue <= nscoord_MIN) { return nscoord_MIN; } #endif return NSToCoordRound(aValue); } and from xpcom/ds/nsMathUtils.h: #if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__) && !defined(__clang__) inline int32_t NS_lroundup30(float x) { /* Code derived from Laurent de Soras' paper at */ /* http://ldesoras.free.fr/doc/articles/rounding_en.pdf */ /* Rounding up on Windows is expensive using the float to */ /* int conversion and the floor function. A faster */ /* approach is to use f87 rounding while assuming the */ /* default rounding mode of rounding to the nearest */ /* integer. This rounding mode, however, actually rounds */ /* to the nearest integer so we add the floating point */ /* number to itself and add our rounding factor before */ /* doing the conversion to an integer. We then do a right */ /* shift of one bit on the integer to divide by two. */ /* This routine doesn't handle numbers larger in magnitude */ /* than 2^30 but this is fine for NSToCoordRound because */ /* Coords are limited to 2^30 in magnitude. */ static const double round_to_nearest = 0.5f; int i; __asm { fld x ; load fp argument fadd st, st(0) ; double it fadd round_to_nearest ; add the rounding factor fistp dword ptr i ; convert the result to int } return i >> 1; /* divide by 2 */ } #endif /* XP_WIN32 && _M_IX86 && !__GNUC__ */
So from that I would guess that it's the rounding operation that is expensive. The original code before bug 576927 truncated the value so we could try that and see if it helps. BTW, after looking briefly at the tests I think it's the "FPS" and other stat numbers that triggers these calls. I suspect they are frequent and they pass the font by name and set it for every single drawing operation: http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/canvasmark/scripts/canvasmark_v6.js#1202
Thanks for finding a fix and pushing to try. Looking at the numbers they seem to be half fixed (which is great), but could be fully fixed depending on the baseline used. Overall, this seems like a win for us. As for the canvasmark code, yes- you found the correct source. That is my fault for not linking to the source and giving instructions for how to run it locally and on try. I am glad you were able to figure it out.
Flags: needinfo?(jmaher)
Attached patch fix (deleted) — Splinter Review
Assignee: nobody → mats
Attachment #8690405 - Flags: review?(roc)
Comment on attachment 8690405 [details] [diff] [review] fix Review of attachment 8690405 [details] [diff] [review]: ----------------------------------------------------------------- I find it hard to believe these measurably regress performance, but OK.
Attachment #8690405 - Flags: review?(roc) → review+
Keywords: checkin-needed
Flags: needinfo?(jmaher)
I agree, this isn't showing any effect on the data after landing for the last 24+ hours.
Flags: needinfo?(jmaher)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I tried this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffb241bad8dc but I can't seem to get a useful graph for it at: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=ffb241bad8dc Joel, can you get the results out of that Try push?
Flags: needinfo?(jmaher)
I looked at the parent revision and found it on inbound, right now I have retriggered the test on inbound, so in 30 minutes or so we should have enough data in perfherder to see the real impact. Here is the compare view for it: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=cf322a94b295&newProject=try&newRevision=ffb241bad8dc&originalSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363&newSignature=fcc1e66ac3fc1b53d6b620e94eafc1e2aac54363
Flags: needinfo?(jmaher)
data is in, from what I can tell this is a win.
Thanks! Yes, this looks nice. I'll see if I can make the same change to the other clamping functions too...
Attached patch fix part 2 (deleted) — Splinter Review
fmin[f]/fmax[f] are in C99 so I think we can use it generally without #ifdef. The numbers certainly looks impressive, but I'm not sure how much I believe in them though... https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=c321d8403851&newProject=try&newRevision=bc6178eb89f9
Attachment #8693114 - Flags: review?(roc)
(In reply to Mats Palmgren (:mats) from comment #22) > Created attachment 8693114 [details] [diff] [review] > fix part 2 > > fmin[f]/fmax[f] are in C99 so I think we can use it generally without #ifdef. > > The numbers certainly looks impressive, but I'm not sure how much I believe > in them though... > https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla- > central&originalRevision=c321d8403851&newProject=try&newRevision=bc6178eb89f9 As hinted at in the graphs view, you should do more retriggers on the base revision to increase confidence in your results. If you click on the graph for (e.g.) cart, you'll see that the test has bimodal behaviour: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-central,8d47c06d6df551fe085883ad80cfe3fcb2d501f6,1]&series=[try,8d47c06d6df551fe085883ad80cfe3fcb2d501f6,1]&highlightedRevisions=c321d8403851&highlightedRevisions=bc6178eb89f9&timerange=604800
(In reply to William Lachance (:wlach) from comment #23) > As hinted at in the graphs view, you should do more retriggers on the base > revision to increase confidence in your results. > > If you click on the graph for (e.g.) cart, you'll see that the test has > bimodal behaviour: Actually ignore that, I just realized I misread the graphs. The improvements do seem to be real, though I did a bunch of retriggers on the base (m-c) revision to be more sure.
It turns out we can't use this technique in NSToCoordRoundWithClamp because it calls NSToCoordRound which calls NS_lroundup30 on Windows and that function can't deal with values larger than (2^30)-1. http://hg.mozilla.org/mozilla-central/annotate/7883e81f3c30/xpcom/ds/nsMathUtils.h#l43 nscoord_MAX is 2^30 and that makes NSToCoordRound(float(nscoord_MAX)) return a negative value on win32. (Which was a surprise to me, I didn't know NSToCoordRound had badness like that.) So I landed without the change in NSToCoordRoundWithClamp to avoid this. The author of the paper points that out and suggests using a 64-bit int for the temp value if you need the full integer range (see the PDF link in the code). I filed a follow-up bug 1228858 to investigate that.
Blocks: 1228858
Backed out part 2 in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=53aa14927954 because it may have caused some Talos regressions on Android, e.g. http://mzl.la/1NXppAf
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
should we reopen this as the second patch was backed out?
(In reply to Joel Maher (:jmaher) from comment #29) > should we reopen this as the second patch was backed out? I don't think so; the tcanvasmark numbers in the past 14 days have slowly improved in periods unrelated to this bug (Nov 24 - Nov 28) and is now back where it was before: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,975bd3027c1a2bc77dadf2224503b425474e2e09,1 The two clusters on Nov 29 are before/after part 2 landed, and it shows just a minor improvement in comparison. I'm not very confident it's significant. (I'll investigate this more in bug 1228858 and I'll run some micro-benchmarks on these *Clamp functions to get more accurate figures. When time allows...)
Note that the Android regressions attributed to this bug are almost certainly misfiled due to an omitted clobber. See bug 1229118 for more details.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: