1548187 - 21.05 - 38.25% glterrain (windows10-64-shippable, windows7-32-shippable) regression on push e8d2d9aff5026ef1f1777b781b47fdcbdb9d8f20 (Mon Apr 29 2019)

Cristina Coroiu [:ccoroiu]

Reporter

Description

•

6 years ago

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=e8d2d9aff5026ef1f1777b781b47fdcbdb9d8f20

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

38% glterrain windows10-64-shippable opt e10s stylo 0.73 -> 1.01
21% glterrain windows7-32-shippable opt e10s stylo 0.74 -> 0.89

Improvements:

11% displaylist_mutate linux64-shippable opt e10s stylo 1,673.98 -> 1,488.03

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=20725

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Cristina Coroiu [:ccoroiu]

Reporter

Updated

•

6 years ago

Component: General → Graphics: Layers

Flags: needinfo?(bas)

Product: Testing → Core

Andreea Pavel [:apavel]

Updated

•

6 years ago

Flags: needinfo?(igoldan)

Bas Schouten (:bas.schouten)

Comment 1

•

6 years ago

So Linux is not affected by this pref change, so that improvement is most certainly nonsense.

As for the glterrain regression, we had measured this ourselves as well, we don't understand completely what glterrain is measuring here but the number is so small in absolutes, and this patch merely reduces the work we do, we believe it is some sort of ordering or scheduling issue.

There is significant pageload improvements from this patch so we would be very reluctant to switch back, in my discussions with graphics team members there seemed to be agreement with this.

Jeff, do you have any idea how doing -less- work could be regressing glterrain? I'm a little unsure how this number is being generated. As a point of information here, the only -additional- work this patch do, is copy regions which are -valid- in the front buffer from the previous frame but invalid in the backbuffer from the front to the backbuffer before validating the backbuffer.

Flags: needinfo?(bas) → needinfo?(jgilbert)

Bas Schouten (:bas.schouten)

Comment 2

•

6 years ago

Sotaro, Jeff Muizelaar suggested you may have an idea on this as well, I have actually come up with a theory of why this is based on how the test works, and why this regression also occurs with WebRender), I'd like to know what your idea is.

Flags: needinfo?(sotaro.ikeda.g)

Kelsey Gilbert [:jgilbert]

Comment 3

•

6 years ago

I'm curious what the theories are, but I'd be very comfortable accepting such a small absolute-unit regression. This degree of change is generally from adding or removing a copy, which is not a deal-breaker.

It's possible webgl isn't properly tied into the damage-rect code. (over-invalidation?)
We might have been getting lucky with single-buffer mode on this test, also. (Never blocking on reuse, in part because the workload is so small) Switching to double-buffered would thus potentially incur extra memory bandwidth.

Also, switching to double-buffered would indeed incur an extra copy if we're preserving the undamaged remainder of the page around the damanged webgl rect. Particularly if this surrounding content is cheap to recreate (color layers, clears), this could amount to more work being done by copying undirtied regions front-to-back-buffer.

Flags: needinfo?(jgilbert)

Timothy Nikkel (:tnikkel)

Updated

•

5 years ago

Priority: -- → P3

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

Blocks: 1534654

Flags: needinfo?(igoldan)

Sotaro Ikeda [:sotaro]

Comment 4

•

5 years ago

(In reply to Bas Schouten (:bas.schouten) from comment #2)

Sotaro, Jeff Muizelaar suggested you may have an idea on this as well, I have actually come up with a theory of why this is based on how the test works, and why this regression also occurs with WebRender), I'd like to know what your idea is.

Sorry for late response. Japan was in long public holidays. I am going to take a look.

Sotaro Ikeda [:sotaro]

Comment 5

•

5 years ago

It might be caused by a difference of BitBlt model and Flip model.
https://docs.microsoft.com/en-us/windows/desktop/direct3ddxgi/dxgi-flip-model

On Flip model, 2 buffers are used on D3D compositor. In this model, if 2 presents are called during 1 vsync period or 2 buffers are still in use because of heavy tasks, WaitForPreviousPresentQuery() or Present() is going to block until one buffer becomes available. To avoid it, WebRender uses 3 buffers by Bug 1500017.

On BitBlt model, if the above problem happens, it just drops Present() frame.

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Comment 6

•

5 years ago

During looking into this bug, I found that Bug 1547775 caused another problem like Bug 1435995. If DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL is used for HWND, when firefox fallback to BasicCompositor, Firefox does not update painting anymore until restart.

Sotaro Ikeda [:sotaro]

Comment 7

•

5 years ago

Created Bug 1549674 for comment 6.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 8

•

5 years ago

It looks like the glterrain results were restored after bug 1547805 landed, is this likely to be related, or is there still potential for improvements here?

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Comment 9

•

5 years ago

bug 1547805 seems not related to this bug. Bug 1544538 disabled "advanced layers" usage and bug 1547805 just re-enabled "advanced layers" usage. performance of "advanced layers" is better than CompositorD3D11 in general.

Flags: needinfo?(sotaro.ikeda.g)

Alexandru Ionescu (needinfo me) [:alexandrui]

Updated

•

5 years ago

status-firefox69: --- → affected

status-firefox70: --- → affected

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox69: affected → wontfix

status-firefox70: affected → wontfix

status-firefox71: --- → fix-optional

status-firefox-esr60: --- → unaffected

status-firefox-esr68: --- → wontfix

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Keywords: perf-alert

Florin Strugariu [:Bebe]

Comment 10

•

5 years ago

[:sotaro] any other progress on this?

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1594436

Sotaro Ikeda [:sotaro]

Comment 11

•

5 years ago

With current m-c, Enabling double buffering seems to cause 8-9% regression on glterrain. It is not like 21.05 - 38.25%.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5fd5ca84ffffaf6985510bd5030d47e5e15871ff&newProject=try&newRevision=d59c331cef0e689917ac8661b3ca7dab1d2a48df&framework=1

Sotaro Ikeda [:sotaro]

Comment 12

•

5 years ago

As in comment 5, there is a SwapChain model difference.

https://docs.microsoft.com/en-us/windows/desktop/direct3ddxgi/dxgi-flip-model

When double buffering is disabled, BitBlt mode SwapChain is used. When double buffering is enabled, flip model SwapChain is used.

When BitBlt mode SwapChain is used, there could be a case that gecko rendering is skipped and the rendering is not displayed to display. But when flip model SwapChain is used, the rendering skip does not happen. Then, when BitBlt mode SwapChain is used, score of glterrain could be better by the rendering skip. But it does not mean that actual fps on display is better by disabling double buffering.

By Bug 1547775 comment 0, actual performance became better by enabling double buffering. And by Comment 11, actual glterrain regression is not big. Then it seems that we could accept the glterrain regression.

Flags: needinfo?(sotaro.ikeda.g)

Darkspirit

Updated

•

3 years ago

Status: NEW → RESOLVED

Closed: 3 years ago

status-firefox71: fix-optional → wontfix

Resolution: --- → WONTFIX

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Has Regression Range: --- → yes

Bugzilla

Quick Search

21.05 - 38.25% glterrain (windows10-64-shippable, windows7-32-shippable) regression on push e8d2d9aff5026ef1f1777b781b47fdcbdb9d8f20 (Mon Apr 29 2019)

Categories

(Core :: Graphics: Layers, defect, P3)

Tracking

()

People

(Reporter: ccoroiu, Unassigned)

References

(Regression)

Details

(4 keywords)

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Comment 10

Updated

Comment 11

Comment 12

Updated

Updated