Closed Bug 1676390 Opened 4 years ago Closed 4 years ago

0.57 - 3.87% tscrollx (linux64-shippable-qr) regression on push 5f5c269d15d6e521b15243b5014a65965df0cc08 (Sat November 7 2020)

Categories

(Core :: Graphics: WebRender, defect)

Firefox 84
defect

Tracking

()

RESOLVED FIXED
85 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox82 --- unaffected
firefox83 --- unaffected
firefox84 --- fixed
firefox85 --- fixed

People

(Reporter: alexandrui, Assigned: jnicol)

References

(Regression)

Details

(4 keywords)

Attachments

(1 file)

Perfherder has detected a talos performance regression from push 5f5c269d15d6e521b15243b5014a65965df0cc08. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
4% tscrollx linux64-shippable-qr e10s stylo webrender 1.57 -> 1.63
1% tscrollx linux64-shippable-qr e10s stylo webrender 1.58 -> 1.59

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(jnicol)
Component: Performance → Graphics: WebRender
Product: Testing → Core

Set release status flags based on info from the regressing bug 1661528

Please don't back this out. It's an improvement on real world sites with larger amounts of texture upload. The regression here comes from testcases with a very low amount of texture upload. I will look for a way to mitigate the impact in those cases, but we're better off with the change even as it is.

Flags: needinfo?(jnicol)

Dzmitry, from looking at the profile it appears that the glFenceSync call after uploading to the vertex data textures explains the regression. I know we discussed only requiring 1 flush per frame, by using a single texture uploader for all uploads. However, even that won't help in this test case: after the initial upload of glyphs, we only upload very small amounts of vertex data each frame. There are no more texture cache uploads.

Is glFenceSync expected to be taking that long? If not, any ideas on how to avoid it?

Do you think this is representative of real world enough to worry about it? If so, should we avoid reusing PBOs altogether when they are so small that the fence is more expensive than allocating a new one?

Flags: needinfo?(dmalyshau)

I don't think creation of a fence is expensive. It should be totally fine to have one fence per frame created.

I suspect what's happening here is that inserting a fence requires the driver to stop recording the command list, and internally submit it to a GPU queue. After that, we continue submitting commands, so it has to record them into other command streams, and then submit again at the end of the frame. So we end up with at least 2 submissions instead of 1.

What we need to do is to merge these two submissions together, i.e. insert the fence only at a place where we expect the driver to finish recording command stream anyway, at the end of the frame (like we discussed).

Flags: needinfo?(dmalyshau)

UploadPBOPool::end_frame() creates a GLsync object for synchronizing
PBO access, which internaly may require flushing the command
stream. Previously this was being called mid-frame, and potentially
multiple times per frame. This overhead caused talos regressions in
cases with very small amounts of texture upload per frame. To fix
this, call UploadPBO::end_frame() only once at the end of the frame.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b5c5164ab3ca Call UploadPBOPool::end_frame() once per frame and at the end of the frame. r=kvark
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

IIUC we're targeting Mali GPUs in 84. Do we need to uplift this to Beta in support of that?

Flags: needinfo?(jnicol)

We are, and in fact because this regression affects other platforms too there is even more reason to uplift.

Flags: needinfo?(jnicol)

Comment on attachment 9189848 [details]
Bug 1676390 - Call UploadPBOPool::end_frame() once per frame and at the end of the frame. r?kvark

Beta/Release Uplift Approval Request

  • User impact if declined: Performance regression on some Talos tests / probably in the wild too
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Minor change, just delays creating a gl sync object until later in frame.
  • String changes made/needed:
Attachment #9189848 - Flags: approval-mozilla-beta?

Comment on attachment 9189848 [details]
Bug 1676390 - Call UploadPBOPool::end_frame() once per frame and at the end of the frame. r?kvark

Approved for 84.0b7.

Attachment #9189848 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: