Closed Bug 1798929 Opened 2 years ago Closed 2 years ago

Large percentage of time spent in ScopedResolveTexturesForDraw in Proxx-Tables-Canvas benchmark

Categories

(Core :: Graphics: Canvas2D, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
108 Branch
Tracking Status
firefox108 --- fixed

People

(Reporter: mstange, Assigned: jgilbert)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [sp3-proxx-tables-canvas])

Attachments

(1 file)

Profile: https://share.firefox.dev/3hbJr5H

WebGLContext::DrawArraysInstanced seems to spend a fair amount of time allocating and freeing things via ScopedResolveTexturesForDraw.

I found this on the Proxx-Tables-Canvas benchmark (bug 1798923): https://grandprixbench.netlify.app/?suites=Proxx-Tables-Canvas

Kelsey, you've been in this code quite a bit. Are these memory allocations avoidable, or can the number of texUnits be known up-front? ScopedResolveTexturesForDraw::ScopedResolveTexturesForDraw is also profiling in a weird way: the profiler seems to show that the function invokes itself and also can allocate multiple maps in the same call -- perhaps that's just showing that the reserve amount has been exceeded.

Flags: needinfo?(jgilbert)
Severity: -- → S3
Type: defect → enhancement
Priority: -- → P3

We should just reuse the allocations somehow.
The max size is known and small, I'm mostly using map for ergonomics.

Flags: needinfo?(jgilbert)

I'm tempted to just have this be a static thread_local, but we can just tag it off of the this object.

Also showing up kinda hot there is overhead around these four related commands:

  • SetEnabled
  • Clear
  • ClearColor
  • Scissor

Feels like we're ending up doing something like:

for () {
  Enable(SCISSOR_TEST)
  Scissor(x,y,...)
  ClearColor(r,g,b,a)
  Clear(COLOR)
  Disable(SCISSOR_TEST)
}

You can imagine us e.g. lazily combining ClearColor+Clear into Clear(flags, r,g,b,a, d,s), and/or eliding unused Enables/Disables.
Deser overhead is cheap but not free, and it does show up here weighing things down.

Assignee: nobody → jgilbert
Status: NEW → ASSIGNED

While that profile link does make it look like we're spending 15% in this func, that's because we're dropping samples that are waiting on events, so this is closer to a 2% win, all else equal.

Pushed by jgilbert@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/25b53dd457af Reuse samplerByTexUnit capacity to avoid (de)allocs. r=gfx-reviewers,lsalzman
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 108 Branch
Blocks: 1800301

== Change summary for alert #36053 (as of Sun, 13 Nov 2022 01:24:52 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
8% motionmark_webgl 3DGraphics-WebGL windows10-64-shippable-qr e10s fission stylo webrender 10.59 -> 11.46

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=36053

Blocks: speedometer3
Whiteboard: [sp3:proxx-tables-canvas]
Whiteboard: [sp3:proxx-tables-canvas] → [sp3-proxx-tables-canvas]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: