Closed Bug 1239861 Opened 9 years ago Closed 8 years ago

Composite time can go backwards

Categories

(Core :: Graphics: Layers, defect)

46 Branch
ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox45 --- fixed
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: kats, Assigned: mchang)

References

Details

(Keywords: regression, Whiteboard: gfx-noted)

Attachments

(3 files, 3 obsolete files)

While trying to land bug 1226826 I got assertion failures (in code that I added) which I can reproduce locally. What's happening is that the timestamp passed to AsyncCompositionManager::TransformShadowTree can go backwards at times, which is unexpected and probably bad. The way it can happen (on Android anyway) is if ForceComposeToTarget gets called in the middle of a sequence of regular composites. ForceComposeToTarget sets mLastCompose to TimeStamp::Now(), which can be significantly (e.g. 100ms) ahead of the previous value of mLastCompose. Then, on the next vsync, mLastCompose gets reset backwards. For example, I added logging to see how much mLastCompose was moving forward by every time it was touched, and I got this:

01-14 14:56:55.215 I/Gecko   ( 2776): staktrace: Composite advancing by 183.333326
01-14 14:56:55.295 I/Gecko   ( 2776): staktrace: Composite advancing by 83.333330
01-14 14:56:55.395 I/Gecko   ( 2776): staktrace: Composite advancing by 83.190303
01-14 14:56:55.425 I/Gecko   ( 2776): staktrace: ForceCompose advancing by 124.548012
01-14 14:56:55.875 I/Gecko   ( 2776): staktrace: Composite advancing by -24.521002
01-14 14:56:55.955 I/Gecko   ( 2776): staktrace: Composite advancing by 483.180048
...
01-14 14:56:56.755 I/Gecko   ( 2776): staktrace: Composite advancing by 49.632502
01-14 14:56:56.805 I/Gecko   ( 2776): staktrace: Composite advancing by 50.367494
01-14 14:56:56.855 I/Gecko   ( 2776): staktrace: Composite advancing by 33.333332
01-14 14:56:56.905 I/Gecko   ( 2776): staktrace: Composite advancing by 66.666664
01-14 14:56:56.955 I/Gecko   ( 2776): staktrace: ForceCompose advancing by 85.747534
01-14 14:56:57.755 I/Gecko   ( 2776): staktrace: Composite advancing by -35.747536


These negative jumps are what triggered my assertion failure. I think in general we shouldn't have mLastCompose going backwards in time. Not sure what the right fix is though - any thoughts Mason?
OS: Unspecified → Android
Hardware: Unspecified → ARM
Version: unspecified → 46 Branch
Flags: needinfo?(mchang)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #0)
> While trying to land bug 1226826 I got assertion failures (in code that I
> added) which I can reproduce locally. What's happening is that the timestamp
> passed to AsyncCompositionManager::TransformShadowTree can go backwards at
> times, which is unexpected and probably bad. The way it can happen (on
> Android anyway) is if ForceComposeToTarget gets called in the middle of a
> sequence of regular composites. ForceComposeToTarget sets mLastCompose to
> TimeStamp::Now(), which can be significantly (e.g. 100ms) ahead of the
> previous value of mLastCompose. Then, on the next vsync, mLastCompose gets
> reset backwards. For example, I added logging to see how much mLastCompose
> was moving forward by every time it was touched, and I got this:
> 
> 01-14 14:56:55.215 I/Gecko   ( 2776): staktrace: Composite advancing by
> 183.333326
> 01-14 14:56:55.295 I/Gecko   ( 2776): staktrace: Composite advancing by
> 83.333330
> 01-14 14:56:55.395 I/Gecko   ( 2776): staktrace: Composite advancing by
> 83.190303
> 01-14 14:56:55.425 I/Gecko   ( 2776): staktrace: ForceCompose advancing by
> 124.548012
> 01-14 14:56:55.875 I/Gecko   ( 2776): staktrace: Composite advancing by
> -24.521002
> 01-14 14:56:55.955 I/Gecko   ( 2776): staktrace: Composite advancing by
> 483.180048
> ...
> 01-14 14:56:56.755 I/Gecko   ( 2776): staktrace: Composite advancing by
> 49.632502
> 01-14 14:56:56.805 I/Gecko   ( 2776): staktrace: Composite advancing by
> 50.367494
> 01-14 14:56:56.855 I/Gecko   ( 2776): staktrace: Composite advancing by
> 33.333332
> 01-14 14:56:56.905 I/Gecko   ( 2776): staktrace: Composite advancing by
> 66.666664
> 01-14 14:56:56.955 I/Gecko   ( 2776): staktrace: ForceCompose advancing by
> 85.747534
> 01-14 14:56:57.755 I/Gecko   ( 2776): staktrace: Composite advancing by
> -35.747536
> 
> 
> These negative jumps are what triggered my assertion failure. I think in
> general we shouldn't have mLastCompose going backwards in time. Not sure
> what the right fix is though - any thoughts Mason?

The values by which composites are advancing are quite high. I wonder why those intervals are so high rather than being closer to 16.6ms. Is the Software Vsync source that behind [3]? Or is the main thread really busy?

I'm leaning towards skipping the whole composite step at [1], and on force composites, also dispatch touch / vr events like we do at [2]. We could just force ComposeToTarget to call CompositorVsyncScheduler::Composite(TimeStamp::Now()). Then if a vsync comes in that's a previous timestamp, we'll alter the timestamp to now(), like your patch does in comment 1. We already do this on Windows 10 because we sometimes get negative timestamps as well.

[1] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp?case=true&from=CompositorParent.cpp#435
[2] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp?case=true&from=CompositorParent.cpp#435
[3] https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/SoftwareVsyncSource.cpp?case=true&from=SoftwareVsyncSource.cpp#108
Flags: needinfo?(mchang)
Assignee: nobody → mchang
Whiteboard: gfx-noted
(In reply to Mason Chang [:mchang] from comment #2)
> The values by which composites are advancing are quite high. I wonder why
> those intervals are so high rather than being closer to 16.6ms. Is the
> Software Vsync source that behind [3]? Or is the main thread really busy?

This is on Fennec, so yes - I believe that's software vsync.

> I'm leaning towards skipping the whole composite step at [1], and on force
> composites, also dispatch touch / vr events like we do at [2]. We could just
> force ComposeToTarget to call
> CompositorVsyncScheduler::Composite(TimeStamp::Now()). Then if a vsync comes
> in that's a previous timestamp, we'll alter the timestamp to now(), like
> your patch does in comment 1. We already do this on Windows 10 because we
> sometimes get negative timestamps as well.

Sounds reasonable. I don't know how much that will affect vlad's refactoring of vsync though - he should probably weigh in. Also fyi I found a different solution to my problem so this isn't blocking me any more.
No longer blocks: 1226826
This works locally, but I don't have an android setup. Can you also see if this fixes your problem on android kats? Thanks!
Attachment #8708471 - Flags: review?(vladimir)
Attachment #8708471 - Flags: review?(bugmail.mozilla)
Comment on attachment 8708471 [details] [diff] [review]
Normalize composite timestamps to now if timestamp is in the past.

Review of attachment 8708471 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/CompositorParent.cpp
@@ +480,3 @@
>    OnForceComposeToTarget();
> +  SetNeedsComposite();
> +  Composite(TimeStamp::Now());

aren't we losing the drawtarget and rect here?
Attachment #8708471 - Attachment is obsolete: true
Attachment #8708471 - Flags: review?(bugmail.mozilla)
Attachment #8708559 - Flags: review?(bugmail.mozilla)
Comment on attachment 8708559 [details] [diff] [review]
Normalize composite timestamps to now if timestamp is in the past.

Review of attachment 8708559 [details] [diff] [review]:
-----------------------------------------------------------------

Yup, this fixes the issue I was seeing locally.
Attachment #8708559 - Flags: review?(bugmail.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/415f713d58b0
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Depends on: 1241387
Depends on: 1241363
This patch may also have perma-busted Android 4.3 debug R34. See https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&tochange=10d4e8736dbb&fromchange=52b41d0b5edb&filter-searchStr=Reftest%20R%28R34%29 - I triggered a couple of jobs to whittle down the regression range but this bug is the most likely candidate.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #10)
> This patch may also have perma-busted Android 4.3 debug R34. See
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&tochange=10d4e8736dbb&fromchange=52b41d0b5edb&filter-
> searchStr=Reftest%20R%28R34%29 - I triggered a couple of jobs to whittle
> down the regression range but this bug is the most likely candidate.

backed out also now from central since the pushes showed that the test failure started with this changes from this bug
Status: RESOLVED → REOPENED
Flags: needinfo?(mchang)
Resolution: FIXED → ---
Target Milestone: mozilla46 → ---
as a note, this has a 4% winxp webgl terrain e10s regression:
http://alertmanager.allizom.org:8080/alerts.html?rev=415f713d58b0803cf143f0a605ed79409d57fc1d&showAll=1&testIndex=0&platIndex=0#

that was fixed when this was backed out.
Can you please try these builds to see if they reproduce the issues you filed in bug 1241363 and bug 1241387?

Thanks!

https://archive.mozilla.org/pub/firefox/try-builds/mchang@mozilla.com-e1072d701dba593b8efa1198671bb951fa62cb91/
Flags: needinfo?(mchang)
Flags: needinfo?(garyshap)
Flags: needinfo?(alice0775)
(In reply to Mason Chang [:mchang] from comment #14)
> Can you please try these builds to see if they reproduce the issues you
> filed in bug 1241363 and bug 1241387?
> 
> Thanks!
> 
> https://archive.mozilla.org/pub/firefox/try-builds/mchang@mozilla.com-
> e1072d701dba593b8efa1198671bb951fa62cb91/

I tested the bug 1241387, and the try build worked fine for me.
https://hg.mozilla.org/try/rev/e1072d701dba593b8efa1198671bb951fa62cb91
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0 ID:20160130165630
Flags: needinfo?(alice0775)
The first time we create a widget and composite it, we don't have a default render target in Compositor D3D11. This causes us to hit this path at [1]. Since we don't have a default render target, we don't resize the buffer of the swap chain to the invalid region we need to composite. We then create a default render target that is the size of the back buffer at [2], which in the first composite, will not be equal to mSize [3]. Thus the bug 1241363 and bug 1241387 were composites of texture that was in the back buffer scaled up to the invalid size instead of the appropriate invalid region. This patch resizes the back buffer to the invalid region if we don't have a default render target yet. Then the newly created render target will be the appropriate size.

[1] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/d3d11/CompositorD3D11.cpp?case=true&from=CompositorD3D11.cpp#1235
[2] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/d3d11/CompositorD3D11.cpp?case=true&from=CompositorD3D11.cpp#1324
[3] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/d3d11/CompositorD3D11.cpp?case=true&from=CompositorD3D11.cpp#1325
Attachment #8714939 - Flags: review?(bas)
This is actually just a consequence of me being silly in https://hg.mozilla.org/mozilla-central/diff/6899517c16a2/gfx/layers/d3d11/CompositorD3D11.cpp#l1263. Can we just lift the ResizeBuffers call back out of the if (mDefaultRT) statement? Also, mSize ought to be the client rect and not the invalid region that you're sizing to?
Flags: needinfo?(mchang)
Updated with feedback from comment 17.
Flags: needinfo?(mchang)
Attachment #8715544 - Flags: review?(bas)
Attachment #8714939 - Attachment is obsolete: true
Attachment #8714939 - Flags: review?(bas)
Attachment #8715544 - Flags: review?(bas) → review+
I've been thinking about this more, and I actually think it's better just to skip a past vsync notification due to force composites. If we dispatched touch/vr events at force compose timestamps, it'd dispatch non-uniform events. Instead, it's probably better to just halve the frame rate but at least be uniform in this case.
Attachment #8708559 - Attachment is obsolete: true
Attachment #8715958 - Flags: review?(bugmail.mozilla)
Attachment #8715958 - Flags: review?(bugmail.mozilla) → review+
Flags: needinfo?(garyshap)
https://hg.mozilla.org/mozilla-central/rev/9d1a4ca0fe8d
https://hg.mozilla.org/mozilla-central/rev/59ae52ad5880
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Blocks: 1241364
Comment on attachment 8715544 [details] [diff] [review]
Properly resize DXGI swap chain buffer size

Approval Request Comment
[Feature/regressing bug #]: Bug 1229533, accidental failure to resize swap chain buffers sometimes. This patch reverts a small change from bug 122953 to what it was in Gecko 44 and before.
[User impact if declined]: Some random rendering artifacts, see bug 1241364.
[Describe test coverage new/current, TreeHerder]: Manual, treeherder
[Risks and why]: Low, this properly resizes our buffers to the client region.
[String/UUID change made/needed]: None

Requesting uplift to beta since that's when bug 1229533 landed.
Attachment #8715544 - Flags: approval-mozilla-beta?
Attachment #8715544 - Flags: approval-mozilla-aurora?
Comment on attachment 8715544 [details] [diff] [review]
Properly resize DXGI swap chain buffer size

Fix artifact, taking it.
Should be in 45 beta 5.
Attachment #8715544 - Flags: approval-mozilla-beta?
Attachment #8715544 - Flags: approval-mozilla-beta+
Attachment #8715544 - Flags: approval-mozilla-aurora?
Attachment #8715544 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: