Closed
Bug 1453801
Opened 7 years ago
Closed 7 years ago
In SourceSurfaceSharedDataWrapper::SourceSurfaceSharedDataWrapper, mConsumer being initialized to 0 causes an assertion to trigger
Categories
(Core :: Graphics: WebRender, defect, P1)
Core
Graphics: WebRender
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox59 | --- | disabled |
firefox60 | --- | disabled |
firefox61 | --- | fixed |
People
(Reporter: tristanbourvon, Assigned: aosmond)
References
Details
Attachments
(3 files, 3 obsolete files)
(deleted),
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
This hunk https://hg.mozilla.org/integration/mozilla-inbound/diff/d7d2f08e051c/gfx/layers/SourceSurfaceSharedData.h caused an assertion to trigger (https://treeherder.mozilla.org/logviewer.html#?job_id=173104451&repo=mozilla-inbound), presumably because RemoveConsumer is called before AddConsumer somewhere, and currently mConsumer gets positive garbage so this is invisible.
The hunk has since been backed out but should be a correct fix to land.
Updated•7 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(aosmond)
Assignee | ||
Comment 1•7 years ago
|
||
Well that's embarrassing.
Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(aosmond)
Comment 2•7 years ago
|
||
I guess this also means that in SharedSurfacesParent::Release we never do sInstance->mSurfaces.Remove(id) ?
Assignee | ||
Comment 3•7 years ago
|
||
I'm not sure if I even need the consumer code anymore, the code has evolved a lot since it was first introduced. Also this bug only affects WebRender enabled builds.
Assignee | ||
Comment 4•7 years ago
|
||
Should be fine just ripping it out. The image cache in the content process keeps the surfaces alive in the GPU process as long as necessary anyways. The races I feared should be very uncommon, and now WebRender accepts an external image ID not mapping to anything (it will log a critical warning).
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c081d1c862f527c8efa22f578b60daea022ee7eb
Attachment #8967829 -
Flags: review?(jmuizelaar)
Updated•7 years ago
|
Attachment #8967829 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Comment 5•7 years ago
|
||
This should be unmarked as a security bug, given it will only affect WebRender, and the consequence is that we won't release images from the cache in the GPU process when we should (memory leak).
Comment 6•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/2efe54944e8cb91f1198d0c55ccb440de5b37997
https://hg.mozilla.org/mozilla-central/rev/2efe54944e8c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Updated•7 years ago
|
status-firefox59:
--- → disabled
status-firefox60:
--- → disabled
status-firefox-esr52:
--- → unaffected
Comment 7•7 years ago
|
||
I think we need to back this out, it caused a huge AWSY regression w/ WebRender (bug 1454346) as well as frequent crashes (bug 1454112, bug 1454114).
Flags: needinfo?(aryx.bugmail)
Comment 8•7 years ago
|
||
backout |
Status: RESOLVED → REOPENED
Flags: needinfo?(aryx.bugmail)
Resolution: FIXED → ---
Target Milestone: mozilla61 → ---
Updated•7 years ago
|
Group: core-security → gfx-core-security
Assignee | ||
Comment 9•7 years ago
|
||
My latest attempt to fix this has yielded fruit.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=0e45c13b34e815cb42a9f08bb44142d1a81e186e&newProject=try&newRevision=385b2b7f02edc4049f1bf48fc61299d174a39beb&framework=4&showOnlyImportant=1
I haven't looked into why the the Images measurement went up by 30% / 1.5MB, but the Resident Memory went down by 6% / 70MB. Finally results that are starting to make some sense.
Comment 10•7 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #9)
> My latest attempt to fix this has yielded fruit.
>
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> central&originalRevision=0e45c13b34e815cb42a9f08bb44142d1a81e186e&newProject=
> try&newRevision=385b2b7f02edc4049f1bf48fc61299d174a39beb&framework=4&showOnly
> Important=1
>
> I haven't looked into why the the Images measurement went up by 30% / 1.5MB,
> but the Resident Memory went down by 6% / 70MB. Finally results that are
> starting to make some sense.
Looking at the subtests it feels like you're keeping images alive longer [1]. The after 30s / forced GC numbers look okay.
[1] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&originalRevision=0e45c13b34e815cb42a9f08bb44142d1a81e186e&newProject=try&newRevision=385b2b7f02edc4049f1bf48fc61299d174a39beb&originalSignature=192131af763ce4043e6b7dacf039fd3330f47e1a&newSignature=192131af763ce4043e6b7dacf039fd3330f47e1a&framework=4
Assignee | ||
Comment 11•7 years ago
|
||
After some consideration, I think this is what is happened before:
1) Load a page. Surface cache touched to create image container. Image container cached.
2) Tab switch away to prepare for a new page. Image container remains cached by the caller.
3) Repeat.
4a) Return to tabs in order to close. Surface cache not touched as the image container was kept alive. Doesn't matter if imgIContainer::GetImageContainer(AtSize) is called because it exits before hitting the cache if the container already exists.
4b) In parallel, items in cache are expired and removed from reporting. The surfaces may still be kept alive by the image container, but no longer reported.
With my new changes:
2) Tab switch away to prepare for a new page. Image container is discarded.
4a). Return to tabs in order to close. Hits surface cache in order to get the image container for the last brief display before closing.
4b). Same as before, but the timer got reset, so the images live a bit longer.
Assignee | ||
Updated•7 years ago
|
Component: Graphics: Layers → Graphics: WebRender
Assignee | ||
Comment 12•7 years ago
|
||
This reveals a shortcoming in imagelib as well. Ideally we wouldn't expire cache entries that still have an image container alive. This would also make reporting more accurate, e.g. not underreporting.
Assignee | ||
Comment 13•7 years ago
|
||
I don't believe linux64 suffers much from this misreporting. Images are less likely to be layerized. That would cause them to go via the imgIContainer::Draw path, which will touch the surface cache again and keep the images alive longer.
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&originalRevision=2c32c973232d7362ca9dfe103e1c3a4863e9a423&newProject=mozilla-central&newRevision=8ed49dd81059dfdd876cf62ad5def1cfa56ffbbf&originalSignature=64f1ad4137b67d2c468d4d5efc381c4aec875872&newSignature=64f1ad4137b67d2c468d4d5efc381c4aec875872&framework=4
While the patch appears to regress linux64-qr, in reality it just brings us back in line with what one would expect. Since we are showing the same images, why are we would imagelib report less memory usage? There isn't any obvious reason why WebRender would be more efficient here.
Updated•7 years ago
|
Blocks: stage-wr-trains
Priority: -- → P1
Updated•7 years ago
|
Group: gfx-core-security
Assignee | ||
Comment 14•7 years ago
|
||
This fixes how we would hold onto the image containers for too long. Tab switching away would keep the WebRenderUserData objects, while scrolling would toss them.
Attachment #8967829 -
Attachment is obsolete: true
Assignee | ||
Comment 15•7 years ago
|
||
This sort of does a 180 on the consumer code and actually uses it. I didn't run into any failures without it, but unfortunately there is a small chance of a race...
Assignee | ||
Comment 16•7 years ago
|
||
And finally, add the missing shutdown code that should have been there all along. It is a bit tricky because we want to ensure the render thread has finished all its tasks before we remove any lingering textures.
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f3a7290c590d109202fb61340ecff91c5af8b83
Assignee | ||
Comment 17•7 years ago
|
||
Attachment #8969372 -
Attachment is obsolete: true
Attachment #8969745 -
Flags: review?(sotaro.ikeda.g)
Assignee | ||
Comment 18•7 years ago
|
||
Attachment #8969375 -
Attachment is obsolete: true
Attachment #8969747 -
Flags: review?(sotaro.ikeda.g)
Assignee | ||
Comment 19•7 years ago
|
||
Comment on attachment 8969371 [details] [diff] [review]
0001-Bug-1453801-Part-1.-Ensure-WebRenderUserData-objects.patch, v1
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=80cba9c2fe8009cba5ebabaae4c4498eb1c93689&selectedJob=174820076
Attachment #8969371 -
Flags: review?(sotaro.ikeda.g)
Assignee | ||
Comment 20•7 years ago
|
||
I see why the original change made things worse instead of fixing them. Correcting the image lifetime issue on the content process caused us to redecode the images more frequently. Since the rendering thread was keeping the old images alive, more images decoded will mean the memory footprint keeps getting higher. Hence why I needed to fix all of the bugs to see the improvement.
Updated•7 years ago
|
Attachment #8969371 -
Flags: review?(sotaro.ikeda.g) → review+
Updated•7 years ago
|
Attachment #8969745 -
Flags: review?(sotaro.ikeda.g) → review+
Comment 21•7 years ago
|
||
Comment on attachment 8969747 [details] [diff] [review]
0003-Bug-1453801-Part-3.-Fix-race-condition-shutting-down.patch, v2
Looks good!
Attachment #8969747 -
Flags: review?(sotaro.ikeda.g) → review+
Comment 22•7 years ago
|
||
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f3eeeae62a5a
Part 1. Ensure WebRenderUserData objects are freed after a tab switch. r=sotaro
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9f4fd7170ad
Part 2. Ensure shared surfaces are properly released from render texture cache. r=sotaro
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1479b21a284
Part 3. Fix race condition shutting down the render thread and shared surfaces. r=sotaro
Comment 23•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f3eeeae62a5a
https://hg.mozilla.org/mozilla-central/rev/c9f4fd7170ad
https://hg.mozilla.org/mozilla-central/rev/d1479b21a284
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in
before you can comment on or make changes to this bug.
Description
•