Some images in Google Docs sometimes disappear, turn white/blank
Categories
(Core :: Graphics: WebRender, defect, P1)
Tracking
()
People
(Reporter: cpeterson, Assigned: aosmond)
References
(Depends on 1 open bug, Regression)
Details
(Keywords: crash, regression)
Attachments
(5 files)
(deleted),
image/png
|
Details | |
(deleted),
application/x-gzip
|
Details | |
(deleted),
application/x-gzip
|
Details | |
(deleted),
application/x-gzip
|
Details | |
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta-
RyanVM
:
approval-mozilla-esr91+
|
Details |
When reading Google Docs that contain images I've pasted into the docs, sometimes some of the images turn white/blank. The problem usually seems to affect images that I pasted into the doc minutes or hours earlier that day, not when I immediately paste the image into the doc. See the attached screenshot.
This problem seems to have gotten worse during the 90 Nightly cycle. I only see this problem when running a 32-bit Firefox build (on 64-bit Windows OS). I typically have about 5-10 Google Docs tabs open at the same time. Perhaps this is a graphics OOM issue like bug 1703839?
I see this problem with or without Fission.
Comment 1•4 years ago
|
||
Potentially you could use the dev tools to find the uri of the image in question and then use a verbose memory report to find that uri, it would have information about what imagelib thinks is the status of the image.
Updated•4 years ago
|
Comment 2•4 years ago
|
||
A full memory report would be interesting too.
Assignee | ||
Comment 3•4 years ago
|
||
If the raster image fails to decode because of an OOM in the content process, then the canvas probably shows nothing. Bug 1703839 and friends are a little different, where the GPU process ran out of virtual memory -- the image would have been decoded -- and we would crash the GPU process explicitly in that case.
Reporter | ||
Comment 4•4 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #1)
Potentially you could use the dev tools to find the uri of the image in question and then use a verbose memory report to find that uri, it would have information about what imagelib thinks is the status of the image.
The inspector shows the blank image element is:
<image xlink:href="blob:https://docs.google.com/f90b21d2-359c-4ef7-abc6-7d790fc2bc09" width="100%" height="100%" preserveAspectRatio="none" transform="scale(0.0005)" transform-origin="0 0"><title></title><desc></desc></image>
The web console warns:
Cannot access blob URL “blob:https://docs.google.com/f90b21d2-359c-4ef7-abc6-7d790fc2bc09” from a different agent cluster.
Cannot access blob URL “blob:https://docs.google.com/f90b21d2-359c-4ef7-abc6-7d790fc2bc09” from a different agent cluster.
Info about that blob URL in about:memory:
├───31,045,584 B (08.28%) -- images
│ ├──30,993,104 B (08.27%) -- content
│ │ ├──20,966,256 B (05.59%) -- raster/used
│ │ │ ├──14,882,720 B (03.97%) -- progress=18f
│ │ │ │ ├───6,279,440 B (01.68%) -- image(2048x503, blob:https://docs.google.com/f90b21d2-359c-4ef7-abc6-7d790fc2bc09)
│ │ │ │ │ ├──6,021,312 B (01.61%) -- unlocked/types=2000
│ │ │ │ │ │ ├──3,178,592 B (00.85%) -- surface(1797x442)
│ │ │ │ │ │ │ ├──3,178,496 B (00.85%) ── decoded-nonheap
│ │ │ │ │ │ │ └─────────96 B (00.00%) ── decoded-heap
│ │ │ │ │ │ └──2,842,720 B (00.76%) -- surface(1702x417)
│ │ │ │ │ │ ├──2,842,624 B (00.76%) ── decoded-nonheap
│ │ │ │ │ │ └─────────96 B (00.00%) ── decoded-heap
│ │ │ │ │ └────258,128 B (00.07%) ── source
...
514,942 B (100.0%) -- memory-blob-urls
└──514,942 B (100.0%) -- owner(https://docs.google.com/document/u/1/d/14THPtVTl3jwdvkrnuxybn678N3F0-84-VjsMR4wlq0c/edit)
├──257,471 B (50.00%) ── blob:https://docs.google.com/e88b8584-6cef-481a-9d88-2bf5fdce79d6
└──257,471 B (50.00%) ── blob:https://docs.google.com/f90b21d2-359c-4ef7-abc6-7d790fc2bc09
Also, the GPU process section of about:memory warns:
WARNING: the following values are negative or unreasonably large.
explicit/gfx
explicit/gfx/webrender
explicit/gfx/webrender/swgl
explicit/heap-unclassified
This indicates a defect in one or more memory reporters. The invalid values are highlighted.
Explicit Allocations
476,663,808 B (100.0%) -- explicit
├──1,371,045,144 B (287.63%) -- gfx [?!]
│ ├──1,371,045,144 B (287.63%) -- webrender [?!]
│ │ ├──1,343,361,312 B (281.83%) ── swgl [?!]
...
└──-903,281,400 B (-189.50%) ── heap-unclassified [?!]
Reporter | ||
Comment 5•4 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #2)
A full memory report would be interesting too.
Is attaching the full memory report to a public bug safe?
I saved a full memory report file, but the memory report is from my regular user profile, which is logged into my Google and Bugzilla accounts. I saved both a full and "anonymized" version of the memory report.
Comment 6•4 years ago
|
||
The memory report looks okay, I'm thinking the "different agent cluster" might be the reason. You could try flipping privacy.partition.bloburl_per_agent_cluster to false and see if it still happens.
Reporter | ||
Comment 7•4 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #4)
The web console warns:
Cannot access blob URL “blob:https://docs.google.com/f90b21d2-359c-4ef7-abc6-7d790fc2bc09” from a different agent cluster. Cannot access blob URL “blob:https://docs.google.com/f90b21d2-359c-4ef7-abc6-7d790fc2bc09” from a different agent cluster.
@ Baku, could this "image pasted into a Google Doc is blank" bug be a privacy.partition.bloburl_per_agent_cluster
bug?
Nika says: "If you're getting that error, it doesn't look like [a WebRender OOM]. Sounds like a privacy.partition.bloburl_per_agent_cluster
bug. perhaps the blob url is being created by a gdocs service worker or shared worker."
kmag says: "I think BroadcastChannel may be a possibility too... If there are multiple docs tabs and they're trying to conserve resources."
Reporter | ||
Comment 8•4 years ago
|
||
I have been testing with privacy.partition.bloburl_per_agent_cluster
= false for two days and I just reproduced the blank images problem again. I did have Fission disabled for the last two days and I re-enabled it just this morning and minutes later reproduced this bug. Perhaps there is a connection with Fission, but I know I have seen this bug without Fission (comment 0).
Updated•4 years ago
|
Reporter | ||
Comment 10•3 years ago
|
||
I still see this disappearing image problem almost every day using a 32-bit build of Firefox Nightly on Windows 10 and 11 (with or without SW-WR), even when I don't have a ton of tabs or Google Docs open.
Comment 11•3 years ago
|
||
We limit the surface cache in image lib to 1GB in 32 bit builds
so you could be hitting that without hitting OOMs.
What if you set image.mem.surfacecache.max_size_kb to 1024*1024 (1 GB) or smaller in a 64 bit build, does that make it show up more often in 64 bit builds?
Reporter | ||
Comment 12•3 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #11)
What if you set image.mem.surfacecache.max_size_kb to 1024*1024 (1 GB) or smaller in a 64 bit build, does that make it show up more often in 64 bit builds?
I tested a 64-bit build with image.mem.surfacecache.max_size_kb = 512 * 1024 (smaller than the 32-bit default 1024 * 1024 to try to make the problem easier to reproduce) for three days and couldn't reproduce the disappearing image problem.
I wonder if the problem is related to 32-bit and multiple Fission content processes? I will test with 32-bit and Fission disabled.
Reporter | ||
Comment 13•3 years ago
|
||
I tested a 64-bit build with image.mem.surfacecache.max_size_kb = 512 * 1024 (smaller than the 32-bit default 1024 * 1024 to try to make the problem easier to reproduce) for three days and couldn't reproduce the disappearing image problem.
I did see some images disappear (in particular, the slide preview thumbnails in Google Presentations), but they always reloaded when I scrolled the page. In contrast, when images disappeared in 32-bit build, I had to reload the page to get them to reappear.
Reporter | ||
Comment 14•3 years ago
|
||
I reinstalled a 32-bit build and reproduced the problem in less than an hour of using Google Docs (pasting screenshots into a doc). I then disabled Fission (and reduced the e10s process limit from 8 to 1) and again reproduced the problem in less than an hour.
Perhaps this problem is related to 32-bit's image.mem.surfacecache.max_size_kb limit 1024 * 1024, but I wasn't able to reproduce the problem with a 64-bit build and image.mem.surfacecache.max_size_kb = 512 * 1024 after even a couple days of using Google Docs.
Reporter | ||
Comment 15•3 years ago
|
||
I still see this bug every day when using 32-bit Windows Firefox. This bug makes Google Docs practically unusable with 32-bit. I want to use 32-bit Nightly to help find unique bugs (like this one), but I have to revert to 64-bit to do work.
Comment 16•3 years ago
|
||
Can you get a new about:memory report when this happens? If we're still getting bad numbers for WebRender we should fix that.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
|
||
Comment 4 seems very fishy to me, as if we decoded it, but was unable to use it in the rendering pipeline in the GPU process. If we fail to map it in, we OOM abort the process, but maybe we don't even try that because something failed earlier in the pipeline?
Chris, if you could flip image.mem.debug-reporting
to true, and capture the memory report again (check verbose as well) for a missing image, I would be most interested in its contents. There will be relevant bits included in the explicit/images
section of the content process, and possibly in the gfx.webrender.images
section of the compositor process.
Assignee | ||
Comment 18•3 years ago
|
||
To explain a bit more what I'm looking for:
│ │ │ │ ├────4,512 B (00.00%) -- image(16x16, resource://gre-resources/loading-image.png)
│ │ │ │ │ ├──4,240 B (00.00%) -- locked/types=400/surface(16x16, external_id:1200000001, compositor_ref:1)
│ │ │ │ │ │ ├──4,096 B (00.00%) ── decoded-nonheap
│ │ │ │ │ │ └────144 B (00.00%) ── decoded-heap
│ │ │ │ │ └────272 B (00.00%) ── source
The presence of an external_id and non-zero compositor_ref will at least tell us that it should be able to be mapped in by the compositor process.
Under gfx/webrender/images/owner_cache_missing
in the content process, you will find references to images that the compositor process knows about but no longer have entries in the surface cache. These should go away with time, and a good idea to double check to make sure there aren't too many (which would suggest a leak). I would not expect this number to be zero due to the extra layer of caching by ImageContainers (which I will be removing!).
Under gfx/webrender/images/mapped_from_owner
in the compositor process, you will find a list of all of the images mapped into the compositor process. You should be able to cross reference using the external_id
to find the corresponding entry in the content process (either under explicit/images
or gfx/webrender/images/owner_cache_missing
). It would be interesting to see how many images / how much memory appears to be mapped in via the summary here as well.
Reporter | ||
Comment 19•3 years ago
|
||
Here is a memory report (with verbose and image.mem.debug-reporting = true) after some images disappeared on Google Slides.
Caveat: after I saved the memory report and switched back to the Google Slides tab, the images had reloaded. So I don't know if the memory report was taken before or after the images reloaded.
Comment 20•3 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #20)
Created attachment 9246916 [details]
Google_Slides_missing_image_memory-report.json.gzHere is a memory report (with verbose and image.mem.debug-reporting = true) after some images disappeared on Google Slides.
Caveat: after I saved the memory report and switched back to the Google Slides tab, the images had reloaded. So I don't know if the memory report was taken before or after the images reloaded.
In this report you have a bunch of memory under swgl. Are you intentionally using swgl?
Reporter | ||
Comment 21•3 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #21)
In this report you have a bunch of memory under swgl. Are you intentionally using swgl?
Yes. I've been running with gfx.webrender.software
= true since last year to dogfood test swgl (since most Nightly users will be using Accelerated WR and 64-bit builds). I can disable swgl and retest with Accelerated WR.
Reporter | ||
Comment 22•3 years ago
|
||
I disabled swgl (by restting gfx.webrender.software
= false and restarting) and I still reproduced the missing Google Slide images with Accelerated WR. Here's a new memory report with verbose, image.mem.debug-reporting, and Accelerated WR.
Assignee | ||
Comment 23•3 years ago
|
||
There is nothing suspicious in those memory reports unfortunately. My hypothesis that perhaps we were not getting the necessary bindings in the compositor process seems false. As part of my investigation, I filed bug 1737257 and bug 1737277, with patches to fix minor issues with SVGImageElement, which I suppose could be related, but unlikely.
Assignee | ||
Comment 24•3 years ago
|
||
Another thing you can try is flipping the pref image.mem.shared.unmap.force-enabled
(restart required) when testing on 64-bit builds. On 32-bit builds we will unmap the memory for images out of the compositor process when we hit memory pressure and/or just hit the expiry timer for unused images. On 64-bit builds we do not do this, and as such as a difference between the two variants. You can force the same behaviour on with 64-bit builds via said pref. You won't hit the address space pressure, but it should unmap images over time.
Assignee | ||
Comment 25•3 years ago
|
||
Also set image.mem.shared.unmap.min_threshold_kb
to 0 to more closely mirror the 32-bit behaviour. image.mem.shared.unmap.min_expiration_ms
should be the same on both, but perhaps you can lower the expiration period to be more aggressive about the unmapping to aid reproduction.
Reporter | ||
Comment 26•3 years ago
|
||
I was able to reproduce the problem easily using a 64-bit Firefox and just image.mem.shared.unmap.force-enabled
= true, leaving image.mem.shared.unmap.min_threshold_kb
, image.mem.shared.unmap.min_expiration_ms
, and gfx.webrender.software
at their default values. (I was also able to reproduce the problems a little more easily when changing those other prefs, but they weren't required.)
Here is a new memory report with just image.mem.shared.unmap.force-enabled
= true.
Assignee | ||
Comment 27•3 years ago
|
||
Okay, that's a useful smoking gun. I've already reviewed SourceSurfaceSharedDataWrapper/RenderSharedSurfaceTextureHost/SharedSurfacesParent, but clearly somewhere in those 3 classes something is broken.
Assignee | ||
Comment 28•3 years ago
|
||
This should be returning DATA_SHARED_WRAPPER
to be used in:
I think non-blob recording uses of these images are fine. This is crazy.
Assignee | ||
Comment 29•3 years ago
|
||
When replaying blob recordings, we need to ensure that we map and unmap
the surface properly so that we don't accidently expire the mapping too
early to free up virtual memory on 32-bit Firefox. The code was written
but not activated because we failed to use the wrong type in the
SourceSurfaceSharedDataWrapper::GetType implementation.
Assignee | ||
Comment 30•3 years ago
|
||
The images not showing up is when we unmapped the image before we tried to create a Skia binding for the surface.
There must be Skia crash signatures related to this on the flip side, where it was still mapped in when we started replaying the blob, but it got unmapped in the middle of the Skia draw operation.
Assignee | ||
Comment 31•3 years ago
|
||
Crashes such as:
https://crash-stats.mozilla.org/report/index/b6729abf-38e8-428e-aaf4-1b6e90211022#tab-bugzilla
are likely because of this bug.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Comment 32•3 years ago
|
||
Assignee | ||
Comment 33•3 years ago
|
||
Comment on attachment 9247342 [details]
Bug 1709679 - Properly map/unmap SourceSurfaceSharedDataWrapper for blob recordings.
Beta/Release Uplift Approval Request
- User impact if declined: Users on 32-bit operating systems may sometimes see missing images, or experience a low volume crash in the compositor process (either parent or GPU process).
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch forces us to explicitly map/unmap shared surfaces for blob images, effectively forcing a lock and unlock on the pixel data. This is a strict improvement over today where it is nondeterministic what will happen. The scope of code which is impacted is small and well understood.
- String changes made/needed:
Assignee | ||
Comment 34•3 years ago
|
||
Comment on attachment 9247342 [details]
Bug 1709679 - Properly map/unmap SourceSurfaceSharedDataWrapper for blob recordings.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Poor experience, won't see all web content as expected, particular for users who load some content, tab away and later return to it.
- User impact if declined: Users may not see images embedded in blob recordings (e.g. Google Slides is a common use case) or may experience low volume crashes as a result.
- Fix Landed on Version: 95
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch forces us to explicitly map/unmap shared surfaces for blob images, effectively forcing a lock and unlock on the pixel data. This is a strict improvement over today where it is nondeterministic what will happen. The scope of code which is impacted is small and well understood.
- String or UUID changes made by this patch:
Assignee | ||
Comment 35•3 years ago
|
||
Upping the severity as I think this warrants more attention/prioritization now that I understand what is going wrong and how common this is likely to be for users with particular patterns (effectively open a tab with blob recordings with embedded images, move away from it, and return later, on 32-bit OSes).
Assignee | ||
Comment 36•3 years ago
|
||
Assignee | ||
Comment 37•3 years ago
|
||
Chris, now that the fix is in the autoland queue, I'd appreciate if you could confirm that it fixes your problem once it makes it into the next nightly/daily :).
Comment 38•3 years ago
|
||
bugherder |
Comment 39•3 years ago
|
||
We're out of betas this cycle and this doesn't feel like an issue that warrants uplift directly into an RC without any bake time first. Let's let this ride 95 to release.
Comment 40•3 years ago
|
||
Comment on attachment 9247342 [details]
Bug 1709679 - Properly map/unmap SourceSurfaceSharedDataWrapper for blob recordings.
Leaving the ESR nomination for possible inclusion in the 91.4 release next cycle, however.
Reporter | ||
Comment 41•3 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #38)
Chris, now that the fix is in the autoland queue, I'd appreciate if you could confirm that it fixes your problem once it makes it into the next nightly/daily :).
LGTM. I've tested 32-bit Firefox for a day now and haven't seen any images disappear in Google Docs.
Thanks for tracking down this bug!
Comment 42•3 years ago
|
||
Comment on attachment 9247342 [details]
Bug 1709679 - Properly map/unmap SourceSurfaceSharedDataWrapper for blob recordings.
This has gotten a good amount of bake time with no known regressions and has been verified on 95. Approved for 91.4esr.
Comment 43•3 years ago
|
||
bugherder uplift |
Updated•3 years ago
|
Comment 44•3 years ago
|
||
Chris, could you please when time permits help us to verify this also on Firefox esr91?
Reporter | ||
Comment 45•3 years ago
|
||
(In reply to Hani Yacoub from comment #45)
Chris, could you please when time permits help us to verify this also on Firefox esr91?
Hani, where can I find the esr91 build to test?
Comment 46•3 years ago
|
||
Reporter | ||
Comment 47•3 years ago
|
||
(In reply to Hani Yacoub from comment #45)
Chris, could you please when time permits help us to verify this also on Firefox esr91?
LGTM. I tested Google Docs in 91.4.0esr-candidates build1 for a few hours today. I didn't see any missing image, so I'll mark esr91 as verified fixed.
Updated•3 years ago
|
Description
•