Crash in [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::gfx::SourceSurfaceRawData::GuaranteePersistance]
Categories
(Core :: Graphics: ImageLib, defect, P2)
Tracking
()
People
(Reporter: philipp, Assigned: aosmond)
References
Details
(Keywords: crash, perf, perf-alert)
Crash Data
Attachments
(3 files)
This bug is for crash report bp-3a13251a-bfd9-4ec0-a9b3-20da40191217.
Top 10 frames of crashing thread:
0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 mozglue.dll moz_xmalloc memory/mozalloc/cxxalloc.h:33
3 xul.dll mozilla::gfx::SourceSurfaceRawData::GuaranteePersistance gfx/2d/SourceSurfaceRawData.cpp:39
4 xul.dll void mozilla::gfx::DrawTargetCaptureImpl::FillRect gfx/2d/DrawTargetCapture.cpp:196
5 xul.dll mozilla::layers::FillRectWithMask gfx/layers/basic/BasicLayersImpl.cpp:148
6 xul.dll void mozilla::layers::BasicImageLayer::Paint gfx/layers/basic/BasicImageLayer.cpp:81
7 xul.dll void mozilla::layers::BasicLayerManager::PaintSelfOrChildren gfx/layers/basic/BasicLayerManager.cpp:703
8 xul.dll void mozilla::layers::BasicLayerManager::PaintLayer gfx/layers/basic/BasicLayerManager.cpp
9 xul.dll void mozilla::layers::BasicLayerManager::PaintSelfOrChildren gfx/layers/basic/BasicLayerManager.cpp:723
this oom crash signature is on the rise since mid-november 2019 - commonly hitting 32bit versions of the browser. most of the time urls submitted with these reports are pointing to facebook.com.
Comment 1•5 years ago
|
||
@andrew: This seems to have been occuring for a long time and I'm guessing some facebook site changes have increased the frequency as beta and release increased in frequency at the same date. Is there anything else you can see here?
We could also try contacting facebook if there's any useful info we could give them.
Assignee | ||
Comment 3•5 years ago
|
||
SourceSurfaceRawData is used in a lot of corner cases. Whenever we call GuaranteePersistence, we do a copy of the buffer to ensure the data remains unchanged.
Image surfaces may be wrapped by SourceSurfaceRawData, especially when the basic compositor is in use on Windows (I notice almost all of the crashes are for non-WebRender). This is because an image will be stored in volatile memory, and when we "optimize" the image after we finish decoding, we will likely just leave it in a SourceSurfaceVolatileBuffer. When imgFrame::Draw is called with a DrawTargetCapture, we end up calling CreateLockedSurface and wrapping the data with a SourceSurfaceRawData.
For images, GuaranteePersistence should actually do nothing, so we should use a new source surface type to avoid that.
Assignee | ||
Comment 4•5 years ago
|
||
I'm assuming this is mostly images fault, because that seems like the most common scenario with Facebook. There are many uses of SourceSurfaceRawData.
Assignee | ||
Comment 5•5 years ago
|
||
~4/5 crashes are for non-D2D and non-WR users which should be hitting the scenario I am concerned about.
Assignee | ||
Comment 6•5 years ago
|
||
This is also a performance issue with basic compositor on Windows, since we end up copying every large image when we draw with OMTP.
Assignee | ||
Comment 7•5 years ago
|
||
When one uses SourceSurfaceRawData to wrap a data pointer, it will
perform a copy of said data if GuaranteePersistence is called. This is
done for DrawTargetCapture, which is used with OMTP. Prior to this
patch, image surfaces would be wrapped by a SourceSurfaceRawData when
using the basic compositor on any non-Linux platform (since Linux does
not support volatile memory). This means every time imgFrame::Draw is
called with OMTP, a copy of the image will be made. Images don't need
this property since the data is already going to persist, so this patch
adds a new class SourceSurfaceMappedData, which takes a ScopedMap
keeping the underlying surface open and readable.
Comment 9•5 years ago
|
||
bugherder |
Reporter | ||
Comment 10•5 years ago
|
||
we got another crash from today's nightly including the fix: bp-ce6a6787-74c7-426b-bd1a-e184a0191227
Assignee | ||
Comment 11•5 years ago
|
||
(In reply to [:philipp] from comment #10)
we got another crash from today's nightly including the fix: bp-ce6a6787-74c7-426b-bd1a-e184a0191227
I don't expect it to fix it entirely, but hopefully it will reduce the rate.
Updated•5 years ago
|
Comment 12•5 years ago
|
||
I'm not seeing any obvious improvement in the rates for Beta73 vs. Beta72. Should we reopen?
Assignee | ||
Comment 13•5 years ago
|
||
The vast majority of these crashes are on x86. OMTP is known to have a higher memory footprint than non-OMTP due to how capturing off the main thread works, and OOM is precisely the problem. Our current plan is just to disable OMTP on that architecture.
Assignee | ||
Comment 14•5 years ago
|
||
Assignee | ||
Comment 15•5 years ago
|
||
Disabling the pref by default allows users who really want it to still get it, rather than forcing it off in gfxPlatform::InitOMTPConfig.
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 16•5 years ago
|
||
Comment 17•5 years ago
|
||
Backed out changeset f36e74d8c86c (Bug 1604535) for causing windows 7 reftest failures.
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=285411003&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=285402782&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=285402737&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/0ed55d3d6cc4380e708cf4a1249b48edb6b2a372
Assignee | ||
Comment 18•5 years ago
|
||
try with fuzz: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d90b7a892d9e17d2fe4731864a862886d21b71eb
Comment 19•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 20•5 years ago
|
||
Looking good for uplift potential assuming it sticks on nightly.
try on beta: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f113cbedf075df9ec4fa0ac1f70c1d83ec9fdb40
Comment 21•5 years ago
|
||
bugherder |
Assignee | ||
Comment 22•5 years ago
|
||
Comment 23•5 years ago
|
||
Assignee | ||
Comment 24•5 years ago
|
||
Let's see if the images missing fixes related to OMTP I've landed somehow are related to these OOMs. The reports came in around the same time. If not, I will reland near the start of the cycle, and also limiting it by CPU count (<= 2 && 32-bit, no OMTP). Our CI infrastructure has more cores than that, so won't trip the perf regressions for most users with a higher end machine that just happened to be using the 32-bit binary, and there is unlikely to be much benefit to OMTP with 2 or fewer cores given how many threads/processes we already have. This would represent 75% of the crashes.
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 25•5 years ago
|
||
Backout got merged: https://hg.mozilla.org/mozilla-central/rev/f97c48da9cee
Comment 26•5 years ago
|
||
== Change summary for alert #24827 (as of Fri, 31 Jan 2020 18:18:53 GMT) ==
Regressions:
4% tscrollx windows7-32-shippable opt e10s stylo 0.69 -> 0.72
Improvements:
37% tsvgx windows7-32-shippable opt e10s stylo 196.87 -> 123.82
25% tsvgr_opacity windows7-32-shippable opt e10s stylo 159.90 -> 119.59
17% tsvg_static windows7-32-shippable opt e10s stylo 62.45 -> 51.62
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24827
Comment 27•5 years ago
|
||
== Change summary for alert #24834 (as of Mon, 03 Feb 2020 09:39:09 GMT) ==
Regressions:
7% Heap Unclassified windows7-32 opt 34,915,886.27 -> 37,403,325.66
7% Heap Unclassified windows7-32-shippable opt 34,979,821.17 -> 37,365,810.32
5% Heap Unclassified windows7-32 opt tp6 51,462,141.78 -> 53,909,844.11
4% Heap Unclassified windows7-32-shippable opt tp6 51,327,485.14 -> 53,386,403.55
4% Resident Memory windows7-32-shippable opt 590,927,739.53 -> 614,376,924.49
4% Resident Memory windows7-32 opt 592,328,328.40 -> 614,356,772.50
4% Resident Memory windows7-32 opt tp6 656,602,956.46 -> 679,811,951.59
3% Resident Memory windows7-32-shippable opt tp6 661,003,009.91 -> 678,726,640.96
3% Resident Memory windows7-32 opt tp6 659,576,248.55 -> 676,985,456.29
Improvements:
23% Images windows7-32 opt 6,059,289.41 -> 4,661,818.94
21% Images windows7-32-shippable opt 6,109,727.16 -> 4,856,316.79
12% Images windows7-32-shippable opt tp6 7,786,851.52 -> 6,874,368.23
11% Images windows7-32 opt tp6 7,644,862.43 -> 6,805,141.18
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24834
Updated•5 years ago
|
Assignee | ||
Comment 28•5 years ago
|
||
From 73 RC crash reports, we can see the images missing bugs I resolved related to OMTP do not appear related to this issue. I will reland disabling this, albeit for a much more selective population. This should not affect CI due to it having sufficient cores and memory available to pass the checks.
Comment 29•5 years ago
|
||
Comment 30•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 31•4 years ago
|
||
This decision was likely for the best.
But couldn't you reword the warning a bit better?
"Not supported on 32-bit with <= 2 cores" made me think to a transient bug like that time dav1d was pulled in.
Though.. Shouldn't the main criteria just be memory?
Like, even more so if I was running 64 bit firefox, 2GB of total RAM would be very little.
Description
•