Closed Bug 1582954 Opened 5 years ago Closed 4 years ago

Avoid SIGBUS from shared memory allocation failures on Linux/BSD

Categories

(Core :: Graphics: WebRender, defect, P3)

Desktop
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- disabled
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- fixed

People

(Reporter: gwarser, Assigned: aosmond)

References

(Blocks 3 open bugs)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

Attached file sse2bisect.txt (deleted) —

This bug is for crash report bp-9ed48f43-c4ba-4dd1-8a2e-847cc0190921.

Top 10 frames of crashing thread:

0 libc-2.29.so __memcpy_sse2_unaligned_erms 
1 libxul.so mozilla::image::BlendAnimationFilter<mozilla::image::SurfaceSink>::DoAdvanceRow image/SurfaceFilters.h:683
2 libxul.so mozilla::image::nsGIFDecoder2::ReadLZWData image/decoders/nsGIFDecoder2.cpp:1003
3 libxul.so mozilla::Maybe<mozilla::Variant<mozilla::image::TerminalState, mozilla::image::Yield> > mozilla::image::StreamingLexer<mozilla::image::nsGIFDecoder2::State, 16ul>::ContinueUnbufferedRead<mozilla::image::nsGIFDecoder2::DoDecode image/StreamingLexer.h:554
4 libxul.so mozilla::image::nsGIFDecoder2::DoDecode image/decoders/nsGIFDecoder2.cpp:445
5 libxul.so mozilla::image::Decoder::Decode image/Decoder.cpp:133
6 libxul.so mozilla::image::AnimationSurfaceProvider::Run image/AnimationSurfaceProvider.cpp:210
7 libxul.so mozilla::image::DecodePoolWorker::Run image/DecodePool.cpp:271
8 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1225
9 libxul.so <name omitted> xpcom/threads/nsThreadUtils.cpp:486

Bisected to https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=afcc4787126167bc6c395bc11c97158724658704&tochange=5ad0fb1caddddb365936dc8e89ca85bba57c886f for now

Swizzle: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ce76fa05c90f3f24f8db09950eadd4a8cdec9088&tochange=bc58681d24079ae0bdb65c9679455042dccf0e76

I want to say that I see this very often, but today was first time I get crash report dialog. This crash can be more frequent, but not noticed.

How it looks like: my screen starts flashing black, I see some graphic distortions, cursor freezes and finally I only get notification from KWin: "Desktop effects were restarted due to a graphics reset"

Flags: needinfo?(dmalyshau)

There are crashes before the bisection. They are slightly different because they have SIGSEGV instead SIGBUS, but are otherwise very similar. They appear to have a related GPU crash, but I can't find them in crash stats.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash
Priority: -- → P3
Hardware: Unspecified → Desktop

There are crashes before the bisection.

Yes: https://crash-stats.mozilla.org/signature/?product=Firefox&signature=__memcpy_sse2_unaligned_erms%20%7C%20mozilla%3A%3Aimage%3A%3ABlendAnimationFilter%3CT%3E%3A%3ADoAdvanceRow&date=%3E%3D2019-03-27T10%3A33%3A00.000Z&date=%3C2019-09-27T10%3A33%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#reports

And not all crashes are reported(!) - nothing in about:crashes.

Here, Swizzle somehow increases chance to trigger this, and it's easily reproducible on https://blog.google/products/chrome/get-more-done-with-google-chrome/ (I was bisecting with this URL)

Crashed from GIF decoder and WebP decoder, however both in BlendAnimationFilter.

Anyway - unsupported platform/hardware, so not important.

Looks like swizzling is no longer a suspect to causing this,

Flags: needinfo?(dmalyshau)

I cannot reproduce anymore. Mozregression with --find-fix ends in 8th October, probably https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=fbbdc8a6447094a7cc5ab2cf02eafc26eeeb2f03&tochange=ac3bcdd939b430cf82492c342f13038509d1387c
I still see crashes in "Crash data" above, so crash is probably unrelated and this bug should be closed as invalid.

We still saw a crash on Oct 30th, so I don't think it can be duped against it. But probably changed a code path and avoided the problem in some cases.

Around 25 of these crashes in the last week, all on Nightly.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

The crash reason is SIGBUS and the crashing address is almost always a multiple of the page size so this is very likely to be an OOM crash were the kernel could not find a free physical page to page in.

I just experienced this issue while browsing this page. At the time of the crash, I was reading the bottom of the page and the GIF was not visible.

The GIF which is played on this webpage is: https://media.giphy.com/media/JpG2A9P3dPHXaTYrwu/giphy.gif

Crash Signature: [@ __memcpy_sse2_unaligned_erms | mozilla::image::BlendAnimationFilter<T>::DoAdvanceRow] → [@ __memcpy_sse2_unaligned_erms | mozilla::image::BlendAnimationFilter<T>::DoAdvanceRow] [@ memcpy | mozilla::image::BlendAnimationFilter<T>::DoAdvanceRow ]

WebRender makes extensive use of shared memory buffers, particularly for
images decoded in the content process. These images can be arbitrarily
large, and there being insufficient memory for an allocation must be
handled gracefully.

On Linux, we will currently crash with a SIGBUS signal during image
decoding instead of just displaying the broken image tag. This is
because the pages backing the shared memory are only allocated when we
write to them. This blocks shipping WebRender on Linux.

This patch uses posix_fallocate to force the reservation of the pages,
and allows failing gracefully if they are unavailable.

Assignee: nobody → aosmond
Status: NEW → ASSIGNED

Also [@ __memcpy_ssse3 | mozilla::image::BlendAnimationFilter<T>::DoAdvanceRow ] ? https://crash-stats.mozilla.org/report/index/055156b6-165c-4735-9b1b-740a00200626

I can reproduce tab crash by opening four copies of https://blog.google/products/chrome/get-more-done-with-google-chrome/ (2GB /dev/shm)

This does not seem to be related to screen flickering/distortions and KWin reporting graphic reset as I thought it is when I created this bug. However desktop is micro-freezing when page loads when /dev/shm has low available.

Blocks: 1640272
Summary: Crash in [@ __memcpy_sse2_unaligned_erms | mozilla::image::BlendAnimationFilter<T>::DoAdvanceRow] → Avoid SIGBUS from shared memory allocation failures on Linux/BSD
Blocks: 1245239
Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6fd096e34c9f Use posix_fallocate if available to avoid lazy allocation for shared memory. r=jld
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: