Open Bug 1421345 Opened 7 years ago Updated 2 years ago

Crash in OOM | small with BuildDisplayListForChild

Categories

(Core :: Web Painting, defect, P3)

58 Branch
All
Windows
defect

Tracking

()

Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- affected
firefox57 --- unaffected
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fix-optional

People

(Reporter: philipp, Unassigned, NeedInfo)

References

Details

(Keywords: crash, leave-open, regression, Whiteboard: [tbird crash])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-bc93d3d2-ea43-41d7-8bcf-6c7ad0171128. ============================================================= Top 10 frames of crashing thread: 0 xul.dll NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:620 1 xul.dll nsPresArena::Allocate layout/base/nsPresArena.cpp:148 2 xul.dll nsDisplayListBuilder::CreateClipChainIntersection layout/painting/nsDisplayList.cpp:1614 3 xul.dll nsDisplayListBuilder::CopyWholeChain layout/painting/nsDisplayList.cpp:1622 4 xul.dll nsDisplayListBuilder::MarkOutOfFlowFrameForDisplay layout/painting/nsDisplayList.cpp:1200 5 xul.dll nsDisplayListBuilder::MarkFramesForDisplayList layout/painting/nsDisplayList.cpp:1477 6 xul.dll nsIFrame::MarkAbsoluteFramesForDisplayList layout/generic/nsFrame.cpp:3767 7 xul.dll nsIFrame::BuildDisplayListForChild layout/generic/nsFrame.cpp:3664 8 xul.dll nsFlexContainerFrame::BuildDisplayList layout/generic/nsFlexContainerFrame.cpp:2267 9 xul.dll nsIFrame::BuildDisplayListForChild layout/generic/nsFrame.cpp:3717 ============================================================= out of memory crashes on windows with BuildDisplayListForChild in their proto signature are rising in the 58 cycle. on 58.0b there are around 150 daily more daily reports of this than before: https://crash-stats.mozilla.com/signature/?product=Firefox&submitted_from_infobar=%21__true__&proto_signature=~BuildDisplayListForChild&release_channel=beta&signature=OOM%20%7C%20small&date=%3E%3D2017-08-01#graphs during the 58 nightly cycle these crashes started spiking up around 2017-10-30: https://crash-stats.mozilla.com/signature/?product=Firefox&submitted_from_infobar=%21__true__&proto_signature=~BuildDisplayListForChild&release_channel=nightly&signature=OOM%20%7C%20small&date=%3E%3D2017-08-01#graphs
Jet, can you find someone to take a look?
Flags: needinfo?(bugs)
(wondering if we started experiments with RDL on beta yet...)
Bug 1411881 landed around the time this spiked on Nightly. Maybe related? Otherwise, below is a very rough pushlog range for around the time this regressed. A few other display list changes in there too. https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-10-29&enddate=2017-10-31
Component: Layout → Layout: Web Painting
Flags: needinfo?(bugs) → needinfo?(matt.woodrow)
Mark 58 blocking as the volume of crashes is very huge.
(In reply to Mike Taylor [:miketaylr] (58 Regression Engineering Owner) from comment #2) > (wondering if we started experiments with RDL on beta yet...) layout.display-list.retain is still false in 59.0b8. Matt, is there any correlation with refactoring in code around that may not be guarded by the pref?
Matt, any ideas?
Flags: needinfo?(matt.woodrow)
(oops, accidentally cleared ni?)
Flags: needinfo?(matt.woodrow)
I'm working on trying to narrow this down. Crash reports themselves don't show a lot, most look like normal OOM, though some seem to be OOM crashes with fairly low memory usage. It's possible that there are multiple issues causing this. Do we know if any other OOM signatures dropped around this time?
Assignee: nobody → matt.woodrow
on the beta channel the generic [@ OOM|small] signature seems to be rising in 58: https://crash-stats.mozilla.com/signature/?submitted_from_infobar=%21__true__&release_channel=beta&product=Firefox&signature=OOM%20%7C%20small&date=%3E%3D2017-06-08T12%3A14%3A13.000Z&date=%3C2017-12-08T11%3A14%3A13.000Z#graphs (when looking into that i stumbled upon the crashes described in this bug report) a signature related to memory pressure that has dropped in 58 is [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] - but since those reports don't contain much information it's hard to attribute the decline to something in particular.
Ok, that does sound like a real memory usage increase. Looking at the crash stats, it looks like this happened no more than once per Nightly until the 1022 and 1023 builds where we had two crashes, 4 in the 1027 build, 6 and 11 in the two 1029 builds, and then 14 in the 1030 build (taken from build id aggregations). It's really hard to know exactly when it started because of the amount of variance there, the 4 crashes in 1027 seems it likely had the bug, but we went back to 1 crash in 1028. Do we have any way of measuring uptake/usage of each Nightly build? I assume Nightlies from certain days of the week get more usage than others, but I don't know how to quantify that and apply it these results. Wider regression range: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-10-21&enddate=2017-10-30 If we assume the 1022 and 1023 results are significant, then the regression must have happened on 1021/1022, and there's nothing interesting (relating to display lists) there. The main retained-dl code landed on the 23rd. Bug 1405146 landed on the 25th, that one seems like it could increase the total memory used during painting (probably not a huge amount, but it depends on the page). That doesn't fit the timing perfectly, but it's possible. I've setup ASAN and DMD builds on my local Windows machine, but haven't been able to reproduce any leaking or corruption.
The daily crash reports are still quite an amount(over 100+/day). Did we see any memory regression within 1021/1022/1023 builds?
Flags: needinfo?(matt.woodrow)
I'm struggling to make progress with this. A lot of the reports really don't seem to be in a particularly low memory state at all, like this one: https://crash-stats.mozilla.com/report/index/3d6da33d-850b-49a0-8d15-afbda0180105 8.79TB left of virtual memory, 2.32GB left of physical memory, and 6.46GB left of the page file. Crashing with an OOM with those numbers seems really suspicious to me. Still trying to figure out more.
Nathan, do you have any ideas why we'd fail to allocate memory when there's still so much left? The only alternative I can think of is that ArenaChunk::header::offset is 0 (which has been seen before, in bug 1406727 comment 36) which makes ArenaChunk::Allocate return nullptr despite not really being OOM. That seems like it should be much too rare to cause this volume of crashes though.
Flags: needinfo?(matt.woodrow) → needinfo?(nfroyd)
Skimming through, some of them have a very small amount of contiguous free memory left, e.g. https://crash-stats.mozilla.com/report/index/401dc51d-8b97-4ed7-bf34-473570180109 https://crash-stats.mozilla.com/report/index/98afc26e-5a76-4ec2-89ea-0bbe60180109 https://crash-stats.mozilla.com/report/index/1ba9339b-80e3-407a-9f61-42d8b0180109 all have a largest contiguous VM block of < 2MB. If the allocation winds up requesting blocks of memory from the OS, and requests 2MB chunks when it does so (to carve larger blocks out of), you're going to be out of luck. That's just life. But that doesn't explain the crash in comment 13, or one like: https://crash-stats.mozilla.com/report/index/0020ff4d-f4af-4988-97f7-667cd0180109 which both have tons of space--total virtual/physical and large chunks of VM--unless the largest contiguous VM block measurements (see the "largest_free_vm_block" field in the Raw Dump tab) are completely out of whack. But then that'd be some massive fragmentation, given that there's so much space left. There's also things like: https://crash-stats.mozilla.com/report/index/1c1fd1a7-203a-49cf-9f1a-a7e750180109 which has ~4MB of contiguous VM space left, but still OOMs. ArenaChunk::header::offset being busted seems reasonable to me, but then I don't know what made it that way. More canaries are in order? =/
Flags: needinfo?(nfroyd)
Attached patch Check the canary during allocations (deleted) — — Splinter Review
Worth a shot at least!
Attachment #8942598 - Flags: review?(nfroyd)
Comment on attachment 8942598 [details] [diff] [review] Check the canary during allocations r+ to get this on to Beta and get crash reports back ASAP. Nathan: I'll leave a NI on you to have a look at the first reports that come back. Thx!
Flags: needinfo?(nfroyd)
Attachment #8942598 - Flags: review?(nfroyd) → review+
Comment on attachment 8942598 [details] [diff] [review] Check the canary during allocations Approval Request Comment [Feature/Bug causing the regression]: See bug 1421345. [User impact if declined]: Undiagnosed OOM crashes. [Is this code covered by automated tests?]: Yes [Has the fix been verified in Nightly?]: This is not a fix. It's diagnostic code to help identify a root cause for OOM crashes when there's still available memory. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: No. [Is the change risky?]: Low risk [Why is the change risky/not risky?]: Diagnostic code that we'll pull out before we ship to Release. [String changes made/needed]: None.
Attachment #8942598 - Flags: approval-mozilla-beta?
Comment on attachment 8942598 [details] [diff] [review] Check the canary during allocations For debug purpose. Beta58+.
Attachment #8942598 - Flags: approval-mozilla-release+
Attachment #8942598 - Flags: approval-mozilla-beta?
Attachment #8942598 - Flags: approval-mozilla-beta+
Backed out for bustage at dist/include/mozilla/ArenaAllocator.h:180:7: 'canary' was not declared in this scope: https://hg.mozilla.org/releases/mozilla-beta/rev/9579dad4492b9ce9e2be0379bae320e1f6327394 https://hg.mozilla.org/releases/mozilla-release/rev/fae7c41d40fd8ddb4d6d0ade34af7c75fef0e4d5 Push with bustage: https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&revision=814254bd1eb76533621eea0700d0182aa3121350&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Build log: https://treeherder.mozilla.org/logviewer.html#?job_id=156352910&repo=mozilla-release [task 2018-01-15T11:49:56.529Z] 11:49:56 INFO - gmake[5]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/xpcom/base' [task 2018-01-15T11:49:56.530Z] 11:49:56 INFO - /usr/bin/ccache /builds/worker/workspace/build/src/gcc/bin/g++ -std=gnu++11 -o Unified_cpp_xpcom_base0.o -c -I/builds/worker/workspace/build/src/obj-firefox/dist/stl_wrappers -I/builds/worker/workspace/build/src/obj-firefox/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -DNDEBUG=1 -DTRIMMED=1 -DOS_POSIX=1 -DOS_LINUX=1 -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/xpcom/base -I/builds/worker/workspace/build/src/obj-firefox/xpcom/base -I/builds/worker/workspace/build/src/obj-firefox/ipc/ipdl/_ipdlheaders -I/builds/worker/workspace/build/src/ipc/chromium/src -I/builds/worker/workspace/build/src/ipc/glue -I/builds/worker/workspace/build/src/xpcom/build -I/builds/worker/workspace/build/src/dom/base -I/builds/worker/workspace/build/src/xpcom/ds -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -Wall -Wc++11-compat -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wc++14-compat -Wduplicated-cond -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -D_GLIBCXX_USE_CXX11_ABI=0 -pipe -g -O3 -fomit-frame-pointer -Werror -I/builds/worker/workspace/build/src/widget/gtk/compat-gtk3 -pthread -I/builds/worker/workspace/build/src/gtk3/usr/local/include/gtk-3.0/unix-print -I/builds/worker/workspace/build/src/gtk3/usr/local/include/gtk-3.0 -I/builds/worker/workspace/build/src/gtk3/usr/local/include/gio-unix-2.0/ -I/builds/worker/workspace/build/src/gtk3/usr/local/include/cairo -I/builds/worker/workspace/build/src/gtk3/usr/local/include/pango-1.0 -I/builds/worker/workspace/build/src/gtk3/usr/local/include/atk-1.0 -I/builds/worker/workspace/build/src/gtk3/usr/local/include/cairo -I/builds/worker/workspace/build/src/gtk3/usr/local/include/pixman-1 -I/builds/worker/workspace/build/src/gtk3/usr/local/include -I/builds/worker/workspace/build/src/gtk3/usr/local/include/gdk-pixbuf-2.0 -I/builds/worker/workspace/build/src/gtk3/usr/local/include/glib-2.0 -I/builds/worker/workspace/build/src/gtk3/usr/local/lib/glib-2.0/include -I/builds/worker/workspace/build/src/gtk3/usr/include/freetype2 -I/builds/worker/workspace/build/src/gtk3/usr/include/libpng12 -fprofile-generate -MD -MP -MF .deps/Unified_cpp_xpcom_base0.o.pp /builds/worker/workspace/build/src/obj-firefox/xpcom/base/Unified_cpp_xpcom_base0.cpp [task 2018-01-15T11:49:56.531Z] 11:49:56 INFO - In file included from /builds/worker/workspace/build/src/obj-firefox/dist/include/nsPresArena.h:13:0, [task 2018-01-15T11:49:56.531Z] 11:49:56 INFO - from /builds/worker/workspace/build/src/obj-firefox/dist/include/nsIPresShell.h:38, [task 2018-01-15T11:49:56.531Z] 11:49:56 INFO - from /builds/worker/workspace/build/src/obj-firefox/dist/include/nsPresContext.h:19, [task 2018-01-15T11:49:56.531Z] 11:49:56 INFO - from /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/dom/Element.h:28, [task 2018-01-15T11:49:56.531Z] 11:49:56 INFO - from /builds/worker/workspace/build/src/dom/base/nsDOMMutationObserver.h:20, [task 2018-01-15T11:49:56.531Z] 11:49:56 INFO - from /builds/worker/workspace/build/src/xpcom/base/CycleCollectedJSContext.cpp:35, [task 2018-01-15T11:49:56.531Z] 11:49:56 INFO - from /builds/worker/workspace/build/src/obj-firefox/xpcom/base/Unified_cpp_xpcom_base0.cpp:20: [task 2018-01-15T11:49:56.532Z] 11:49:56 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/ArenaAllocator.h: In member function 'void* mozilla::ArenaAllocator<ArenaSize, Alignment>::ArenaChunk::Allocate(size_t)': [task 2018-01-15T11:49:56.532Z] 11:49:56 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/ArenaAllocator.h:180:7: error: 'canary' was not declared in this scope [task 2018-01-15T11:49:56.532Z] 11:49:56 INFO - canary.Check(); [task 2018-01-15T11:49:56.532Z] 11:49:56 INFO - ^~~~~~ [task 2018-01-15T11:49:56.534Z] 11:49:56 INFO - /builds/worker/workspace/build/src/config/rules.mk:1028: recipe for target 'Unified_cpp_xpcom_base0.o' failed [task 2018-01-15T11:49:56.534Z] 11:49:56 INFO - gmake[5]: *** [Unified_cpp_xpcom_base0.o] Error 1 [task 2018-01-15T11:49:56.535Z] 11:49:56 INFO - gmake[5]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/xpcom/base' [task 2018-01-15T11:49:56.535Z] 11:49:56 INFO - /builds/worker/workspace/build/src/config/recurse.mk:73: recipe for target 'xpcom/base/target' failed [task 2018-01-15T11:49:56.535Z] 11:49:56 INFO - gmake[4]: *** [xpcom/base/target] Error 2
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(bugs)
Looks like we'll also need to uplift at least the patch for bug 1406727 comment 46. I'll let Matt request the required uplifts for that one.
Flags: needinfo?(bugs)
Whiteboard: [leave open]
Keywords: leave-open
Whiteboard: [leave open]
Thanks for jumping on this! I've made the extra uplift request in bug 1406727.
Flags: needinfo?(matt.woodrow)
Priority: -- → P1
Depends on: 1430962
Per email thread, this is not going to block the 58 release.
Comment on attachment 8942598 [details] [diff] [review] Check the canary during allocations this isn't going to be on 58 after all.
Attachment #8942598 - Flags: approval-mozilla-release+
Attachment #8942598 - Flags: approval-mozilla-beta+
Moving to p3 because no activity for at least 24 weeks. See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P1 → P3

The leave-open keyword is there and there is no activity for 6 months.
:mattwoodrow, maybe it's time to close this bug?

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(matt.woodrow)

The leave-open keyword is there and there is no activity for 6 months.
:mattwoodrow, maybe it's time to close this bug?

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(matt.woodrow)

With 510 crashes in the past week, this ranks in the top 10 crashes for https://crash-stats.mozilla.org/topcrashers/?product=Firefox&version=92.0.1 - 50-50 split between x86 and amd64

There are also Thunderbird crashes, for example bp-9b5fbfa3-e6a6-4226-a1c7-aed0f0211004

Whiteboard: [tbird crash]
Assignee: matt.woodrow → nobody
Blocks: gfx-triage
Severity: critical → S3

This looks like it's continues to be a massive crash fest, even years later. Miko, since this is categorized under "Web Painting", can you provide any insights or fixes?

Flags: needinfo?(mikokm)

Note that the crash signature shown in the bugzilla UI is shared with all small OOMs.

In 100.0a1 we seem to have gotten 42 crashes (with very small population) where the stack includes BuildDisplayList. I took a look at a handful of recent crashes and it seems that most of them show either ERROR_COMMITMENT_LIMIT or ERROR_NOT_ENOUGH_MEMORY errors. The allocations are generally 32KB or less.

Some of these crashes make very little sense, for example this:

Last Error Value              ERROR_COMMITMENT_LIMIT
Total Virtual Memory          140,737,488,224,256 bytes (140.74 TB)
Available Virtual Memory      138,530,068,635,648 bytes (138.53 TB)
Available Page File           3,600,437,248 bytes (3.6 GB)
Available Physical Memory     1,042,993,152 bytes (1.04 GB)
System Memory Use Percentage  89
OOM Allocation Size           32,768 bytes (32.77 KB)

No insights unfortunately.

Flags: needinfo?(mikokm)
Flags: needinfo?(jmuizelaar)
No longer blocks: gfx-triage
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(jmuizelaar)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: