Open Bug 1541092 Opened 6 years ago Updated 2 years ago

Crash in [@ OOM | small] in nsDisplayBackgroundImage::AppendBackgroundItemsToTop

Categories

(Core :: Web Painting, defect, P5)

66 Branch
All
Windows
defect

Tracking

()

Tracking Status
firefox-esr60 --- unaffected
firefox66 --- wontfix
firefox67 - wontfix
firefox68 - affected

People

(Reporter: philipp, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression, Whiteboard: [tbird crash])

Crash Data

This bug is for crash report bp-e4e5db17-4841-49bf-b708-d43ae0190402.

Top 10 frames of crashing thread:

0 xul.dll NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:603
1 xul.dll nsDisplayBackgroundImage::AppendBackgroundItemsToTop layout/painting/nsDisplayList.cpp:3751
2 xul.dll bool nsFrame::DisplayBackgroundUnconditional layout/generic/nsFrame.cpp:2313
3 xul.dll nsFrame::DisplayBorderBackgroundOutline layout/generic/nsFrame.cpp:2335
4 xul.dll nsLeafBoxFrame::BuildDisplayList layout/xul/nsLeafBoxFrame.cpp:93
5 xul.dll nsImageBoxFrame::BuildDisplayList layout/xul/nsImageBoxFrame.cpp:289
6 xul.dll nsIFrame::BuildDisplayListForStackingContext layout/generic/nsFrame.cpp:3085
7 xul.dll nsIFrame::BuildDisplayListForChild layout/generic/nsFrame.cpp:3772
8 xul.dll static void DisplayLine layout/generic/nsBlockFrame.cpp:6423
9 xul.dll nsBlockFrame::BuildDisplayList layout/generic/nsBlockFrame.cpp:6514

[@ OOM | small] is a catch-all signature for out-of-memory crashes.
i'm filing this bug specifically for reports with nsDisplayBackgroundImage::AppendBackgroundItemsToTop in the stack, as they have gotten more prevalent in firefox 66 (~200 daily reports):
https://crash-stats.mozilla.com/signature/?release_channel=release&proto_signature=~nsDisplayBackgroundImage%3A%3AAppendBackgroundItemsToTop&product=Firefox&signature=OOM%20%7C%20small&date=%3E%3D2019-02-01#graphs

200 crashes a day on release is high enough that I'd like to keep this open in case we come up with a fix in time for a dot release.

I can see the equivalent spike in beta, when 66 merged. Nightly is harder, since the volume is so much lower, but it looks like it picked around the 16th of December.

https://crash-stats.mozilla.com/signature/?release_channel=nightly&proto_signature=~nsDisplayBackgroundImage%3A%3AAppendBackgroundItemsToTop&product=Firefox&signature=OOM%20%7C%20small&date=%3E%3D2018-12-01T21%3A38%3A00.000Z&date=%3C2018-12-20T21%3A38%3A00.000Z#graphs

Changes from around that date: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2018-12-12&enddate=2018-12-17

The most like thing in that range is bug 1512213 (https://hg.mozilla.org/mozilla-central/rev/2cfb0caec309) since it switched the Display List arena to use 32768 chunks, which is the size we're crashing on.

Has the overall crash rate increased, or is just the rate of this signature?

I can't see any interesting correlations in the crash reports (including URLs), nor can I reproduce any leaking display list memory.

It seems like that the larger chunk size just increases the likelyhood of this being the callsite that fails when we run out of memory, and isn't indicative of a painting-specifc bug.

Sylvestre this bug is not about OOM|small as a whole.

(In reply to Matt Woodrow (:mattwoodrow) from comment #2)

Has the overall crash rate increased, or is just the rate of this signature?

yes, we started seeing an increase of [@ OOM | small] reports in firefox 66 - around ~1k more reports per day than in 65 (crashes with nsDisplayBackgroundImage::AppendBackgroundItemsToTop would only explain a subset of this spike though, i didn't find a clear cause for the rest of them).

Matt, or Miko, anything you can do to improve the crash rate for this in 67 or 68?

Flags: needinfo?(mikokm)
Flags: needinfo?(matt.woodrow)

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #5)

Matt, or Miko, anything you can do to improve the crash rate for this in 67 or 68?

This is a bit awkward situation. In the crash report in the summary, the reporter has 4.72GB / RAM (44% used) and 117MB swap available. Because the reporter is running 64bit Windows 10, the maximum memory available for the process should be at least 128GB. Despite this, a 32 KB memory allocation is failing.
In the crash report, there are also GraphicsCriticalErrors about failed allocations, so this seems like a more general memory allocation problem at least in this case.

We could half the allocator chunk size to 16KB, or back to previous 8KB. But it is likely that this would just change the crash signature by crashing a little later.

Looking at the aggregations, it seems that crashes with this signature happen 99.95% of the time on Windows. Maybe there is a platform specific bug in the memory allocators (nsPresArena, ArenaAllocator, jemalloc) that we are using?

Flags: needinfo?(mikokm)

Hi Matt -- Since this is tracking-beta, relman is looking for an owner, and you are the best choice. But IIUC, the problem here is not actionable, practically speaking. Please chime in if you think there is a practical action we can take before 67 goes to Release (or during the 68 timeframe). Thanks!

Assignee: nobody → matt.woodrow

I don't think there's anything actionable here, the underlying change that caused it just changed our allocation chunking size, and shouldn't have affected the total memory use.

I don't think we need to track beta for this, what do you think Liz?

Flags: needinfo?(matt.woodrow) → needinfo?(lhenry)

(In reply to Matt Woodrow (:mattwoodrow) from comment #8)

I don't think there's anything actionable here, the underlying change that caused it just changed our allocation chunking size, and shouldn't have affected the total memory use.

I don't think we need to track beta for this, what do you think Liz?

Untracking for beta and marking as wontfix for 66 and 67 since the bug is not actionable at the moment.

The priority flag is not set for this bug.
:mattwoodrow, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(matt.woodrow)
Priority: -- → P5

Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See What Do You Triage for more information

Priority: P5 → P2

Our bots are a little bit opinionated! This bug should stay quiet now.

Priority: P2 → P5
QA Whiteboard: qa-not-actionable

There are also Thunderbird crashes, eg. bp-602044bb-ed93-4b8a-a98e-e6c8a0211003, although the numbers are very low.

Whiteboard: [tbird crash]
Has Regression Range: --- → yes

The bug assignee didn't login in Bugzilla in the last 7 months and this bug has severity 'critical'.
:miko, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: matt.woodrow → nobody
Flags: needinfo?(mikokm)
Flags: needinfo?(mikokm)
Severity: critical → S2

Hmm, this bug is tagged with the
[@ OOM | small ] signature, so it tracks all those crashes, but there are only a handful of crashes in AppendBackgroundItemsToTop. I'm lowering the severity based on that. If we want a bug to track all small OOMs then this isn't it.

Severity: S2 → S3
You need to log in before you can comment on or make changes to this bug.