Closed
Bug 1081790
Opened 10 years ago
Closed 10 years ago
Large spike in fake OOM|small (via PushNewDT) in 20141011030203
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox34 | --- | unaffected |
firefox35 | --- | affected |
People
(Reporter: away, Unassigned)
References
Details
(Keywords: crash)
Crash Data
This bug was filed from the Socorro interface and is
report bp-57d7bf8b-3e8d-4622-9e60-d24282141011.
=============================================================
OOM|small rates by build on the nightly channel:
10 20141008030202 13 0.47 %
8 20141008065430 24 0.86 %
7 20141009030201 41 1.48 %
6 20141010030201 42 1.51 %
1 20141011030203 1748 62.95 %
2 20141012030203 729 26.25 %
Comment 1•10 years ago
|
||
Comment 2•10 years ago
|
||
5c6980f9caff Nicolas Silva — Bug 1064107 - Ensure that gfxPlatform is initialized by the time we create the compositor. r=Bas
367b155c5b5e Benoit Jacob — Bug 1080137 - WebGL2: misc fixes to make new tex formats and sized internalformats actually work - r=jgilbert
0a69bc9e746c Bas Schouten — Bug 1078693: Correctly indicate validity of a SourceSurfaceD2D1 and deal with failed surface creation. r=jrmuizel
216915390f9b jdashg — Bug 1066280 - Handle dirtying in BasicCanvasLayer. - r=mattwoodrow
I don't think we should let 35 go to Aurora like this. nical/bas, is there an obvious culprit here?
Logs: Failed to create similar cairo surface! Size: Size(35,15) Status: 1
status-firefox34:
--- → unaffected
status-firefox35:
--- → affected
tracking-firefox35:
--- → +
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(bas)
Comment 3•10 years ago
|
||
My guess would be that it comes from bug 1066280 which has a lot of changes. Although it looks like that bug affects basic layers specifically and only 5 out of 15 crashes are using basic compositing so it doesn't explain everything.
I am certain that Bug 1064107 can't have caused this.
I would be surprised that bug 1078693 be the cause, because I would expect "Failed to create software bitmap" or "Failed to readback into software bitmap" to appear in the app notes if it was the case.
Flags: needinfo?(nical.bugzilla)
Comment 4•10 years ago
|
||
Status: 1 is CAIRO_STATUS_NO_MEMORY, which makes this seem like a real OOM. Has there been a spike in other OOM crash sites too?
No, there hasn't been such a spike in other OOM signatures, and the OOM|small signature is essentially all PushNewDT.
Here's the 'app notes' facet (it's strange to split by word, but you get the idea):
8 size 6381 98.56 %
9 create 6341 97.95 %
10 surface 6335 97.85 %
11 status 6335 97.85 %
12 cairo 6335 97.85 %
13 1 6335 97.85 %
14 failed 6329 97.76 %
15 similar 6323 97.67 %
Available virtual/physical/page memory numbers aren't low, so I don't think this is a 'regular' OOM. Could that error code also indicate lack of video ram?
Comment 6•10 years ago
|
||
Is this crash spike specific to windows users that don't have direct2d?
There's a few other reaons for a CAIRO_STATUS_NO_MEMORY that could come from cairo-win32-surface.c, but none of them really stand one.
One is a GDI error (which should have been printed to stderr), not sure how Jeff's patches could have triggered GDI errors when drawing content (not webgl) though.
We try allocate in vmem (GDI DDB), but if that fails we fail back to system memory (DIB) so that shouldn't be the issue here.
(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> Is this crash spike specific to windows users that don't have direct2d?
Yes, these are all Windows and >90% are "D3D11 Layers-".
Comment 8•10 years ago
|
||
I guess it's possible that we're exhausting GDI handles or similar, but I don't see any changes that would cause us to leak that sort of thing.
Comment 9•10 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #3)
> My guess would be that it comes from bug 1066280 which has a lot of changes.
> Although it looks like that bug affects basic layers specifically and only 5
> out of 15 crashes are using basic compositing so it doesn't explain
> everything.
>
> I am certain that Bug 1064107 can't have caused this.
>
> I would be surprised that bug 1078693 be the cause, because I would expect
> "Failed to create software bitmap" or "Failed to readback into software
> bitmap" to appear in the app notes if it was the case.
So is the appnote "Logs: Failed to create similar cairo surface! Size: Size(35,15) Status: 1" unrelated to this?
I would really like to avoid backing the ShSurf changes out. It seems like it should be simpler to try backing bug 1078693 out, and that bug does touch moz2d failure cases.
I could see this being caused by something in the ShSurf patches, but I really don't touch content, and content is what's failing.
Flags: needinfo?(nical.bugzilla)
Comment 10•10 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #9)
> So is the appnote "Logs: Failed to create similar cairo surface! Size:
> Size(35,15) Status: 1" unrelated to this?
This is the place where we fail to allocate a surface, log the error in the app notes and return null, just before crashing in the caller (gfxContext::PushClip or friend). So this appnote is a symptom of failure to allocate memory, but not a cause.
>
> I would really like to avoid backing the ShSurf changes out. It seems like
> it should be simpler to try backing bug 1078693 out, and that bug does touch
> moz2d failure cases.
We can try, Bas what do you think?
>
> I could see this being caused by something in the ShSurf patches, but I
> really don't touch content, and content is what's failing.
Allocating memory is what's failing (Status: 1 means the error was CAIRO_STATUS_NO_MEMORY). Statistically this happens a lot in gfxContext::PushClip because that's where we tend to do a lot of allocations and, more importantly, often large ones. But anything in the browser could be eating up memory before we end up crashing in here.
Flags: needinfo?(nical.bugzilla)
Comment 11•10 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #9)
> (In reply to Nicolas Silva [:nical] from comment #3)
> > My guess would be that it comes from bug 1066280 which has a lot of changes.
> > Although it looks like that bug affects basic layers specifically and only 5
> > out of 15 crashes are using basic compositing so it doesn't explain
> > everything.
> >
> > I am certain that Bug 1064107 can't have caused this.
> >
> > I would be surprised that bug 1078693 be the cause, because I would expect
> > "Failed to create software bitmap" or "Failed to readback into software
> > bitmap" to appear in the app notes if it was the case.
>
> So is the appnote "Logs: Failed to create similar cairo surface! Size:
> Size(35,15) Status: 1" unrelated to this?
>
> I would really like to avoid backing the ShSurf changes out. It seems like
> it should be simpler to try backing bug 1078693 out, and that bug does touch
> moz2d failure cases.
>
> I could see this being caused by something in the ShSurf patches, but I
> really don't touch content, and content is what's failing.
Error 1 is an actual OOM error in cairo I belief. So I'd look for the cause of this in something that can actually cause runaway memory usage. Backing out 1078693 could be done as an experiment, but it seems unlikely to me that it would be the cause of the problem. Never say never though.
Flags: needinfo?(bas)
Comment 12•10 years ago
|
||
I backed out bug 1066280 from Aurora. It's still on Trunk while I nail down some known regressions.
Updated•10 years ago
|
Comment 13•10 years ago
|
||
This might be bug 1084696.
Reporter | ||
Comment 14•10 years ago
|
||
> This might be bug 1084696.
That fix hasn't reached a nightly yet, but the OOMs are back to normal volume beginning with build 20141017030201. Perhaps it was fixed by bug 1081363.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
tracking-firefox35:
+ → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•