Closed
Bug 837835
Opened 12 years ago
Closed 11 years ago
increase in crashes with EMPTY dumps in Firefox 19 and 20 cycles
Categories
(Core :: General, defect)
Core
General
Tracking
()
People
(Reporter: kairo, Assigned: benjamin)
References
(Depends on 1 open bug)
Details
(Keywords: crash, Whiteboard: [native-crash])
Crash Data
Attachments
(13 files, 1 obsolete file)
(deleted),
image/png
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details | |
(deleted),
application/x-zip-compressed
|
Details | |
(deleted),
text/x-log
|
Details |
This bug was filed from the Socorro interface and is
report bp-6f488a29-cfdd-4dca-8430-68b2b2130203 .
=============================================================
I'm filing this in Core::General for now, as we as of yet have no clue what could be behind those crashes. This also may be related to bug 830808.
Over the last versions of Firefox, crashes with the "EMPTY: no crashing thread identified; corrupt dump" signature have increased. They usually were at 4-5% of all crashes, now they're at 10-20% depending on channel, on ESR even at 45%.
We need to investigate if we can find when the volume increased on different channels and investigate if we can find a potential cause. Also, analyzing the metadata we have could give us some insight (annotations are there, thing like OS or modules not as those live in the dump).
Comment 1•12 years ago
|
||
STR added in Bug 830808
Reporter | ||
Comment 2•12 years ago
|
||
This graph shows the numbers of EMPTY crashes from all builds on the nightly channel by crash day (I don't have access to the by-build-day stuff right now) since Jan 1, 2012.
Unfortunately, it doesn't really paint a bull's eye on anything as there's an up and down here. There was definitely a regression in late May and one in September/October, and both were fixed again as well, but it hard to make out why current numbers are between 100 and 200 per day when they were between 50 and 100 in the first few months of 2012.
I'll attach a text file with the queries and raw data.
Reporter | ||
Comment 3•12 years ago
|
||
Comment 4•12 years ago
|
||
(In reply to MarioMi (:MarioMi) from comment #1)
> STR added in Bug 830808
Can you try your STR in a debugger and file a new bug with the crash signature and mark it as dependent of this one?
Updated•12 years ago
|
Flags: needinfo?(mariomihai22)
Comment 5•12 years ago
|
||
Sorry for delay Scobbi, I will try tomorrow morning and get back with results.
Flags: needinfo?(mariomihai22)
Comment 6•12 years ago
|
||
(In reply to Scoobidiver from comment #4)
> (In reply to MarioMi (:MarioMi) from comment #1)
> > STR added in Bug 830808
> Can you try your STR in a debugger and file a new bug with the crash
> signature and mark it as dependent of this one?
I tried my STR from Bug 830808 Comment 9 in a debugger but nothing weird had happened. I only got one Error in Error Console: " Permission denied to access property 'toString'". I have done theese investigations on Nightly (2013-02-04)
Comment 7•12 years ago
|
||
It also spiked in absolute values for Fennec:
* 17.0 (latest week): 2.8% 0.04 crashes/100 ADU
* 18.0.2 (latest week): 6.8% 0.11 crashes/100 ADU
* 19.0 (current week): 4.8% 0.14 crashes/100 ADU
Whiteboard: [native-crash]
Comment 8•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #2)
> I don't have access to the by-build-day stuff right now
Do you have access now? It's interesting information to find regressions.
Updated•12 years ago
|
Flags: needinfo?(kairo)
Comment 10•12 years ago
|
||
Updated•12 years ago
|
Attachment #722130 -
Attachment is obsolete: true
Comment 11•12 years ago
|
||
Comment 12•12 years ago
|
||
The Android Nightly channel chart is misleading because it doesn't take into account fixes in Aurora and Beta channels.
The increase between 19 and 20 is low:
* 19.0 Beta (last week): 5.9% 0.12 crashes/100 ADU
* 20.0 Beta (current week): 5.5% 0.15 crashes/100 ADU
Comment 13•12 years ago
|
||
My Firefox (22.0a1 (2013-03-11)) crash every mornings the last 3 days with empty data:
https://crash-stats.mozilla.com/report/index/bp-7d532715-cd12-4587-9326-3ebd92130312
Comment 14•12 years ago
|
||
(In reply to Henrik Gemal from comment #13)
> My Firefox (22.0a1 (2013-03-11)) crash every mornings the last 3 days with
> empty data:
> https://crash-stats.mozilla.com/report/index/bp-7d532715-cd12-4587-9326-
> 3ebd92130312
If it's a recent issue in Nightly, it's not this bug which is about a slight increase between two Release versions.
Please file a new bug after getting a valid stack trace (see https://developer.mozilla.org/docs/How_to_get_a_stacktrace_with_WinDbg). Try in Safe Mode before to find a faulty extension (see https://support.mozilla.org/kb/troubleshoot-firefox-issues-using-safe-mode).
Comment 15•12 years ago
|
||
The only potential smoking gun I see in Henrik's crash report is: "IsGarbageCollecting": "1"
Henrik: if you can catch your crash in a debugger and get a stack (using the link Scoobidiver gave above) that would be really helpful.
Reporter | ||
Comment 16•12 years ago
|
||
I now have access to the by-build-date numbers from Socorro, unfortunately, those aren't available as far back as I'd like as that feature only came around in August. In the available range, we don't really see the regression nicely. :(
Reporter | ||
Comment 17•12 years ago
|
||
Reporter | ||
Comment 18•12 years ago
|
||
So, Nightly and Aurora don't show the regression as nicely apparently. Beta and Release do, so I'm attaching data/graphs for "all of the Beta channel per crash date" and "all of the Release channel per crash date" as well.
Reporter | ||
Comment 19•12 years ago
|
||
Reporter | ||
Comment 20•12 years ago
|
||
Reporter | ||
Comment 21•12 years ago
|
||
Reporter | ||
Comment 22•12 years ago
|
||
And actually, both Beta and Release data point to a definite increase of EMPTY crashes in the 19 cycle (went to Beta in the second week of January 2013, released on February 19).
Reporter | ||
Comment 23•12 years ago
|
||
Oh, and 20 on Beta seems to be even worse. As Nightly 20 started at Nov 19 and it looks like there was an external issue that made us spike mostly across channels from the start of September to the end of October, and after that, we have a few Nightly values down to the level predating that, I think we need to investigate the Nightly time frame between Nov 1 and Nov 20 for the 19 regression, from the attachment 711629 [details] graph we can actually narrow down to Nov 5 and Aurora uplift of 19.
For the additional 20 regression, the same Nightly graph makes me suspect somewhere between Nov 25 and Dec 10.
Reporter | ||
Comment 24•12 years ago
|
||
Oh, and from those ranges, we can try to narrow down further by using the by-build-date attachment 725214 [details] but I'm too tired to do that today.
Comment 25•12 years ago
|
||
It's odd that MemShrink reduces the memory usage while these crashes likely OOM (e.g. bug 834667 was the cause of a spike in 17.0.2esr) increase.
Reporter | ||
Comment 26•12 years ago
|
||
Scoobidiver, what has MemShrink to do with the bug here?
Comment 27•12 years ago
|
||
Is this memory spike due to ongoing effort of Paris Bindings?
Comment 28•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #26)
> Scoobidiver, what has MemShrink to do with the bug here?
OOM = Out-of-memory.
Reporter | ||
Comment 29•12 years ago
|
||
Scoobidiver, I know what OOM is, and that's not what I asked about.
It's not even clear that this increase in bugs with empty dumps is an increase of OOM crashes, as there are clearly cases where we are not OOM where we hit empty dumps (even if we don't really know how that happens). And There is absolutely no correlation with MemShrink from what I can see, unless some very significant MemShrink work landed in the ranges I found in comment #23 and you can somehow correlate that work with higher likeliness of OOM (though I'd suspect the reverse) or if you can otherwise paint a clear picture of how those would relate to those empty dump crash increases.
henryfhchan:
Which memory spike? This is about crashes with empty dumps, not about memory per se. It's unclear if there is any relation to memory at all.
Comment 30•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #29)
> It's not even clear that this increase in bugs with empty dumps is an
> increase of OOM crashes, as there are clearly cases where we are not OOM
> where we hit empty dumps (even if we don't really know how that happens).
It's only a theory and I though the MemShrink team should be aware of this issue in case it rings them a bell. They can also find new variables to monitor in Telemetry based on that.
You can know the ratio of real OOMs with the OOMAllocationSize field in crash headers.
Comment 31•12 years ago
|
||
As Kairo said in channel meeting it's probably too late here for FF20 cycle given how late we are into it but perhaps we want to keep this on our radar for FF21.
tracking-firefox21:
--- → ?
Updated•12 years ago
|
Reporter | ||
Comment 32•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #23)
> the 19 regression, from the attachment 711629 [details] graph we can
> actually narrow down to Nov 5 and Aurora uplift of 19.
Given attachment 725214 [details] I would look at the start of that period, even at what landed for the Nightly build of Nov 4, as that looks high already there.
> For the additional 20 regression, the same Nightly graph makes me suspect
> somewhere between Nov 25 and Dec 10.
While the Nov 27 build have something, it looks more likely to be between Dec 2 and 9 in builds, I'd actually look at what landed for the Dec 2 one first.
Reporter | ||
Comment 33•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #32)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #23)
> > the 19 regression, from the attachment 711629 [details] graph we can
> > actually narrow down to Nov 5 and Aurora uplift of 19.
>
> Given attachment 725214 [details] I would look at the start of that period,
> even at what landed for the Nightly build of Nov 4, as that looks high
> already there.
Not that http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2012-11-03+00%3A00%3A00&enddate=2012-11-06+04%3A00%3A00 would point to anything particularly bad/big, though.
> > For the additional 20 regression, the same Nightly graph makes me suspect
> > somewhere between Nov 25 and Dec 10.
>
> While the Nov 27 build have something, it looks more likely to be between
> Dec 2 and 9 in builds, I'd actually look at what landed for the Dec 2 one
> first.
http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2012-12-01+00%3A00%3A00&enddate=2012-12-02+04%3A00%3A00 already has quite a list of checkins, an NSS update, CC cleanup, and the following days all kinds of additional fun, like JIT fixes, and a ton of other things. Hard to point to anything specific there.
Comment 34•12 years ago
|
||
Assigning to bsmedberg as engineering POC given his thoughts on a path forward:
"Short-term, there are some things we could try:
* reduce the minidump data request even further: don't ask for the stack memory, just ask for a list of loaded modules and memory mappings.
* annotate just the crash reason and address separately from the minidump"
Firefox 21 would be the first version where we'd make the crash changes, since we're only a week from FF20's release.
Reporter | ||
Comment 35•12 years ago
|
||
Ted commented on bug 724046, which might help us get better data from those crashes.
Comment 36•12 years ago
|
||
Hi. I don't know if what I have to say is new, of has some value.
I have a lot of this crashes. I would say more than 50% of my crashes have empty dumps. BUT my Firefox rarely crashes like 'BOOM' (when everything closes and I get that window to send the crash report). This 'BOOM' type of crash never has empty dump (I think).
I don't know when I get this empty dumps crashes. Sometimes I just type 'about:crashes' and I see that there's a crash that I didn't submit (and don't remeber of having a 'BOOM' crash). Then I try to submit it, and it's a empty one.
Comment 37•12 years ago
|
||
(In reply to Guilherme Lima from comment #36)
> I have a lot of this crashes. I would say more than 50% of my crashes have
> empty dumps. BUT my Firefox rarely crashes like 'BOOM' (when everything
> closes and I get that window to send the crash report). This 'BOOM' type of
> crash never has empty dump (I think).
They are probably Flash crashes or hangs.
https://crash-analysis.mozilla.com/bsmedberg/flash-summary.html shows an increase of crashes and hangs by about 15% between November 2012 (Flash 11.4.402.287, Firefox 16) and January 2013 (Flash 11.5.502.135, Firefox 18). That might explain a part of the empty dump increase.
Comment 38•12 years ago
|
||
The hangs that don't automatically submit are usually hangs with the plugins. They are denoted with bp-hr- and always return a 404 when I click them. Therefore i doubt that these are the cause of the increase in the empty dump number.
FYI, the crashes I have that are empty do not seem to be caused by flash (e.g. Clicking on non-flash websites)
Comment 39•12 years ago
|
||
get a stacktrace with windbg (comment 14), and you may have something useful.
Reporter | ||
Comment 40•12 years ago
|
||
(In reply to Scoobidiver from comment #37)
> They are probably Flash crashes or hangs.
> [...]
> That might explain a part of the empty dump increase.
No, I heavily disagree. My analysis in here cleanly demonstrates that there were two regression in our code, in the 19 and 20 cycles. And those "empty dump" crashes are actually not plugin crashes. Flash crashes and hangs regressed, but that's something completely different, and even in different time periods.
Comment 41•12 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #33)
> Not that
> http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2012-11-
> 03+00%3A00%3A00&enddate=2012-11-06+04%3A00%3A00 would point to anything
> particularly bad/big, though.
I suspect bug 778993 in that range.
(In reply to Scoobidiver from comment #41)
> I suspect bug 778993 in that range.
As far as I know, that code isn't present on any branch. It was backed out everywhere.
Comment 43•12 years ago
|
||
Benjamin asked a few questions in email:
"Have you looked through the URLs and comments to see if there is anything interesting? Especially limited to nightly and aurora, where the comments are often much more helpful. Do these reports typically have a list of extensions or not? If so have we run extension correlation reports?"
Flags: needinfo?(kairo)
Reporter | ||
Comment 44•12 years ago
|
||
Top URLs:
3963 about:blank
3330 https://www.facebook.com/
3127 http://www.facebook.com/
899 about:sessionrestore
885 http://www.tumblr.com/dashboard
728 about:home
549 http://assets.tumblr.com/analytics.html?748b075014045cae7cd6ac4429aded74
348 http://www.facebook.com/?ref=tn_tnmn
273 http://vk.com/feed
253 https://mail.google.com/mail/u/0/?shva=1#inbox
242 about:newtab
215 https://www.facebook.com/?ref=tn_tnmn
193 http://vk.com/audio
182 https://twitter.com/
142 http://www.facebook.com/?ref=logo
138 https://www.facebook.com/dialog/oauth?client_id=130402594779&response_type=token%2Csigned_request%2Ccode&display=none&domain=www.kingdomsofcamelot.com&origin=5&redirect_uri=https%3A%2F%2Fs-static.ak.facebook.com%2Fconnect%2Fxd_arbiter.php%3Fversion%3D19%2
137 https://mail.google.com/mail/?shva=1#inbox
119 https://www.facebook.com/?ref=logo
105 http://www.facebook.com/home.php
104 https://www.facebook.com/login.php?login_attempt=1
104 https://www.google.com/
Comments are not painting anything near to clear picture, most users are confused why it crashes, many complain that they are seeing a lot of crashes.
That's for the general population of all versions.
For Nightly, a lot of comments talk about OOM, some about loading PDFs or image-heavy pages. See comments tab on https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2013-03-28&signature=EMPTY%3A%20no%20crashing%20thread%20identified%3B%20corrupt%20dump&version=Firefox%3A22.0a1
Top URLs for Nightly:
80 about:blank
34 http://www.tumblr.com/dashboard
31 http://www.facebook.com/
26 https://www.facebook.com/
24 http://www.songbanc.com/photos
11 http://planet.mozilla.org/
9 about:newtab
8 http://www.icefilms.info/
8 http://www.dpreview.com/
8 http://movies.netflix.com/WiHome
We don't have many reports with modules, apparently, as correlation reports are only available for beta and release, and it lists very few reports that data is being taken from:
2013-03-25_Firefox_20.0-interesting-modules:
EMPTY: no crashing thread identified|EXCEPTION_ACCESS_VIOLATION_READ (22 crashes)
91% (20/22) vs. 6% (2271/40644) credssp.dll
95% (21/22) vs. 14% (5550/40644) schannel.dll
82% (18/22) vs. 5% (2026/40644) FlashPlayerPlugin_11_6_602_180.exe
95% (21/22) vs. 32% (13001/40644) Wldap32.dll
100% (22/22) vs. 44% (17827/40644) mpr.dll
100% (22/22) vs. 44% (17937/40644) sspicli.dll
100% (22/22) vs. 44% (18054/40644) ntmarta.dll
100% (22/22) vs. 49% (20002/40644) comdlg32.dll
100% (22/22) vs. 54% (22043/40644) profapi.dll
100% (22/22) vs. 57% (23046/40644) sechost.dll
100% (22/22) vs. 57% (23047/40644) CRYPTBASE.dll
100% (22/22) vs. 57% (23047/40644) KERNELBASE.dll
45% (10/22) vs. 9% (3695/40644) BrowserProtect.dll
100% (22/22) vs. 64% (25884/40644) secur32.dll
91% (20/22) vs. 56% (22759/40644) apphelp.dll
95% (21/22) vs. 61% (24949/40644) dwmapi.dll
100% (22/22) vs. 67% (27186/40644) msctf.dll
100% (22/22) vs. 71% (28811/40644) winspool.drv
95% (21/22) vs. 69% (28054/40644) lpk.dll
100% (22/22) vs. 78% (31799/40644) iertutil.dll
100% (22/22) vs. 82% (33226/40644) urlmon.dll
36% (8/22) vs. 19% (7712/40644) snxhk.dll
82% (18/22) vs. 68% (27482/40644) normaliz.dll
18% (4/22) vs. 6% (2489/40644) api-ms-win-downlevel-ole32-l1-1-0.dll
18% (4/22) vs. 7% (2651/40644) api-ms-win-downlevel-normaliz-l1-1-0.dll
18% (4/22) vs. 7% (2651/40644) api-ms-win-downlevel-version-l1-1-0.dll
18% (4/22) vs. 7% (2651/40644) api-ms-win-downlevel-user32-l1-1-0.dll
18% (4/22) vs. 7% (2654/40644) api-ms-win-downlevel-shlwapi-l1-1-0.dll
18% (4/22) vs. 7% (2655/40644) api-ms-win-downlevel-advapi32-l1-1-0.dll
9% (2/22) vs. 0% (29/40644) FlashPlayerPlugin_11_5_502_135.exe
100% (22/22) vs. 92% (37235/40644) wininet.dll
109% (24/22) vs. 101% (40993/40644) comctl32.dll
9% (2/22) vs. 2% (872/40644) RocketDock.dll
2013-03-25_Firefox_19.0.2-interesting-modules.txt.gz
EMPTY: no crashing thread identified|EXCEPTION_ACCESS_VIOLATION_READ (66 crashes)
98% (65/66) vs. 14% (21149/147984) schannel.dll
89% (59/66) vs. 5% (8107/147984) credssp.dll
80% (53/66) vs. 4% (6515/147984) FlashPlayerPlugin_11_6_602_180.exe
95% (63/66) vs. 36% (52664/147984) mpr.dll
91% (60/66) vs. 33% (48215/147984) Wldap32.dll
95% (63/66) vs. 41% (61086/147984) ntmarta.dll
100% (66/66) vs. 57% (83692/147984) comdlg32.dll
86% (57/66) vs. 47% (69849/147984) sspicli.dll
100% (66/66) vs. 62% (91926/147984) secur32.dll
86% (57/66) vs. 56% (83287/147984) profapi.dll
91% (60/66) vs. 61% (90964/147984) apphelp.dll
86% (57/66) vs. 59% (86759/147984) CRYPTBASE.dll
86% (57/66) vs. 59% (86759/147984) sechost.dll
86% (57/66) vs. 59% (86759/147984) KERNELBASE.dll
100% (66/66) vs. 74% (109111/147984) winspool.drv
98% (65/66) vs. 77% (113648/147984) msctf.dll
91% (60/66) vs. 71% (105686/147984) lpk.dll
118% (78/66) vs. 101% (149611/147984) comctl32.dll
100% (66/66) vs. 85% (126405/147984) urlmon.dll
21% (14/66) vs. 7% (10166/147984) api-ms-win-downlevel-ole32-l1-1-0.dll
21% (14/66) vs. 7% (10543/147984) api-ms-win-downlevel-normaliz-l1-1-0.dll
21% (14/66) vs. 7% (10543/147984) api-ms-win-downlevel-version-l1-1-0.dll
21% (14/66) vs. 7% (10543/147984) api-ms-win-downlevel-user32-l1-1-0.dll
21% (14/66) vs. 7% (10549/147984) api-ms-win-downlevel-shlwapi-l1-1-0.dll
21% (14/66) vs. 7% (10550/147984) api-ms-win-downlevel-advapi32-l1-1-0.dll
98% (65/66) vs. 85% (125868/147984) iertutil.dll
20% (13/66) vs. 7% (10911/147984) BrowserProtect.dll
80% (53/66) vs. 68% (100908/147984) dwmapi.dll
83% (55/66) vs. 75% (110959/147984) normaliz.dll
12% (8/66) vs. 4% (6292/147984) sahook.dll
100% (66/66) vs. 93% (138136/147984) wininet.dll
I also did gather different installations affected:
breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT client_crash_date - install_age * interval '1 second') as installations FROM reports WHERE product='Firefox' AND signature='EMPTY: no crashing thread identified; corrupt dump' AND utc_day_is(date_processed, '2013-03-27') GROUP BY version;
version | crashes | installations
------------+---------+---------------
[...]
18.0 | 323 | 206
18.0.1 | 208 | 184
18.0.2 | 248 | 230
18.0a1 | 3 | 3
18.0a2 | 4 | 4
19.0 | 733 | 566
19.0.1 | 17 | 16
19.0.2 | 20021 | 18061
19.0a1 | 4 | 2
19.0a2 | 5 | 5
20.0 | 7198 | 5271
20.0a1 | 2 | 2
20.0a2 | 23 | 13
21.0a1 | 9 | 7
21.0a2 | 504 | 388
22.0a1 | 1256 | 826
[...]
Flags: needinfo?(kairo)
Comment 45•12 years ago
|
||
This isn't surprising, modules come from the minidump, and these reports have empty minidumps.
Comment 46•12 years ago
|
||
What do I do now if I have a 40 MB log from WinDbg?
Assignee | ||
Comment 47•12 years ago
|
||
henryfhchan, please put it up on dropbox or google drive, so I can read it?
Assignee | ||
Comment 48•12 years ago
|
||
Attachment #736953 -
Flags: review?(ted)
Comment 49•12 years ago
|
||
Comment on attachment 736953 [details] [diff] [review]
Reserve VM space for breakpad, rev. 1
Review of attachment 736953 [details] [diff] [review]:
-----------------------------------------------------------------
Have you tested that this is sufficient to fix the issue when we run out of VM space? I assume it's not terribly hard to write something to exhaust VM.
::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +734,5 @@
> +
> +/**
> + * Reserve some VM space. In the event that we crash because VM space is
> + * being leaked without leaking memory, freeing this space before taking
> + * the minidump will allow us to collect a minidump.
So we don't expect this to help in real OOM, right? Just out-of-VM-space?
Attachment #736953 -
Flags: review?(ted) → review+
Assignee | ||
Comment 50•12 years ago
|
||
Correct, I don't want to commit actual memory because 12MB seems like a lot, and if we're running out of actual memory we can know if from the crash metadata. This is only going to help the cases where we're running out of VM space.
Assignee | ||
Comment 51•12 years ago
|
||
Whiteboard: [native-crash] → [native-crash][leave open]
Reporter | ||
Comment 52•12 years ago
|
||
I guess that means that this patch is different from what bug 724046 would be targeting for?
Assignee | ||
Comment 53•12 years ago
|
||
Kinda, yes. Although I'd say that bug may be WONTFIX if this one shows us that many/most of the existing EMPTY DUMP crashes are in fact the VM-exhaustion thing I'm seeing.
Comment 54•12 years ago
|
||
Comment 55•12 years ago
|
||
I don't see any improvement compared to previous Nightly builds at the same time of the day, about 25 crashes.
Assignee | ||
Comment 56•12 years ago
|
||
Seth, given that bug 859377 part 4 so dramatically improved the empty dump situation on Nightly, does any of what you did there apply to the situation on Aurora?
Assignee | ||
Updated•12 years ago
|
Flags: needinfo?(seth)
Comment 57•12 years ago
|
||
By the way, has anyone considered the possibility that these empty crash reports are caused by stack exhaustion crashes (like you get from infinite recursion)?
The crashes André Reinard's been seeing at bug 865702, which triggered empty crash dumps, are stack exhaustion crashes.
Comment 58•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #56)
> Seth, given that bug 859377 part 4 so dramatically improved the empty dump
> situation on Nightly, does any of what you did there apply to the situation
> on Aurora?
That's a good question, and I don't know the answer. There may be more than one cause at work here.
Just to document it clearly, the change I made in bug 859377 part 4 was to stop using hardware surfaces on Windows. Allocating hardware surfaces seems to be extremely wasteful for small images on Windows and my code was allocating a lot of them.
If the situation on Aurora is caused by the same thing, it couldn't be originating from the ClippedImage code being worked on in bug 859377 (since that isn't in Aurora), but it could be caused by other things that use hardware surfaces on Windows. These include, to my knowledge, raster images of other sorts and the layers subsystem.
Flags: needinfo?(seth)
Comment 59•12 years ago
|
||
I don't know if this will help you guys out but I am able to replicate this crash 100% of the time on Win7 x64 SP1 using the following method:
1. Visit www.speed-battle.com
2. Download and open Autoclicker 1.0.0.2 and set "Miliseconds" to at least 3000, "Number of Clicks to Automate" to 0 (which is infinity), and click the "L" button on the far right, then click "Pick Location" and click the location on speed-battle to be "Repeat Test", then click "Ok" to continue.
3. Click the "Start Clicking" button and navigate back to the already loaded speed-battle page and Autoclicker should begin clicking on the "Repeat Test" link every 3 seconds.
It crashes for me after around 200-400 clicks.
Comment 60•12 years ago
|
||
(In reply to Arthur K. from comment #59)
> I don't know if this will help you guys out but I am able to replicate this
> crash 100% of the time on Win7 x64 SP1 using the following method:
Crashes with empty dump are a collection of hundreds of unrelated bugs. This bug is only to figure out why it has increased. So please file a new bug with a valid stack trace (see https://developer.mozilla.org/docs/How_to_get_a_stacktrace_for_a_bug_report)
Comment 61•11 years ago
|
||
There have been a big improvement since 21.0: 20.5% in 20.0.1, 13.6% in 21.0, 12.2% in 22.0b1, 9.8% in 23.0a2, and 7.6% in 24.0a1.
Comment 62•11 years ago
|
||
Finally after a few days, it's not as good as in comment 61, 17.4% in 21.0 (62% > 1H), 14.2% in 22.0b1 (65% > 1H), 12.5% in 23.0a2 (71% > 1 H), and 7% in 24.0a1 (70% > 1H), but still promising.
Comment 63•11 years ago
|
||
There's no silver bullet for this bug, or any clear regression between releases. This will have to be investigated in an ongoing fashion.
Comment 64•11 years ago
|
||
Someone should follow up what I said in comment #57.
Whether or not all the empty dumps happen with infinite-recursion crashes, all the infinite-recursion crashes I've had since I made that comment have had empty dumps.
So this bug appears to be reproducible.
Assignee | ||
Comment 65•11 years ago
|
||
Since these crashes are not primarily on mac, I don't think we need to immediately follow up on the mac issue.
Comment 66•11 years ago
|
||
Here's a bunch of crash reports from a user that had both a bunch of empty crashes as well as some in nsSupportsStringImpl::SetData.
https://support.mozilla.org/en-US/questions/953285
(Originally posted by :John99 in bug 767343)
Comment 67•11 years ago
|
||
> Since these crashes are not primarily on mac,
I wasn't speaking specifically of the Mac, though so far that's the only platform I've tested on.
I expect an infinite-recursion crash will lead to an empty dump on all platforms.
Comment 68•11 years ago
|
||
We handle stack overflow crashes fine on Windows, AFAIK. I've tested this in the past, and seen a number of them in crash-stats.
Comment 69•11 years ago
|
||
> I've tested this in the past
How far in the past? :-)
Comment 70•11 years ago
|
||
Comment 71•11 years ago
|
||
Fair enough.
Is this with the current version of your crashme extension? If so I'll try it on the Mac and let you know my results. (It's possible that, even on the Mac, only *some* infinite recursion crashes trigger an empty dump.)
Comment 72•11 years ago
|
||
Yes, although there's some other bug with crashme on Mac that makes the UI non-functional. :-( You can manually crash after installing it by opening the browser console and executing:
Cu.import("resource://crashme/modules/Crasher.jsm");
Crasher.crash(Crasher.CRASH_STACK_OVERFLOW);
Comment 73•11 years ago
|
||
Thanks!
Using the same version (0.4) of your crashme extension on OS X 10.7.5 in today's mozilla-central nightly and your STR from comment #72, I also don't get an empty dump -- though the main thread isn't displayed nearly as nicely as in your Windows example:
bp-709018a7-a4c0-4490-9d5e-181742130604
I'm still convinced that infinite recursion crashes are likely to be the key to figuring out how to reproduce this bug. But I don't know when I'm going to have the time to do the work to confirm or deny this.
Comment 74•11 years ago
|
||
Bug 865702, which was fixed in the 2013-05-24 mozilla-central nightly, had crashes that (for me) always resulted in empty Socorro dumps. These were infinite recursion crashes, and were reproducible (in 2013-05-23 and earlier m-c nightlies) using the following STR (from bug 865702 comment #41):
1) Plug in an external monitor to your MacBook Pro and arrange it on top of your laptop's display.
2) Visit a page in bugzilla (this bug will do).
3) Move that page to the external monitor, if it doesn't open there.
4) Make the page just narrow enough for the horizontal scrollbar to disappear.
5) Scroll down to the bottom of the page.
6) Press Cmd-b to open the Bookmarks sidebar.
7) Click on the Status button -- its combobox should open.
8) Press Cmd-b again to close the Bookmarks sidebar.
9) Press Cmd-b again to open the Bookmarks sidebar.
10) Click on the Status button again ... and crash.
Comment 75•11 years ago
|
||
(In reply to Steven Michaud from comment #73)
> I'm still convinced that infinite recursion crashes are likely to be the key
> to figuring out how to reproduce this bug. But I don't know when I'm going
> to have the time to do the work to confirm or deny this.
As Benjamin said, the vast majority of empty dump crashes are on Windows (as are the vast majority of most of our crashes). We don't have any evidence to show that it's a huge problem on OS X.
Comment 76•11 years ago
|
||
Nonetheless, if we can reliably reproduce the problem on OS X (or any specific platform, for that matter), it will likely be a lot easier to figure out the problem on all platforms.
Comment 77•11 years ago
|
||
We know what the problem is on Windows: crashes as a result of OOM (or virtual memory fragmentation causing OOM) frequently cause minidump writing to fail because Microsoft's minidump writer is not memory-safe.
Comment 78•11 years ago
|
||
> We know what the problem is on Windows: crashes as a result of OOM
> (or virtual memory fragmentation causing OOM) ...
Then why didn't the "Reserve VM space for breakpad" patch fix it?
Reporter | ||
Comment 79•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #65)
> Since these crashes are not primarily on mac
To be fair, we don't even know how many are on which OS, even though we are pretty sure that most are on Windows. Bug 838061 would help a lot to shed more light on which actual OS those crashes happen with.
Comment 80•11 years ago
|
||
(In reply to Seth Fowler [:seth] from comment #58)
> (In reply to Benjamin Smedberg [:bsmedberg] from comment #56)
> > Seth, given that bug 859377 part 4 so dramatically improved the empty dump
> > situation on Nightly, does any of what you did there apply to the situation
> > on Aurora?
>
> That's a good question, and I don't know the answer. There may be more than
> one cause at work here.
>
> Just to document it clearly, the change I made in bug 859377 part 4 was to
> stop using hardware surfaces on Windows. Allocating hardware surfaces seems
> to be extremely wasteful for small images on Windows and my code was
> allocating a lot of them.
>
> If the situation on Aurora is caused by the same thing, it couldn't be
> originating from the ClippedImage code being worked on in bug 859377 (since
> that isn't in Aurora), but it could be caused by other things that use
> hardware surfaces on Windows. These include, to my knowledge, raster images
> of other sorts and the layers subsystem.
In my experience, my daily crashes have been during periods of long usage or lots of image activity. I finally put FF under windbg and found that the crash appears to be, indeed, related to hardware surfaces (at least that time) combined with memory pressure.
Follows is the stack track, hopefully I stripped out enough since windbg "helpfully" pulled symbols from our servers...
# ChildEBP RetAddr
00 00feba80 76f0d85e KERNELBASE!RaiseException(
unsigned long dwExceptionCode = 0xe06d7363,
unsigned long dwExceptionFlags = 1,
unsigned long nNumberOfArguments = 3,
unsigned long * lpArguments = 0x00febaac)+0x6c [windows file]
01 00febab8 735b0687 msvcrt!_CxxThrowException(
void * pExceptionObject = 0x00febac8,
struct _s__ThrowInfo * pThrowInfo = 0x73567f04)+0x48 [windows file]
02 00febadc 735c05a0 d3d11!ThrowFailure+0x7ba4a [windows file]
03 00febb38 69e47ca6 d3d11!NDXGI::CDevice::DeallocateCB+0x8e483
WARNING: Stack unwind information not available. Following frames may be wrong.
04 00febb80 69e6934e igd10umd32!OpenAdapter10+0xb046
05 00febbcc 69c4ac4e igd10umd32!OpenAdapter10+0x2c6ee
06 00febbf0 69c48843 igd10umd32+0xac4e
07 00febc04 69c43f3b igd10umd32+0x8843
08 00febc30 69c42eb3 igd10umd32+0x3f3b
09 00febc4c 69e4f0a4 igd10umd32+0x2eb3
0a 00febc70 69e40dd5 igd10umd32!OpenAdapter10+0x12444
0b 00febc94 73531286 igd10umd32!OpenAdapter10+0x4175
0c 00febd6c 7352f685 d3d11!CResource<ID3D11Texture2D>::CLS::FinalConstruct(
class CContext * pC = 0x00000000,
struct D3D11DDIARG_CREATERESOURCE * pDDICreateResource = 0x00feca58,
struct SD3D11SharedResourceCreationArgs * pShared = 0x00fecfc4,
struct SD3D11CrossLayerData * pCrossLayerData = <Value unavailable error>,
struct D3D10DDI_HRTRESOURCE pRtHandle = struct D3D10DDI_HRTRESOURCE)+0x189 [windows file]
0d (Inline) -------- d3d11!CTexture2D::CLS::FinalConstruct+0x33 [windows file]
0e 00febd90 7352fe1c d3d11!TCLSWrappers<CTexture2D>::CLSFinalConstructFn(
struct CTexture2D::CLS * pCLS = 0x2b23a73c,
class CContext * pContext = 0x00000000,
struct CTexture2D::TConstructorArgs * pArgs = 0x00fec15c)+0x38 [windows file]
0f (Inline) -------- d3d11!CLayeredObjectWithCLS<CTexture2D>::FinalConstruct+0x8c [windows file]
10 (Inline) -------- d3d11!CLayeredObjectWithCLS<CTexture2D>::{ctor}+0x100 [windows file]
11 (Inline) -------- d3d11!CLayeredObjectWithCLS<CTexture2D>::CreateInstance+0x142 [windows file]
12 00fecce8 7352b618 d3d11!CDevice::CreateLayeredChild(
unsigned int ChildType = 2,
void * pLayeredChildArgs = 0x00fecda4,
struct ID3D11LayeredUseCounted * pOuterUnk = 0x2b23a630,
struct _GUID * iid = 0x735214b8 {8b21025d-3e48-4fe9-9800-acc2fd8d6541},
void ** ppUnk = 0x2b23a660)+0x645 [windows file]
13 00fecd00 735306e9 d3d11!CBridgeImpl<ID3D11LayeredDevice,ID3D11LayeredDevice,CLayeredObject<CDevice> >::CreateLayeredChild(
unsigned int a = 2,
void * b = 0x00fecda4,
unsigned long c = 0x30,
struct ID3D11LayeredUseCounted * d = 0x2b23a630,
struct _GUID * e = 0x735214b8 {8b21025d-3e48-4fe9-9800-acc2fd8d6541},
void ** f = 0x2b23a660)+0x1f [windows file]
14 (Inline) -------- d3d11!CD3D11LayeredChild<ID3D11DeviceChild,NDXGI::CDevice,64>::FinalConstruct+0x21 [windows file]
15 00fecd3c 7353060e d3d11!NDXGI::CDeviceChild<IDXGIResource1>::FinalConstruct(
ED3D11DeviceChildType eDeviceChildType = e_D3D11Texture2D (0n2),
struct SLayeredArgs * pLArgs = 0x00fecda4,
unsigned long uiArgSize = 0x30,
struct ID3D11LayeredUseCounted * pOutmstLyrIface = 0x2b23a630)+0x2d [windows file]
16 00fecd84 73530494 d3d11!NDXGI::CResource::FinalConstruct(
struct NDXGI::CResource::TConstructorArgs * args = 0x00fecda0)+0x29 [windows file]
17 (Inline) -------- d3d11!CLayeredObject<NDXGI::CResource>::{ctor}+0x49fa [windows file]
18 (Inline) -------- d3d11!CLayeredObject<NDXGI::CResource>::CreateInstance+0x49fa [windows file]
19 00fece2c 7352b254 d3d11!NDXGI::CDevice::CreateLayeredChild(
unsigned int ChildType = <Value unavailable error>,
void * pLayeredChildArgs = 0x00fece50,
unsigned long uiArgSize = <Value unavailable error>,
struct ID3D11LayeredUseCounted * pOuterUnk = 0x2b23a630,
struct _GUID * iid = 0x735214b8 {8b21025d-3e48-4fe9-9800-acc2fd8d6541},
void ** ppUnk = 0x2b23a648)+0x2ea [windows file]
1a (Inline) -------- d3d11!CBridgeImpl<ID3D11LayeredDevice,ID3D11LayeredDevice,CLayeredObject<NDXGI::CDevice> >::CreateLayeredChild+0x24 [windows file]
1b (Inline) -------- d3d11!NOutermost::CDeviceChild::FinalConstruct+0x32 [windows file]
1c (Inline) -------- d3d11!CUseCountedObject<NOutermost::CDeviceChild>::{ctor}+0x9b [windows file]
1d (Inline) -------- d3d11!CUseCountedObject<NOutermost::CDeviceChild>::CreateInstance+0xa9 [windows file]
1e 00fecf30 7353092e d3d11!NOutermost::CDevice::CreateLayeredChild(
unsigned int ChildType = 2,
void * pLayeredChildArgs = 0x00fecf78,
unsigned long uiArgSize = 0x30,
struct ID3D11LayeredUseCounted * pOuterUnk = 0x00000000,
struct _GUID * iid = 0x73522f58 {6f15aaf2-d208-4e89-9ab4-489535d34f9c},
void ** ppUnk = 0x00fed0fc)+0x1e2 [windows file]
1f (Inline) -------- d3d11!CDevice::CreateAndRecreateLayeredChild+0x90 [windows file]
20 00fed0a0 73532ec1 d3d11!CDevice::CreateTexture2D_Worker(
struct D3D11_TEXTURE2D_DESC * pDesc = 0x00fed100,
struct D3D11_SUBRESOURCE_DATA * pInitialData = 0x00000000,
int DWMException = 0n0,
struct ID3D11Texture2D ** ppTexture2D = 0x00fed0fc,
struct SD3D11SharedResourceCreationArgs * pSResArgs = 0x00000000,
bool bCalledFromD3D10 = true)+0x21a [windows file]
21 00fed130 0362cf60 d3d11!CDevice::ID3D10Device1_CreateTexture2D_(
struct ID3D10Device1 * pIFace = 0x08f0a73c,
struct D3D10_TEXTURE2D_DESC * pDesc = 0x00fed15c,
struct D3D10_SUBRESOURCE_DATA * pInitialData = 0x00000000,
struct ID3D10Texture2D ** ppTexture2D = 0x21db0d70)+0xc7 [windows file]
22 00fed18c 038ac8a1 xul!mozilla::services::_external_GetChromeRegistryService+0x5daac
23 00fed2b8 038ad227 xul!XRE_InitEmbedding2+0x2699f
24 00fed328 038ad231 xul!XRE_InitEmbedding2+0x27325
25 00fed390 038acc05 xul!XRE_InitEmbedding2+0x2732f
26 00fedd50 038ad102 xul!XRE_InitEmbedding2+0x26d03
27 00feddac 02d94765 xul!XRE_InitEmbedding2+0x27200
28 00fedec4 02cfa2c9 xul!webrtc::VoEAudioProcessing::DriftCompensationSupported+0x6395
29 00fee37c 02da4206 xul!mozilla::scache::PathifyURI+0x256b9
2a 00fee4d4 02da453c xul!webrtc::VoEAudioProcessing::DriftCompensationSupported+0x15e36
2b 00fee530 02cea611 xul!webrtc::VoEAudioProcessing::DriftCompensationSupported+0x1616c
2c 00fee53c 02cea727 xul!mozilla::scache::PathifyURI+0x15a01
2d 00fee548 02d36f1e xul!mozilla::scache::PathifyURI+0x15b17
2e 00fee550 02ce1365 xul!NS_CycleCollectorSuspect2_P+0x909e
2f 00fee560 02cfedc8 xul!mozilla::scache::PathifyURI+0xc755
30 00fee680 02db109d xul!mozilla::scache::PathifyURI+0x2a1b8
31 00fee6e0 02d08256 xul!webrtc::VoEAudioProcessing::DriftCompensationSupported+0x22ccd
32 00fee7c8 02cff64e xul!mozilla::scache::PathifyURI+0x33646
33 00fee7d0 02d2777a xul!mozilla::scache::PathifyURI+0x2aa3e
34 00fee7ec 77a3a643 xul!mozilla::scache::PathifyURI+0x52b6a
35 00fee818 77a3a593 ntdll!RtlpEnterCriticalSectionContended(
struct _RTL_CRITICAL_SECTION * CriticalSection = 0x00000000)+0x148 [windows file]
36 00fee824 67c02a6e ntdll!RtlEnterCriticalSection(
struct _RTL_CRITICAL_SECTION * CriticalSection = 0x02d64c0d)+0x43 [windows file]
37 00fee83c 67c02ba8 nspr4!PR_Lock+0x2e
38 00fee88c 0306d551 nspr4!PR_Unlock+0x38
39 00fee8c4 0306d605 xul!NS_InvokeByIndex_P+0x5e36
3a 00fee8e4 02e3fc71 xul!NS_InvokeByIndex_P+0x5eea
3b 00fee9d8 02bbc04e xul!XRE_main+0x53a5
3c 00fee9fc 02e3a8fc xul!xpc::Base64Decode+0x43df
3d 00feeb14 0106157e xul!XRE_main+0x30
3e 01064230 2f2f3a73 firefox+0x157e
3f 01064234 73617263 0x2f2f3a73
40 (Inline) -------- d3d11!CLayeredObjectRoot<ID3D11LayeredDevice>::CondObjectLock::{dtor}+0x9 [windows file]
41 (Inline) -------- d3d11!CDevice::CondObjectLock::{dtor}+0x9 [windows file]
42 01064238 65722d68 d3d11!CContext::ID3D11DeviceContext1_Map_<2>(
struct ID3D11DeviceContext1 * pIFace = 0x045300e0,
struct ID3D11Resource * pResource = <Memory access error>,
unsigned int Subresource = <Memory access error>,
D3D11_MAP MapType = <Memory access error>,
unsigned int MapFlags = <Memory access error>,
struct D3D11_MAPPED_SUBRESOURCE * pMappedSubresource = <Memory access error>)+0x52 [windows file]
43 01064254 3d64693f 0x65722d68
44 01064258 3863657b 0x3d64693f
45 0106425c 66303330 0x3863657b
46 01064260 32632d37 explorerframe!`string'+0x4
47 01064264 342d6130 0x32632d37
48 01064268 2d663436 0x342d6130
49 0106426c 65306239 0x2d663436
4a 01064270 6133312d 0x65306239
4b 01064274 65396133 0x6133312d
4c 01064278 38333739 0x65396133
4d 0106427c 76267d34 0x38333739
4e 01064280 69737265 shell32!ntdll_NULL_THUNK_DATA+0x1a08
4f 01064284 323d6e6f 0x69737265
50 01064288 26302e31 0x323d6e6f
51 0106428c 6c697562 0x26302e31
52 01064290 3d646964 0x6c697562
53 01064294 33313032 0x3d646964
54 01064298 31313530 0x33313032
55 0106429c 38303231 0x31313530
56 010642a0 00000000 0x38303231
Comment 81•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #56)
> given that bug 859377 part 4 so dramatically improved the empty dump situation on
> Nightly
It was a red herring. The crash ratio is 13.3% in 22.0b6 and 16.5% in 23.0a2.
Comment 82•11 years ago
|
||
If 5% of OOM crashes have a signature and 95% have the empty dump crash signature then we should fix OOM crashes like bug 767343 (1422 crashes in 22.0) and bug 764342 (927 crashes in 22.0). These two bugs would account for 43% of empty dump crashes in 22.0.
Comment 83•11 years ago
|
||
I crashed twice with an empty stack today, both times I was creating routes or using street view in the new Google Maps. I'm not sure it was an oom crash though.
These are my non-stacks:
https://crash-stats.mozilla.com/report/index/39ee4d84-2fd4-430c-9c4c-115d52130719
https://crash-stats.mozilla.com/report/index/8b12a70b-b213-4582-862f-b002e2130720
Comment 84•11 years ago
|
||
(In reply to Marco Bonardo [:mak] from comment #83)
> https://crash-stats.mozilla.com/report/index/39ee4d84-2fd4-430c-9c4c-
> 115d52130719
This one is bug 802152 based on the abort message in App Notes.
Comment 85•11 years ago
|
||
It accounts for 18% in 22.0, 14.7% in 23.0b6, 15.8% in 24.0a2, and 10.2% in 25.0a1.
(In reply to Scoobidiver from comment #82)
> If 5% of OOM crashes have a signature and 95% have the empty dump crash
> signature
Instead of assuming, I used https://crash-analysis.mozilla.com/crash_analysis/20130719/20130719-pub-crashdata.csv.gz. The breakdown per abort or error message in 22.0 is as follow:
Abort or error message Bug Total 16787
OOM: file e:\builds\...\xpcom\string\src\nsTSubstring.cpp, line 533 Bug 802152 1675 9.98%
OOM: file e:\builds\...\xpcom\string\src\nsTSubstring.cpp, line 348 Bug 767343 1398 8.33%
Failed to create temporary texture in system memory. Error code: 2147942414 Bug 793126 1224 7.29%
ThebesLayerD3D10::Validate(): Failed to create texture Error code: 2147942414 527 3.14%
OOM: file e:\builds\...\xpcom\string\src\nsTSubstring.cpp, line 291 Bug 869294 333 1.98%
Attempt to create unsupported SourceSurface fromnon-image surface.: file e:/builds/.../gfx/thebes/gfxPlatform.cpp, line 655 Bug 844819 204 1.22%
out of memory: file e:/builds/.../layout/base/nsDisplayList.cpp, line 867 127 0.76%
OOM: file e:/builds/.../xpcom/string/src/nsReadableUtils.cpp, line 160 Bug 858791 111 0.66%
out of memory: file e:/builds/.../layout/base/nsPresArena.cpp, line 362 89 0.53%
bug836263: file e:/builds/.../modules/libpref/src/nsPrefBranch.cpp, line 330 Bug 836263 52 0.31%
OOM: file e:\builds\...\xpcom\string\src\nsTSubstring.cpp, line 393 37 0.22%
OOM: file e:\builds\...\obj-firefox\dist\include\nsTHashtable.h, line 99 27 0.16%
OOM: file e:/builds/.../layout/generic/nsLineLayout.cpp, line 584 18 0.11%
file e:/builds/.../build/ipc/ch/src/base/pickle.cc, line 60 18 0.11%
Can't allocate mozilla::ReentrantMonitor: file e:\builds\...\obj-firefox\dist\include\mozilla/ReentrantMonitor.h, line 49 14 0.08%
Can't allocate mozilla::Mutex: file e:\builds\...\obj-firefox\dist\include\mozilla/Mutex.h, line 51 8 0.05%
OOM: file e:\builds\...\obj-firefox\dist\include\nsTSubstring.h, line 132 6 0.04%
Comment 86•11 years ago
|
||
(In reply to Scoobidiver from comment #84)
> (In reply to Marco Bonardo [:mak] from comment #83)
> > https://crash-stats.mozilla.com/report/index/39ee4d84-2fd4-430c-9c4c-
> > 115d52130719
> This one is bug 802152 based on the abort message in App Notes.
It would probably be interesting to make this code path annotate the OOMAllocationSize like the mozalloc-OOM case does, so we could see how much memory is being allocated here.
mak: if you can reproduce this, it would be pretty interesting to attach a debugger and find out where this is actually crashing.
Comment 87•11 years ago
|
||
Also, from one of mak's crashes:
Available Virtual Memory
370790400
That's not particularly large, and if it's fragmented I could certainly see him hitting an OOM.
Comment 88•11 years ago
|
||
I'm a simple user and I come with this information, maybe it helps.
The following crash report was sent by me:
https://crash-stats.mozilla.com/report/index/47d26d8a-aa3b-4f22-99b9-581a42130914
I have seen the link of this page in the "Related bugs" under the description.
All I can say is that Firefox crashes often after I open too many tabs, I have 2 GB of memory and it crashes when it's about 70-80% full; this is the first time when I could send the crash report, before the crash sending dialog appeared but after I clicked "Send report" it said that the report could not be sent, even if I was connected to the internet.
Comment 89•11 years ago
|
||
One of my empty crashes:
http://crash-stats.mozilla.com/report/index/b5105c81-5298-468b-b25f-717ac2131016
How this could possibly be debugged if there is no dump?
Comment 90•11 years ago
|
||
User Dderss - you have to launch firefox in debugger and reproduce. Since every one of these crashes I've seen is due to memory pressure (OOM in physical space or fragmentation), that means you have to run in debugger for a while. It can be a pain.
Comment 91•11 years ago
|
||
Thanks for reply, Timothy. Do you mean "safe" mode? The problem with it is that is that the crash might core/video related (my guess), and it happens because I watch lot of YouTube videos. In safe mode all of extensions are turned off I will not be able to watch YouTube videos so I will not be able to build the pressure on FF resources to actually generate crash.
Or, is there an actual debugger which I can download and install, who would run in real-time in parallel with FF process and grab the mess that goes on, writing it on the fly to the disk, so no matter how abrupt the crash is, and no matter whether the dump is corrupt or not, something still could be traced?
Comment 92•11 years ago
|
||
(In reply to Tim from comment #90)
> User Dderss - you have to launch firefox in debugger and reproduce. Since
> every one of these crashes I've seen is due to memory pressure (OOM in
> physical space or fragmentation), that means you have to run in debugger for
> a while. It can be a pain.
If he were to run, say Fx 25.0b8, wouldn't debug mode be enabled by default?
Comment 93•11 years ago
|
||
Comment 94•11 years ago
|
||
Thanks; I finally managed to catch the crash, though it took two days of slow browsing under the debugger. However, because it took so long, the log file is giant: 565 MB. How much I can cut from it so the log would be light enough for easy upload for developers to see what happened?
(The way it goes is I open a bunch of YouTube videos, which becomes unbearable for FF so it dies.)
Also, for just in case of need, I have made "minidump", which is not really mini since it takes 3.5 GB. But I would prefer not to upload it -- unless it would become absolutely necessary -- since has personal information.
Comment 95•11 years ago
|
||
The personal information is minimally useful in a dump, though if someone were interested I'm sure an ill-intentioned person could get something from it. You could start with just getting the stack trace following the above instructions.
Comment 96•11 years ago
|
||
If part before the exception needed, please let me know how much so I would cut out appropriately-sized chunk of the log.
Attachment #819129 -
Flags: review+
Attachment #819129 -
Flags: feedback+
Updated•11 years ago
|
Attachment #819129 -
Attachment mime type: text/x-log → text/plain
Attachment #819129 -
Flags: review+
Attachment #819129 -
Flags: feedback+
Comment 97•11 years ago
|
||
Thanks, this is useful! The stack of the crashing thread here is:
#136 Id: 51e0.74a4 Suspend: 1 Teb: ffe20000 Unfrozen "Media Decode"
ChildEBP RetAddr
ea1df480 583e1218 mozalloc!mozalloc_abort(char * msg = 0xea1df498 "out of memory: 0x0000000000151800 bytes requested")+0x2a [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc_abort.cpp @ 30]
ea1df4d0 583e10a2 mozalloc!mozalloc_handle_oom(unsigned int size = 0x151800)+0x5f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc_oom.cpp @ 50]
ea1df4e0 105e28ab mozalloc!moz_xmalloc(unsigned int size = 0x151800)+0x1b [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc.cpp @ 56]
ea1df4f8 106235c9 xul!mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int aSize = 0x9c4cf9a4)+0x52 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 111]
ea1df504 10623536 xul!mozilla::layers::PlanarYCbCrImage::AllocateBuffer(unsigned int aSize = 0x151800)+0x10 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 427]
ea1df518 1064b315 xul!mozilla::layers::PlanarYCbCrImage::CopyData(struct mozilla::layers::PlanarYCbCrImage::Data * aData = 0xea1df53c)+0x2f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 463]
ea1df524 10a44a41 xul!mozilla::layers::PlanarYCbCrImage::SetData(struct mozilla::layers::PlanarYCbCrImage::Data * aData = 0xea1df53c)+0xa [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 485]
ea1df598 10a44e5f xul!mozilla::VideoData::Create(class mozilla::VideoInfo * aInfo = 0x3a0feae0, class mozilla::layers::ImageContainer * aContainer = 0x554c6fb0, class mozilla::layers::Image * aImage = 0xea1df5e0, int64 aOffset = 0n487161, int64 aTime = 0n300300, int64 aEndTime = 0n333666, struct mozilla::VideoData::YCbCrBuffer * aBuffer = 0xea1df668, bool aKeyframe = false, int64 aTimecode = 0n-1, struct nsIntRect aPicture = struct nsIntRect)+0x266 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderreader.cpp @ 252]
ea1df5e0 1007ad2a xul!mozilla::VideoData::Create(class mozilla::VideoInfo * aInfo = 0x3a0feae0, class mozilla::layers::ImageContainer * aContainer = 0x554c6fb0, int64 aOffset = 0n487161, int64 aTime = 0n300300, int64 aEndTime = 0n333666, struct mozilla::VideoData::YCbCrBuffer * aBuffer = 0xea1df668, bool aKeyframe = false, int64 aTimecode = 0n-1, struct nsIntRect aPicture = struct nsIntRect)+0x3a [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderreader.cpp @ 266]
ea1df6b4 1007b06f xul!mozilla::WMFReader::CreateBasicVideoFrame(struct IMFSample * aSample = 0x264b0208, int64 aTimestampUsecs = 0n300300, int64 aDurationUsecs = 0n33366, int64 aOffsetBytes = 0n487161, class mozilla::VideoData ** aOutVideoData = 0xea1df724)+0x1d1 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\wmf\wmfreader.cpp @ 856]
ea1df714 10a47885 xul!mozilla::WMFReader::DecodeVideoFrame(bool * aKeyframeSkip = 0xea1df7d3, int64 aTimeThreshold = 0n333666)+0x1e1 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\wmf\wmfreader.cpp @ 987]
ea1df7dc 10a489e2 xul!mozilla::MediaDecoderStateMachine::DecodeLoop(void)+0x248 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderstatemachine.cpp @ 905]
ea1df7f4 10589423 xul!mozilla::MediaDecoderStateMachine::DecodeThreadRun(void)+0x9f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderstatemachine.cpp @ 507]
ea1df7f8 0fdc7a51 xul!nsRunnableMethodImpl<void (void)+0xe [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\obj-firefox\dist\include\nsthreadutils.h @ 351]
ea1df86c 0fe1b1b8 xul!nsThread::ProcessNextEvent(bool mayWait = true, bool * result = 0xea1df89c)+0x221 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\xpcom\threads\nsthread.cpp @ 632]
ea1df894 5176e927 xul!nsThread::ThreadFunc(void * arg = 0x532d0201)+0x98 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\xpcom\threads\nsthread.cpp @ 264]
ea1df8b4 5177329d nss3!_PR_NativeRunThread(void * arg = 0x223a5860)+0x167 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\nsprpub\pr\src\threads\combined\pruthr.c @ 419]
Comment 98•11 years ago
|
||
Does this stack mean that this crash/bug might be related to this one?
http://bugzilla.mozilla.org/show_bug.cgi?id=887968
Comment 99•11 years ago
|
||
Dderss, no bug 887968 is unlikely to be your issue. I opened a clean bug for your specific case. The number is bug 930797.
Comment 100•11 years ago
|
||
Another Windbg trace see if it helps. This is on Firefox 25. Steps to reproduce is to load multiple tabs simultaneously(~200) The attached file is trace since Access Violation occurred. I also have minidump and full trace saved. Let me know if that is needed too.
Comment 101•11 years ago
|
||
(In reply to hitesh.seth@yahoo.co.in from comment #100)
> Created attachment 826311 [details]
> WinDbg trace. I have cut the trace post access violation. If previous
> content is needed then let me know. It is more than 100MB in size.
>
> Another Windbg trace see if it helps. This is on Firefox 25. Steps to
> reproduce is to load multiple tabs simultaneously(~200) The attached file is
> trace since Access Violation occurred. I also have minidump and full trace
> saved. Let me know if that is needed too.
There's something wrong with this log, it doesn't have symbols loaded for xul.dll, which makes it very hard to get useful info out. Also, when you first hit an exception in WinDBG, you can just enter the command to get the stack. Trying to continue at that point doesn't help much.
Comment 102•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #101)
> There's something wrong with this log, it doesn't have symbols loaded for
> xul.dll
Maybe because http://symbols.mozilla.org/firefox is currently giving error 404.
Comment 103•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #101)
> (In reply to hitesh.seth@yahoo.co.in from comment #100)
> > Created attachment 826311 [details]
> > WinDbg trace. I have cut the trace post access violation. If previous
> > content is needed then let me know. It is more than 100MB in size.
> >
> > Another Windbg trace see if it helps. This is on Firefox 25. Steps to
> > reproduce is to load multiple tabs simultaneously(~200) The attached file is
> > trace since Access Violation occurred. I also have minidump and full trace
> > saved. Let me know if that is needed too.
>
> There's something wrong with this log, it doesn't have symbols loaded for
> xul.dll, which makes it very hard to get useful info out. Also, when you
> first hit an exception in WinDBG, you can just enter the command to get the
> stack. Trying to continue at that point doesn't help much.
For loading symbols I followed instructions given at :
https://developer.mozilla.org/en/docs/How_to_get_a_stacktrace_with_WinDbg
If there are new instructions available to give a stack trace then please let me know.
Thanks for that tip about WinDbg. I had a doubt there, how should I differentiate between exception that can be handled(by continuing) versus exception that will lead to crash in firefox?
Comment 104•11 years ago
|
||
Here's a log of a crash after the usual browsing I do, Firefox 24.1.0ESR.
Just browsing on image-heavy sites, and other sites known to be big on RAM (Slashdot, Amazon).
Is it normal that the crash-reporter isn't showing when WinDbg is used?
I hope I have done everything properly, at least I saw WinDbg loading xul.pdb, so I think that's covered.
Comment 105•11 years ago
|
||
(In reply to elbart from comment #102)
> Maybe because http://symbols.mozilla.org/firefox is currently giving error
> 404.
It always gives a 404, it's not designed to be human-browsable. It should return proper responses for paths to symbols.
(In reply to elbart from comment #104)
> Is it normal that the crash-reporter isn't showing when WinDbg is used?
Yes, the Mozilla crash reporter is fired from a last-chance exception handler, which doesn't get invoked when you have a debugger attached.
Comment 106•11 years ago
|
||
(In reply to hitesh.seth@yahoo.co.in from comment #103)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #101)
> > (In reply to hitesh.seth@yahoo.co.in from comment #100)
> > > Created attachment 826311 [details]
> > > WinDbg trace. I have cut the trace post access violation. If previous
> > > content is needed then let me know. It is more than 100MB in size.
> > >
> > > Another Windbg trace see if it helps. This is on Firefox 25. Steps to
> > > reproduce is to load multiple tabs simultaneously(~200) The attached file is
> > > trace since Access Violation occurred. I also have minidump and full trace
> > > saved. Let me know if that is needed too.
> >
> > There's something wrong with this log, it doesn't have symbols loaded for
> > xul.dll, which makes it very hard to get useful info out. Also, when you
> > first hit an exception in WinDBG, you can just enter the command to get the
> > stack. Trying to continue at that point doesn't help much.
>
> For loading symbols I followed instructions given at :
> https://developer.mozilla.org/en/docs/How_to_get_a_stacktrace_with_WinDbg
>
> If there are new instructions available to give a stack trace then please
> let me know.
>
> Thanks for that tip about WinDbg. I had a doubt there, how should I
> differentiate between exception that can be handled(by continuing) versus
> exception that will lead to crash in firefox?
I was going through WinDbg log from start and I found this:
"
0:000> .sympath SRV*c:\symbols*http://symbols.mozilla.org/firefox;SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*c:\symbols*http://symbols.mozilla.org/firefox;SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Expanded Symbol search path is: srv*c:\symbols*http://symbols.mozilla.org/firefox;srv*c:\symbols*http://msdl.microsoft.com/download/symbols
0:000> .symfix+ c:\symbols
0:000> .reload /f
Reloading current modules
...*** WARNING: Unable to verify checksum for G:\Program Files\Alwil Software\Avast5\snxhk.dll
*** ERROR: Symbol file could not be found. Defaulted to export symbols for G:\Program Files\Alwil Software\Avast5\snxhk.dll -
........
"
From the warning it seems symbol files were not found at the location. But it is giving error for one of the DLL files of my anti-virus and not firefox. Is it normal? Or did symbol files were truly not found and that's why stack file doesn't have symbols? Please suggest.
Comment 107•11 years ago
|
||
I got another crash just now. Interestingly it is not Empty crash but step to reproduce it was same-- load multiple tabs (~200) Also, this crash also refers to xul.dll, the same dll referred by WinDbg trace.
https://crash-stats.mozilla.com/report/index/3a99d4ff-0ea2-4a22-8077-ffd402131102
Comment 108•11 years ago
|
||
Can somebody look at my bug if this can cause some of them ???
Bug 937651 - Replace the sessionstore.js with an sessionstore.sqlite
Comment 109•11 years ago
|
||
(In reply to hitesh.seth@yahoo.co.in from comment #107)
> I got another crash just now. Interestingly it is not Empty crash but step
> to reproduce it was same-- load multiple tabs (~200) Also, this crash also
> refers to xul.dll, the same dll referred by WinDbg trace.
>
> https://crash-stats.mozilla.com/report/index/3a99d4ff-0ea2-4a22-8077-
> ffd402131102
The stack here is:
Thread 0 (crashed)
0 xul.dll!mozilla::WebGLContext::PresentScreenBuffer() [WebGLContext.cpp:d86ad
7db1de3 : 1379 + 0x3]
eip = 0x10c89796 esp = 0x001cbec8 ebp = 0x001cbf08 ebx = 0x00000000
esi = 0x28554ec0 edi = 0x308c6900 eax = 0x28554ec4 ecx = 0x00000000
edx = 0x05100048 efl = 0x00210202
Found by: recovered by external stack walker
...
This seems like it's already filed as bug 881311.
Reporter | ||
Updated•11 years ago
|
Crash Signature: [@ EMPTY: no crashing thread identified; corrupt dump] → [@ EMPTY: no crashing thread identified; corrupt dump]
[@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER]
Comment 110•11 years ago
|
||
The same crash as Robert's:
https://crash-stats.mozilla.com/report/index/75b6fe4d-28bf-416c-ae60-a0cfe2131120
I debugged similar crash recently:
http://bugzilla.mozilla.org/show_bug.cgi?id=930797
Reporter | ||
Comment 111•11 years ago
|
||
(In reply to User Dderss from comment #110)
> The same crash as Robert's:
> https://crash-stats.mozilla.com/report/index/75b6fe4d-28bf-416c-ae60-
> a0cfe2131120
I didn't have one of these, I only added that new signature here as reporting has changed to put that on on those reports - purely a change in our tools that process crash reports from users, not a change in what those rashes would be
> I debugged similar crash recently:
> http://bugzilla.mozilla.org/show_bug.cgi?id=930797
Thanks, please leave individual debugging of that in the bug there, would be good if we find specific cases of how one can reproducibly run into those issues, as that may help developers find out what code causes it and possibly how to improve the situation.
Comment 112•11 years ago
|
||
The debugging information is already there; thanks.
Is this bug is the same as https://bugzilla.mozilla.org/show_bug.cgi?id=711568?
Reporter | ||
Comment 113•11 years ago
|
||
(In reply to User Dderss from comment #112)
> Is this bug is the same as
> https://bugzilla.mozilla.org/show_bug.cgi?id=711568?
That one is the generic meta bug for those crashes - the one here is specifically about the regressing in volume we have seen with those in the Firefox 19 and 20 cycles.
Summary: increase in crashes with EMPTY dumps → increase in crashes with EMPTY dumps in Firefox 19 and 20 cycles
Comment 114•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #111)
> (In reply to User Dderss from comment #110)
> > The same crash as Robert's:
> > https://crash-stats.mozilla.com/report/index/75b6fe4d-28bf-416c-ae60-
> > a0cfe2131120
>
> I didn't have one of these, I only added that new signature here as
> reporting has changed to put that on on those reports - purely a change in
> our tools that process crash reports from users, not a change in what those
> rashes would be
>
> > I debugged similar crash recently:
> > http://bugzilla.mozilla.org/show_bug.cgi?id=930797
>
> Thanks, please leave individual debugging of that in the bug there, would be
> good if we find specific cases of how one can reproducibly run into those
> issues, as that may help developers find out what code causes it and
> possibly how to improve the situation.
Comment 115•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #111)
> Thanks, please leave individual debugging of that in the bug there, would be
> good if we find specific cases of how one can reproducibly run into those
> issues, as that may help developers find out what code causes it and
> possibly how to improve the situation.
OK, I think I can give you easily an scenario where FF makes a lot of I/O, needs a lot of memory, slow down and makes a lot of crashes.
At the moment there is Bug 934935 so it is really easy to make the sessionstore.js big ... ;-)
I think this 'bug' is caused be the Facebook 'switching advertisement system', so you don't have to write much in the message page ... ;-)
- At first configure you FF to use no plugins and open the old open tabs after you start your FF again.
- Now open a page that don't waste much space in the sessionstore.js. E.g.: Bugzilla.
- Now log in to FB and open some message pages. Lets say 10.
- Wait some minutes a look how you sessionstore.js (ss.js) is growing.
- Lets say you make the first test with an ss.js with 15 MB.
- So now go back to your Bugzilla page and close FF.
- Restart FF and don't load the message pages.
- Try to work.
-> You will now see that it slows down, makes I/O, ... Maybe it starts to get instable ...
- Lets open some more message pages. (Don't reload the old ones!)
- Grow you ss.js to 30 MB.
- Restart.
- Don't reload the message pages !!!
- Try to work.
-> More slow, more I/O, more instable.
- Now do the same again and go to 50 MB.
- ...
-> FF crashes more and more!
Please look at Bug 937651 for more infos.
Reporter | ||
Comment 116•11 years ago
|
||
Yes, the Facebook thing is known, but it was not shipped to some part with Firefox 19 and to some part with Firefox 20, and this bug is explicitely about why we have increased the number of OOM crashes of this kind in those two cycles, as depicted by the graphs in the attachments.
BTW, the patch from comment #48 seems to not have helped significantly here.
Comment 117•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #116)
> Yes, the Facebook thing is known, but it was not shipped to some part with
> Firefox 19 and to some part with Firefox 20, and this bug is explicitely
> about why we have increased the number of OOM crashes of this kind in those
> two cycles, as depicted by the graphs in the attachments.
>
> BTW, the patch from comment #48 seems to not have helped significantly here.
Sorry, I was thinking you need a case of an empty crash.
fb (seems now fixed) was just an example to grow the sessionstore ...
... but this is not limited to FF19 & FF20.
Sorry!
Comment 118•11 years ago
|
||
I believe the bug #903842 is related to this one. Because the symptoms for me are the same: many open tabs, then browser windows start becoming black. If you create a new window from such a tab - the whole window becomes white and the browser soon crashes.
Comment 120•11 years ago
|
||
Today I again got a crash with the same symptoms as I defined in the bug #903842, but without crashing thread identified: https://crash-stats.mozilla.com/report/index/d566010c-15a0-40a9-8493-6867a2131210
Assignee | ||
Comment 121•11 years ago
|
||
I'm working on spinning up data collection around memory usage but this bug is no longer useful.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
Updated•11 years ago
|
Whiteboard: [native-crash][leave open] → [native-crash]
Updated•6 years ago
|
Restrict Comments: true
You need to log in
before you can comment on or make changes to this bug.
Description
•