Closed
Bug 1093278
Opened 10 years ago
Closed 10 years ago
b2g has crashed in libxul.so!CanonicalizeXPCOMParticipant [nsCycleCollector.cpp : 934 + 0x6]
Categories
(Firefox OS Graveyard :: Stability, defect, P1)
Tracking
(blocking-b2g:2.1+, firefox34 wontfix, firefox35 fixed, firefox36 fixed, firefox-esr31 fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)
People
(Reporter: tkundu, Assigned: mccr8)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 394][caf priority: p1][CR 749784])
Crash Data
Attachments
(4 files, 2 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
smaug
:
review+
lmandel
:
approval-mozilla-aurora+
lmandel
:
approval-mozilla-beta-
bkerensa
:
approval-mozilla-esr31+
bajaj
:
approval-mozilla-b2g34+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details |
STR: Random stability test for a long time.
Reporter | ||
Comment 1•10 years ago
|
||
Reporter | ||
Comment 2•10 years ago
|
||
[Blocking Requested - why for this release]: b2g has crashed
Please note that we are seeing following memleak at time of crash :
==- 00051712, 1414999732, 2014-11-03 12:58:52, 2014-11-03 07:28:52 UTC, up 232334 -==
| megabytes |
NAME PID PPID CPU(s) NICE USS PSS RSS SWAP VSIZE OOM_ADJ USER
b2g 2751 1 146854.6 0 248.2 259.2 279.2 0.0 488.2 0 root
(Nuwa) 2757 2751 924.5 0 1.9 4.8 13.6 0.0 53.8 -16 root
Built-in Keyboa 2953 2757 257.9 18 6.7 10.4 21.3 0.0 64.1 10 u0_a2953
Homescreen 3125 2751 4829.2 18 21.0 27.1 40.5 0.0 99.1 8 u0_a3125
(Preallocated a 32701 2757 0.2 18 5.3 8.6 18.9 0.0 60.7 1 u0_a32701
System memory info:
Total 847.5 MB
SwapTotal 0.0 MB
Used - cache 489.1 MB
B2G procs (PSS) 310.2 MB
Non-B2G procs 178.9 MB
Free + cache 358.4 MB
Free 132.2 MB
Cache 226.1 MB
SwapFree 0.0 MB
and /sys/class/kgsl/kgsl/page_alloc is 69472256 bytes = 69MB which clearly says kgsl memleak ny b2g process. unfortunately we don't have DMD report for now..
one more point is that
1) Crash has happened inside nsCycleCollector.cpp and system has enough memory to continue although b2g is leaking kgsl memory.
I will create a separate bug for memleak issue.
For this crash, we need a logging patch to confirm what exactly is happening inside gecko.
blocking-b2g: --- → 2.1?
Flags: needinfo?(khuey)
Whiteboard: [CR 749784]
Reporter | ||
Comment 3•10 years ago
|
||
b2g memleak bug 1093281 for comment 2
And here is the full logcat logs :
https://drive.google.com/file/d/0B1cSMS8_GuAETGFQV0piQnI4eFE/view?usp=sharing
crash timestamp is :
"11-03 12:58:54.887 2752 2789 E AudioFlinger: no wake lock to update!" in full logcat logs.
I'll direct this to Andrew for now since I'm swamped.
Flags: needinfo?(khuey) → needinfo?(continuation)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(continuation)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(continuation)
Assignee | ||
Comment 5•10 years ago
|
||
Are these non-DEBUG builds? There are some checks we do in debug builds we could try in a non-debug build.
Flags: needinfo?(tkundu)
Updated•10 years ago
|
Whiteboard: [CR 749784] → [caf priority: p1][CR 749784]
Updated•10 years ago
|
Whiteboard: [caf priority: p1][CR 749784] → [b2g-crash][caf-crash 394][caf priority: p1][CR 749784]
Comment 7•10 years ago
|
||
Observed on:
Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.127
Moz BuildID: 20141029001202
B2G Version: 2.1
Gecko Version: 34.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=eb0aab0f13c78c7ac378ad860e865c4b6eaf669f
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=e16ae12c5ab365be13fb2c98d3811b93b25ded6d
Patches: bug 1070431, bug 1083449
Assignee | ||
Comment 8•10 years ago
|
||
Olli, what do you think this crash stack means? Maybe that the argument to CanonicalizeXPCOMParticipant is null?
Assignee: nobody → continuation
Flags: needinfo?(continuation) → needinfo?(bugs)
Comment 9•10 years ago
|
||
http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=xpcom/base/nsCycleCollector.cpp;h=aecc0a265ff926dd7a8db0b14c6bb112710ac7b5;hb=e16ae12c5ab365be13fb2c98d3811b93b25ded6d#l933
Yeah, aIn is null, as far as I see.
Flags: needinfo?(bugs)
Updated•10 years ago
|
blocking-b2g: 2.1? → 2.1+
Assignee | ||
Comment 10•10 years ago
|
||
My guess here is that there's some kind of bug in UnmarkRemainingPurple, because I think we rarely actually call it, and it sets the mObject pointer to null. In fact, looking at the code now, if mCount > 0, in FreeBlocks(), then we null out pointers in mFirstBlock, but of course we don't actually delete it. Then if we trigger something that iterates over the purple buffer before the first block is filled, I think we'll hit null pointers. So maybe that's the issue? But it would probably be safer to just null check in VisitEntries(), if in fact it is always null like this.
Assignee | ||
Comment 11•10 years ago
|
||
Eh, it looks like UnmarkRemainingPurple() is okay, thanks to the InitBlocks() call after, so I'm not sure what is going on. We can try the null check I guess.
Assignee | ||
Comment 12•10 years ago
|
||
This should be pretty safe. I checked and everything that reads mObject from the purple buffer goes through VisitEntries so this shouldn't move the crash elsewhere.
Attachment #8516348 -
Flags: review?(bugs)
Assignee | ||
Comment 13•10 years ago
|
||
Rebased on top of b2g34, though the patch looks identical, I think.
Attachment #8516348 -
Attachment is obsolete: true
Attachment #8516348 -
Flags: review?(bugs)
Attachment #8516354 -
Flags: review?(bugs)
Assignee | ||
Comment 14•10 years ago
|
||
Olli pointed out that an assert would be good.
Attachment #8516354 -
Attachment is obsolete: true
Attachment #8516354 -
Flags: review?(bugs)
Attachment #8516365 -
Flags: review?(bugs)
Assignee | ||
Comment 15•10 years ago
|
||
Tapas, you could try this patch. It should fix the immediate cause of the crash, assuming they are all null like this. I think we won't end up crashing elsewhere, but I can't tell for sure.
Flags: needinfo?(tkundu)
Updated•10 years ago
|
Attachment #8516365 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 16•10 years ago
|
||
I'm waiting on landing this until we think it actually solves the 2.1 issue.
The try run against trunk is green:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e42b6404cb16
Reporter | ||
Comment 17•10 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #15)
> Tapas, you could try this patch. It should fix the immediate cause of the
> crash, assuming they are all null like this. I think we won't end up
> crashing elsewhere, but I can't tell for sure.
We landed this patch internally and we are waiting for feedback from our test team. I will update here asap
Reporter | ||
Comment 18•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #17)
> (In reply to Andrew McCreight [:mccr8] from comment #15)
> > Tapas, you could try this patch. It should fix the immediate cause of the
> > crash, assuming they are all null like this. I think we won't end up
> > crashing elsewhere, but I can't tell for sure.
>
> We landed this patch internally and we are waiting for feedback from our
> test team. I will update here asap
We are not seeing this again with your patch. It seems to me that this bug is very difficult to reproduce (we have seen only once) and I am not sure this patch really fixed this bug or not.
Please go ahead and land this patch if you are ok with this :) Thanks for your help .
Flags: needinfo?(tkundu) → needinfo?(continuation)
Assignee | ||
Comment 19•10 years ago
|
||
Alright, thanks!
Flags: needinfo?(continuation)
Keywords: checkin-needed
Comment 20•10 years ago
|
||
Keywords: checkin-needed
Comment 21•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.1 S9 (21Nov)
Assignee | ||
Comment 22•10 years ago
|
||
Comment on attachment 8516365 [details] [diff] [review]
Null check in nsPurpleBuffer::VisitEntries.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: 2.1 blocker. I want to get this on non-B2G branches, too, because we run this code on desktop, and it will get us more testing.
Testing completed: TBPL
Risk to taking this patch (and alternatives if risky): very low, just a null check.
String or UUID changes made by this patch: none
Attachment #8516365 -
Flags: approval-mozilla-beta?
Attachment #8516365 -
Flags: approval-mozilla-b2g34?
Attachment #8516365 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Attachment #8516365 -
Flags: approval-mozilla-b2g34? → approval-mozilla-b2g34+
Comment 23•10 years ago
|
||
Comment on attachment 8516365 [details] [diff] [review]
Null check in nsPurpleBuffer::VisitEntries.
I spoke with mccr8 about this request. It's not worth taking on Beta but can provide some value on Aurora.
Beta-
Aurora+
Attachment #8516365 -
Flags: approval-mozilla-beta?
Attachment #8516365 -
Flags: approval-mozilla-beta-
Attachment #8516365 -
Flags: approval-mozilla-aurora?
Attachment #8516365 -
Flags: approval-mozilla-aurora+
Comment 24•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/fba588b0173c
https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/25b39ad03fb8
status-b2g-v2.1:
--- → fixed
status-b2g-v2.2:
--- → fixed
status-firefox34:
--- → wontfix
status-firefox35:
--- → fixed
status-firefox36:
--- → fixed
Comment 25•10 years ago
|
||
We see that crash on ESR 31.3. so would be worth to backport it there too.
Comment 26•10 years ago
|
||
Martin, any chance you have STR for the esr crash?
Flags: needinfo?(stransky)
Assignee | ||
Comment 27•10 years ago
|
||
Comment on attachment 8516365 [details] [diff] [review]
Null check in nsPurpleBuffer::VisitEntries.
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: Crashes, see comment 25. I don't know how common these are.
User impact if declined: crashes
Fix Landed on Version: 37 maybe? recent
Risk to taking this patch (and alternatives if risky): very low
String or UUID changes made by this patch: none
Attachment #8516365 -
Flags: approval-mozilla-esr31?
Comment 28•10 years ago
|
||
There's the stracktrace. Catched on Linux/RHEL5.11.
Flags: needinfo?(stransky)
Comment 29•10 years ago
|
||
Unfortunately stack trace isn't enough here, since that is too generic. We just have a null pointer there, but the question is why.
Comment 30•10 years ago
|
||
I see a crash where also aEntry->mParticipant in VisitEntries() is null.
Comment 31•10 years ago
|
||
If I add the aEntry->mParticipant on the same place the app crashes in GCGraphBuilder::DescribeRefCountedNode():
2051 if (refCount == UINT32_MAX) ?
>2052 Fault("overflowing refcount", mCurrPi)
Assignee | ||
Comment 32•10 years ago
|
||
It is normal for mParticipant to be null in VisitEntries(). Please file a new bug for your issue if the patch here is insufficient. Thanks.
Updated•10 years ago
|
Attachment #8516365 -
Flags: approval-mozilla-esr31? → approval-mozilla-esr31+
Comment 33•10 years ago
|
||
status-firefox-esr31:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•