Closed
Bug 822398
Opened 12 years ago
Closed 12 years ago
Crash when trying to add contact photo during MW2
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-basecamp:+, firefox19 fixed, firefox20 fixed, b2g18 fixed)
People
(Reporter: cjones, Assigned: bechen)
References
Details
(Keywords: regression)
Attachments
(3 files, 1 obsolete file)
STR
(1) Follow the steps in https://wiki.mozilla.org/B2G/Memory_acceptance_criteria#MW2:_Active_call_stays_active
During step 19, the b2g process crashes.
Can we please run these tests as part of our smoketests?
Comment 1•12 years ago
|
||
Over to tchung to determine if we should add this as a smoketest.
Flags: needinfo?(tchung)
Updated•12 years ago
|
blocking-basecamp: ? → +
Andrea: Can you have a look at this? I think this is crashing in webactivities code.
Assignee: nobody → amarchesini
Updated•12 years ago
|
Target Milestone: --- → B2G C3 (12dec-1jan)
Comment 3•12 years ago
|
||
I cannot reproduce this bug.
Can you tell me which release you are using?
Updated•12 years ago
|
Flags: needinfo?(jones.chris.g)
Reporter | ||
Comment 4•12 years ago
|
||
https://wiki.mozilla.org/index.php?title=B2G/Memory_acceptance_criteria&oldid=494353
http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=7b93aab393cd0ea45f2b1894c969f6c5b55ef083
http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=df14f48088e107a180811ff73357e76e95acf776
Flags: needinfo?(jones.chris.g)
Updated•12 years ago
|
Assignee: amarchesini → nobody
Assignee | ||
Comment 6•12 years ago
|
||
Looks like a duplication to Bug 823951
Assignee | ||
Comment 7•12 years ago
|
||
I have got this twice by gdb and both the strings in frame6 are "screen".
#6 0x411cef8a in nsStringBuffer::Release (this=0x49acfb60) at /home/benjamin/Documents/git/unagi/B2G/gecko/xpcom/string/src/nsSubstring.cpp:161
0000000: 00 00 00 00 52 00 61 00 64 00 69 00 6f 00 20 00 ....R.a.d.i.o. .
0000010: 46 00 4d 00 20 00 47 00 61 00 69 00 61 00 00 00 F.M. .G.a.i.a...
0000020: 04 00 ff ff 00 00 00 00 0a 00 00 00 69 6d 61 67 ............imag
0000030: 65 2f 67 69 66 00 00 20 63 6c 6f 73 65 00 00 00 e/gif.. close...
0000040: 00 00 9d 47 00 00 00 00 0b 00 00 00 6b 65 65 70 ...G........keep
0000050: 2d 61 6c 69 76 65 00 00 e0 98 30 40 d0 81 b7 43 -alive....0@...C
0000060: c7 75 23 40 00 00 00 00 0e 00 00 00 73 00 63 00 .u#@........s.c.
0000070: 72 00 65 00 65 00 6e 00 0a r.e.e.n
Comment 8•12 years ago
|
||
In frame 1, this assertion failed:
RELEASE_ASSERT((run->regs_mask[elm] & (1U << bit)) == 0);
It looks like a double free.
In frame 6:
int32_t count = PR_ATOMIC_DECREMENT(&mRefCount);
NS_LOG_RELEASE(this, count, "nsStringBuffer");
if (count == 0)
{
STRING_STAT_INCREMENT(Free);
free(this); // we were allocated with |malloc|
}
It is possible that more than one threads free the same thing simultaneously.
The "if (count ==0)" check should be combined with the decrement.
Comment 9•12 years ago
|
||
Sorry! The protection should be fine! So it may be that it is freed in other sites.
Assignee | ||
Comment 10•12 years ago
|
||
It's double free.
Reporter | ||
Comment 11•12 years ago
|
||
Nice! :)
Are you able to reproduce this in a DEBUG build? It looks like bug 823474 just maybe could cause this, but I'm not 100% sure. If it's bug 823474, then assertions will fail in DEBUG builds.
Reporter | ||
Comment 12•12 years ago
|
||
What the backtraces seem to show is that
- we're actually destroying a string here [1]; maybe that string is |aKey|? o_O
- we hand out that string reference to an hal:: notification [2], which is probably calling into JS
- then we later die trying to free the same string JSExternalString::finalize
This looks like it might maybe be violating some pldhash invariant.
Also looks like it might be the Great White Whale of random crashes :D.
[1] http://mxr.mozilla.org/mozilla-central/source/hal/HalWakeLock.cpp#81
[2]http://mxr.mozilla.org/mozilla-central/source/hal/HalWakeLock.cpp#85
Reporter | ||
Comment 13•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12)
> - we hand out that string reference to an hal:: notification [2], which is
> probably calling into JS
Confirmed that this will result in JS callbacks being invoked.
Comment 14•12 years ago
|
||
Hmm, I didn't notice that IPDL type doesn't copy values. We should copy the key. Nice catch Benjamin.
Reporter | ||
Comment 15•12 years ago
|
||
Looking back over the memory-test history, things started getting really crash between 12/6 and 12/12. With I had more resolution there, but bug 811740 is in that history. That seems like the kind of change that might exacerbate a pre-existing problem.
Reporter | ||
Comment 16•12 years ago
|
||
Bug 814822 merged on 11/27, and bug 817946 was filed on 12/3. Doesn't say a whole lot, but things are in range.
Reporter | ||
Comment 17•12 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] from comment #14)
> Hmm, I didn't notice that IPDL type doesn't copy values. We should copy the
> key. Nice catch Benjamin.
The IPC layer copies the incoming key. I think the problem is
sLockTable->Remove(aKey);
while we're enumerating |aKey|. I think we need to return PL_DHASH_REMOVE and not remove the key there, but I need to double check the dhash API. (It's not very friendly :/.)
Reporter | ||
Comment 18•12 years ago
|
||
Comment 19•12 years ago
|
||
Use Enumerate instead EnumerateRead?
Comment 20•12 years ago
|
||
Comment on attachment 695954 [details] [diff] [review]
Fix
Review of attachment 695954 [details] [diff] [review]:
-----------------------------------------------------------------
::: hal/HalWakeLock.cpp
@@ +78,5 @@
> LockCount totalCount;
> WakeLockInformation info;
> aTable->EnumerateRead(CountWakeLocks, &totalCount);
> if (!totalCount.numLocks) {
> + op = PL_DHASH_REMOVE;
op |= PL_DHASH_REMOVE;
We need to search the whole table.
I'm not sure if EnumerateRead() is allowed to return PL_DHASH_REMOVE. If not we should change the enumerator to Enumerate().
Reporter | ||
Comment 21•12 years ago
|
||
Yep you're right. Now trying to sort through the also not-very-friendly nsBaseHashtable API :/.
Comment 22•12 years ago
|
||
Reporter | ||
Comment 23•12 years ago
|
||
I think this is right. The PLDHashOperator isn't really a bitfield when enumerating.
Reporter | ||
Comment 24•12 years ago
|
||
Comment on attachment 695957 [details] [diff] [review]
Fix v2
I was able to complete the memory acceptance tests without any crashes! Looks promising. That's only happened once in the last 14 days.
Kan-Ru, do you have tests that would check that we're removing the wake-lock tables properly?
Attachment #695957 -
Flags: review?(kchen)
Reporter | ||
Updated•12 years ago
|
Attachment #695954 -
Attachment is obsolete: true
Reporter | ||
Comment 25•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #23)
> Created attachment 695957 [details] [diff] [review]
> Fix v2
>
> I think this is right. The PLDHashOperator isn't really a bitfield when
> enumerating.
Specifically,
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.cpp#717
Comment 26•12 years ago
|
||
Comment on attachment 695957 [details] [diff] [review]
Fix v2
Review of attachment 695957 [details] [diff] [review]:
-----------------------------------------------------------------
The existing test doesn't cover multiple wake lock with different names.
Attachment #695957 -
Flags: review?(kchen) → review+
Reporter | ||
Comment 27•12 years ago
|
||
I got a second consecutive non-crash memory acceptance run, so this is promising indeed.
Reporter | ||
Comment 28•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/de53cfa4b491
https://hg.mozilla.org/releases/mozilla-aurora/rev/e28578533fae
https://hg.mozilla.org/releases/mozilla-b2g18/rev/6d8cdf8cb5c3
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 33•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #27)
> I got a second consecutive non-crash memory acceptance run, so this is
> promising indeed.
Three in a row. Unheard of!
Assignee | ||
Comment 34•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12)
> What the backtraces seem to show is that
> - we're actually destroying a string here [1]; maybe that string is |aKey|?
> o_O
> - we hand out that string reference to an hal:: notification [2], which is
> probably calling into JS
> - then we later die trying to free the same string
> JSExternalString::finalize
>
> This looks like it might maybe be violating some pldhash invariant.
>
> Also looks like it might be the Great White Whale of random crashes :D.
>
> [1] http://mxr.mozilla.org/mozilla-central/source/hal/HalWakeLock.cpp#81
> [2]http://mxr.mozilla.org/mozilla-central/source/hal/HalWakeLock.cpp#85
I have a question about the nsStringBuffer's ref count.
From my attachment 695937 [details], the first and the second free come from
nsStringBuffer::Release.
There is a mRefCount for the nsStringBuffer, only free when mRefCount == 0
That means someone keep the reference and add the mRefCount after the first free?
Comment 35•12 years ago
|
||
As bugs with verified seen-in-the-wild signatures were blindly duped here, can people who could reproduce the exact crash scenario fixed here please give us some crash IDs? Otherwise I think I'll probably need to reopen the duped bugs unless we can verify otherwise that those top crash signatures from the other bugs were indeed this issue.
Reporter | ||
Comment 37•12 years ago
|
||
(In reply to Benjamin Chen from comment #34)
> I have a question about the nsStringBuffer's ref count.
> From my attachment 695937 [details], the first and the second free come from
> nsStringBuffer::Release.
> There is a mRefCount for the nsStringBuffer, only free when mRefCount == 0
>
> That means someone keep the reference and add the mRefCount after the first
> free?
The problem is that the enumerator is passed a const reference to the string element in the hash set, and then frees the item in the hash set that the reference points to.
We would have caught this with nasty assertion failures if we had a test for this running in debug builds (v. bug 824526).
Comment 38•12 years ago
|
||
Updated•12 years ago
|
Reporter | ||
Comment 39•12 years ago
|
||
This is a huge improvement in stability :). bechen++ for the analysis!
Updated•12 years ago
|
Flags: needinfo?(tchung)
You need to log in
before you can comment on or make changes to this bug.
Description
•