Closed
Bug 774622
Opened 12 years ago
Closed 12 years ago
crash with abort message: "mismatched CxxStackFrame ctor/dtors: file /builds/slave/m-cen-andrd-ntly/build/ipc/glue/RPCChannel.cpp, line 656" on quitting Nightly
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
VERIFIED
FIXED
mozilla18
People
(Reporter: AdrianT, Assigned: nical)
References
Details
(4 keywords, Whiteboard: [native-crash])
Crash Data
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
cjones
:
review+
akeybl
:
approval-mozilla-aurora+
akeybl
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-2b3ea40a-be14-424e-b07d-6145d2120717 .
Other crashes:
https://crash-stats.mozilla.com/report/index/bp-6337ba57-4ad8-450b-9737-bd7772120717
https://crash-stats.mozilla.com/report/index/bp-ace5405c-2498-4f42-a283-ebe1c2120717
=============================================================
Nightly 16.0a1 2012-07-16
Google Nexus (Android 4.0.4)
Steps to reproduce:
1) Load planet.mozilla.org.
2) Wait for the page to load.
3) Quit Firefox
Actual result:
A crash happens. I got 3 different crashes from 4 tries. Adding the crash signatures to the same bug as they may be related since the same steps to reproduce apply.
Issue is not reproducible on Aurora
=============================================================
0 libmozalloc.so mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:19
1 libc.so libc.so@0x11f6e
2 dalvik-heap (deleted) dalvik-heap @0x7b511f
3 dalvik-mark-stack (deleted) dalvik-mark-stack @0x30eaf40
4 org.mozilla.fennec-2.apk org.mozilla.fennec-2.apk@0x124556b
5 org.mozilla.fennec-2.apk org.mozilla.fennec-2.apk@0x33696a
6 dalvik-heap (deleted) dalvik-heap @0xfad5f63
7 org.mozilla.fennec-2.apk org.mozilla.fennec-2.apk@0x703633
8 libxul.so XPCJSRuntime::Get js/xpconnect/src/xpcprivate.h:645
9 libxul.so XPCCallContext::~XPCCallContext js/xpconnect/src/XPCCallContext.cpp:302
10 libxul.so nsXPCWrappedJSClass::DelegatedQueryInterface js/xpconnect/src/XPCWrappedJSClass.cpp:745
11 @0xffffffff
12 libxul.so XPCCallContext::~XPCCallContext js/xpconnect/src/XPCCallContext.cpp:334
13 libxul.so nsXPCWrappedJSClass::DelegatedQueryInterface js/xpconnect/src/XPCWrappedJSClass.cpp:745
14 @0x5f4c5d3e
15 libmozglue.so libmozglue.so@0x7041
16 libmozglue.so libmozglue.so@0x8fbd
17 libmozglue.so libmozglue.so@0x90fd
18 libmozglue.so libmozglue.so@0x9c7b
19 libmozglue.so libmozglue.so@0x20ffa
20 libmozglue.so libmozglue.so@0x199d7
21 libmozglue.so libmozglue.so@0x7041
22 libmozglue.so libmozglue.so@0x8fbd
23 libc.so libc.so@0x12132
24 libc.so libc.so@0x436ae
25 libmozglue.so libmozglue.so@0x19b29
26 libmozglue.so libmozglue.so@0x19b41
27 libxul.so std::__node_alloc::allocate _alloc.h:158
28 libxul.so std::priv::_STLP_alloc_proxy<unsigned int, IPC::Message, std::allocator<IPC::Mes _alloc.h:307
29 @0x3
30 libxul.so mozilla::ipc::RPCChannel::DebugAbort ipc/glue/RPCChannel.cpp:656
31 libxul.so mozilla::ipc::RPCChannel::~RPCChannel ipc/glue/RPCChannel.cpp:80
32 libxul.so mozilla::layers::PCompositorParent::~PCompositorParent obj-firefox/ipc/ipdl/PCompositorParent.cpp:88
33 libxul.so mozilla::layers::CompositorParent::~CompositorParent gfx/layers/ipc/CompositorParent.cpp:106
34 libxul.so mozilla::layers::CompositorParent::~CompositorParent gfx/layers/ipc/CompositorParent.cpp:106
35 libxul.so mozilla::layers::CompositorParent::Release CompositorParent.h:58
36 libxul.so RunnableFunction<void , Tuple2<nsRefPtr<mozilla::layers::CompositorParent>, nsRe nsAutoPtr.h:874
37 libxul.so RunnableFunction<void , Tuple2<nsRefPtr<mozilla::layers::CompositorParent>, nsRe ipc/chromium/src/base/task.h:411
38 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:327
39 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:334
40 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:434
41 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:86
42 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:208
43 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:201
44 libxul.so nsBaseAppShell::Run widget/xpwidgets/nsBaseAppShell.cpp:163
45 libxul.so nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:257
46 libxul.so XREMain::XRE_mainRun toolkit/xre/nsAppRunner.cpp:3787
47 libxul.so XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:3864
48 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:3940
49 libxul.so GeckoStart toolkit/xre/nsAndroidStartup.cpp:73
50 libmozglue.so libmozglue.so@0x10aad
51 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e
52 libdvm.so libdvm.so@0x1ec32
53 dalvik-heap (deleted) dalvik-heap @0xcb3bae
54 libdvm.so libdvm.so@0x58eed
55 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x134e61
56 libmozglue.so libmozglue.so@0x10a5b
57 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x11d4b9
58 libc.so libc.so@0x14a43
59 libdvm.so libdvm.so@0x98e3d
60 libc.so libc.so@0x158a7
61 libmozglue.so libmozglue.so@0x10a5b
62 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x11d4b9
63 libc.so libc.so@0x158a7
64 libmozglue.so libmozglue.so@0x10a5b
65 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x11d4b9
66 libc.so libc.so@0x15f09
67 libdvm.so libdvm.so@0x5ae8d
68 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x347e
69 dalvik-heap (deleted) dalvik-heap @0xddde1e
70 dalvik-heap (deleted) dalvik-heap @0xddde1e
71 libdvm.so libdvm.so@0xa2532
72 dalvik-heap (deleted) dalvik-heap @0xddde1e
73 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x2991a2
74 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e
75 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e
76 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x148534
77 libdvm.so libdvm.so@0x8e8ab
78 libdvm.so libdvm.so@0x74897
79 dalvik-heap (deleted) dalvik-heap @0xcb3bae
80 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e
81 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e
82 libdvm.so libdvm.so@0x58d43
83 dalvik-heap (deleted) dalvik-heap @0xcb3bae
84 libdvm.so libdvm.so@0x5ac1d
85 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29918e
86 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x9ccb8
87 dalvik-heap (deleted) dalvik-heap @0xcb3bae
88 libdvm.so libdvm.so@0x1edbe
89 libdvm.so libdvm.so@0x30a8e
90 libdvm.so libdvm.so@0xb2f8e
91 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29b4b6
92 libdvm.so libdvm.so@0x34272
93 libdvm.so libdvm.so@0x5f987
94 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29b4b6
95 dalvik-heap (deleted) dalvik-heap @0xc9b946
96 data@app@org.mozilla.fennec-2.apk@classes.dex data@app@org.mozilla.fennec-2.apk@classes.dex@0x134a9c
97 libdvm.so libdvm.so@0x6c8ff
98 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x29b4b6
99 dalvik-heap (deleted) dalvik-heap @0xc9b946
100 libdvm.so libdvm.so@0xb7c56
101 libdvm.so libdvm.so@0x5f987
102 libdvm.so libdvm.so@0x6c923
103 libdvm.so libdvm.so@0xb7c56
104 libdvm.so libdvm.so@0x5f987
105 libdvm.so libdvm.so@0xb2f8e
106 libdvm.so libdvm.so@0x5fa37
107 libdvm.so libdvm.so@0x5f987
108 libc.so libc.so@0x12c4e
109 libc.so libc.so@0x127a2
Comment 1•12 years ago
|
||
(In reply to adrian tamas from comment #0)
> https://crash-stats.mozilla.com/report/index/bp-6337ba57-4ad8-450b-9737-bd7772120717
This one looks like bug 763805/bug 763813.
Keywords: regression
Summary: crash in mozalloc_abort on quitting Nightly → crash with abort message: "mismatched CxxStackFrame ctor/dtors: file /builds/slave/m-cen-andrd-ntly/build/ipc/glue/RPCChannel.cpp, line 656" on quitting Nightly
Whiteboard: [native-crash]
Comment 2•12 years ago
|
||
Based on crash stats, it first appeared in 16.0a1/20120715. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0602e44ac248&tochange=57abb5f70e01
It might be a regression from bug 745148.
Comment 4•12 years ago
|
||
Chris looks like a regression from your bug, can you take a look?
Assignee: nobody → jones.chris.g
tracking-fennec: ? → 16+
tracking-firefox16:
--- → ?
Keywords: regressionwindow-wanted
Updated•12 years ago
|
Crash Signature: [@ mozalloc_abort | XPCJSRuntime::Get]
[@ mozalloc_abort | nsAString_internal::Assign ]
[@ mozalloc_abort | nsSVGUtils::GetFontXHeight ] → [@ mozalloc_abort | XPCJSRuntime::Get]
[@ mozalloc_abort | nsAString_internal::Assign]
[@ mozalloc_abort | nsSVGUtils::GetFontXHeight]
[@ mozalloc_abort | NS_IsMainThread_P]
[@ mozalloc_abort | libwebcore.so@0x3acf63]
[@ mozalloc_abort | dalvik-bitm…
Updated•12 years ago
|
Crash Signature: dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort] → dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ]
Comment 6•12 years ago
|
||
The mozalloc_abort | XPCJSRuntime::Get signature is #7 top crasher in 16.0a2 and #5 in 17.0a1. This bug ranks probably higher with its other signatures that are shared with other bugs.
Keywords: topcrash
Reporter | ||
Comment 7•12 years ago
|
||
Good build: 2012-07-14
Bad build: 2012-07-15
Possible regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0602e44ac248&tochange=57abb5f70e01
Keywords: regressionwindow-wanted
Comment 8•12 years ago
|
||
That range has a lot of changesets in it, please use tinderbox-builds.
Keywords: regressionwindow-wanted
Comment 9•12 years ago
|
||
The presence of bug 767775 and bug 745148 in that list looks suspicious. Any thoughts, Chris?
Updated•12 years ago
|
Updated•12 years ago
|
Crash Signature: dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ] → dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort | nsThreadManager::GetIsMainThread]
[@ mozalloc_abort | arena_avail_tree_insert]
[@ mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ]
Comment 10•12 years ago
|
||
Crashes on abort account for 4.4% of all crashes in 15.0b3 and 17.7% in 16.0a2.
Guessing this bug is the only crash regression in 16.0, it accounts for around 13% of all crashes in 16.0a2, making it #1 top crasher in 16.0a2.
Comment 11•12 years ago
|
||
I've hit this crash a couple of times on the Galaxy Nexus, Android 4.1 while quitting while I was on http://buienradar.nl/ .
Adrian, could you try and narrow the regression range further?
Reporter | ||
Comment 12•12 years ago
|
||
Testing tinderbox central builds:
no crash: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1342280061/
crash: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1342285161/
I will continue to try to narrow the regression range further.
Comment 13•12 years ago
|
||
Here is the regression range of comment 12:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=011147826130&tochange=9046ecf4db8f
Neither bug 767775, nor bug 745148 belong to it.
I think it's a regression from bug 763234.
Reporter | ||
Comment 14•12 years ago
|
||
In inbound tinderbox builds:
The good build: 2012-07-13 73e6fad01985
The bad build: 2012-07-13 f9723f26c8a9
Pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=73e6fad01985&tochange=f9723f26c8a9
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Assignee | ||
Comment 16•12 years ago
|
||
Bug 763234 expects that no one uses the CompositorParents or whatever is living on the compositor thread after gfxPlatform has been destroyed. Could it be related to this?
Updated•12 years ago
|
Crash Signature: dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort | nsThreadManager::GetIsMainThread]
[@ mozalloc_abort | arena_avail_tree_insert]
[@ mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ] → dalvik-bitmap-2 (deleted)@0x14d11f]
[@ mozalloc_abort | dalvik-bitmap-2 (deleted)@0x12011f]
[@ mozalloc_abort | 0 (deleted)@0x11f911f]
[@ mozalloc_abort | nsThreadManager::GetIsMainThread]
[@ mozalloc_abort | arena_avail_tree_insert]
[@ mozalloc_abor…
Comment 20•12 years ago
|
||
In 16.0a2/20120819, it accounts for 46% of all crashes.
Comment 21•12 years ago
|
||
(In reply to Scoobidiver from comment #20)
> In 16.0a2/20120819, it accounts for 46% of all crashes.
I'll ping cjones now and see if anybody else may have more time to look at this.
Assignee | ||
Comment 22•12 years ago
|
||
I am looking at it.
I had troubles debugging this until recently because I couldn't get symbols to show up in gdb (see bug 781668).
If it is very important that we get a temporary hack to remove the crash ASAP, one (very bad) possibility is to leak CompositorParent. CompositorParent is created a startup and destroyed when exiting firefox. The crash happens in its destructor.
Has anyone been able to create reliable STR for this bug?
(In reply to Nicolas Silva [:nical] from comment #22)
> If it is very important that we get a temporary hack to remove the crash
> ASAP, one (very bad) possibility is to leak CompositorParent.
> CompositorParent is created a startup and destroyed when exiting firefox.
> The crash happens in its destructor.
Shutdown crashes are also extremely frustrating because that's exactly what we want to do at shutdown. Another workaround is determine after what point in shutdown this crash is occurring, and then _exit(0) instead of aborting if we hit this assertion at that point.
STR definitely the way to start here.
Comment 24•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #23)
> STR definitely the way to start here.
Adding needURLs and qawanted.
Comment 25•12 years ago
|
||
Sorry, I cannot draw URLs for all abort crashes, and this looks like it's becoming just a collection of all abort crashes, which probably can have any number of different causes. I doubt there's any use in pulling URLs for all of those.
I also don't know how to pull URLs specifically for one abort message, which may be what could be reasonable here.
Comment 26•12 years ago
|
||
Actually, I remembered that with our old method of getting URLs off some CSV files, we can query the app notes, and so I did that. I'm leaving out those with only one hit in the last 12 days.
[rkaiser@fs1 ~]$ gunzip --stdout /data/security_group/crash_urls/201208{1,2}*-crashdata.csv.gz | awk -W compat -F\t '$7 ~ /^FennecAndroid$/ && $26 ~ /mismatched CxxStackFrame ctor\/dtors/ {print $2}' | sort | uniq -c | sort -nr
1449 \N
10 http://moztw.org/foxmosa/game/pairs/
6 about:
4 http://lenta.ru/news/2012/08/17/srok/
3 http://www.telegraph.co.uk/health/healthnews/9485807/Baldness-cure-could-be-on-shelves-in-two-years.html
3 http://www.bbc.co.uk/news/uk-england-sussex-19329653
3 https://touch.facebook.com/?_rdr
3 about:home
3 about:config
3 about:apps
2 http://www.northamptonsaints.co.uk/news/6331.php
2 http://www.hkepc.com/forum/viewthread.php?tid=1845400&extra=page%3D1
2 http://www.bbc.co.uk/sport/0/mobile/cricket/19309373
2 http://videojs.com/
2 https://signin.ebay.com/ws/eBayISAPI.dll?SignIn&UsingSSL=1&pUserId=&co_partnerId=2&siteid=0&ru=http%3A%2F%2Fcgi5.ebay.com%2Fws%2FeBayISAPI.dll%3FNewListing%26%26np_id%3D0%26cpg%3D2%26aid%3D3%26tid%3D8453930017%26guest%3D1%26userid%3D&pageType=1144
2 https://air.mozilla.org/
2 http://m.forums.theregister.co.uk/forum/1/2012/08/17/twitter_api_rules/
2 http://m.bbc.co.uk/news/entertainment_and_arts
2 about:firefox
2 about:blank
Keywords: needURLs
Updated•12 years ago
|
Crash Signature: mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort | js_NewStringCopyZ]
[@ mozalloc_abort | js_NewString]
[@ mozalloc_abort | js::TraceChildren]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ] → mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort | js_NewStringCopyZ]
[@ mozalloc_abort | js_NewString]
[@ mozalloc_abort | js::TraceChildren]
[@ mozalloc_abort | __wrap_malloc]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ]
Reporter | ||
Comment 28•12 years ago
|
||
The steps from Comment 0 can still be used to reproduce the issue using Nightly 2012-08-27 on Galaxy Note running Android 4.0.4.
Steps to reproduce:
1) Load planet.mozilla.org.
2) Wait for the page to load.
3) Quit Firefox
Reproducibility rate is about 40-50% of the cases. From ~10 tries I got 4 crashes.
Keywords: qawanted
Assignee | ||
Comment 29•12 years ago
|
||
FWIW, I don't manage to reproduce it with debug builds while I can reproduce it on optimized builds (on a galaxy nexus and a nexus 7).
Comment 30•12 years ago
|
||
I can reproduce with the suggested steps in comment #28 pretty easily (08/28, GN:4.1.1)
I/GeckoApp( 2680): pause
I/GeckoApp( 2680): application paused
I/Gecko ( 2680): ###!!! ABORT: mismatched CxxStackFrame ctor/dtors: file ../../../ipc/glue/RPCChannel.cpp, line 656
Updated•12 years ago
|
Keywords: reproducible
Assignee | ||
Comment 32•12 years ago
|
||
I see that we are sometimes deleting the compositor thread before we delete CompositorParent. When this happens the nsWindow are already deleted, so it is likely that something dispatches a NewRunnableFunction or NewRunnableMethod that takes the CompositorParent in parameter (this uses a ref ptr internally) which keeps it alive for too long.
I'm almost there
Target Milestone: --- → Firefox 17
Updated•12 years ago
|
Crash Signature: mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort | js_NewStringCopyZ]
[@ mozalloc_abort | js_NewString]
[@ mozalloc_abort | js::TraceChildren]
[@ mozalloc_abort | __wrap_malloc]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ] → mozalloc_abort | js::ObjectImpl::markChildren]
[@ mozalloc_abort | js_NewStringCopyZ]
[@ mozalloc_abort | js_NewString]
[@ mozalloc_abort | js::TraceChildren]
[@ mozalloc_abort | __wrap_malloc]
[@ mozalloc_abort]
[@ libmozalloc.so@0x1704 ]
[@ mozal…
status-firefox18:
--- → affected
Version: Firefox 16 → Firefox 18
Updated•12 years ago
|
Crash Signature: mozalloc_abort | ProcessGeneric]
[@ mozalloc_abort | mozilla::dom::CSS2PropertiesBinding::set_borderImageRepeat]
[@ mozalloc_abort | exn_toSource]
[@ mozalloc_abort | browser.db (deleted)@0x156451]
[@ mozalloc_abort | browser.db (deleted)@0x356e6c]
[… → mozalloc_abort | ProcessGeneric ]
[@ mozalloc_abort | mozilla::dom::CSS2PropertiesBinding::set_borderImageRepeat ]
[@ mozalloc_abort | exn_toSource ]
[@ mozalloc_abort | browser.db (deleted)@0x156451 ]
[@ mozalloc_abort | browser.db (deleted)@0x356e6c…
Comment 33•12 years ago
|
||
Christian, it's a regression from 16.0, not from 18.0.
Version: Firefox 18 → Firefox 16
Assignee | ||
Comment 34•12 years ago
|
||
It seems that the problem is coming from the CompositorParent being kept alive longer than expected because of a dispatched Task, as I said earlier.
This patch fixes the crash on my nexus 7. I just pushed it to try.
What it does is it manages a reference count for the thread, so that when CompositorParent::DestroyThread is called it only decrements the ref count, and only when the ref count reaches zero we delete the thread. It turned out that managing a static int counter by hand required less code than creating a reference counted subclass of base::Thread so I did it by hand.
Assignee: jones.chris.g → nsilva
Attachment #656574 -
Flags: review?(jones.chris.g)
Reporter | ||
Updated•12 years ago
|
Crash Signature: (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ] → (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get]
Assignee | ||
Comment 35•12 years ago
|
||
oh well, ran into the cxx stackframe abort again with the patch applied, so I probably just made it less likely to happen :(
Chris, do you know if it is important that CompositorParent's destructor gets called before the compositor thread is destroyed? If so my patch still has some value...
Assignee | ||
Comment 36•12 years ago
|
||
I have (yet another) theory:
The abort that causes the crash is a piece of ipdl code that checks that a stack mCxxStackFrame is empty.
this stack is pusehd and poped only through RAII with CxxStackFrame (see RPCChannel.h) when sending and receiving messages, like this
RPCChannel::Send(Message* msg)
{
Message copy = *msg;
CxxStackFrame f(*this, OUT_MESSAGE, ©); // push
return AsyncChannel::Send(msg);
} // (RAII) pop
with a sync message we expect to see the following sequence (not always true actually):
Sender.RPCChannel.mCxxStackFrame.push(OUT_MESSAGE)
Receiver.RPCChannel.mCxxStackFrame.push(IN_MESSAGE)
// the message is handled on the receiver's thread
Receiver.RPCChannel.mCxxStackFrame.pop(IN_MESSAGE)
Sender.RPCChannel.mCxxStackFrame.pop(OUT_MESSAGE)
However, if the stack is poped after sending back to the sender the signal that the message has beene handled, it is actually possible that the sender thread start running again before the receiver thread reaches the end of the scope and pops its stack.
In our case the crash happens always in the destructor of CompositorParent very shortly after PCompositorParent::SendStop, because we dispatch a task on the main thread that holds a refPtr to the compositor parent, that calls SendStop and then returns. When the task returns CompositorParent is deleted because the task's RefPtr is destroyed.
so ~CompositorParent happens very shortly after SendStop.
if i printf_stderr something in ~CxxStackFrame just before poping the stack this makes it way easier to reproduce.
Chris, does it make sense? You know ipdl way better than me.
Is the CompositorParent setting itself to be destroyed during ActorDestroy? What's the expected destruction sequence.
Comment on attachment 656574 [details] [diff] [review]
Fixes fennec exit crash by reference counting the compositor thread.
Clearing review request since doesn't seem the entire fix.
Attachment #656574 -
Flags: review?(jones.chris.g)
Comment 39•12 years ago
|
||
Still crashing on the latest Nightly: https://crash-stats.mozilla.com/report/index/bp-a2f48d7a-8ba0-4eb2-8257-b07ed2120831
Crash Signature: (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get] → (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get]
[@ mozalloc_abort | UnmarkGrayChildren ]
Assignee | ||
Comment 40•12 years ago
|
||
When the widget is destroyed (in its dtor), CompositorChild first sends a WillStop message, telling the parent side that it has to get rid of its resources now. After SendWillStop, a task is dispatched on the main thread to defer the second phase of the destruction (it's the same 2-steps destruction sequence principle as ImageBridge's). This task is a RunnableFunction that executes DeferredDestroyCompositor (nsBaseWdget.cpp) which calls compositorChild->Destroy() and immediately after, loses the last reference to the CompositorParent and the CompositorChild which deletes them. Destroy() calles SendStop() which synchronously finishes the destruction sequence.
in short:
nsBaseWidget::dtor
| CompositorChild::SendWillStop [sync]
| | CompositorParent::RecvWillStop
| Dispatch DeferredDestroyCompositor
...
(on fennec it appears that we run into gfxPlatform::ShutDown here, in which case:)
gfxPlatform::ShutDown
| delete compositorThread
...
DeferredDestroyCompositor
| CompositorChild::SendStop [sync]
| | compsitorParent::RecvStop
| CompositorChild::dtor
| CompositorParent::dtor
I think we have at least two bugs here, the first being that the compositor thread can be destroyed way too early, so my first patch fixes this.
The second bug is that since CompositorParent::dtor (and therefore RPCChannel's dtor) is called immediately after SendStop, we have a race between the CompositorParent's destructor and the RPCChannel's poping its mCxxStackFrames on the other side.
Updated•12 years ago
|
Crash Signature: (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get]
[@ mozalloc_abort | UnmarkGrayChildren ] → (deleted)@0x356e6c ]
[@ mozalloc_abort | JSCompartment::sweepBreakpoints ]
[@ mozalloc_abort | libfreebl3.so (deleted)@0x20341 | XPCJSRuntime::Get]
[@ mozalloc_abort | UnmarkGrayChildren ]
[@ mozalloc_abort | __swrite | libxul.so@0x10d3d37]
[@ mozall…
The parent and child worker threads each get their own set of stack frames. If we're sharing them somehow, that would be a fantastically bad bug! :/
Assignee | ||
Comment 42•12 years ago
|
||
They're not sharing their stack frames, but they both get deleted in the main thread at the end of DeferredDestroyCompositor. In our case we are having a problem with CompositorParent being destroyed on the main thread while its CxxStackFrames has not yet be poped on the compositor thread after a sync message from the main thread to the compositor thread.
Oh, so we're trying to delete the CompositorParent on the main thread while it's still active on the compositor thread? That's also very bad! :)
Comment 44•12 years ago
|
||
Linux is affected and the stack trace is not buggy. See https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort+|+NS_DebugBreak_P+|+mozilla%3A%3Aipc%3A%3ARPCChannel%3A%3ADebugAbort
Component: General → Graphics: Layers
OS: Android → All
Product: Firefox for Android → Core
Hardware: ARM → All
Target Milestone: Firefox 17 → ---
Version: Firefox 16 → 16 Branch
Assignee | ||
Comment 45•12 years ago
|
||
I have a patch in whice we dispatch a task on the Compositor thread from within CompositorParent::RecvStop in order to make sure that a nsRefPtr of the compositor parent stays there and keeps the compositor parent alive until the end of the ipdl code around RecvStop.
So this solves the problem of deleting the CompositorParent while it is still active on the compositor thread.
With this patch we then have the compositor thread's ref count that reaches zero within its own thread, so we end up having to dispatch a task on the main thread so that later deletes the compositor thread.
The destruction sequence is becoming quite messy:
When a widget is destroyed it triggers most of the compositor parent's clean up, and dispatches a task on the main thread to terminate the cleanup.
This is not extremely nice because we then have part of the compositor stuff being destroyed before the destruction of gfxPlatform, and the rest being destroyed after.
When the second half of the destruction happens, we need to have the compositor thread dispatch a task to himself to avoid deleting the compositor parent too soon. When we reach this task on the compositor thread we then need to dispatch a task from the compositor thread to the main thread so that the main thread deletes the compositor thread.
The way we end up dispatching tasks all over the place makes it hard for us too have guarantees that we destroy the modules in the right order. For instance, Chris, do we have a constraint on how soon we must have deleted the last object that uses ipdl?
Comment 46•12 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #45)
> When we reach this task on the compositor thread
> we then need to dispatch a task from the compositor thread to the main
> thread so that the main thread deletes the compositor thread.
I'm concerned with dispatching a task from the compositor thread to the main thread after we've destroyed the channel and at which points the main thread is tearing down the browser. We have no guarantee that the main thread will always be in a proper state to respond to this event and wont be far along it's shutdown sequence.
I think the right thing to do here is to destroy the associated compositor parent before the widget. To do this we need to find a way to make sure all the pending messages are processed.
Assignee | ||
Comment 48•12 years ago
|
||
This patch seems to fix it for me on fennec debug and optimized.
I agree with BenWa that fixing the problem at this level is probably not the best solution, yet we have to do something about this crash in the short term and fixing the entire destruction sequence may be very tricky.
Attachment #656574 -
Attachment is obsolete: true
Attachment #658588 -
Flags: review?(jones.chris.g)
Updated•12 years ago
|
Crash Signature: libxul.so@0x1148aa3 | NS_IsMainThread_P]
[@ mozalloc_abort | org.mozilla.fennec_aurora-2.apk@0xf24f40]
[@ mozalloc_abort | __swrite | libxul.so@0x1148aa3 | 0 (deleted)@0x11f911f] → libxul.so@0x1148aa3 | NS_IsMainThread_P]
[@ mozalloc_abort | org.mozilla.fennec_aurora-2.apk@0xf24f40]
[@ mozalloc_abort | __swrite | libxul.so@0x1148aa3 | 0 (deleted)@0x11f911f]
[@ mozalloc_abort | __swrite | libxul.so@0x1148b03]
[@ mozalloc_abort | …
Updated•12 years ago
|
Crash Signature: dalvik-bitmap-2 (deleted)@0x111f63]
[@ mozalloc_abort | org.mozilla.firefox_beta-1.apk@0x6d2038]
[@ mozalloc_abort | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort] → dalvik-bitmap-2 (deleted)@0x111f63]
[@ mozalloc_abort | org.mozilla.firefox_beta-1.apk@0x6d2038]
[@ mozalloc_abort | __swrite | libxul.so@0x10d3cf7]
[@ mozalloc_abort | org.mozilla.firefox_beta-2.apk@0x6d2038]
[@ mozalloc_abort | org.mozilla.firefox_b…
Comment on attachment 658588 [details] [diff] [review]
Fixes fennec exit crash by reference counting the compositor thread and making sure a CompositorParent is not deleted while running code on the compositor thread.
>diff --git a/gfx/layers/ipc/CompositorParent.cpp b/gfx/layers/ipc/CompositorParent.cpp
>+static int sCompositorThreadRefCount = 0;
Document the semantics of this.
>+static void KeepCompositorParentAlive(CompositorParent* aNowReadyToDie)
Call this DeferredDelete.
>+{
>+ aNowReadyToDie->Release();
>+}
>+
>+static void DeleteCompositorThread()
>+{
>+ if (NS_IsMainThread()){
>+ if (sCompositorThread) {
>+ delete sCompositorThread;
No need to null check, |delete| does the right thing.
> void CompositorParent::DestroyThread()
> {
>+ --sCompositorThreadRefCount;
>+ if(sCompositorThreadRefCount == 0) {
Nit: |if (0 == --sRefCount)|
> CompositorParent::~CompositorParent()
> {
>+ --sCompositorThreadRefCount;
>+ if (sCompositorThreadRefCount == 0) {
>+ DeleteCompositorThread();
>+ }
Don't duplicate this code. Refactor it into a helper function.
Attachment #658588 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 50•12 years ago
|
||
Attachment #658588 -
Attachment is obsolete: true
Attachment #660164 -
Flags: review?(jones.chris.g)
Comment on attachment 660164 [details] [diff] [review]
Fixes fennec exit crash by reference counting the compositor thread and making sure a CompositorParent is not deleted while running code on the compositor thread.
No need to re-request review for small changes like these.
Attachment #660164 -
Flags: review?(jones.chris.g) → review+
Assignee | ||
Comment 52•12 years ago
|
||
Comment 53•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Comment 54•12 years ago
|
||
Please nominate for Aurora/Beta asap (when you're comfortable with the bake time on Nightly).
Updated•12 years ago
|
status-firefox18:
affected → ---
Comment 55•12 years ago
|
||
I cannot reproduce any @mozalloc_abort crashes on the latest Nightly build.
(In reply to Alex Keybl [:akeybl] from comment #54)
> Please nominate for Aurora/Beta asap (when you're comfortable with the bake
> time on Nightly).
This crashes are annoying because occur almost for each Firefox quit. Due to the high number of crashes both on m-a and m-b, I nominate this to Aurora and Beta. We cannot deliver an app which crashes much more frequently than it does any other released Betas so far.
status-firefox17:
--- → affected
status-firefox18:
--- → fixed
Updated•12 years ago
|
Updated•12 years ago
|
Attachment #660164 -
Flags: approval-mozilla-beta?
Attachment #660164 -
Flags: approval-mozilla-aurora?
Comment 56•12 years ago
|
||
(In reply to Cristian Nicolae (:xti) from comment #55)
> Due to the high number of crashes both on m-a and m-b, I nominate this to Aurora
> and Beta.
That is not exactly what Alex asked. It's instead: does this patch fix the issue AND not add regressions that are worse than the benefit?
You answered yes to the first part, what about the second part?
Status: RESOLVED → VERIFIED
Comment 57•12 years ago
|
||
(In reply to Scoobidiver from comment #56)
what about the second part?
So far the app is working fine. I tried different scenarios to reproduce this bug, but no crash occurred. Atm there isn't any noticeable regression, so I guess that this patch is great benefit.
Assignee | ||
Comment 58•12 years ago
|
||
I haven't had any problem since this patch so I feel comfortable having it landed on aurora and beta. What would worry me most is having another problem in the destruction sequence that we have not seen yet (the tear-down sequence of omtc is a mess), but it would have to be fixed on top of this anyway.
Comment 59•12 years ago
|
||
Comment on attachment 660164 [details] [diff] [review]
Fixes fennec exit crash by reference counting the compositor thread and making sure a CompositorParent is not deleted while running code on the compositor thread.
[Triage Comment]
Fix for a new top crash in FF16, which we believe is low risk based upon current testing. Approving for branches.
Attachment #660164 -
Flags: approval-mozilla-beta?
Attachment #660164 -
Flags: approval-mozilla-beta+
Attachment #660164 -
Flags: approval-mozilla-aurora?
Attachment #660164 -
Flags: approval-mozilla-aurora+
Updated•12 years ago
|
Crash Signature: org.mozilla.firefox_beta-1.apk@0x3d2038]
[@ mozalloc_abort | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort] → org.mozilla.firefox_beta-1.apk@0x3d2038]
[@ mozalloc_abort | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort]
[@ mozalloc_abort | __swrite | libxul.so@0x10d56b7]
[@ mozalloc_abort | __swrite | libxul.so@0x10d56b7 | 0 (deleted)@0x11f911f]
[@ moz…
Comment 60•12 years ago
|
||
Comment on attachment 660164 [details] [diff] [review]
Fixes fennec exit crash by reference counting the compositor thread and making sure a CompositorParent is not deleted while running code on the compositor thread.
Review of attachment 660164 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/CompositorParent.cpp
@@ +84,5 @@
> +
> +static void DeleteCompositorThread()
> +{
> + if (NS_IsMainThread()){
> + delete sCompositorThread;
Trailing whitespace
@@ +231,5 @@
> + // after this function returns while some ipdl code still needs to run on
> + // this thread.
> + // We must keep the compositor parent alive untill the code handling message
> + // reception is finished on this thread.
> + this->AddRef(); // Corresponds to DeferredDeleteCompositorParent's Release
NS_ADDREF_THIS() is more common.
@@ +232,5 @@
> + // this thread.
> + // We must keep the compositor parent alive untill the code handling message
> + // reception is finished on this thread.
> + this->AddRef(); // Corresponds to DeferredDeleteCompositorParent's Release
> + CompositorLoop()->PostTask(FROM_HERE,
More trailing whitespace
@@ +233,5 @@
> + // We must keep the compositor parent alive untill the code handling message
> + // reception is finished on this thread.
> + this->AddRef(); // Corresponds to DeferredDeleteCompositorParent's Release
> + CompositorLoop()->PostTask(FROM_HERE,
> + NewRunnableFunction(&DeferredDeleteCompositorParent,
Weird indentation
Assignee | ||
Comment 61•12 years ago
|
||
Assignee | ||
Comment 62•12 years ago
|
||
Updated•12 years ago
|
Comment 63•12 years ago
|
||
This crash doesn't occur anymore on the latest Aurora and Beta builds.
--
Device: Galaxy Note
OS: Android 4.0.4
You need to log in
before you can comment on or make changes to this bug.
Description
•