Closed
Bug 1217373
Opened 9 years ago
Closed 9 years ago
[Presentation WebAPI] Avoid B2G crash due to potential excessive releases in PresentationSessionTransport
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:2.5+, firefox45 fixed, b2g-v2.5 fixed)
People
(Reporter: selin, Assigned: selin)
References
Details
(Whiteboard: [ft:conndevices][partner-blocker])
Attachments
(2 files, 1 obsolete file)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
jdm
:
review+
|
Details | Diff | Splinter Review |
When it comes to establish a new presentation connection to a receiving device (Nexus) while an existing one is still connected, somehow the B2G process on the receiving device crashes due to the following cause:
1. |PresentationSessionTransport| has some attributes [1] which may hold references to |PresentationSessionTransport| as listeners and thus result in reference cycles.
2. When one of these objects (for example, A) is about to release, it calls the destructor of the |PresentationSessionTransport| object. Then the |PresentationSessionTransport| object follows the reference cycle and calls A's destructor again. That eventually results in an attempt to destroy a lock which has already been destroyed. [2]
[1] https://dxr.mozilla.org/mozilla-central/source/dom/presentation/PresentationSessionTransport.h#86-97
[2] https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/Mutex.h#60
Assignee | ||
Updated•9 years ago
|
Blocks: 2-UA_Presentation_API
Whiteboard: [ft:conndevices]
Assignee | ||
Comment 1•9 years ago
|
||
Update |PresentationSessionTransport|'s destructor to break potential reference cycles.
Comment 2•9 years ago
|
||
Is PresentationSessionTransport's destructor being called manually? I don't understand how this can happen otherwise, if it's participating in a reference cycle.
Assignee | ||
Comment 3•9 years ago
|
||
(In reply to Josh Matthews [:jdm] from comment #2)
> Is PresentationSessionTransport's destructor being called manually? I don't
> understand how this can happen otherwise, if it's participating in a
> reference cycle.
|NS_IMPL_RELEASE| macro appears to trigger the destructor [1].
For example, |PresentationSessionTransport| object has a member |mInputStreamPump| [2], which also hold the |PresentationSessionTransport| object as its listener. And when |nsInputStreamPump::OnStateStop| [3] is called (according to the callstack [4]), somehow these objects, possibly also involved with some other members in [2], probably indirectly reference to the same lock. So b2g crashes when the lock is about to be destroyed at the second time (according to the attached backtrace.txt) in a series of nested object release calls.
So the patch is to avoid the nested destroying calls by using local smart pointers to temporarily hold these objects and then unset the member variables in |PresentationSessionTransport|'s destructor.
[1] https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsISupportsImpl.h#642
[2] https://dxr.mozilla.org/mozilla-central/source/dom/presentation/PresentationSessionTransport.h#86-97
[3] https://dxr.mozilla.org/mozilla-central/source/netwerk/base/nsInputStreamPump.cpp#672
[4] The callstack:
#0 nsAsyncStreamCopier::~nsAsyncStreamCopier (this=this@entry=0xa51f1f10, __in_chrg=<optimized out>) at /home/rexboy/B2G-nexus/gecko/netwerk/base/nsAsyncStreamCopier.cpp:78
#1 0xb27d2dc2 in nsAsyncStreamCopier::Release (this=0xa51f1f10) at /home/rexboy/B2G-nexus/gecko/netwerk/base/nsAsyncStreamCopier.cpp:140
#2 0xb2724212 in nsCOMPtr<nsIStackFrame>::~nsCOMPtr (this=0xa51f4500, __in_chrg=<optimized out>) at ../../../../dist/include/nsCOMPtr.h:405
#3 0xb390c184 in mozilla::dom::PresentationSessionTransport::~PresentationSessionTransport (this=0xa51f44c0, __in_chrg=<optimized out>)
at /home/rexboy/B2G-nexus/gecko/dom/presentation/PresentationSessionTransport.cpp:79
#4 0xb390c24a in mozilla::dom::PresentationSessionTransport::Release (this=0xa51f44c0) at /home/rexboy/B2G-nexus/gecko/dom/presentation/PresentationSessionTransport.cpp:64
#5 0xb2724284 in nsCOMPtr<nsIStackFrame>::assign_assuming_AddRef (this=0xa97bc41c, aNewPtr=<optimized out>) at ../../../../dist/include/nsCOMPtr.h:380
#6 0xb27e6e0c in assign_with_AddRef (aRawPtr=0x0, this=0xa97bc41c) at ../../dist/include/nsCOMPtr.h:1066
#7 operator= (aRhs=0x0, this=0xa97bc41c) at ../../dist/include/nsCOMPtr.h:578
#8 nsInputStreamPump::OnStateStop (this=this@entry=0xa97bc400) at /home/rexboy/B2G-nexus/gecko/netwerk/base/nsInputStreamPump.cpp:720
#9 0xb27ea356 in nsInputStreamPump::OnInputStreamReady (this=0xa97bc400, stream=<optimized out>) at /home/rexboy/B2G-nexus/gecko/netwerk/base/nsInputStreamPump.cpp:436
#10 0xb276d8c8 in nsInputStreamReadyEvent::Run (this=0xa4bff920) at /home/rexboy/B2G-nexus/gecko/xpcom/io/nsStreamUtils.cpp:91
#11 0xb277e870 in nsThread::ProcessNextEvent (this=0xb606e420, aMayWait=<optimized out>, aResult=0xbee365e7) at /home/rexboy/B2G-nexus/gecko/xpcom/threads/nsThread.cpp:972
#12 0xb279ad46 in NS_ProcessNextEvent (aThread=0xb606e420, aMayWait=aMayWait@entry=false) at /home/rexboy/B2G-nexus/gecko/xpcom/glue/nsThreadUtils.cpp:297
#13 0xb297ebb8 in mozilla::ipc::MessagePump::Run (this=0xb6038650, aDelegate=0xb6071280) at /home/rexboy/B2G-nexus/gecko/ipc/glue/MessagePump.cpp:95
#14 0xb2963dd8 in MessageLoop::RunInternal (this=this@entry=0xb6071280) at /home/rexboy/B2G-nexus/gecko/ipc/chromium/src/base/message_loop.cc:234
#15 0xb2963df2 in RunHandler (this=0xb6071280) at /home/rexboy/B2G-nexus/gecko/ipc/chromium/src/base/message_loop.cc:227
#16 MessageLoop::Run (this=0xb6071280) at /home/rexboy/B2G-nexus/gecko/ipc/chromium/src/base/message_loop.cc:201
#17 0xb392b66a in nsBaseAppShell::Run (this=0xae85f9a0) at /home/rexboy/B2G-nexus/gecko/widget/nsBaseAppShell.cpp:156
#18 0xb3e0674c in nsAppStartup::Run (this=0xae883100) at /home/rexboy/B2G-nexus/gecko/toolkit/components/startup/nsAppStartup.cpp:281
#19 0xb3e3b1b8 in XREMain::XRE_mainRun (this=this@entry=0xbee367a0) at ../../../gecko/toolkit/xre/nsAppRunner.cpp:4298
#20 0xb3e3b5a2 in XREMain::XRE_main (this=this@entry=0xbee367a0, argc=argc@entry=1, argv=argv@entry=0xb6003320, aAppData=aAppData@entry=0xb6f76b88 <_ZL8sAppData>)
at ../../../gecko/toolkit/xre/nsAppRunner.cpp:4391
#21 0xb3e3b786 in XRE_main (argc=1, argv=0xb6003320, aAppData=0xb6f76b88 <_ZL8sAppData>, aFlags=<optimized out>) at ../../../gecko/toolkit/xre/nsAppRunner.cpp:4493
#22 0xb6f57f26 in do_main (argc=argc@entry=1, argv=argv@entry=0xb6003320) at ../../../gecko/b2g/app/nsBrowserApp.cpp:167
#23 0xb6f58098 in b2g_main (argc=1, argv=<optimized out>) at ../../../gecko/b2g/app/nsBrowserApp.cpp:299
#24 0xb6f57da6 in RunProcesses (aReservedFds=..., argv=0xbee37a84, argc=1) at ../../../gecko/b2g/app/B2GLoader.cpp:232
#25 main (argc=1, argv=0xbee37a84) at ../../../gecko/b2g/app/B2GLoader.cpp:297
Comment 4•9 years ago
|
||
Comment on attachment 8677406 [details] [diff] [review]
Patch, v1
This is not the right fix, and I think it only works by accident. Based on the backtrace from the crash, we're crashing when proxying the final release of PresentationSessionTransport to the main thread (due to the use of SetEventSink with a non-null target in PresentationSessionTransport::InitWithChannelDescription). As seen in the callstack in the previous comment, this is a problem because we've already run the destructor during that OnStreamStop call. I still don't understand why this is occurring, however - nsTransportEventSinkProxy (which is created in SetEventSink) correctly holds a reference to the sink (PresentationSessionTransport, in this case), so there's no reason that PresentationSessionTransport's destructor should be called before the nsAsyncStreamCopier's destructor.
Are there any non-smart pointer reference counting operations being performed on PresentationSessionTransport (ie. ptr->AddRef()/ptr->Release() or NS_ADDREF(ptr)/NS_RELEASE(ptr))? The parts I've looked and you've explained don't add to me. There are definitely memory cycles, but those should be causing leaks instead of crashes. The correct way to fix the cycles is to make PresentationSessionTransport cycle-collected, instead of trying to manually break cycles in the destructor (see http://mxr.mozilla.org/mozilla-central/source/dom/network/TCPSocket.cpp?force=1#100 for how TCPSocket does it).
Attachment #8677406 -
Flags: review?(josh) → review-
Updated•9 years ago
|
Whiteboard: [ft:conndevices] → [ft:conndevices][partner-blocker]
Comment 5•9 years ago
|
||
Here is the STR for this issue:
0. Fennec with presentation API video sharing patch and Firefox OS debug build + TV Gaia
1. initiate video sharing from Fennec to Firefox OS
2. reload web page on Fennec
3. initiate video sharing again
Assignee | ||
Comment 6•9 years ago
|
||
Make PresentationSessionTransport cycle collectable.
Attachment #8677406 -
Attachment is obsolete: true
Attachment #8680515 -
Flags: review?(josh)
Comment 7•9 years ago
|
||
Comment on attachment 8680515 [details] [diff] [review]
Patch, v2
Review of attachment 8680515 [details] [diff] [review]:
-----------------------------------------------------------------
Does this fix the crash?
Attachment #8680515 -
Flags: review?(josh) → review+
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Josh Matthews [:jdm] from comment #7)
> Comment on attachment 8680515 [details] [diff] [review]
> Patch, v2
>
> Review of attachment 8680515 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Does this fix the crash?
Yeah. We've verified that the crash no longer happens on the real device where the issue was observed.
Assignee | ||
Comment 9•9 years ago
|
||
The latest try-run. FYI.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=759123a196d1
Keywords: checkin-needed
Comment 10•9 years ago
|
||
Keywords: checkin-needed
Comment 11•9 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox45:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S11 (13Nov)
Updated•9 years ago
|
blocking-b2g: --- → 2.5+
Comment 12•9 years ago
|
||
bugherder uplift |
status-b2g-v2.5:
--- → fixed
Comment 13•9 years ago
|
||
removing the b2g 2.5 flag since this commit has been reverted due to an incorrect merge, sorry for the confusion
status-b2g-v2.5:
fixed → ---
Comment 14•9 years ago
|
||
bugherder uplift |
status-b2g-v2.5:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•