Closed Bug 860075 Opened 12 years ago Closed 12 years ago

[Inari] crash in mozilla::dom::bluetooth::BluetoothService::DistributeSignal

Categories

(Firefox OS Graveyard :: Bluetooth, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:tef+, firefox21 wontfix, firefox22 wontfix, firefox23 fixed, b2g18+ verified, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 verified)

RESOLVED FIXED
blocking-b2g tef+
Tracking Status
firefox21 --- wontfix
firefox22 --- wontfix
firefox23 --- fixed
b2g18 + verified
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- verified

People

(Reporter: marcia, Assigned: jhlin)

References

Details

(Keywords: crash, regression, Whiteboard: [b2g-crash][CR 476080][status: needs ETA][fixed-in-birch] QARegressExclude )

Crash Data

Attachments

(3 files, 2 obsolete files)

This bug was filed from the Socorro interface and is report bp-66798839-f9eb-45f8-a361-3c77f2130409 . ============================================================= Seen while running on Inari, using: Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/cfef36c0c8bc Gaia a80be95f553de517e5d8a159e04511cda5e38be4 BuildID 20130409070205 Version 18.0 Not sure of the STR, but I was pairing and unpairing devices at the time of the crash. Will try to reproduce.
Component: General → Bluetooth
Whiteboard: [b2g-crash]
Assignee: nobody → echou
Crash Signature: [@ mozilla::dom::bluetooth::BluetoothService::DistributeSignal] → [@ mozilla::dom::bluetooth::BluetoothService::DistributeSignal] [@ DistributeBluetoothSignalTask::Run]
(In reply to Marcia Knous [:marcia] from comment #0) > This bug was filed from the Socorro interface and is > report bp-66798839-f9eb-45f8-a361-3c77f2130409 . > ============================================================= > > Seen while running on Inari, using: > > Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/cfef36c0c8bc > Gaia a80be95f553de517e5d8a159e04511cda5e38be4 > BuildID 20130409070205 > Version 18.0 > > Not sure of the STR, but I was pairing and unpairing devices at the time of > the crash. Will try to reproduce. Problem confirmed, I saw this as well. I'll try to reproduce and fix it.
Not sure if this should be tef? or leo?, mark tef? first.
blocking-b2g: --- → tef?
Whiteboard: [b2g-crash] → [b2g-crash][CR 476080]
backtrace: #0 0x40fed2bc in mozilla::ObserverList<mozilla::dom::bluetooth::BluetoothSignal>::Broadcast ( this=<value optimized out>, aSignal=...) at ../../dist/include/mozilla/Observer.h:69 #1 mozilla::dom::bluetooth::BluetoothService::DistributeSignal (this=<value optimized out>, aSignal=...) at /home/eric30/Mozilla/mercurial/mozilla-b2g18/dom/bluetooth/BluetoothService.cpp:435 #2 0x40ff7c9a in DistributeBluetoothSignalTask::Run (this=0x48c891c0) at /home/eric30/Mozilla/mercurial/mozilla-b2g18/dom/bluetooth/linux/BluetoothDBusService.cpp:264 (gdb) l 64 65 void Broadcast(const T& aParam) { 66 uint32_t size = mObservers.Length(); 67 for (uint32_t i=0; i<size; ++i) { 68 mObservers[i]->Notify(aParam); 69 } 70 } 71 72 protected: (gdb) p mObservers Cannot access memory at address 0x0 (gdb) f 1 #1 mozilla::dom::bluetooth::BluetoothService::DistributeSignal (this=<value optimized out>, aSignal=...) at /home/eric30/Mozilla/mercurial/mozilla-b2g18/dom/bluetooth/BluetoothService.cpp:435 435 ol->Broadcast(aSignal); (gdb) p ol.mObservers $3 = {<nsTArray_base<nsTArrayDefaultAllocator>> = { mHdr = 0x48a78b20}, <nsTArray_SafeElementAtHelper<mozilla::Observer<mozilla::dom::bluetooth::BluetoothSignal>*, nsTArray<mozilla::Observer<mozilla::dom::bluetooth::BluetoothSignal>*, nsTArrayDefaultAllocator> >> = {<No data fields>}, <No data fields>}
I've found a series of steps which could reproduce this bug: (1) Go to Settings (2) Goto Bluetooth page very quickly (before the name of paired devices appears) (3) Turn on Bluetooth (4) Press Home key to be back to homescreen (5) Longpress Home key and kill Settings app (6) Go to Settings/Bluetooth again, crash happens. Do these steps as fast as you can.
mrbkap and I looked at this last week. Seems like mObservers[i]->Notify(aParam); can change mObservers[i]
(In reply to Eric Chou [:ericchou] [:echou] from comment #8) > I've found a series of steps which could reproduce this bug: > > (1) Go to Settings > (2) Goto Bluetooth page very quickly (before the name of paired devices > appears) > (3) Turn on Bluetooth > (4) Press Home key to be back to homescreen > (5) Longpress Home key and kill Settings app > (6) Go to Settings/Bluetooth again, crash happens. > > Do these steps as fast as you can. I recorded a video: https://www.dropbox.com/s/inutdrc518ett9a/Bug-860075.mp4
(In reply to Gregor Wagner [:gwagner] from comment #9) > mrbkap and I looked at this last week. Seems like > mObservers[i]->Notify(aParam); can change mObservers[i] Cool, lemme take a look. Thanks, Gregor.
Given the fact that this crash only happens when the user performs the STR "very quickly", we'll call this an edge case. Please nominate for uplift to v1.1 with a risk evaluation once completed.
blocking-b2g: tef? → -
tracking-b2g18: --- → +
(In reply to Alex Keybl [:akeybl] from comment #12) > Given the fact that this crash only happens when the user performs the STR > "very quickly", we'll call this an edge case. Please nominate for uplift to > v1.1 with a risk evaluation once completed. I renamed the device during receiving a file and after enabling/disabling it usually crashed so this had nothing todo with doing something quickly.
(In reply to Gregor Wagner [:gwagner] from comment #13) > (In reply to Alex Keybl [:akeybl] from comment #12) > > Given the fact that this crash only happens when the user performs the STR > > "very quickly", we'll call this an edge case. Please nominate for uplift to > > v1.1 with a risk evaluation once completed. > > I renamed the device during receiving a file and after enabling/disabling it > usually crashed so this had nothing todo with doing something quickly. Gregor is right. The STR I found is just a way relatively easy to reproduce, but it's not neccesary to do every steps quickly to make this happen. See bug 861008, bug 860732, bug 861762, bug 861856 for more information. These are dups. Re-nom as tef+.
blocking-b2g: - → tef?
The set of dupes makes clear that this recent crash regression can be triggered easily across many common usage scenarios. Blocking+.
blocking-b2g: tef? → tef+
Keywords: regression
Eric, how long for a fix for this?
Whiteboard: [b2g-crash][CR 476080] → [b2g-crash][CR 476080][status: needs ETA]
(In reply to Dietrich Ayala (:dietrich) from comment #16) > Eric, how long for a fix for this? I investigated yseterday but didn't find the root cause. Now I'm on bug 860166 because of my director CJ's request. I'll send a patch for 860166 today and back to this bug tomorrow, so feel free to assign this bug to others if necessary.
(In reply to Eric Chou [:ericchou] [:echou] from comment #10) > (In reply to Eric Chou [:ericchou] [:echou] from comment #8) > > I've found a series of steps which could reproduce this bug: > > > > (1) Go to Settings > > (2) Goto Bluetooth page very quickly (before the name of paired devices > > appears) > > (3) Turn on Bluetooth > > (4) Press Home key to be back to homescreen > > (5) Longpress Home key and kill Settings app > > (6) Go to Settings/Bluetooth again, crash happens. > > > > Do these steps as fast as you can. > > I recorded a video: https://www.dropbox.com/s/inutdrc518ett9a/Bug-860075.mp4 For this specific case, the root cause is that Settings/Bluetooth app registers for same Bluetooth signal twice but when it's killed only one record is removed, and the dangling one causes SIGSEGV later. code: http://mxr.mozilla.org/mozilla-b2g18/source/dom/bluetooth/BluetoothService.cpp 380 PLDHashOperator 381 RemoveAllSignalHandlers(const nsAString& aKey, 382 nsAutoPtr<BluetoothSignalObserverList>& aData, 383 void* aUserArg) 384 { // Same observer(BluetoothParent instance) could have been registered more than once, // but only one will be removed in next line. 385 aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg)); 386 return aData->Length() ? PL_DHASH_NEXT : PL_DHASH_REMOVE; 387 } 388 // The following function is called to remove -all- records of given observer // when a child(in this case, Settings/Bluetooth app) is killed. 389 void 390 BluetoothService::UnregisterAllSignalHandlers(BluetoothSignalObserver* aHandler) 391 { 392 MOZ_ASSERT(NS_IsMainThread()); 393 MOZ_ASSERT(aHandler); 394 395 mBluetoothSignalObserverTable.Enumerate(RemoveAllSignalHandlers, aHandler); 396 }
tracking-b2g18: + → ---
Good catch, John!
Attachment #738878 - Flags: review?(gyeh)
Attachment #738878 - Flags: review?(anygregor)
Attachment #738878 - Flags: feedback?(jolin)
(In reply to Eric Chou [:ericchou] [:echou] from comment #19) > Created attachment 738878 [details] [diff] [review] > patch 1: v1: Fix the crash caused by accessing invalid memory > > Good catch, John! Forgot to mention, I've done monkey test for about half hour this morning and no crash happened.
Hm that looks like a hack :( Could we avoid to register the same observers? Blake also had a good point: Do we notify these observers twice?
I checked the implementation and found we did notify these observers twice :( Discussed with John and Eric, try to figure out a solution.
(In reply to Gregor Wagner [:gwagner] from comment #21) > Hm that looks like a hack :( Could we avoid to register the same observers? > Blake also had a good point: Do we notify these observers twice? Well, it mighe be a kind of workaround, but it's simple and safe. :| BluetoothParent instance would be registerd more than once when a child process has more than one BluetoothAdapter(or BluetoothDevice). And, you guys are right. We should be able to avoid this by modifying BluetoothParent or BluetoothServiceChildProcess. John will take over this.
Assignee: echou → jolin
Attached patch Fix SIGSEGV crash (obsolete) (deleted) — Splinter Review
Attachment #738878 - Attachment is obsolete: true
Attachment #738878 - Flags: review?(gyeh)
Attachment #738878 - Flags: review?(anygregor)
Attachment #738878 - Flags: feedback?(jolin)
Attachment #738989 - Flags: review?(gyeh)
Attachment #738989 - Flags: review?(echou)
Attachment #738989 - Flags: review?(anygregor)
(In reply to John Lin[:jolin][:jhlin] from comment #24) > Created attachment 738989 [details] [diff] [review] > Fix SIGSEGV crash About the patch: the changes in BluetoothServiceChildProcess address the dup registration and the correctness of unregistration using (kind of) ref counting approach. Some assertions were also added in BluetoothService to make sure there was no dup. To be honest I think ObserverList should provide some way to detect this kind of problem (is there?), or prevent it from even happening.
Comment on attachment 738989 [details] [diff] [review] Fix SIGSEGV crash Review of attachment 738989 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me but I don't know the code well enough. ::: dom/bluetooth/ipc/BluetoothServiceChildProcess.cpp @@ +93,2 @@ > gBluetoothChild->SendUnregisterSignalHandler(nsString(aNodeName)); > } Why does the order matter here?
(In reply to Gregor Wagner [:gwagner] from comment #26) > Comment on attachment 738989 [details] [diff] [review] > Fix SIGSEGV crash > > Review of attachment 738989 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good to me but I don't know the code well enough. > > ::: dom/bluetooth/ipc/BluetoothServiceChildProcess.cpp > @@ +93,2 @@ > > gBluetoothChild->SendUnregisterSignalHandler(nsString(aNodeName)); > > } > > Why does the order matter here? Do you mean why moving "BluetoothService::Unregister..." above "if..."? It's in that order to make IsSignalRegistered() returns false after the last handler is removed.
Comment on attachment 738989 [details] [diff] [review] Fix SIGSEGV crash Review of attachment 738989 [details] [diff] [review]: ----------------------------------------------------------------- This should work, r+ with nits addressed. ::: dom/bluetooth/BluetoothService.cpp @@ +367,5 @@ > > BluetoothSignalObserverList* ol; > if (mBluetoothSignalObserverTable.Get(aNodeName, &ol)) { > ol->RemoveObserver(aHandler); > + // It would be better if we can do duplication check when registering, but ObserverList doesn't support that. nit: 80 chars per line, please. Here and below. ::: dom/bluetooth/ipc/BluetoothServiceChildProcess.h @@ +187,5 @@ > PrepareAdapterInternal(const nsAString& aPath) MOZ_OVERRIDE; > + > + bool > + IsSignalRegistered(const nsAString& aNodeName) { > + return !!mBluetoothSignalObserverTable.Get(aNodeName); nit: since it's an one-line function, please make it inline, or could we simply use mBluetoothSignalObserverTable.Get(aNodeName) in BluetoothService.cpp?
Attachment #738989 - Flags: review?(echou) → review+
Comment on attachment 738989 [details] [diff] [review] Fix SIGSEGV crash Review of attachment 738989 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/bluetooth/BluetoothService.cpp @@ +368,5 @@ > BluetoothSignalObserverList* ol; > if (mBluetoothSignalObserverTable.Get(aNodeName, &ol)) { > ol->RemoveObserver(aHandler); > + // It would be better if we can do duplication check when registering, but ObserverList doesn't support that. > + MOZ_ASSERT(!ol->RemoveObserver(aHandler)); The comment is a bit unclear to me. Here's my understanding for this assertion: We shouldn't have duplicate instances in the ObserverList, but there's no appropriate way to do duplication check while registering, so assertions are added here. Am I right? @@ +385,5 @@ > void* aUserArg) > { > aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg)); > + // It would be better if we can do duplication check when registering, but ObserverList doesn't support that. > + MOZ_ASSERT(!aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg))); How about making a variable for |static_cast<BluetoothSignalObserver*>|? Then we can re-use it in the assertion.
(In reply to Eric Chou [:ericchou] [:echou] from comment #28) > Comment on attachment 738989 [details] [diff] [review] > Fix SIGSEGV crash > > Review of attachment 738989 [details] [diff] [review]: > ----------------------------------------------------------------- > > This should work, r+ with nits addressed. > > ::: dom/bluetooth/BluetoothService.cpp > @@ +367,5 @@ > > > > BluetoothSignalObserverList* ol; > > if (mBluetoothSignalObserverTable.Get(aNodeName, &ol)) { > > ol->RemoveObserver(aHandler); > > + // It would be better if we can do duplication check when registering, but ObserverList doesn't support that. > > nit: 80 chars per line, please. Here and below. > Okay, I will fix it. > ::: dom/bluetooth/ipc/BluetoothServiceChildProcess.h > @@ +187,5 @@ > > PrepareAdapterInternal(const nsAString& aPath) MOZ_OVERRIDE; > > + > > + bool > > + IsSignalRegistered(const nsAString& aNodeName) { > > + return !!mBluetoothSignalObserverTable.Get(aNodeName); > > nit: since it's an one-line function, please make it inline, or could we > simply use mBluetoothSignalObserverTable.Get(aNodeName) in > BluetoothService.cpp? I thought defining a function inside the class definition makes it implicitly inline?
> > ::: dom/bluetooth/ipc/BluetoothServiceChildProcess.h > > @@ +187,5 @@ > > > PrepareAdapterInternal(const nsAString& aPath) MOZ_OVERRIDE; > > > + > > > + bool > > > + IsSignalRegistered(const nsAString& aNodeName) { > > > + return !!mBluetoothSignalObserverTable.Get(aNodeName); > > > > nit: since it's an one-line function, please make it inline, or could we > > simply use mBluetoothSignalObserverTable.Get(aNodeName) in > > BluetoothService.cpp? > > I thought defining a function inside the class definition makes it > implicitly inline? Oh, you're right. :)
Comment on attachment 738989 [details] [diff] [review] Fix SIGSEGV crash Review of attachment 738989 [details] [diff] [review]: ----------------------------------------------------------------- r+ with nits addressed.
Attachment #738989 - Flags: review?(gyeh) → review+
(In reply to Gina Yeh [:gyeh] [:ginayeh] from comment #29) > Comment on attachment 738989 [details] [diff] [review] > Fix SIGSEGV crash > > Review of attachment 738989 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/bluetooth/BluetoothService.cpp > @@ +368,5 @@ > > BluetoothSignalObserverList* ol; > > if (mBluetoothSignalObserverTable.Get(aNodeName, &ol)) { > > ol->RemoveObserver(aHandler); > > + // It would be better if we can do duplication check when registering, but ObserverList doesn't support that. > > + MOZ_ASSERT(!ol->RemoveObserver(aHandler)); > > The comment is a bit unclear to me. > > Here's my understanding for this assertion: We shouldn't have duplicate > instances in the ObserverList, but there's no appropriate way to do > duplication check while registering, so assertions are added here. > > Am I right? Yes. That's much more clear than my comments. I will update them using your words (with your permission, of course). > > @@ +385,5 @@ > > void* aUserArg) > > { > > aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg)); > > + // It would be better if we can do duplication check when registering, but ObserverList doesn't support that. > > + MOZ_ASSERT(!aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg))); > > How about making a variable for |static_cast<BluetoothSignalObserver*>|? > Then we can re-use it in the assertion. Hmm... Don't really know about the benifit here (enlighten me please?). I'll do it anyway to please you. :)
Attachment #738989 - Flags: review?(anygregor)
Attachment #738989 - Attachment is obsolete: true
Attached patch Patch for m-c. r=echou, r=gyeh (deleted) — Splinter Review
Whiteboard: [b2g-crash][CR 476080][status: needs ETA] → [b2g-crash][CR 476080][status: needs ETA][fixed-in-birch]
This is the patch for v1_0_1
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Verified fixed on Inari the ability to successfully execute the STR in comment #8 without any crashes/issues. Inari Build ID: 20130429070204 Kernel Date: Feb 21 Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/45aa5ba0ed53 Gaia: cf2d4136f0ebc66039637fdbeb72ed184dfbc0f2 Verified fixed on Leo the ability to successfully execute the STR in comment #8 without any crashes/issues. Leo Build ID: 20130426070204 Kernel Date: Mar 15 Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/6c2493de1441 Gaia: c9046a7acef33328977840176ff5574720d2c74c
Marking as QARegressExclude. Can not verify for firefox23.
Whiteboard: [b2g-crash][CR 476080][status: needs ETA][fixed-in-birch] → [b2g-crash][CR 476080][status: needs ETA][fixed-in-birch] QARegressExclude
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: