Closed Bug 1020227 Opened 10 years ago Closed 10 years ago

Deadlock in opensl_stream_destroy

Categories

(Core :: Audio/Video, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jwwang, Assigned: snorp)

References

Details

Attachments

(1 file, 4 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=41017124&tree=Try&full=1#error0 thread 43 (state machine thread): MediaDecoderStateMachine waiting for audio loop thread to finish. thread 46 (audio loop thread): opensl_stream_destroy stuck in |(*stm->playerObj)->Destroy(stm->playerObj)| (https://hg.mozilla.org/try/file/0b3d7df6ae30/media/libcubeb/src/cubeb_opensl.c#l570) thread 47 (opensl thread): bufferqueue_callback stuck in |(*stm->bufq)->Enqueue(stm->bufq, buf, written * stm->framesize)| (https://hg.mozilla.org/try/file/0b3d7df6ae30/media/libcubeb/src/cubeb_opensl.c#l105) It looks like a deadlock for |libc.so + 0xc738| is the address PR_WaitCondVar will call into. (see thread 15 and 16). When this deadlock happens, the state machine thread will wait indefinitely for the audio loop thread to finish. Since all MediaDecoderStateMachines share the same state machine thread, all subsequent tests will get stuck too. This might account for bug 904231 where 4 test timeouts happened in a row. Btw, I tried to wait for the buffer queue to finish before destroying the player object as recommended in the sample code of OpenSL_ES_Specification_1.0.1.pdf, timeouts still happened only with much lower chance (one in 200 runs). (see: https://tbpl.mozilla.org/?tree=Try&rev=de1e6abaf2ab)
Hi Star, Is it possible to enable debug symbols to know where it actually gets stuck in libmedia.so?
Flags: needinfo?(scheng)
Blocks: 904231
(In reply to JW Wang [:jwwang] from comment #1) > Hi Star, > > Is it possible to enable debug symbols to know where it actually gets stuck > in libmedia.so? I am not sure because I have never done that. Maybe you can try to use "SL_LOGD" [1] to dump log in the Enqueue() function [2]. [1] for example : SL_LOGD("Leaving %.*s::%s", (int) (underscore - function), function, &underscore[1]); [2] http://androidxref.com/4.3_r2.1/xref/frameworks/wilhelm/src/itf/IBufferQueue.c#46
Flags: needinfo?(scheng)
https://tbpl.mozilla.org/?tree=Try&rev=23d896405f59 One failure in 300 runs. https://tbpl.mozilla.org/php/getParsedLog.php?id=41096832&tree=Try&full=1 However, we still have the deadlock. Thread 55 was still doing something about libOpenSLES while we were destroying the player in thread Thread 53. I wonder if "SL_ENGINEOPTION_THREADSAFE" really takes effect. Hi Star, Can you help check what is the correct way to destroy a player object?
Flags: needinfo?(scheng)
Hi Matthew, Per described in comment 0, do you have any idea about the deadlock?
Flags: needinfo?(kinetik)
I'm not super familiar with OpenSL, sorry. For the patch you've pushed to try, we don't want to be forced to play whatever audio has been queued before destroying a stream. Can we use Clear() on the BufferQueue instead?
Flags: needinfo?(kinetik)
(In reply to Matthew Gregan [:kinetik] from comment #5) > For the patch you've pushed to try, we don't want to be forced to play > whatever audio has been queued before destroying a stream. Can we use > Clear() on the BufferQueue instead? Good idea. I was worrying the client might try to read stream position while shutting down and clearing the buffer queue will reset the stream position. However MediaDecoderStateMachine will hold the lock before calling AudioStream::Shutdown, I think it is fine to clear the buffer queue.
What are the thoughts around disabling the OpenSL backend on 2.3 and falling back to audiotrack
Flags: needinfo?(jwwang)
Good idea. I'm fine with doing this, FWIW.
https://groups.google.com/forum/#!searchin/android-ndk/opensl/android-ndk/G7dLKAGGL28/AndXjfKPjc4J Someone hit this issue on KitKat. I am not sure if we are just lucky not having this problem on Fennec for Android 4.0+. I am trying to sleep for 50ms before destroying the player object. If that does work, we might be able to get around this problem by destroying the player object in another thread where we can sleep for a while without blocking opensl_stream_destroy(). Any thoughts?
Flags: needinfo?(jwwang) → needinfo?(kinetik)
Comment on attachment 8445443 [details] [diff] [review] Disable OpenSL on Android 2.3 and lower Review of attachment 8445443 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/libcubeb/src/cubeb_opensl.c @@ +28,5 @@ > +{ > + static int (*property_get)(const char*, char*, const char*) = NULL; > + > + if (!property_get) { > + void* library = dlopen("libcutils.so", RTLD_LAZY); Where's the dlclose? @@ +172,5 @@ > > +static int > +get_android_version(void) > +{ > + char version_string[4] = { 0, 0, 0, 0 }; Maybe use PROPERTY_VALUE_MAX just in case something silly happens to these numbers in the future. @@ +177,5 @@ > + > + int len = android_property_get("ro.build.version.sdk", version_string, 0); > + if (len == 0) { > + LOG("Failed to get Android version!\n"); > + return 0; Is zero the right value here? It'll be treated as a valid version, which isn't logical.
Attachment #8445443 - Flags: review-
(In reply to Gian-Carlo Pascutto [:gcp] from comment #14) > Comment on attachment 8445443 [details] [diff] [review] > Disable OpenSL on Android 2.3 and lower > > Review of attachment 8445443 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: media/libcubeb/src/cubeb_opensl.c > @@ +28,5 @@ > > +{ > > + static int (*property_get)(const char*, char*, const char*) = NULL; > > + > > + if (!property_get) { > > + void* library = dlopen("libcutils.so", RTLD_LAZY); > > Where's the dlclose? I think we can "leak" it here. We're only doing this once, ever, and really don't have a good place to close it. I guess we could open/close it in each cubeb_opensl instance, but that seems unnecessary to me. > > @@ +172,5 @@ > > > > +static int > > +get_android_version(void) > > +{ > > + char version_string[4] = { 0, 0, 0, 0 }; > > Maybe use PROPERTY_VALUE_MAX just in case something silly happens to these > numbers in the future. Yeah, definitely. > > @@ +177,5 @@ > > + > > + int len = android_property_get("ro.build.version.sdk", version_string, 0); > > + if (len == 0) { > > + LOG("Failed to get Android version!\n"); > > + return 0; > > Is zero the right value here? It'll be treated as a valid version, which > isn't logical. It's lazy, I guess, but I think it's fine to make the caller do some validation on the number. In this case it must be > 0 to be valid. I did notice that the caller to this checks for >= 0, which is of course wrong. Fixed that.
Attached patch Disable OpenSL on Android 2.3 and lower (obsolete) (deleted) — Splinter Review
Attachment #8445443 - Attachment is obsolete: true
Attachment #8445443 - Flags: feedback?(jwwang)
Attachment #8445957 - Flags: review?(gpascutto)
Comment on attachment 8445957 [details] [diff] [review] Disable OpenSL on Android 2.3 and lower Review of attachment 8445957 [details] [diff] [review]: ----------------------------------------------------------------- I'd dlclose in the get_properties function. By your own logic it's only called once. Then again, we only lose a tiny bit of address space due to the superfluous dlopen, I guess. ::: media/libcubeb/src/cubeb_opensl.c @@ +22,5 @@ > +#define ANDROID_VERSION_GINGERBREAD 10 > + > +#ifdef MOZ_WIDGET_ANDROID > + > +// Obtained from Android's properties.h y u no include sys/system_properties.h from the NDK which has PROP_VALUE_MAX as properties.h explains @@ +177,5 @@ > +get_android_version(void) > +{ > + char version_string[PROPERTY_VALUE_MAX]; > + > + bzero(version_string, PROPERTY_VALUE_MAX); nit: bzero is deprecated and on Android just calls memset
Attachment #8445957 - Flags: review?(gpascutto) → review+
Attached patch Disable OpenSL on Android 2.3 and lower (obsolete) (deleted) — Splinter Review
I suck.
Attachment #8445957 - Attachment is obsolete: true
Attachment #8445977 - Flags: review?(gpascutto)
Attached patch Disable OpenSL on Android 2.3 and lower (obsolete) (deleted) — Splinter Review
I didn't even know system_properties.h was in the NDK! We can remove all the dlopen junk and (hopefully) use the same code on b2g. Not sure if b2g has that header, though. Try run here: https://tbpl.mozilla.org/?tree=Try&rev=0c0bbba455ea
Attachment #8445977 - Attachment is obsolete: true
Attachment #8445977 - Flags: review?(gpascutto)
Attachment #8445991 - Flags: review?(gpascutto)
(In reply to JW Wang [:jwwang] from comment #13) > https://groups.google.com/forum/#!searchin/android-ndk/opensl/android-ndk/ > G7dLKAGGL28/AndXjfKPjc4J > Someone hit this issue on KitKat. > > I am not sure if we are just lucky not having this problem on Fennec for > Android 4.0+. I am trying to sleep for 50ms before destroying the player > object. If that does work, we might be able to get around this problem by > destroying the player object in another thread where we can sleep for a > while without blocking opensl_stream_destroy(). Any thoughts? I'd prefer not to go down this route if we can avoid it, but I don't have any better suggestions. If we're not hitting this on 4.x at the moment, let's take snorp's patch and see how we go?
Flags: needinfo?(kinetik)
Comment on attachment 8445991 [details] [diff] [review] Disable OpenSL on Android 2.3 and lower Review of attachment 8445991 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/libcubeb/src/cubeb_opensl.c @@ +8,5 @@ > #include <assert.h> > #include <dlfcn.h> > #include <stdlib.h> > #include <SLES/OpenSLES.h> > +#include <sys/system_properties.h> Why is this *outside* the defined(__ANDROID__)? @@ +19,5 @@ > #include "cubeb/cubeb.h" > #include "cubeb-internal.h" > #include "cubeb_resampler.h" > > +#define ANDROID_VERSION_GINGERBREAD 10 Gingerbread is API Level 9. Level 10 is GINGERBREAD_MR1. Rename it to ANDROID_VERSION_GINGERBREAD_MR1.
Attachment #8445991 - Flags: review?(gpascutto) → review+
(In reply to Gian-Carlo Pascutto [:gcp] from comment #21) > Comment on attachment 8445991 [details] [diff] [review] > Disable OpenSL on Android 2.3 and lower > > Review of attachment 8445991 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: media/libcubeb/src/cubeb_opensl.c > @@ +8,5 @@ > > #include <assert.h> > > #include <dlfcn.h> > > #include <stdlib.h> > > #include <SLES/OpenSLES.h> > > +#include <sys/system_properties.h> > > Why is this *outside* the defined(__ANDROID__)? Oops. For some reason I was thinking that really meant gonk, but that's not true, AFAICT. > > @@ +19,5 @@ > > #include "cubeb/cubeb.h" > > #include "cubeb-internal.h" > > #include "cubeb_resampler.h" > > > > +#define ANDROID_VERSION_GINGERBREAD 10 > > Gingerbread is API Level 9. Level 10 is GINGERBREAD_MR1. Rename it to > ANDROID_VERSION_GINGERBREAD_MR1. Fair enough. All of this android stuff should also be behind an ifdef, I suppose.
Attachment #8445991 - Attachment is obsolete: true
Attachment #8448752 - Flags: review?(gpascutto)
Attachment #8448752 - Flags: review?(gpascutto) → review+
Assignee: nobody → snorp
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Blocks: 981898
Blocks: 981889
Flags: needinfo?(scheng)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: