Closed
Bug 1020227
Opened 10 years ago
Closed 10 years ago
Deadlock in opensl_stream_destroy
Categories
(Core :: Audio/Video, defect)
Tracking
()
RESOLVED
FIXED
mozilla33
People
(Reporter: jwwang, Assigned: snorp)
References
Details
Attachments
(1 file, 4 obsolete files)
(deleted),
patch
|
gcp
:
review+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=41017124&tree=Try&full=1#error0
thread 43 (state machine thread): MediaDecoderStateMachine waiting for audio loop thread to finish.
thread 46 (audio loop thread): opensl_stream_destroy stuck in |(*stm->playerObj)->Destroy(stm->playerObj)| (https://hg.mozilla.org/try/file/0b3d7df6ae30/media/libcubeb/src/cubeb_opensl.c#l570)
thread 47 (opensl thread): bufferqueue_callback stuck in |(*stm->bufq)->Enqueue(stm->bufq, buf, written * stm->framesize)| (https://hg.mozilla.org/try/file/0b3d7df6ae30/media/libcubeb/src/cubeb_opensl.c#l105)
It looks like a deadlock for |libc.so + 0xc738| is the address PR_WaitCondVar will call into. (see thread 15 and 16).
When this deadlock happens, the state machine thread will wait indefinitely for the audio loop thread to finish. Since all MediaDecoderStateMachines share the same state machine thread, all subsequent tests will get stuck too. This might account for bug 904231 where 4 test timeouts happened in a row.
Btw, I tried to wait for the buffer queue to finish before destroying the player object as recommended in the sample code of OpenSL_ES_Specification_1.0.1.pdf, timeouts still happened only with much lower chance (one in 200 runs).
(see: https://tbpl.mozilla.org/?tree=Try&rev=de1e6abaf2ab)
Reporter | ||
Comment 1•10 years ago
|
||
Hi Star,
Is it possible to enable debug symbols to know where it actually gets stuck in libmedia.so?
Flags: needinfo?(scheng)
Comment 2•10 years ago
|
||
(In reply to JW Wang [:jwwang] from comment #1)
> Hi Star,
>
> Is it possible to enable debug symbols to know where it actually gets stuck
> in libmedia.so?
I am not sure because I have never done that. Maybe you can try to use "SL_LOGD" [1] to dump log in the Enqueue() function [2].
[1] for example : SL_LOGD("Leaving %.*s::%s", (int) (underscore - function), function, &underscore[1]);
[2] http://androidxref.com/4.3_r2.1/xref/frameworks/wilhelm/src/itf/IBufferQueue.c#46
Flags: needinfo?(scheng)
Reporter | ||
Comment 3•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=23d896405f59
One failure in 300 runs.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41096832&tree=Try&full=1
However, we still have the deadlock.
Thread 55 was still doing something about libOpenSLES while we were destroying the player in thread Thread 53. I wonder if "SL_ENGINEOPTION_THREADSAFE" really takes effect.
Hi Star,
Can you help check what is the correct way to destroy a player object?
Flags: needinfo?(scheng)
Reporter | ||
Comment 4•10 years ago
|
||
Hi Matthew,
Per described in comment 0, do you have any idea about the deadlock?
Flags: needinfo?(kinetik)
Comment 5•10 years ago
|
||
I'm not super familiar with OpenSL, sorry.
For the patch you've pushed to try, we don't want to be forced to play whatever audio has been queued before destroying a stream. Can we use Clear() on the BufferQueue instead?
Flags: needinfo?(kinetik)
Reporter | ||
Comment 6•10 years ago
|
||
(In reply to Matthew Gregan [:kinetik] from comment #5)
> For the patch you've pushed to try, we don't want to be forced to play
> whatever audio has been queued before destroying a stream. Can we use
> Clear() on the BufferQueue instead?
Good idea. I was worrying the client might try to read stream position while shutting down and clearing the buffer queue will reset the stream position. However MediaDecoderStateMachine will hold the lock before calling AudioStream::Shutdown, I think it is fine to clear the buffer queue.
Reporter | ||
Comment 7•10 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=41504711&tree=Try&full=1#error0
still no luck with clearing buffer queue...
Reporter | ||
Comment 8•10 years ago
|
||
https://groups.google.com/forum/#!searchin/android-ndk/opensl/android-ndk/G7dLKAGGL28/AndXjfKPjc4J
There are also other victims...
Assignee | ||
Comment 9•10 years ago
|
||
What are the thoughts around disabling the OpenSL backend on 2.3 and falling back to audiotrack
Assignee | ||
Comment 10•10 years ago
|
||
Flags: needinfo?(jwwang)
Assignee | ||
Comment 11•10 years ago
|
||
Try run here: https://tbpl.mozilla.org/?tree=Try&rev=16232f864c12
Attachment #8445443 -
Flags: feedback?(jwwang)
Comment 12•10 years ago
|
||
Good idea. I'm fine with doing this, FWIW.
Reporter | ||
Comment 13•10 years ago
|
||
https://groups.google.com/forum/#!searchin/android-ndk/opensl/android-ndk/G7dLKAGGL28/AndXjfKPjc4J
Someone hit this issue on KitKat.
I am not sure if we are just lucky not having this problem on Fennec for Android 4.0+. I am trying to sleep for 50ms before destroying the player object. If that does work, we might be able to get around this problem by destroying the player object in another thread where we can sleep for a while without blocking opensl_stream_destroy(). Any thoughts?
Flags: needinfo?(jwwang) → needinfo?(kinetik)
Comment 14•10 years ago
|
||
Comment on attachment 8445443 [details] [diff] [review]
Disable OpenSL on Android 2.3 and lower
Review of attachment 8445443 [details] [diff] [review]:
-----------------------------------------------------------------
::: media/libcubeb/src/cubeb_opensl.c
@@ +28,5 @@
> +{
> + static int (*property_get)(const char*, char*, const char*) = NULL;
> +
> + if (!property_get) {
> + void* library = dlopen("libcutils.so", RTLD_LAZY);
Where's the dlclose?
@@ +172,5 @@
>
> +static int
> +get_android_version(void)
> +{
> + char version_string[4] = { 0, 0, 0, 0 };
Maybe use PROPERTY_VALUE_MAX just in case something silly happens to these numbers in the future.
@@ +177,5 @@
> +
> + int len = android_property_get("ro.build.version.sdk", version_string, 0);
> + if (len == 0) {
> + LOG("Failed to get Android version!\n");
> + return 0;
Is zero the right value here? It'll be treated as a valid version, which isn't logical.
Attachment #8445443 -
Flags: review-
Assignee | ||
Comment 15•10 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #14)
> Comment on attachment 8445443 [details] [diff] [review]
> Disable OpenSL on Android 2.3 and lower
>
> Review of attachment 8445443 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: media/libcubeb/src/cubeb_opensl.c
> @@ +28,5 @@
> > +{
> > + static int (*property_get)(const char*, char*, const char*) = NULL;
> > +
> > + if (!property_get) {
> > + void* library = dlopen("libcutils.so", RTLD_LAZY);
>
> Where's the dlclose?
I think we can "leak" it here. We're only doing this once, ever, and really don't have a good place to close it. I guess we could open/close it in each cubeb_opensl instance, but that seems unnecessary to me.
>
> @@ +172,5 @@
> >
> > +static int
> > +get_android_version(void)
> > +{
> > + char version_string[4] = { 0, 0, 0, 0 };
>
> Maybe use PROPERTY_VALUE_MAX just in case something silly happens to these
> numbers in the future.
Yeah, definitely.
>
> @@ +177,5 @@
> > +
> > + int len = android_property_get("ro.build.version.sdk", version_string, 0);
> > + if (len == 0) {
> > + LOG("Failed to get Android version!\n");
> > + return 0;
>
> Is zero the right value here? It'll be treated as a valid version, which
> isn't logical.
It's lazy, I guess, but I think it's fine to make the caller do some validation on the number. In this case it must be > 0 to be valid. I did notice that the caller to this checks for >= 0, which is of course wrong. Fixed that.
Assignee | ||
Comment 16•10 years ago
|
||
Attachment #8445443 -
Attachment is obsolete: true
Attachment #8445443 -
Flags: feedback?(jwwang)
Attachment #8445957 -
Flags: review?(gpascutto)
Comment 17•10 years ago
|
||
Comment on attachment 8445957 [details] [diff] [review]
Disable OpenSL on Android 2.3 and lower
Review of attachment 8445957 [details] [diff] [review]:
-----------------------------------------------------------------
I'd dlclose in the get_properties function. By your own logic it's only called once. Then again, we only lose a tiny bit of address space due to the superfluous dlopen, I guess.
::: media/libcubeb/src/cubeb_opensl.c
@@ +22,5 @@
> +#define ANDROID_VERSION_GINGERBREAD 10
> +
> +#ifdef MOZ_WIDGET_ANDROID
> +
> +// Obtained from Android's properties.h
y u no include sys/system_properties.h from the NDK which has PROP_VALUE_MAX as properties.h explains
@@ +177,5 @@
> +get_android_version(void)
> +{
> + char version_string[PROPERTY_VALUE_MAX];
> +
> + bzero(version_string, PROPERTY_VALUE_MAX);
nit: bzero is deprecated and on Android just calls memset
Attachment #8445957 -
Flags: review?(gpascutto) → review+
Assignee | ||
Comment 18•10 years ago
|
||
I suck.
Attachment #8445957 -
Attachment is obsolete: true
Attachment #8445977 -
Flags: review?(gpascutto)
Assignee | ||
Comment 19•10 years ago
|
||
I didn't even know system_properties.h was in the NDK! We can remove all the dlopen junk and (hopefully) use the same code on b2g. Not sure if b2g has that header, though.
Try run here: https://tbpl.mozilla.org/?tree=Try&rev=0c0bbba455ea
Attachment #8445977 -
Attachment is obsolete: true
Attachment #8445977 -
Flags: review?(gpascutto)
Attachment #8445991 -
Flags: review?(gpascutto)
Comment 20•10 years ago
|
||
(In reply to JW Wang [:jwwang] from comment #13)
> https://groups.google.com/forum/#!searchin/android-ndk/opensl/android-ndk/
> G7dLKAGGL28/AndXjfKPjc4J
> Someone hit this issue on KitKat.
>
> I am not sure if we are just lucky not having this problem on Fennec for
> Android 4.0+. I am trying to sleep for 50ms before destroying the player
> object. If that does work, we might be able to get around this problem by
> destroying the player object in another thread where we can sleep for a
> while without blocking opensl_stream_destroy(). Any thoughts?
I'd prefer not to go down this route if we can avoid it, but I don't have any better suggestions. If we're not hitting this on 4.x at the moment, let's take snorp's patch and see how we go?
Flags: needinfo?(kinetik)
Comment 21•10 years ago
|
||
Comment on attachment 8445991 [details] [diff] [review]
Disable OpenSL on Android 2.3 and lower
Review of attachment 8445991 [details] [diff] [review]:
-----------------------------------------------------------------
::: media/libcubeb/src/cubeb_opensl.c
@@ +8,5 @@
> #include <assert.h>
> #include <dlfcn.h>
> #include <stdlib.h>
> #include <SLES/OpenSLES.h>
> +#include <sys/system_properties.h>
Why is this *outside* the defined(__ANDROID__)?
@@ +19,5 @@
> #include "cubeb/cubeb.h"
> #include "cubeb-internal.h"
> #include "cubeb_resampler.h"
>
> +#define ANDROID_VERSION_GINGERBREAD 10
Gingerbread is API Level 9. Level 10 is GINGERBREAD_MR1. Rename it to ANDROID_VERSION_GINGERBREAD_MR1.
Attachment #8445991 -
Flags: review?(gpascutto) → review+
Assignee | ||
Comment 22•10 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #21)
> Comment on attachment 8445991 [details] [diff] [review]
> Disable OpenSL on Android 2.3 and lower
>
> Review of attachment 8445991 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: media/libcubeb/src/cubeb_opensl.c
> @@ +8,5 @@
> > #include <assert.h>
> > #include <dlfcn.h>
> > #include <stdlib.h>
> > #include <SLES/OpenSLES.h>
> > +#include <sys/system_properties.h>
>
> Why is this *outside* the defined(__ANDROID__)?
Oops. For some reason I was thinking that really meant gonk, but that's not true, AFAICT.
>
> @@ +19,5 @@
> > #include "cubeb/cubeb.h"
> > #include "cubeb-internal.h"
> > #include "cubeb_resampler.h"
> >
> > +#define ANDROID_VERSION_GINGERBREAD 10
>
> Gingerbread is API Level 9. Level 10 is GINGERBREAD_MR1. Rename it to
> ANDROID_VERSION_GINGERBREAD_MR1.
Fair enough. All of this android stuff should also be behind an ifdef, I suppose.
Assignee | ||
Comment 23•10 years ago
|
||
Attachment #8445991 -
Attachment is obsolete: true
Attachment #8448752 -
Flags: review?(gpascutto)
Updated•10 years ago
|
Attachment #8448752 -
Flags: review?(gpascutto) → review+
Assignee | ||
Comment 24•10 years ago
|
||
Assignee: nobody → snorp
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Updated•10 years ago
|
Updated•10 years ago
|
Flags: needinfo?(scheng)
You need to log in
before you can comment on or make changes to this bug.
Description
•