Closed
Bug 860483
Opened 12 years ago
Closed 12 years ago
[Layers][unagi] B2G18 leaking EGLImages attached to GL_TEXTURE_EXTERNAL textures
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:tef+)
People
(Reporter: mikeh, Assigned: mikeh)
References
Details
(Keywords: unagi, Whiteboard: [MemShrink:P1][status: needs patch, progress happening][madrid])
Attachments
(1 file, 15 obsolete files)
(deleted),
patch
|
jgilbert
:
review-
|
Details | Diff | Splinter Review |
Investigation of bug 846903 shows the following dark matter:
Unreported: ~1,521 blocks in stack trace record 1 of 538
~6,225,453 bytes (~6,225,453 requested / ~0 slop)
12.10% of the heap (12.10% cumulative); 17.81% of unreported (17.81% cumulative)
Allocated at
malloc /home/mikeh/dev/mozilla/m-c/b2g18/memory/build/replace_malloc.c:152
os_malloc_ext (0x42a92134 libgsl.so+0x4134) (no addr2line)
qeglDrvAPI_eglCreateImageKHR (0x42dd0d10 libEGL_adreno200.so+0xbd10) (no addr2line)
eglCreateImageKHR (0x42dca6f4 libEGL_adreno200.so+0x56f4) (no addr2line)
eglCreateImageKHR /home/mikeh/dev/mozilla/btg019/frameworks/base/opengl/libs/EGL/eglApi.cpp:1276
mozilla::gl::GLLibraryEGL::fCreateImage(void*, void*, unsigned int, void*, int const*) /home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLLibraryEGL.h:346
mozilla::gl::GLContextEGL::BindExternalBuffer(unsigned int, void*) /home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:439
mozilla::layers::ShadowImageLayerOGL::RenderLayer(int, nsIntPoint const&) /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ImageLayerOGL.cpp:969
ContainerRender<mozilla::layers::ShadowContainerLayerOGL> /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
ContainerRender<mozilla::layers::ShadowContainerLayerOGL> /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
ContainerRender<mozilla::layers::ShadowRefLayerOGL> /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
ContainerRender<mozilla::layers::ShadowContainerLayerOGL> /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
ContainerRender<mozilla::layers::ShadowContainerLayerOGL> /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
ContainerRender<mozilla::layers::ShadowContainerLayerOGL> /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
ContainerRender<mozilla::layers::ShadowContainerLayerOGL> /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
mozilla::layers::LayerManagerOGL::Render() /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/LayerManagerOGL.cpp:1031
mozilla::layers::LayerManagerOGL::EndTransaction(void (*)(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/LayerManagerOGL.cpp:700 mozilla::layers::LayerManagerOGL::EndEmptyTransaction(mozilla::layers::LayerManager::EndTransactionFlags) /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/LayerManagerOGL.cpp:642
~AutoResolveRefLayers /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/ipc/CompositorParent.cpp:458
RunnableMethod<mozilla::layers::ImageContainerChild, void (mozilla::layers::ImageContainerChild::*)(), Tuple0>::Run() /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/task.h:308
MessageLoop::RunTask(Task*) /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:335
MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:345
MessageLoop::DoWork() /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:442
base::MessagePumpDefault::Run(base::MessagePump::Delegate*) /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_pump_default.cc:24
Spoke to bjacob and he suggested that the mismatch in types, LOCAL_GL_TEXTURE_EXTERNAL vs LOCAL_GL_TEXTURE_2D, passed to fBindTexture() in BindExternalBuffer() and UnbindExternalBuffer(), respectively, might be the cause. The mismatch would cause the call in the latter to fail and not free the EGLImage created in BindExternalBuffer().
Comment 1•12 years ago
|
||
Right. What's happening here is UnbindExternalBuffer is supposed to undo the effect of BindExternalBuffer, but is actually a no-operation because it tries to bind to the TEXTURE_2D target a texture that had been bound to the TEXTURE_EXTERNAL target.
So Bind/UnbindExternalBuffer functions are _inherently_ broken --- it's not a bug in the caller.
To fix this, we have to find a way to actually un-bind a gralloc buffer from a TEXTURE_EXTERNAL texture. We could delete and recreate GL texture objects everytime, but that would presumably be slow. It seems more reasonable to have a singleton dummy EGLImage here and bind it in UnbindExternalBuffer as a means of detaching any EGLImage that was previously attached there.
I'm also CC'ing jgilbert here as ISTR he had a suggestion on this topic recently but I don't remember what he said or where...
Assignee | ||
Comment 2•12 years ago
|
||
Benoit, something like this?
(With this change, I see RSS for the b2g parent process fluctuate between ~91 and 95MiB; it will take more time to see if there's a steady leak.)
Assignee | ||
Updated•12 years ago
|
blocking-b2g: --- → tef?
Comment 3•12 years ago
|
||
as discussed with jgilbert on IRC--- creating a EGLImage wrapping a null buffer is maybe a little bit too wonky; instead you could create a real legitimate android::GraphicBuffer of size 1x1 and get its native buffer.
Comment 4•12 years ago
|
||
Comment on attachment 735975 [details] [diff] [review]
fix texture-type mismatch causing EGLImages to not get freed
No, this creates a new EGLImage everytime and leaves it attached to a texture, like BindExternalBuffer does, so I think this will leak in exactly the same way, right?
I had in mind a patch where the EGLImage would be a singleton (a static local var here) and would be attached to _every_ texture you call UnbindExternalBuffer on. That _should_ be OK and at least would not leak.
Updated•12 years ago
|
Attachment #735975 -
Flags: feedback?(bjacob) → feedback-
Assignee | ||
Comment 5•12 years ago
|
||
This works, and the leak is gone!
Attachment #735975 -
Attachment is obsolete: true
Attachment #735982 -
Flags: review?(bjacob)
Comment 6•12 years ago
|
||
I've had this floating around for a while, but was waiting for the refactor to land before perturbing things.
Attachment #735987 -
Flags: review?(bjacob)
Attachment #735987 -
Flags: feedback?(mhabicher)
Assignee | ||
Comment 7•12 years ago
|
||
Post the right patch this time.
I'm going to let the test run for a while to gather more data. Will r? this patch if things look good.
Attachment #735982 -
Attachment is obsolete: true
Attachment #735982 -
Flags: review?(bjacob)
Assignee | ||
Comment 8•12 years ago
|
||
jgilbert, it looks like the leak might still be there. I'll try your patch next (later tonight).
Comment 9•12 years ago
|
||
Comment on attachment 735982 [details] [diff] [review]
fix texture-type mismatch causing EGLImages to not get freed
Review of attachment 735982 [details] [diff] [review]:
-----------------------------------------------------------------
This patch seems to be two patches stuck together.
Also:
fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL, nullptr)
`nullptr` isn't a valid EGLImage, so this emits an error by spec. (INVALID_VALUE? I forget which)
sEGLLibrary.fCreateImage(EGL_DISPLAY(),
EGL_NO_CONTEXT,
EGL_NATIVE_BUFFER_ANDROID,
nullptr, attrs);
The spec for this also requires that `buffer` is a valid ANativeWindowBuffer, or generates EGL_BAD_PARAMETER.
Attachment #735982 -
Flags: feedback-
Assignee | ||
Comment 10•12 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #6)
> Created attachment 735987 [details] [diff] [review]
> patch: Bind to a lazy singleton 'null' EGLImage to Unbind.
>
> I've had this floating around for a while, but was waiting for the refactor
> to land before perturbing things.
This patch doesn't apply:
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp: In member function 'virtual bool mozilla::gl::GLContextEGL::UnbindExternalBuffer(GLuint)':
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:462: error: 'ScopedBindTexture' was not declared in this scope
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:462: error: expected ';' before 'autoTex'
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:476: error: expected primary-expression before '->' token
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:492: error: 'LOCAL_GL_TEXTURE_BINDING_EXTERNAL' was not declared in this scope
Assignee | ||
Comment 11•12 years ago
|
||
And to confirm: the leak still exists with attachment 735988 [details] [diff] [review]. jgilbert, if you can unbitrot attachment 735987 [details] [diff] [review], I can test it.
Assignee | ||
Comment 12•12 years ago
|
||
So it looks like 'ScopedBindTexture' needs to be 'TextureImage::ScopedBindTexture' and 'GLuint tempTex' needs to be 'GLenum tempTex'; with those changes, I get the following build errors:
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp: In member function 'virtual bool mozilla::gl::GLContextEGL::UnbindExternalBuffer(GLuint)':
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:466: error: no matching function for call to 'mozilla::gl::TextureImage::ScopedBindTexture::ScopedBindTexture(mozilla::gl::GLContextEGL* const, GLenum&)'
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContext.h:231: note: candidates are: mozilla::gl::TextureImage::ScopedBindTexture::ScopedBindTexture(mozilla::gl::TextureImage*, GLenum)
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContext.h:229: note: mozilla::gl::TextureImage::ScopedBindTexture::ScopedBindTexture(const mozilla::gl::TextureImage::ScopedBindTexture&)
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:480: error: expected primary-expression before '->' token
Flags: needinfo?(jgilbert)
Comment 13•12 years ago
|
||
I don't understand how this:
(In reply to Mike Habicher [:mikeh] from comment #5)
> Created attachment 735982 [details] [diff] [review]
> fix texture-type mismatch causing EGLImages to not get freed
>
> This works, and the leak is gone!
reconciles with this:
(In reply to Mike Habicher [:mikeh] from comment #11)
> And to confirm: the leak still exists with attachment 735988 [details] [diff] [review]
Both 735982 and 735988 seem to use the approach of attaching a nullptr EGLImage to replace the existing attachment, so does this not work after all?
Assuming it doesn't: let's wait for Jeff to update his patch.
Updated•12 years ago
|
Attachment #735987 -
Flags: review?(bjacob)
Attachment #735987 -
Flags: feedback?(mhabicher)
Assignee | ||
Comment 14•12 years ago
|
||
As I discussed with bjacob in IRC, it doesn't. On my particular test run, RSS initially stayed stable, but then started climbing again. DMD showed continuing leakage through fCreateImage().
Comment 15•12 years ago
|
||
We want mozilla::gl::ScopedBindTexture. I won't have access to a b2g building machine for another hour, but that should be the issue.
Flags: needinfo?(jgilbert)
Comment 16•12 years ago
|
||
Attachment #736532 -
Flags: review?(bjacob)
Comment 17•12 years ago
|
||
Requires the prereq to give us a define enum we need.
Attachment #735987 -
Attachment is obsolete: true
Attachment #736533 -
Flags: review?(bjacob)
Attachment #736533 -
Flags: feedback?(mhabicher)
Assignee | ||
Comment 19•12 years ago
|
||
Comment on attachment 735988 [details] [diff] [review]
fix texture-type mismatch causing EGLImages to not get freed, v3
Obsoleting my naïve solution.
Attachment #735988 -
Attachment is obsolete: true
Assignee | ||
Comment 20•12 years ago
|
||
Comment on attachment 736532 [details] [diff] [review]
patch prereq: Finish adding enum defines for the extension we use
This patch doesn't cleanly apply to the b2g18 tree.
Attachment #736532 -
Flags: feedback-
Assignee | ||
Comment 21•12 years ago
|
||
Comment on attachment 736533 [details] [diff] [review]
patch: bind to a lazy singleton 'null' EGLImage in order to 'unbind'.
I changed 'mozilla::gl::ScopedBindTexture autoTex(this, tempTex);' to 'TextureImage::ScopedBindTexture autoTex(this, tempTex);' but still get the following build errors:
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp: In member function 'virtual bool mozilla::gl::GLContextEGL::UnbindExternalBuffer(GLuint)':
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:462: error: no matching function for call to 'mozilla::gl::TextureImage::ScopedBindTexture::ScopedBindTexture(mozilla::gl::GLContextEGL* const, GLuint&)'
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContext.h:231: note: candidates are: mozilla::gl::TextureImage::ScopedBindTexture::ScopedBindTexture(mozilla::gl::TextureImage*, GLenum)
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContext.h:229: note: mozilla::gl::TextureImage::ScopedBindTexture::ScopedBindTexture(const mozilla::gl::TextureImage::ScopedBindTexture&)
/home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:476: error: 'GetEGLContext' was not declared in this scope
It looks like the constructor is being called with the wrong type of parameters, and a grep for 'GetEGLContext' doesn't find any matches in the b2g18 tree aside from the applied patch.
Attachment #736533 -
Flags: feedback?(mhabicher) → feedback-
Comment 22•12 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #20)
> Comment on attachment 736532 [details] [diff] [review]
> patch prereq: Finish adding enum defines for the extension we use
>
> This patch doesn't cleanly apply to the b2g18 tree.
Oooh, you want it against b2g18. Yeah, this is against central.
(In reply to Mike Habicher [:mikeh] from comment #21)
> Comment on attachment 736533 [details] [diff] [review]
> patch: bind to a lazy singleton 'null' EGLImage in order to 'unbind'.
>
> I changed 'mozilla::gl::ScopedBindTexture autoTex(this, tempTex);' to
> 'TextureImage::ScopedBindTexture autoTex(this, tempTex);' but still get the
> following build errors:
That's because TextureImage::ScopedBindTexture is completely different, and not what we want. Evidently b2g18 doesn't have gl::ScopedBindTexture, which *is* what we want.
I'll pull 18 and update the patch.
Comment 23•12 years ago
|
||
That said, bjacob should still review the patches for landing on central.
Comment 24•12 years ago
|
||
Attachment #736997 -
Flags: review?(bjacob)
Comment 25•12 years ago
|
||
Attachment #736998 -
Flags: review?(bjacob)
Attachment #736998 -
Flags: feedback?(mhabicher)
Updated•12 years ago
|
Attachment #736532 -
Flags: review?(bjacob) → review+
Comment 26•12 years ago
|
||
Comment on attachment 736533 [details] [diff] [review]
patch: bind to a lazy singleton 'null' EGLImage in order to 'unbind'.
Review of attachment 736533 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/gl/GLContextProviderEGL.cpp
@@ +489,5 @@
> +
> + mNullEGLImage = sEGLLibrary.fCreateImage(sEGLLibrary.Display(),
> + GetEGLContext(),
> + LOCAL_EGL_GL_TEXTURE_2D,
> + (EGLClientBuffer)tempTex,
How do you know that casting a GL texture object to a EGLClientBuffer works? Is that guaranteed by EGL?
@@ +495,5 @@
> + fTexSubImage2D(LOCAL_GL_TEXTURE_2D, 0,
> + 0, 0,
> + 2, 2,
> + LOCAL_GL_RGBA, LOCAL_GL_UNSIGNED_BYTE,
> + blankData);
Why do a texImage2D + texSubImage2D instead of a texImage2D?
I guess what I don't understand is why you don't write blankData to a android::GraphicBuffer, construct a EGLImage from it (we have a function for doing this here) and tie it to a GL texture with fEGLImageTargetTexture2D. In this way you don't need to assume that casting to EGLClientBuffer works, you don't need to create a new GL texture object and you don't need to texSubImage2D.
Attachment #736533 -
Flags: review?(bjacob) → review-
Updated•12 years ago
|
Attachment #736997 -
Flags: review?(bjacob) → review+
Comment 27•12 years ago
|
||
Comment on attachment 736998 [details] [diff] [review]
[b2g18] patch
Review of attachment 736998 [details] [diff] [review]:
-----------------------------------------------------------------
OK after IRC conversation with jgilbert.
Attachment #736998 -
Flags: review?(bjacob) → review+
Comment 28•12 years ago
|
||
Comment on attachment 736533 [details] [diff] [review]
patch: bind to a lazy singleton 'null' EGLImage in order to 'unbind'.
Review of attachment 736533 [details] [diff] [review]:
-----------------------------------------------------------------
OK after IRC conversation.
Attachment #736533 -
Flags: review- → review+
Assignee | ||
Comment 29•12 years ago
|
||
Comment on attachment 736998 [details] [diff] [review]
[b2g18] patch
Unfortunately this didn't fix the leak. The b2g parent RSS still rises to ~155MiB, and causes all other processes to get killed. The DMD traces still show:
- mozilla::gl::GLContextEGL::BindExternalBuffer()
- mozilla::gl::GLLibraryEGL::fCreateImage()
I'll attach the complete DMD report next.
Attachment #736998 -
Flags: feedback?(mhabicher) → feedback-
Assignee | ||
Comment 30•12 years ago
|
||
Comment 31•12 years ago
|
||
That is very worrying. Can you verify that GLContextEGL::UnbindExternalBuffer is called as many times as GLContextEGL::BindExternalBuffer is?
If it is, then that means that we don't know yet how to avoid this leak on Qualcomm drivers. We would then have to ask for help from Qualcomm.
Updated•12 years ago
|
Assignee | ||
Comment 32•12 years ago
|
||
FYI, when I apply the above [b2g18] patches, I get a crash trying to move icons on the b2g homescreen. I haven't dug into this yet, but the logcat shows:
E( 445:0x1cc) <qeglDrvAPI_eglCreateImageKHR:4047>: EGL_BAD_PARAMETER
F( 445:0x1cc) Assertion failure: mNullEGLImage, at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:496
F( 445:0x1cc) Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
Assignee | ||
Comment 33•12 years ago
|
||
bjacob, good idea--when I apply this patch, and do "adb logcat | grep mikeh" I *never* see any UNBIND lines, and the mBindUnbindCount value climbs monotonically.
So the textures are never getting unbound.
Attachment #737439 -
Flags: feedback?(bjacob)
Comment 34•12 years ago
|
||
Let's look into this in the office today.
Assignee | ||
Comment 35•12 years ago
|
||
Some more information: UnbindExternalBuffer() is only ever called from GetLockSurface(), inside a test for mGraphicBuffer != nullptr. I added a printf_stderr() to the beginning of this function, and it never appears in my log.
Moreover, GetLockSurface() is called from BeginUpdate() and DirectUpdate(). I've instrumented both functions and neither of them is ever called while the camera viewfinder is running.
Assignee | ||
Comment 36•12 years ago
|
||
bjacob, the call to BindExternalBuffer() is in ShadowImageLayerOGL::RenderLayer(), line 975. (This is another call on line 437, but I never see it getting called.)
Grepping shows there are _no_ calls to UnbindExternalBuffer() outside of GetLockBuffer().
Assignee | ||
Comment 37•12 years ago
|
||
Here's what I know:
- in GLContextProviderEGL.cpp, UnbindExternalBuffer() is _only_ called from GetLockSurface()
- GetLockSurface() is only called from two places, both in GLContextProviderEGL.cpp:
- BeginUpdate() (2 places)
- DirectUpdate()
I have instrumented both BeginUpdate() and DirectUpdate(), and neither is ever called while the camera viewfinder is active.
BeginUpdate() is called from the following places in GLContext.cpp:
- BasicTextureImage::BeginUpdate()
- TiledTextureImage::BeginUpdate()
- TiledTextureImage::EndUpdate()
...from BasicBufferOGL::BeginPaint() in ThebesLayerOGL.cpp, and possibly from TextureImage::Resize() in GLContext.h.
Except for Resize(), I have instrumented these functions and none of them are ever called. (There is no need to instrument Resize(), since it doesn't contain any conditional paths to BeginUpdate().)
Resize() is called from:
- in GLContextProviderEGL.cpp:
- TextureImageEGL::TextureImageEGL(), but only if gUseBackingSurface is true
- BindTexture()
- GetTextureID()
- in GLContext.cpp:
- TiledTextureImage::TiledTextureImage()
- in ThebesLayerOGL.cpp:
- BasicBufferOGL::BeginPaint()
Only BindTexture() is ever called, but never (apparently) for our viewfinder frames (though for other elements when updating the overall viewfinder).
Assignee | ||
Comment 38•12 years ago
|
||
With "b mozilla::gl::GLContextEGL::BindExternalBuffer", here is the backtrace:
#0 mozilla::gl::GLContextEGL::BindExternalBuffer (this=0x48a98800, texture=135, buffer=0x496f7084) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:437
#1 0x41a74824 in mozilla::layers::ShadowImageLayerOGL::RenderLayer (this=0x49cd8000, aPreviousFrameBuffer=<value optimized out>, aOffset=...)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ImageLayerOGL.cpp:976
#2 0x41a71d82 in ContainerRender<mozilla::layers::ShadowContainerLayerOGL> (this=0x44fb7800, aPreviousFrameBuffer=0, aOffset=<value optimized out>)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
#3 mozilla::layers::ShadowContainerLayerOGL::RenderLayer (this=0x44fb7800, aPreviousFrameBuffer=0, aOffset=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:450
#4 0x41a71d82 in ContainerRender<mozilla::layers::ShadowContainerLayerOGL> (this=0x4987fc00, aPreviousFrameBuffer=0, aOffset=<value optimized out>)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
#5 mozilla::layers::ShadowContainerLayerOGL::RenderLayer (this=0x4987fc00, aPreviousFrameBuffer=0, aOffset=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:450
#6 0x41a71962 in ContainerRender<mozilla::layers::ShadowRefLayerOGL> (this=0x4987d000, aPreviousFrameBuffer=0, aOffset=<value optimized out>)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
#7 mozilla::layers::ShadowRefLayerOGL::RenderLayer (this=0x4987d000, aPreviousFrameBuffer=0, aOffset=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:491
#8 0x41a71d82 in ContainerRender<mozilla::layers::ShadowContainerLayerOGL> (this=0x4987c800, aPreviousFrameBuffer=0, aOffset=<value optimized out>)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
#9 mozilla::layers::ShadowContainerLayerOGL::RenderLayer (this=0x4987c800, aPreviousFrameBuffer=0, aOffset=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:450
#10 0x41a71d82 in ContainerRender<mozilla::layers::ShadowContainerLayerOGL> (this=0x4987f000, aPreviousFrameBuffer=0, aOffset=<value optimized out>)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
#11 mozilla::layers::ShadowContainerLayerOGL::RenderLayer (this=0x4987f000, aPreviousFrameBuffer=0, aOffset=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:450
#12 0x41a71d82 in ContainerRender<mozilla::layers::ShadowContainerLayerOGL> (this=0x495df800, aPreviousFrameBuffer=0, aOffset=<value optimized out>)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
#13 mozilla::layers::ShadowContainerLayerOGL::RenderLayer (this=0x495df800, aPreviousFrameBuffer=0, aOffset=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:450
#14 0x41a71d82 in ContainerRender<mozilla::layers::ShadowContainerLayerOGL> (this=0x495df000, aPreviousFrameBuffer=0, aOffset=<value optimized out>)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:264
#15 mozilla::layers::ShadowContainerLayerOGL::RenderLayer (this=0x495df000, aPreviousFrameBuffer=0, aOffset=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/ContainerLayerOGL.cpp:450
#16 0x41a77d1a in mozilla::layers::LayerManagerOGL::Render (this=0x48af2ef0) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/LayerManagerOGL.cpp:1028
#17 0x41a78390 in mozilla::layers::LayerManagerOGL::EndTransaction (this=0x48af2ef0, aCallback=0, aCallbackData=0x0, aFlags=<value optimized out>)
at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/LayerManagerOGL.cpp:700
#18 0x41a75854 in mozilla::layers::LayerManagerOGL::EndEmptyTransaction (this=0x48a98800, aFlags=1114578632) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/opengl/LayerManagerOGL.cpp:641
#19 0x41a8a602 in mozilla::layers::CompositorParent::Composite (this=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/layers/ipc/CompositorParent.cpp:555
#20 0x417dd5b0 in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/tuple.h:383
#21 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/task.h:307
#22 0x41a04cee in MessageLoop::RunTask (this=0x44affdd0, task=0x4925d120) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:334
#23 0x41a05518 in MessageLoop::DeferOrRunPendingTask (this=0x48a98800, pending_task=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:342
#24 0x41a0626e in MessageLoop::DoWork (this=0x44affdd0) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:442
#25 0x41a065ea in base::MessagePumpDefault::Run (this=0x448e6120, delegate=0x44affdd0) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_pump_default.cc:23
#26 0x41a052a2 in MessageLoop::RunInternal (this=0x44affdd0) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:216
#27 0x41a05302 in MessageLoop::RunHandler (this=0x44affdd0) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:209
#28 MessageLoop::Run (this=0x44affdd0) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:183
#29 0x41a0f26c in base::Thread::ThreadMain (this=0x448e5460) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/thread.cc:156
#30 0x41a1cc3e in ThreadFunc (closure=0x48a98800) at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/platform_thread_posix.cc:39
#31 0x4005de18 in __thread_entry (func=0x41a1cc35 <ThreadFunc>, arg=0x448e5460, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#32 0x4005d96c in pthread_create (thread_out=<value optimized out>, attr=0xbee91ef8, start_routine=0x41a1cc35 <ThreadFunc>, arg=0x448e5460) at bionic/libc/bionic/pthread.c:357
#33 0x00000000 in ?? ()
Assignee | ||
Comment 39•12 years ago
|
||
(Note that in comment 38, the line numbers include numerous printf()s.)
Comment 40•12 years ago
|
||
Hi mikeh, is the STR for this bug opening the camera and finding that UnbindExternalBuffer wasn't called?
Comment 41•12 years ago
|
||
So far there are two known reasons for this leak:
- UnbindExternalBuffer isn't called
- even if it were called, in its current form it's a no-operation as explained in the first few comments at the top.
Assignee | ||
Comment 42•12 years ago
|
||
Kan-Ru, STR is opening the camera and doing nothing for a few hours. Eventually the b2g parent process RSS will expand until everything is killed, and even the homescreen can't be loaded. Even if that doesn't happen, the only way to recover the leaked memory is to restart b2g.
Benoit, any idea where the Unbind should be happening?
Comment 43•12 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #42)
> Kan-Ru, STR is opening the camera and doing nothing for a few hours.
> Eventually the b2g parent process RSS will expand until everything is
> killed, and even the homescreen can't be loaded. Even if that doesn't
> happen, the only way to recover the leaked memory is to restart b2g.
>
> Benoit, any idea where the Unbind should be happening?
Yes; well, I'm not authoritative as I hadn't see that code before, but the idea is that you really want to do the Bind right before you draw, and the Unbind right after. Here we are doing the Bind right before we draw, and so it really seems that we should just be unbinding right after:
gl()->fActiveTexture(LOCAL_GL_TEXTURE0);
gl()->BindExternalBuffer(data->mTexture.GetTextureID(), ioImage->GetNativeBuffer());
ShaderProgramOGL *program = mOGLManager->GetProgram(RGBAExternalLayerProgramType, GetMaskLayer());
gl()->ApplyFilterToBoundTexture(mFilter);
program->Activate();
program->SetLayerQuadRect(nsIntRect(0, 0,
ioImage->GetSize().width,
ioImage->GetSize().height));
program->SetLayerTransform(GetEffectiveTransform());
program->SetLayerOpacity(GetEffectiveOpacity());
program->SetRenderOffset(aOffset);
program->SetTextureUnit(0);
program->LoadMask(GetMaskLayer());
mOGLManager->BindAndDrawQuadWithTextureRect(program,
GetVisibleRegion().GetBounds(),
nsIntSize(ioImage->GetSize().width,
ioImage->GetSize().height));
^ should unbind here, right after the BindAndDrawQuadWithTextureRect.
Comment 44•12 years ago
|
||
Note: that was in ImageLayerOGL::RenderLayer, in ImageLayerOGL.cpp, as in frame #1 in your stack.
Comment 45•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #44)
> Note: that was in ImageLayerOGL::RenderLayer, in ImageLayerOGL.cpp, as in
> frame #1 in your stack.
Is it ShadowImageLayerOGL::RenderLayer()?
Comment 46•12 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #45)
> (In reply to Benoit Jacob [:bjacob] from comment #44)
> > Note: that was in ImageLayerOGL::RenderLayer, in ImageLayerOGL.cpp, as in
> > frame #1 in your stack.
>
> Is it ShadowImageLayerOGL::RenderLayer()?
IIRC, ImageLayerOGL::RenderLayer() is not used in gonk.
Comment 47•12 years ago
|
||
Hm. You know what, I was confused because I had done a BRANCH=master ./config.sh to get master Gaia, and didn't realize that it updated gecko.
Looking in b2g18 now... : so we have the Bind call here,
http://hg.mozilla.org/releases/mozilla-b2g18/file/5eb76dd7c2a4/gfx/layers/opengl/ImageLayerOGL.cpp#l969
... and the code there is very complicated so I can't claim to understand it in first reading. But the idea remains, that the pattern should be: bind, draw, unbind all in the same function. Here, we bind, draw, and don't unbind. So the right place to add the unbind call is inside of this ShadowImageLayerOGL::RenderLayer function, after any draw call that may require the gralloc buffer binding.
It's difficult to get this right as this function has error cases with early return statements. A RAII helper to bind/unbind a gralloc buffer would help here...
anyway, not worrying about the error cases for now, the right place to put the Unbind call seems to be at the end of ShadowImageLayerOGL::RenderLayer. Indeed, much of this function consists in a series of if...else if...else... with each a draw call that may require the gralloc buffer binding; the last one is there:
http://hg.mozilla.org/releases/mozilla-b2g18/file/5eb76dd7c2a4/gfx/layers/opengl/ImageLayerOGL.cpp#l1102
and that's just before the end of the function, so we have no choice but to do the unbinding at the very end of the function.
Make sure to do the unbinding in the same case as you did the binding before. I mean, the binding is in a
956 #ifdef MOZ_WIDGET_GONK
957 if (img
958 && (img->type() == SharedImage::TSurfaceDescriptor)
959 && (img->get_SurfaceDescriptor().type() == SurfaceDescriptor::TSurfaceDescriptorGralloc)) {
I think the safest is to set a bool boundGrallocBuffer there, and at the end of ShadowImageLayerOGL::RenderLayer, do
if (boundGrallocBuffer) {
UnbindExternalBuffer(...);
}
Even better, again, would be a gralloc helper.
Thank god we rewrote all this junk in the layers refactoring... once bug 860441 is fixed it'll be awesome.
Comment 48•12 years ago
|
||
> Even better, again, would be a gralloc helper.
Er, I mean a RAII helper.
Assignee | ||
Comment 49•12 years ago
|
||
With the patch in attachment 736998 [details] [diff] [review], I see the MOZ_ASSERT(mNullEGLImage) fail:
E( 593:0x260) <qeglDrvAPI_eglCreateImageKHR:4047>: EGL_BAD_PARAMETER
F( 593:0x260) Assertion failure: mNullEGLImage, at /home/mikeh/dev/mozilla/m-c/b2g18/gfx/gl/GLContextProviderEGL.cpp:498
F( 593:0x260) Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
This is with the new call to UnbindExternalTexture() in ShadowImageLayerOGL::RenderLayer().
Flags: needinfo?(bjacob)
Comment 50•12 years ago
|
||
The spec allows `attrib_list` to be null, but it's possible that the implementation doesn't like this. Also, I didn't actually check if the EGL implementation has KHR_gl_texture_2D_image. If it doesn't, we'll need to do the gralloc approach.
If we have the source for the driver, that would make it easy to see what's up.
Flags: needinfo?(bjacob)
Assignee | ||
Comment 51•12 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #50)
>
> The spec allows `attrib_list` to be null, but it's possible that the
> implementation doesn't like this. Also, I didn't actually check if the EGL
> implementation has KHR_gl_texture_2D_image. If it doesn't, we'll need to do
> the gralloc approach.
I can easily test a non-null attrib_list--what value(s) should I stuff in there?
How can I check if KHR_gl_texture_2D_image is implemented?
> If we have the source for the driver, that would make it easy to see what's
> up.
Unfortunately we don't--we get the Adreno driver as a blob.
Flags: needinfo?(jgilbert)
Comment 52•12 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #51)
> (In reply to Jeff Gilbert [:jgilbert] from comment #50)
> >
> > The spec allows `attrib_list` to be null, but it's possible that the
> > implementation doesn't like this. Also, I didn't actually check if the EGL
> > implementation has KHR_gl_texture_2D_image. If it doesn't, we'll need to do
> > the gralloc approach.
>
> I can easily test a non-null attrib_list--what value(s) should I stuff in
> there?
const GLint attrib_list[] = { LOCAL_EGL_NONE };
>
> How can I check if KHR_gl_texture_2D_image is implemented?
`sEGLLibrary.IsExtensionSupported(KHR_gl_texture_2D_image)`
>
> > If we have the source for the driver, that would make it easy to see what's
> > up.
>
> Unfortunately we don't--we get the Adreno driver as a blob.
Flags: needinfo?(jgilbert)
Comment 53•12 years ago
|
||
We are not using UnbindExternalBuffer but we do release the texture here
https://hg.mozilla.org/releases/mozilla-b2g18/file/5eb76dd7c2a4/gfx/layers/opengl/ImageLayerOGL.cpp#l1041
Assignee | ||
Comment 54•12 years ago
|
||
jgilbert:
if (!mNullEGLImage) {
+ if (sEGLLibrary.HasKHRImageTexture2D()) {
+ printf_stderr("mikeh: KHR_gl_texture_2D_image extension is supported\n");
+ } else {
+ printf_stderr("mikeh: KHR_gl_texture_2D_image extension is NOT supported\n");
+ }
GLuint tempTex = 0;
fGenTextures(1, &tempTex);
GLuint prevTex = 0;
GetUIntegerv(LOCAL_GL_TEXTURE_BINDING_2D, &prevTex);
fBindTexture(LOCAL_GL_TEXTURE_2D, tempTex);
I( 897:0x390) mikeh: KHR_gl_texture_2D_image extension is NOT supported
Flags: needinfo?(jgilbert)
Comment 55•12 years ago
|
||
Well that's unfortunate. We'll probably have to do it with gralloc, then. Can you dump out the EGL extensions it *does* have?
Flags: needinfo?(jgilbert)
Assignee | ||
Comment 56•12 years ago
|
||
jgilbert:
if (!mNullEGLImage) {
+ for (int i = 0; i < sEGLLibrary.Extensions_Max; i++) {
+ if (sEGLLibrary.IsExtensionSupported(static_cast<GLLibraryEGL::EGLExtensions>(i))) {
+ printf_stderr("mikeh: extension %d is supported\n", i);
+ } else {
+ printf_stderr("mikeh: extension %d is NOT supported\n", i);
+ }
+ }
GLuint tempTex = 0;
fGenTextures(1, &tempTex);
I( 1214:0x4cd) mikeh: extension 0 is supported
I( 1214:0x4cd) mikeh: extension 1 is NOT supported
I( 1214:0x4cd) mikeh: extension 2 is NOT supported
I( 1214:0x4cd) mikeh: extension 3 is NOT supported
I( 1214:0x4cd) mikeh: extension 4 is NOT supported
I( 1214:0x4cd) mikeh: extension 5 is NOT supported
I( 1214:0x4cd) mikeh: extension 6 is supported
I( 1214:0x4cd) mikeh: extension 7 is supported
Ref. https://mxr.mozilla.org/mozilla-central/source/gfx/gl/GLLibraryEGL.h#112
Comment 57•12 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] from comment #53)
> We are not using UnbindExternalBuffer but we do release the texture here
> https://hg.mozilla.org/releases/mozilla-b2g18/file/5eb76dd7c2a4/gfx/layers/
> opengl/ImageLayerOGL.cpp#l1041
It would be interesting then, to step through GLTexture::Release to understand why the gralloc buffer is still being leaked. A side note though, the path in there that could actually release it calls glDeleteTextures; we should really try not to create and delete GL textures at every frame if we are to achieve good performance (though at this point we haven't established yet that we can avoid doing so).
(In reply to Mike Habicher [:mikeh] from comment #54)
> I( 897:0x390) mikeh: KHR_gl_texture_2D_image extension is NOT supported
So let's go back to the other approach, of using a dummy android::GraphicBuffer, as discussed in comment 3 and onwards.
Fortunately I needed something similar in bug 860441 so there is already code for that there, only it's for mozilla-central.
https://bug860441.bugzilla.mozilla.org/attachment.cgi?id=737891
+ EGLImage GetNullEGLImage() MOZ_OVERRIDE
+ {
+ if (!mNullEGLImage) {
+ android::GraphicBuffer *buffer
+ = new android::GraphicBuffer(
+ 1, 1,
+ PIXEL_FORMAT_RGB_565,
+ GRALLOC_USAGE_SW_READ_NEVER | GRALLOC_USAGE_SW_WRITE_NEVER);
+ EGLint attrs[] = {
+ LOCAL_EGL_NONE, LOCAL_EGL_NONE
+ };
+ mNullEGLImage = sEGLLibrary.fCreateImage(EGL_DISPLAY(),
+ EGL_NO_CONTEXT,
+ LOCAL_EGL_NATIVE_BUFFER_ANDROID,
+ buffer->getNativeBuffer(), attrs);
+ }
+ return mNullEGLImage;
+ }
Updated•12 years ago
|
Whiteboard: [MemShrink] → [MemShrink][status: needs patch, progress happening]
Comment 58•12 years ago
|
||
Comment on attachment 737439 [details] [diff] [review]
[b2g18] Bind/UnbindTexture call counting
Yup, that was the right way to do the counting.
Attachment #737439 -
Flags: feedback?(bjacob) → feedback+
Comment 59•12 years ago
|
||
Does this work for you? I haven't tried building so there may be trivial compile errors. Does this fix the leak?
Note, the _whole_ BindExternalBuffer/UnbindExternalBuffer API is inherently bad because it forces BindExternalBuffer to create and destroy a new EGLImage everytime... hopefully that's not too slow.
Attachment #737978 -
Flags: feedback?(mhabicher)
Comment 60•12 years ago
|
||
Fixed a couple of typos.
Attachment #737978 -
Attachment is obsolete: true
Attachment #737978 -
Flags: feedback?(mhabicher)
Updated•12 years ago
|
Attachment #737980 -
Attachment is patch: true
Attachment #737980 -
Flags: feedback?(mhabicher)
Comment 61•12 years ago
|
||
Note: this patch only takes care of fixing UnbindExternalBuffer (hopefully); it does not add the missing call to UnbindExternalBuffer. Have you guys figured that part yet?
Assignee | ||
Comment 62•12 years ago
|
||
Comment on attachment 737980 [details] [diff] [review]
fix UnbindExternalBuffer
With the missing symbols defined, as discussed in IRC, this builds and runs without crashing, but the viewfinder display is corrupted: it shows a repeating pattern of fine-pitch pink and black lines.
Attachment #737980 -
Flags: feedback?(mhabicher) → feedback-
Assignee | ||
Comment 63•12 years ago
|
||
This is my currently-applied patch.
Attachment #737992 -
Flags: feedback?(bjacob)
Comment 64•12 years ago
|
||
The name of UnbindExternalBuffer is wrong because it is used to unbind a GL_TEXTURE_2D gralloc buffer and was never used to unbind a GL_TEXTURE_EXTERNAL gralloc buffer. The user of BindExternalBuffer & UnbindExternalBuffer are different so it's very confusing.
Assignee | ||
Comment 65•12 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] from comment #64)
>
> The name of UnbindExternalBuffer is wrong because it is used to unbind a
> GL_TEXTURE_2D gralloc buffer and was never used to unbind a
> GL_TEXTURE_EXTERNAL gralloc buffer. The user of BindExternalBuffer &
> UnbindExternalBuffer are different so it's very confusing.
100% agree!
Any idea how to unbind the texture bound in BindExternalBuffer() so that we can free the EGLImage?
Comment 66•12 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #65)
> Any idea how to unbind the texture bound in BindExternalBuffer() so that we
> can free the EGLImage?
The right way to unbind the gralloc buffer bound in BindExternalBuffer() is (hopefully) as is done by the UnbindExternalBuffer function as modified by my patch, but reading Kan-Ru's comment, I now understand that that is not what existing callers of UnbindExternalBuffer expected!
So we should take this code into a new, separate function, and leave UnbindExternalBuffer unchanged.
Updated•12 years ago
|
Summary: [Layers] Texture type mismatch prevents freeing EGLImages → [Layers] B2G18 leaking gralloc buffers attached to GL_TEXTURE_EXTERNAL textures
Updated•12 years ago
|
Summary: [Layers] B2G18 leaking gralloc buffers attached to GL_TEXTURE_EXTERNAL textures → [Layers] B2G18 leaking EGLImages attached to GL_TEXTURE_EXTERNAL textures
Assignee | ||
Comment 67•12 years ago
|
||
I can certainly code up that change; but that doesn't address the (new) pink screen problem.
Comment 68•12 years ago
|
||
My hope is that the new pink screen problem was caused by messing with UnbindExternalBuffer, which as Kan-Ru explains is used elsewhere, and so I hope that making a new patch that leaves UnbindExternalBuffer untouched will avoid the problem.
Assignee | ||
Comment 69•12 years ago
|
||
I hope so too, but in my testing, I _never_ saw UnbindExternalBuffer() getting called, even once, even outside of the context of testing the camera app.
Comment 70•12 years ago
|
||
Oh, that's a good point.
To investigate this kind of rendering issues:
- 1. ensure that your build is a DEBUG one
- 2. add this to b2g.sh before the exec b2g line:
export MOZ_GL_DEBUG_ABORT_ON_ERROR=1
- 3. check in 'adb logcat' or in /proc/kmsg if there are any messages about 'genlock'. That's the system-wide lock mechanism controlling gralloc buffers.
Assignee | ||
Comment 71•12 years ago
|
||
Here's the latest change. With this patch, the camera viewfinder stays solid green.
Attachment #737992 -
Attachment is obsolete: true
Attachment #737992 -
Flags: feedback?(bjacob)
Assignee | ||
Comment 72•12 years ago
|
||
(The patch in comment 71 goes on top of the other [b2g18] patchen.)
Assignee | ||
Comment 73•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #70)
>
> To investigate this kind of rendering issues:
> - 1. ensure that your build is a DEBUG one
> - 2. add this to b2g.sh before the exec b2g line:
> export MOZ_GL_DEBUG_ABORT_ON_ERROR=1
> - 3. check in 'adb logcat' or in /proc/kmsg if there are any messages about
> 'genlock'. That's the system-wide lock mechanism controlling gralloc buffers.
With |export MOZ_GL_DEBUG_ABORT_ON_ERROR=1| set in b2g.sh, the camera viewfinder works properly; with it removed, the viewfinder goes solid green again.
Flags: needinfo?(bjacob)
Comment 74•12 years ago
|
||
Hm, this could explain the trouble:
@@ -1035,17 +1035,17 @@
program->SetRenderOffset(aOffset);
program->SetTextureUnit(0);
program->LoadMask(GetMaskLayer());
mOGLManager->BindAndDrawQuadWithTextureRect(program,
mPictureRect,
nsIntSize(mSize.width, mSize.height));
- gl()->UnbindExternalBuffer(mExternalBufferTexture.GetTextureID());
+ gl()->AlsoUnbindExternalBuffer(mExternalBufferTexture.GetTextureID());
As I tried to explain in comment 47, we've got to make sure that we Unbind if and only if we had Bound. Since the Bind call is inside a nontrivial if() block, we need to be careful there.
I got a b2g18 build now, testing a patch.
Flags: needinfo?(bjacob)
Comment 75•12 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #73)
> (In reply to Benoit Jacob [:bjacob] from comment #70)
> >
> > To investigate this kind of rendering issues:
> > - 1. ensure that your build is a DEBUG one
> > - 2. add this to b2g.sh before the exec b2g line:
> > export MOZ_GL_DEBUG_ABORT_ON_ERROR=1
> > - 3. check in 'adb logcat' or in /proc/kmsg if there are any messages about
> > 'genlock'. That's the system-wide lock mechanism controlling gralloc buffers.
>
> With |export MOZ_GL_DEBUG_ABORT_ON_ERROR=1| set in b2g.sh, the camera
> viewfinder works properly; with it removed, the viewfinder goes solid green
> again.
That's interesting. The most likely way that MOZ_GL_DEBUG_ABORT_ON_ERROR could make this difference, is that it inserts glFinish() calls after every GL call, effectively turning GL into a synchronous API. So what you observe here is what you'd get if the green rendering bug were caused by a us failing to properly synchronize (and the glFinish calls would add that missing synchronization).
Assignee | ||
Comment 76•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #74)
>
> As I tried to explain in comment 47, we've got to make sure that we Unbind
> if and only if we had Bound. Since the Bind call is inside a nontrivial if()
> block, we need to be careful there.
My bad--the jetlag is hitting pretty hard today. I can exercise your patch too, if you like.
Comment 77•12 years ago
|
||
Mike, can you try this? I built b2g18 with it and everything seems to go smoothly, in particular the camera preview looks good, but I am not sure what the STR are for the visual glitch you mentioned.
Also, regarding the leak, normally I'd use tools/get_about_memory.py and check the gralloc metric, but it doesn't seem to be there in b2g18.
Attachment #738147 -
Flags: feedback?(mhabicher)
Assignee | ||
Comment 78•12 years ago
|
||
Comment on attachment 738147 [details] [diff] [review]
Combined diff, Unbind only if bound, works here
I just tried this patch (making sure MOZ_GL_DEBUG_ABORT_ON_ERROR=1 was not set) and the camera viewfinder is working properly.
I will let it run for a while and examine the DMD output.
(You can also see in gross terms if the EGLImage is getting freed by running |adb shell b2g-ps| and looking at the RSS numbers for the b2g process--it will keep increasing if there is a leak.)
Attachment #738147 -
Flags: feedback?(mhabicher) → feedback+
Comment 79•12 years ago
|
||
Mike says it's still leaking, and also confirms that the Bind/Unbind calls are balancing out.
So at this point we have a very strong indication that there is an EGLImage leak in the EGL libraries installed on Unagi phones.
Let's get the patch reviewed and landed as it makes things at least sane on our end, and then let's ask for help from Qualcomm.
Comment 80•12 years ago
|
||
In summary, this patch
- implements a ActuallyUnbindExternalBuffer function that really does what it says
- uses it to unbind buffers
So with that, I don't see another way that the leak could be caused by Gecko. The leaks seems to still be here but it is now apparently a driver bug.
Attachment #736532 -
Attachment is obsolete: true
Attachment #736533 -
Attachment is obsolete: true
Attachment #736997 -
Attachment is obsolete: true
Attachment #736998 -
Attachment is obsolete: true
Attachment #737134 -
Attachment is obsolete: true
Attachment #737439 -
Attachment is obsolete: true
Attachment #737980 -
Attachment is obsolete: true
Attachment #738109 -
Attachment is obsolete: true
Attachment #738147 -
Attachment is obsolete: true
Attachment #738225 -
Flags: review?(jgilbert)
Comment 81•12 years ago
|
||
Michael, we have some strong indication that there may be a leak of EGLImages on Unagi.
The leak is described in comment 0. With the patch here, it still exists, and our code now looks like this:
BindExternalBuffer(texture);
drawSomeStuff();
ActuallyUnbindExternalBuffer(texture);
where BindExternalBuffer and ActuallyUnbindExternalBuffer are like this:
void BindExternalBuffer(GLuint texture, void* buffer)
{
EGLint attrs[] = {
LOCAL_EGL_IMAGE_PRESERVED, LOCAL_EGL_TRUE,
LOCAL_EGL_NONE, LOCAL_EGL_NONE
};
/*** THIS IS WHERE THE LEAKED EGLIMAGE IS CREATED !!! ***/
EGLImage image = sEGLLibrary.fCreateImage(EGL_DISPLAY(),
EGL_NO_CONTEXT,
EGL_NATIVE_BUFFER_ANDROID,
buffer, attrs);
fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL, texture);
fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL, image);
sEGLLibrary.fDestroyImage(EGL_DISPLAY(), image);
}
void ActuallyUnbindExternalBuffer(GLuint texture)
{
fBindTexture(LOCAL_GL_TEXTURE_EXTERNAL, texture);
fEGLImageTargetTexture2D(LOCAL_GL_TEXTURE_EXTERNAL, GetNullEGLImage());
}
EGLImage GetNullEGLImage() MOZ_OVERRIDE
{
if (!mNullGraphicBuffer.get()) {
mNullGraphicBuffer
= new android::GraphicBuffer(
1, 1,
PIXEL_FORMAT_RGB_565,
GRALLOC_USAGE_SW_READ_NEVER | GRALLOC_USAGE_SW_WRITE_NEVER);
EGLint attrs[] = {
LOCAL_EGL_NONE, LOCAL_EGL_NONE
};
mNullEGLImage = sEGLLibrary.fCreateImage(
EGL_DISPLAY(),
EGL_NO_CONTEXT,
LOCAL_EGL_NATIVE_BUFFER_ANDROID,
mNullGraphicBuffer->getNativeBuffer(), attrs);
}
return mNullEGLImage;
}
Updated•12 years ago
|
Flags: needinfo?(mvines)
Comment 82•12 years ago
|
||
Side note: we have a mitigation path that consists in having ImageLayerOGL no longer use the crummy BindExternalBuffer API and instead maintain its own mEGLImage. In this way, even if the leak still exists, it would now be only one mEGLImage leaked per image layer object, instead of leaking over and over again at every rendered frame (see how BindExternalBuffer creates and destroys EGLImages everytime).
Comment 83•12 years ago
|
||
Michael: some more comments on this. See in comment 81 how BindExternalBuffer creates an EGLImage, passes it to EGLImageTargetTexture2D, and destroys it. This relies on the EGL implementation to get reference-counting right. So it may be a good approach to look for reference-counting issues around there in the EGL implementation.
Updated•12 years ago
|
Whiteboard: [MemShrink][status: needs patch, progress happening] → [MemShrink:P1][status: needs patch, progress happening]
Comment 84•12 years ago
|
||
Please retest on either a vendor *unmodified* Buri or Inari. I have no idea what random libraries you may have on Unagi.
Flags: needinfo?(mvines)
Comment 85•12 years ago
|
||
The only B2G development devices I have here in Toronto are Unagi's and Otoro's.
I'm going to implement the mitigation described in comment 82, then.
Updated•12 years ago
|
Whiteboard: [MemShrink:P1][status: needs patch, progress happening] → [MemShrink:P1][status: needs patch, progress happening][madrid]
Assignee | ||
Comment 86•12 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #84)
>
> Please retest on either a vendor *unmodified* Buri or Inari. I have no idea
> what random libraries you may have on Unagi.
This is an interesting result: running an unmodified Buri idling on the camera viewfinder, I can see the b2g parent process' RSS start at ~60MiB. Over time it rises to 62-64MiB at approximately the same rate I would expect for the leak we're trying to address above. But _then_ RSS drops back down to ~60MiB and restarts it climb. It's done this twice now over the past hour, as if something is waiting until it has collected enough "garbage" and is then disposing of it all at once.
Most importantly, this is running code supplied entirely by the vendor, without the above patch.
The test needs to run longer to be conclusive, but I wanted to share the preliminary results. I will provide an update later.
Comment 87•12 years ago
|
||
That sort of makes sense, as (see comment 83) given correct refcounting on the part of the EGL implementation, the current b2g18 code should actually not leak. (Earlier I thought that we had an actual bug on our part causing the leak, but I was wrong).
Assignee | ||
Comment 88•12 years ago
|
||
The test has now been running for >5 hours, and the b2g parent RSS has stayed within ~66MiB running the unmodified Buri build.
Does this mean Unagi has an old Adreno driver?
Assignee | ||
Updated•12 years ago
|
Flags: needinfo?(mvines)
Comment 89•12 years ago
|
||
I hope so :)
But I actually have no idea what blob bits are in a Moz Unagi build, which is why I was interested in the behaviour on a vendor build that has a well known origin and is fully supported by the team here.
Flags: needinfo?(mvines)
Assignee | ||
Comment 90•12 years ago
|
||
Any chance we can update the Unagi Adreno driver and retest?
If not, I think we should tag this bug as Unagi-specific and mark it as RESOLVED/WONTFIX.
Thoughts?
Updated•12 years ago
|
Whiteboard: [MemShrink:P1][status: needs patch, progress happening][madrid] → [MemShrink:P1][status: maybe invalid][madrid]
Assignee | ||
Updated•12 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Keywords: unagi
Resolution: --- → WONTFIX
Summary: [Layers] B2G18 leaking EGLImages attached to GL_TEXTURE_EXTERNAL textures → [Layers][unagi] B2G18 leaking EGLImages attached to GL_TEXTURE_EXTERNAL textures
Whiteboard: [MemShrink:P1][status: maybe invalid][madrid] → [MemShrink:P1][status: needs patch, progress happening][madrid]
Comment 91•12 years ago
|
||
This should be tested on hamachi or leo, which have newer adreno drivers.
Assignee | ||
Comment 92•12 years ago
|
||
(In reply to Michael Wu [:mwu] from comment #91)
>
> This should be tested on hamachi or leo, which have newer adreno drivers.
This was tested on Buri/Hamachi and the problem did not exist.
Comment 93•12 years ago
|
||
Comment on attachment 738225 [details] [diff] [review]
Fix any known possibility that the leak could be caused by Gecko
Review of attachment 738225 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/gl/GLContext.h
@@ +778,4 @@
> gfxPattern::GraphicsFilter aFilter);
>
> virtual bool BindExternalBuffer(GLuint texture, void* buffer) { return false; }
> + // UnbindExternalBuffer is not actually doing the converse of BindExternalBuffer
We're going to need a better explanation that this.
::: gfx/gl/GLContextProviderEGL.cpp
@@ +464,4 @@
> #endif
> }
>
> + bool ActuallyUnbindExternalBuffer(GLuint texture)
I'm going to need a good explanation why we have a 'UnbindExternalBuffer' that says "I'm broken" and then additionally a function for 'ActuallyUnbindExternalBuffer'.
@@ +702,5 @@
> + EGLImage GetNullEGLImage() MOZ_OVERRIDE
> + {
> + if (!mNullGraphicBuffer.get()) {
> + mNullGraphicBuffer
> + = new android::GraphicBuffer(
No use pulling `new andr[...]` to the next line.
@@ +704,5 @@
> + if (!mNullGraphicBuffer.get()) {
> + mNullGraphicBuffer
> + = new android::GraphicBuffer(
> + 1, 1,
> + PIXEL_FORMAT_RGB_565,
I would prefer this be 2x2 and RGBA. (simplest combo ever)
I'd also love if this was filled with pink, for an indication if we end up rendering from this accidentally.
@@ +706,5 @@
> + = new android::GraphicBuffer(
> + 1, 1,
> + PIXEL_FORMAT_RGB_565,
> + GRALLOC_USAGE_SW_READ_NEVER | GRALLOC_USAGE_SW_WRITE_NEVER);
> + EGLint attrs[] = {
Leave a newline here between the indented previous function call and this different bit of code.
@@ +784,5 @@
> return surface;
> }
> +
> +#ifdef MOZ_WIDGET_GONK
> + EGLImage mNullEGLImage;
Where is this destroyed?
::: gfx/gl/GLDefs.h
@@ +46,4 @@
>
> // OES_EGL_image (GLES)
> typedef void* GLeglImage;
> +typedef void* EGLImage;
This should technically go under the heading for EGL_KHR_image or EGL_KHR_image_base (iirc).
::: gfx/layers/opengl/ImageLayerOGL.cpp
@@ +936,5 @@
> return;
> }
> mOGLManager->MakeCurrent();
> +
> +#ifdef MOZ_WIDGET_GONK
Ifdefs are infecting the code again. :(
Attachment #738225 -
Flags: review?(jgilbert) → review-
Comment 94•12 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #90)
> If not, I think we should tag this bug as Unagi-specific and mark it as
> RESOLVED/WONTFIX.
>
> Thoughts?
OK as far as I'm concerned.
Updated•12 years ago
|
Target Milestone: --- → Madrid (19apr)
You need to log in
before you can comment on or make changes to this bug.
Description
•