Closed Bug 1054929 Opened 10 years ago Closed 10 years ago

Black screen happens when writing in Canvas - GraphicBufferMapper failure: lock(...) failed -22 (Invalid argument)

Categories

(Core :: DOM: Content Processes, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34
blocking-b2g 2.0+
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: wdeng, Assigned: khuey)

References

Details

(Keywords: regression)

Attachments

(7 files, 4 obsolete files)

Attached image blackpanel.png (deleted) —
produce steps:

1. Apply patch of bug 1020847.
   pull request url: https://github.com/mozilla-b2g/gaia/pull/22151
2. Install handwriting IME:
   GAIA_KEYBOARD_LAYOUTS=en,zh-Hans-Handwriting make install-gaia
3. Enable handwriting IME in [settings]->[Keyboards]->[Select Keyboards]->[手写简体]
4. Open message app, switch to handwriting IME, write in the blank area.
5. Press home button, change settings for handwriting IME in [settings]->[Keyboards]->[Built-in Keyboard]. Adjust input range sliders' positions (several times). 
6. Press home button and open message app again, focus in text input area, handwriting keyboard appears, write in the blank area, there will be black screen, just as attached.

If it does not happen, just retry steps 5 and 6.

When it happens, the log looks:

I/Gecko   ( 1622): SharedSurface_Gralloc::Create -------
I/Gecko   ( 1622): SharedSurface_Gralloc::Create: success -- surface 0xafc1fbb0, GraphicBuffer 0xb1194500.
E/GeckoConsole( 1622): Content JS WARN at app://keyboard.gaiamobile.org/js/keyboard/target_handlers_manager.js:74 in TargetHandlersManager.prototype._callTargetAction: TargetHandlersManager: calling targetHandler.activate() on existing handler.
E/GeckoConsole( 1622): Content JS WARN at app://keyboard.gaiamobile.org/js/keyboard/target_handlers_manager.js:74 in TargetHandlersManager.prototype._callTargetAction: TargetHandlersManager: calling targetHandler.activate() on existing handler.
E/qdmemalloc( 1622): clean_buffer: ION_IOC_IMPORT failed with error - Invalid argument
W/GraphicBufferMapper( 1622): lock(...) failed -22 (Invalid argument)
E/qdmemalloc( 1622): clean_buffer: ION_IOC_IMPORT failed with error - Invalid argument
W/GraphicBufferMapper( 1622): lock(...) failed -22 (Invalid argument)
E/GeckoConsole( 1438): Content JS LOG at app://system.gaiamobile.org/js/app_usage_metrics.js:92 in debug: [AppUsage] Saved app usage data
Summary: Black screen happens when writing in Canvas → Black screen happens when writing in Canvas - GraphicBufferMapper failure: lock(...) failed -22 (Invalid argument)
Component: General → Canvas: 2D
Product: Firefox OS → Core
Version: unspecified → Trunk
OS: Linux → Gonk (Firefox OS)
Hi Wei, I have some questions to the bug.
-(1) In which hardware did you produce the problem?
-(2) What is "Adjust input range sliders"?
  I installed the pull request in comment 0 and moved to "Built-in Keyboard" in Settings.
  I saw only "Stroke Width" slider and "Response Speed" slide.
  I did not saw "input range slider".
Flags: needinfo?(wdeng)
(In reply to Sotaro Ikeda [:sotaro] from comment #1)
> Hi Wei, I have some questions to the bug.
> -(1) In which hardware did you produce the problem?
> -(2) What is "Adjust input range sliders"?
>   I installed the pull request in comment 0 and moved to "Built-in Keyboard"
> in Settings.
>   I saw only "Stroke Width" slider and "Response Speed" slide.
>   I did not saw "input range slider".

And can you also provide the firefox os version?
Hi Sotaro,
> Hi Wei, I have some questions to the bug.
> -(1) In which hardware did you produce the problem?
       The device is flame and firefox OS version is 2.1, the latest master branch.
> -(2) What is "Adjust input range sliders"?
>   I installed the pull request in comment 0 and moved to "Built-in Keyboard"
> in Settings.
>   I saw only "Stroke Width" slider and "Response Speed" slide.
>   I did not saw "input range slider".
    Sorry, input range is the DOM element. So input range sliders mean the two sliders for stroke width and response speed.
    You can drag one slider, move from left to right or right to left several times before dropping it.
    The more times you move it, the more often black screen appear.
Flags: needinfo?(wdeng)
I can reproduce this with GDB, it seems not canvas bug, but Content Painting bug:
Breakpoint 4, android::GraphicBufferMapper::lock (this=0xb45376c4, handle=0xb1668470, usage=3, bounds=..., vaddr=0xb1694214) at frameworks/native/libs/ui/GraphicBufferMapper.cpp:83
83	    ALOGW_IF(err, "lock(...) failed %d (%s)", err, strerror(-err));
(gdb) bt
#0  android::GraphicBufferMapper::lock (this=0xb45376c4, handle=0xb1668470, usage=3, bounds=..., vaddr=0xb1694214) at frameworks/native/libs/ui/GraphicBufferMapper.cpp:83
#1  0xb4e11356 in android::GraphicBuffer::lock (this=0xb162ac80, usage=3, rect=..., vaddr=<optimized out>) at frameworks/native/libs/ui/GraphicBuffer.cpp:173
#2  0xb4e1137c in android::GraphicBuffer::lock (this=<optimized out>, usage=<optimized out>, vaddr=<optimized out>) at frameworks/native/libs/ui/GraphicBuffer.cpp:160
#3  0xb5729412 in Lock (aMode=mozilla::layers::OPEN_READ, this=0xb16941a0) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/opengl/GrallocTextureClient.cpp:145
#4  mozilla::layers::GrallocTextureClientOGL::Lock (this=0xb16941a0, aMode=<optimized out>) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/opengl/GrallocTextureClient.cpp:125
#5  0xb5709980 in mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame (this=0xb22b89c0, aRegionToDraw=...) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/client/ContentClient.cpp:548
#6  0xb56f7f3c in mozilla::layers::RotatedContentBuffer::BeginPaint (this=0xb22b89d4, aLayer=0xb17a4c00, aFlags=5) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/RotatedBuffer.cpp:532
#7  0xb5707e06 in mozilla::layers::ContentClientRemoteBuffer::BeginPaintBuffer (this=<optimized out>, aLayer=<optimized out>, aFlags=<optimized out>)
    at ../../dist/include/mozilla/layers/ContentClient.h:214
#8  0xb570a818 in mozilla::layers::ClientThebesLayer::PaintThebes (this=0xb17a4c00) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/client/ClientThebesLayer.cpp:55
#9  0xb570d2a4 in mozilla::layers::ClientThebesLayer::RenderLayerWithReadback (this=0xb17a4c00, aReadback=0xbe9344a8)
    at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/client/ClientThebesLayer.cpp:132
#10 0xb570d596 in mozilla::layers::ClientContainerLayer::RenderLayer (this=0xb17a4800) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/client/ClientContainerLayer.h:69
#11 0xb5708832 in mozilla::layers::ClientLayerManager::EndTransactionInternal (this=0xb2dd7d00, aCallback=<optimized out>, aCallbackData=<optimized out>)
    at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/client/ClientLayerManager.cpp:220
#12 0xb570d800 in mozilla::layers::ClientLayerManager::EndTransaction (this=0xb2dd7d00, 
    aCallback=0xb5d913bd <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*)>, 
    aCallbackData=0xbe934728, aFlags=mozilla::layers::LayerManager::END_DEFAULT) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/gfx/layers/client/ClientLayerManager.cpp:246
#13 0xb5dbcf72 in nsDisplayList::PaintForFrame (this=0xb2a61800, aBuilder=0xbe934728, aCtx=<optimized out>, aForFrame=<optimized out>, aFlags=13)
    at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/layout/base/nsDisplayList.cpp:1336
#14 0xb5dbd0e0 in nsDisplayList::PaintRoot (this=0xbe93470c, aBuilder=0xbe934728, aCtx=0x0, aFlags=13) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/layout/base/nsDisplayList.cpp:1183
#15 0xb5dbddc0 in nsLayoutUtils::PaintFrame (aRenderingContext=0x0, aFrame=0xb393e2b8, aDirtyRegion=<optimized out>, aBackstop=0, aFlags=772)
    at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/layout/base/nsLayoutUtils.cpp:3066
#16 0xb5d7eec0 in PresShell::Paint (this=0xb39aa920, aViewToPaint=0xb2a3cc90, aDirtyRegion=..., aFlags=1) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/layout/base/nsPresShell.cpp:6212
#17 0xb5b57b74 in nsViewManager::ProcessPendingUpdatesPaint (this=0xb2d454f0, aWidget=0xb3949200) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/view/nsViewManager.cpp:443
#18 0xb5b57c94 in nsViewManager::ProcessPendingUpdatesForView (this=<optimized out>, aView=<optimized out>, aFlushDirtyRegion=<optimized out>)
    at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/view/nsViewManager.cpp:384
#19 0xb5d83ab2 in nsRefreshDriver::Tick (this=0xb3a196e0, aNowEpoch=<optimized out>, aNowTime=...) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/layout/base/nsRefreshDriver.cpp:1280
#20 0xb5d83f78 in TickDriver (now=<optimized out>, jsnow=1408607333246509, driver=<optimized out>) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/layout/base/nsRefreshDriver.cpp:171
#21 mozilla::RefreshDriverTimer::Tick (this=<optimized out>) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/layout/base/nsRefreshDriver.cpp:162
#22 0xb536e9e4 in nsTimerImpl::Fire (this=0xb2d26040) at ../../../gecko/xpcom/threads/nsTimerImpl.cpp:618
#23 0xb536ea9a in nsTimerEvent::Run (this=0xb2d4c0c0) at ../../../gecko/xpcom/threads/nsTimerImpl.cpp:711
#24 0xb536c602 in ProcessNextEvent (aResult=0xbe934e7f, aMayWait=<optimized out>, this=0xb4502550) at ../../../gecko/xpcom/threads/nsThread.cpp:770
#25 nsThread::ProcessNextEvent (this=0xb4502550, aMayWait=<optimized out>, aResult=0xbe934e7f) at ../../../gecko/xpcom/threads/nsThread.cpp:686
#26 0xb537a802 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/xpcom/glue/nsThreadUtils.cpp:265
#27 0xb54b53bc in mozilla::ipc::MessagePump::Run (this=0xb4501c40, aDelegate=0xbe934f80) at ../../../gecko/ipc/glue/MessagePump.cpp:99
#28 0xb54aa1d6 in MessageLoop::RunInternal (this=<optimized out>) at ../../../gecko/ipc/chromium/src/base/message_loop.cc:229
#29 0xb54aa288 in RunHandler (this=0xbe934f80) at ../../../gecko/ipc/chromium/src/base/message_loop.cc:222
#30 MessageLoop::Run (this=0xbe934f80) at ../../../gecko/ipc/chromium/src/base/message_loop.cc:196
#31 0xb5b59cca in nsBaseAppShell::Run (this=0xb3934340) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/widget/xpwidgets/nsBaseAppShell.cpp:164
#32 0xb5fc4a2a in XRE_RunAppShell () at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/toolkit/xre/nsEmbedFunctions.cpp:702
#33 0xb54aa1d6 in MessageLoop::RunInternal (this=<optimized out>) at ../../../gecko/ipc/chromium/src/base/message_loop.cc:229
#34 0xb54aa288 in RunHandler (this=0xbe934f80) at ../../../gecko/ipc/chromium/src/base/message_loop.cc:222
#35 MessageLoop::Run (this=0xbe934f80) at ../../../gecko/ipc/chromium/src/base/message_loop.cc:196
#36 0xb5fc4e58 in XRE_InitChildProcess (aArgc=<optimized out>, aArgv=<optimized out>) at /home/chiajung/FirefoxOS/flame-clean/B2G/gecko/toolkit/xre/nsEmbedFunctions.cpp:539
#37 0x0000910e in content_process_main (argc=7, argv=0xbe935904) at ../../../gecko/ipc/app/../contentproc/plugin-container.cpp:148
#38 0xb6f27d38 in __libc_init (raw_args=0xbe935900, onexit=<optimized out>, slingshot=0x912d <main(int, char**)>, structors=<optimized out>) at bionic/libc/bionic/libc_init_dynamic.cpp:112

It maybe related to bug 1055226.
This is a front buffer lock fail while bug 1055226 is a back buffer lock fail, I highly suspect it is a Fence/Gralloc related bug while swap buffer.
From kernel source and dmesg, the error is in fact !is_dma_buf_file(file) check fail, which is hard to image.

But I can not reproduce this bug after revert Bug 918595. Which may indicate that we close some fd at invalid timing.
By using attachment 8477910 [details] [diff] [review] and attachment 8477910 [details] [diff] [review] with little modification for master flame JB, I confirmed that an invalid fd close happens at RemoteOpenFileChild::~RemoteOpenFileChild().

stack was the following.

(gdb) info stack
#0  close (fd=38) at bionic/libc/bionic/close.c:49
#1  0xb6f56c34 in __wrap_close (aFd=38) at ../../../gecko/mozglue/build/Nuwa.cpp:1339
#2  0xb6e524a0 in pt_Close (fd=0xb2a3ff00) at ../../../../../gecko/nsprpub/pr/src/pthreads/ptio.c:1252
#3  0xb6e4361c in PR_Close (fd=<optimized out>) at ../../../../../gecko/nsprpub/pr/src/io/priometh.c:104
#4  0xb540201a in mozilla::net::RemoteOpenFileChild::~RemoteOpenFileChild (this=0xb2c5e6a0, __in_chrg=<optimized out>)
    at ../../../gecko/netwerk/ipc/RemoteOpenFileChild.cpp:141
#5  0xb5402064 in mozilla::net::RemoteOpenFileChild::~RemoteOpenFileChild (this=0xb2c5e6a0, __in_chrg=<optimized out>)
    at ../../../gecko/netwerk/ipc/RemoteOpenFileChild.cpp:143
#6  0xb54020a0 in Release (this=<optimized out>) at ../../../gecko/netwerk/ipc/RemoteOpenFileChild.cpp:72
#7  mozilla::net::RemoteOpenFileChild::Release (this=<optimized out>) at ../../../gecko/netwerk/ipc/RemoteOpenFileChild.cpp:72
#8  0xb54020c4 in ReleaseIPDLReference (this=<optimized out>) at ../../dist/include/mozilla/net/RemoteOpenFileChild.h:86
#9  mozilla::net::NeckoChild::DeallocPRemoteOpenFileChild (this=<optimized out>, aChild=<optimized out>)
    at ../../../gecko/netwerk/ipc/NeckoChild.cpp:289
#10 0xb54db29a in mozilla::net::PNeckoChild::RemoveManagee (this=0xb39d3880, aProtocolId=<optimized out>, aListener=0xb2c5e6a0)
    at PNeckoChild.cpp:1063
#11 0xb54f60ba in OnMessageReceived (__msg=<optimized out>, this=<optimized out>) at PRemoteOpenFileChild.cpp:168
#12 mozilla::net::PRemoteOpenFileChild::OnMessageReceived (this=<optimized out>, __msg=<optimized out>)
    at PRemoteOpenFileChild.cpp:134
#13 0xb5499870 in mozilla::dom::PContentChild::OnMessageReceived (this=0xb455b418, __msg=...) at PContentChild.cpp:3966
#14 0xb545603e in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0xb455b448, aMsg=...)
    at ../../../gecko/ipc/glue/MessageChannel.cpp:1233
#15 0xb545b14a in mozilla::ipc::MessageChannel::OnMaybeDequeueOne (this=<optimized out>)
    at ../../../gecko/ipc/glue/MessageChannel.cpp:1098
#16 0xb52eb8a6 in DispatchToMethod<FdWatcher, void (FdWatcher::*)()> (method=
    (void (FdWatcher::*)(FdWatcher * const)) 0xb545b0d3 <mozilla::ipc::MessageChannel::OnMaybeDequeueOne()>, obj=<optimized out>, 
    arg=<optimized out>) at ../../../gecko/ipc/chromium/src/base/tuple.h:383
#17 RunnableMethod<FdWatcher, void (FdWatcher::*)(), Tuple0>::Run (this=<optimized out>)
This bug might cause Bug 1056893, Bug 1046084 or bug 1057220.
[Blocking Requested - why for this release]:
This bug did invalid fd close. It could cause Bug 1056893, Bug 1046084 or bug 1057220.
blocking-b2g: --- → 2.0?
Changed to correct component.
Component: Canvas: 2D → Networking
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> Changed to correct component.

It is based on comment 7.
Blocks: 1056893
Blocks: 1046084
(In reply to Sotaro Ikeda [:sotaro] from comment #9)
> [Blocking Requested - why for this release]:
> This bug did invalid fd close. It could cause Bug 1056893, Bug 1046084 or
> bug 1057220.

Have you verified that this bug exists on the 2.0 branch?
Assignee: nobody → sotaro.ikeda.g
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #12)
> (In reply to Sotaro Ikeda [:sotaro] from comment #9)
> > [Blocking Requested - why for this release]:
> > This bug did invalid fd close. It could cause Bug 1056893, Bug 1046084 or
> > bug 1057220.
> 
> Have you verified that this bug exists on the 2.0 branch?

I did not confirmed it on b2g-v2.0 yet. The STR in the fowling pull request is provided for master.

> pull request url: https://github.com/mozilla-b2g/gaia/pull/22151
The logout patch for close() fd.
At first the problem seems to be reproduced. But somehow I can not reproduce it now :-(
I added some additional logs to RemoteOpenFileChild's functions in and out. From the log, problematic fd seems 38 in the log. But it seeems to be deleted somewhere before RemoteOpenFileChild::Recv__delete__() is called.
Attachment #8478507 - Attachment is patch: false
We found that if we close ion driver fd by GDB, we can produce very similar lock fail condition, maybe we closed ion driver unexpectedly.
Assignee: sotaro.ikeda.g → nobody
No longer blocks: 1046084, 1056893
blocking-b2g: 2.0? → ---
Component: Networking → Canvas: 2D
[Blocking Requested - why for this release]:
Assignee: nobody → sotaro.ikeda.g
blocking-b2g: --- → 2.0?
Blocks: 1046084, 1056893
No longer blocks: 1020847
Component: Canvas: 2D → Networking
Attached patch fix v1 (obsolete) (deleted) — Splinter Review
I believe the issue here is that RemoteOpenFileChild's destructor is close()ing the fd unconditionally. It should only call close if the fd was never handed out to client code (via OpenNSPRFileDescriptor): if the client called that it's responsible for calling close.

If I'm right the issue here began with bug 988816, which changed the logic we used subtly.  We used to set mNSPRFileDesc to null when we handed out the fd, but in that bug we started keeping the fd so we could hand it out multiple times.

This patch will hopefully fix things.  

I'm delegating review to :schien since he knows this code fairly well at this point.
Assignee: sotaro.ikeda.g → jduell.mcbugs
Attachment #8478918 - Flags: review?(schien)
And in case it's confusing (it was to me, I had to look it up :)

dup() creates a new fd that shares the same OS-level file descriptor (i.e. seek/write/etc on different fd's wind up changing the file offset of all of them).  But close() is per-fd: the OS-level FD is not closed/deleted until the last of the dup()'d fd's is closed.  This means the fix is simpler than I thought--each RemoteOpenFileChild keeps it's own fd (we dup() when we clone()), so it is in charge of closing it if it's never handed out with OpenNSPRFileDesc().
Comment on attachment 8478918 [details] [diff] [review]
fix v1

Review of attachment 8478918 [details] [diff] [review]:
-----------------------------------------------------------------

I believe you're looking for @swu's review since he is the author of Bug 988816.
Attachment #8478918 - Flags: review?(schien) → review?(swu)
attachment 8478918 [details] [diff] [review] did not fix invalid fd close on my master flame.
Comment on attachment 8478918 [details] [diff] [review]
fix v1

I'm confused.  We dup the fd right before we hand it out, so why would we not want to call close on our internal fd here?
Attachment #8478918 - Flags: review-
(In reply to Jason Duell (:jduell) from comment #20)
> Created attachment 8478918 [details] [diff] [review]
> fix v1
> 
> I believe the issue here is that RemoteOpenFileChild's destructor is
> close()ing the fd unconditionally. It should only call close if the fd was
> never handed out to client code (via OpenNSPRFileDescriptor): if the client
> called that it's responsible for calling close.

A new fd returned by dup() in OpenNSPRFileDesc(), therefore even the ~RemoteOpenFileChild() unconditionally closes the original fd, it won't affect the new fd.  And if we don't close the original fd in ~RemoteOpenFileChild(), won't it cause fd leakage once it's been handed out?
Attachment #8478918 - Flags: review?(swu) → review-
According to comment 6, the bug is not reproducible by reverting bug 918595, we might see if any clues available from there.
Flags: needinfo?(bent.mozilla)
Sotaro can you test in a debug build?  I wonder if those assertions added in bug 918595 are failing?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #27)
> Sotaro can you test in a debug build?  I wonder if those assertions added in
> bug 918595 are failing?

I did not tested it in debug build. This bug seems very timing sensitive. When I added a lot of logs, the problem can not be regenerated.

I am going to check the assertion.
Flags: needinfo?(sotaro.ikeda.g)
You could change those assertions to MOZ_RELEASE_ASSERT then.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #29)
> You could change those assertions to MOZ_RELEASE_ASSERT then.

I dit the above change but it did not hit.
Comment on attachment 8478918 [details] [diff] [review]
fix v1

> A new fd returned by dup() in OpenNSPRFileDesc(), therefore even the
> ~RemoteOpenFileChild() unconditionally closes the original fd, it won't affect
> the new fd. 

Yeah I missed the dup() in OpenNSPRFileDesc.  Bad patch!
Attachment #8478918 - Attachment is obsolete: true
Attached patch Patch (obsolete) (deleted) — Splinter Review
Sotaro, does this fix the problem?
Attachment #8479252 - Flags: review?(bent.mozilla)
Attachment #8479252 - Flags: feedback?(sotaro.ikeda.g)
Comment on attachment 8479252 [details] [diff] [review]
Patch

Review of attachment 8479252 [details] [diff] [review]:
-----------------------------------------------------------------

The patch did not fix invalid fd close() in my case.
Attachment #8479252 - Flags: feedback?(sotaro.ikeda.g)
Attached file logout log when invalid fd close happned (obsolete) (deleted) —
I got logcat log with some additional logs. On my phone, invalid fd close happen with 38.
- Invalid fd close happened with same fd number(38) in the STR on my flame.
- There seems no duplicate fd(38) close before invalid fd close failure.
- dup() was already failed before close.
  http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/RemoteOpenFileChild.cpp#88
Attachment #8478507 - Attachment is obsolete: true
(In reply to Sotaro Ikeda [:sotaro] from comment #34)
> Created attachment 8479294 [details]
> logout log when invalid fd close happned
> 
> I got logcat log with some additional logs. On my phone, invalid fd close
> happen with 38.
> - Invalid fd close happened with same fd number(38) in the STR on my flame.
> - There seems no duplicate fd(38) close before invalid fd close failure.
> - dup() was already failed before close.
>  
> http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/
> RemoteOpenFileChild.cpp#88

on additional thing.
FileDescriptorSet::SetDescriptors() did not hanbdle fd=38 before the invalid fd close.
http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc#109

The SetDescriptors() is called from Channel::ChannelImpl::ProcessIncomingMessages()
Attached patch Patch (deleted) — Splinter Review
Ok ... how about this?
Attachment #8479252 - Attachment is obsolete: true
Attachment #8479252 - Flags: review?(bent.mozilla)
Attachment #8479252 - Flags: feedback?(wdeng)
Attachment #8479298 - Flags: feedback?(sotaro.ikeda.g)
Alright, it sounds like something else is going on then.  Is this memory corruption?
I would test this myself but if I check out the PR my phone doesn't boot ...
blocking-b2g: 2.0? → 2.0+
(In reply to Sotaro Ikeda [:sotaro] from comment #35)
> (In reply to Sotaro Ikeda [:sotaro] from comment #34)
> > Created attachment 8479294 [details]
> > logout log when invalid fd close happned
> > 
> > I got logcat log with some additional logs. On my phone, invalid fd close
> > happen with 38.
> > - Invalid fd close happened with same fd number(38) in the STR on my flame.
> > - There seems no duplicate fd(38) close before invalid fd close failure.
> > - dup() was already failed before close.
> >  
> > http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/
> > RemoteOpenFileChild.cpp#88
> 
> on additional thing.
> FileDescriptorSet::SetDescriptors() did not hanbdle fd=38 before the invalid
> fd close.
> http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/
> file_descriptor_set_posix.cc#109
> 
> The SetDescriptors() is called from
> Channel::ChannelImpl::ProcessIncomingMessages()

The above comment was incorrect. Before one close(fd=38) call, the SetDescriptors() added fd=38.
Add more logout logs around invalid fd close.
Attachment #8479294 - Attachment is obsolete: true
From Comment 40, I seems to understand what happening when invalid fd close happened on my phone.

-[1] ParamTraits<MagicGrallocBufferHandle>::Read() recived 2 fds(37,38)
-[2] Create android::GraphicBuffer() with the above fds(37,38).
-[3] TabChild::DestroyWindow() close fds(include fd=38)
       + This close GraphicBuffer's fd(38)
       + This is a problem!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-[4] RemoteOpenFileParent::OpenSendCloseDelete()
       + Send sd to child side
-[5] Child side recive fd(38)
-[6] The above GraphicBuffer is closed. Fds(37,38) are closed.
       + One closed fd(38) is a fd for RemoteOpenFileChild
-[7] RemoteOpenFileChild close fd(38)
       + fd=38 is already invalid fd.


From the above, TabChild::DestroyWindow() seems to close stale fd.
Comment on attachment 8479298 [details] [diff] [review]
Patch

Review of attachment 8479298 [details] [diff] [review]:
-----------------------------------------------------------------

The patch worked on my flame.
Attachment #8479298 - Flags: feedback?(sotaro.ikeda.g) → feedback+
Comment on attachment 8479298 [details] [diff] [review]
Patch

wdeng, can you confirm the patch fix the problem?
Attachment #8479298 - Flags: feedback?(wdeng)
Change component based on Comment 36 and Comment 41.
Component: Networking → DOM
Flags: needinfo?(bent.mozilla)
So you're saying that the first patch does not work but the second does?  That surprises me.
Flags: needinfo?(sotaro.ikeda.g)
Please test both of my patches.
Flags: needinfo?(wdeng)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #45)
> So you're saying that the first patch does not work but the second does? 
> That surprises me.

Yes. The following part seems to fix the problem. TabChild::DestroyWindow() was called twice in the STR. Therefore, in the second DestroyWindow() call, it closed stale fds.

> mCachedFileDescriptorInfos.Clear();
Flags: needinfo?(sotaro.ikeda.g)
TabChild::DestroyWindow() was called twice from the following.
- TabChild::RecvDestroy()
- TabChild::~TabChild()
I'm pretty sure DestroyWindow is always called twice though, so what's special about this testcase?
When I used some applications on flame device, I did not even see one TabChild::DestroyWindow() call. Therefore TabChild::DestroyWindow() is not called on normal b2g use case.

The child process is normally killed in the middle of the application shutdown. It seems to affect to this.
(In reply to Sotaro Ikeda [:sotaro] from comment #50)
> The child process is normally killed in the middle of the application
> shutdown. It seems to affect to this.

Oh, that's ... lovely.
So, on b2g, it seems not normal that TabChild::DestroyWindow() is called twice during TabChild's destroy.
Blocks: 918595
Keywords: regression
From the code, this bug seems regression of Bug 918595. b2g-v2.0 also have this code.
Blocks: 1058780
Attached patch Patch for b2g v2.0 (deleted) — Splinter Review
A patch for b2g v2.0 that is created from attachment 8479298 [details] [diff] [review].
khuey, is attachment 8479298 [details] [diff] [review] is a patch that can be reviewed? I am going to ask codeaurora do retesting Bug 1057220 with the patch.
Flags: needinfo?(khuey)
No longer blocks: 1058780
Blocks: 1055226
Attachment #8479298 - Flags: review?(bent.mozilla)
Flags: needinfo?(khuey)
Assignee: jduell.mcbugs → khuey
Comment on attachment 8479298 [details] [diff] [review]
Patch

Review of attachment 8479298 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/TabChild.cpp
@@ +1533,5 @@
>              runnable->Dispatch();
>          }
>      }
> +
> +    mCachedFileDescriptorInfos.Clear();

Rather than simply clear I think we should probably call the callbacks with an invalid fd first.
Blocks: 1058780
Assignee: khuey → bent.mozilla
We only do OpenFileAndSendFDRunnable for the application.zip, so we are definitely not guaranteed to call the callback today.

http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#543
Comment on attachment 8479298 [details] [diff] [review]
Patch

Review of attachment 8479298 [details] [diff] [review]:
-----------------------------------------------------------------

Ok
Attachment #8479298 - Flags: review?(bent.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/2aa1c0816eb6
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Component: DOM → DOM: Content Processes
Comment on attachment 8479298 [details] [diff] [review]
Patch

It seems good to me.
Attachment #8479298 - Flags: feedback?(wdeng) → feedback+
Flags: needinfo?(wdeng)
Blocks: 1060091
Blocks: 1057220
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: