Closed Bug 1031527 Opened 10 years ago Closed 10 years ago

b2g crashes during boot during receiving call

Categories

(Core :: Graphics: Layers, defect, P1)

32 Branch
ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla33
blocking-b2g 2.0+
Tracking Status
firefox31 --- wontfix
firefox32 --- fixed
firefox33 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: tkundu, Assigned: sotaro)

References

Details

(Keywords: crash, topcrash-b2g, Whiteboard: [caf priority: p1][CR 686674][b2g-crash])

Crash Data

Attachments

(1 file, 1 obsolete file)

Test steps:
1. Reboot the device for every 60 seconds and Use another phone to call test device continuously every 20 seconds
2. After night run, mini dumps are generated on the phone.


Logs(stack trace, logcat etc) : https://drive.google.com/file/d/0B1cSMS8_GuAEZ29LZE1Zd2hpRjQ/edit?usp=sharing

Exact string in logcat when this crash happens is : 

E Gecko   : mozalloc_abort: [Parent 4310] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198


This crash is happening multiple times on FFOS 2.0, msm8610 256MB device.

gaia: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/tag/?h=0a25eff60fcc699687e45ba2ac8b9a3ab3782672&id=AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.016
gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/tag/?h=0b83d5e6a54df9bfccdbb5ba662527bf657f5381&id=AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.016


I knew that STR will be very difficult to reproduce it again. But if you give us some direction in debugging then we can reproduce it with more logs and share with you for further debugging.
blocking-b2g: --- → 2.0?
Flags: needinfo?(msreckovic)
Whiteboard: [CR 686674]
Component: General → Graphics: Layers
Keywords: crash
Product: Firefox OS → Core
Whiteboard: [CR 686674] → [CR 686674][b2g-crash]
Version: unspecified → 32 Branch
Whiteboard: [CR 686674][b2g-crash] → [caf priority: p1][CR 686674][b2g-crash]
Crash observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.017
Moz BuildID: 20140624000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=9d2f7bd16a8dc0c74c97c5a40d2f0731f3dfff4b
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=32c226e5a7adbb95e9c4ee003dc9e64699da03e1
(We've seen this crash 91 times on v2.0 in the last week)
blocking-b2g: 2.0? → 2.0+
Keywords: topcrash-b2g
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.010
Moz BuildID: 20140614000202
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=5b1fdc6000d35962769e789b924b24e166a27759
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=f6c59193a0102d1b2ffb136b5d1d42fd30f93446
Flags: needinfo?(msreckovic) → needinfo?(milan)
Assignee: nobody → sotaro.ikeda.g
I found a strange code in ParamTraits<MagicGrallocBufferHandle>::Read(). The following code do file descriptor's dup only when IPC is between inter threads and index(key of buffer) is negative value.

>     int dupFd = sameProcess && index < 0 ? dup(fd.fd) : fd.fd;
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp#139

index(key) is defined as int and incremented for each gralloc buffer allocation. Therefore, index(key) could become negative when the value is wrapped.

> ref.mKey = ++sBufferKey;
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/SharedBufferManagerParent.cpp#192
(In reply to Sotaro Ikeda [:sotaro] from comment #4)
> I found a strange code in ParamTraits<MagicGrallocBufferHandle>::Read(). The
> following code do file descriptor's dup only when IPC is between inter
> threads and index(key of buffer) is negative value.
> 
> >     int dupFd = sameProcess && index < 0 ? dup(fd.fd) : fd.fd;
> http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/
> ShadowLayerUtilsGralloc.cpp#139
> 

We do not need to dup fd in this case since SharedBufferManager is implemented.
Attachment #8448164 - Flags: review?(jmuizelaar)
As a gralloc buffer's key, "int" is used. 32 bit is too small for it. I am going to create a bug for it.
Depends on: 1032364
(In reply to Sotaro Ikeda [:sotaro] from comment #7)
> As a gralloc buffer's key, "int" is used. 32 bit is too small for it. I am
> going to create a bug for it.

Created bug 1032364.
Comment on attachment 8448164 [details] [diff] [review]
patch - remove dup fd from ParamTraits<MagicGrallocBufferHandle>::Read()

Review of attachment 8448164 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable.

::: gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp
@@ +139,5 @@
>      // the right thing and dup's the fd.  If it's shared cross-thread,
> +    // SCM_RIGHTS doesn't dup the fd.
> +    // But in shared cross-thread, dup fd is not necessary.
> +    // A pointer of the GraphicBuffer is directly got from SharedBufferManagerParent.
> +    fds[n] = fd.fd;

I'd change this sentence to be something like:

// But in shared cross-thread, dup fd is not necessary because we get a pointer to the GraphicBuffer directly from SharedBufferManagerParent and don't create a new GraphicBuffer around the fd.

As an aside, didn't we have problems with gralloc/genlock when using the same file descriptor and accessed from the same process but on different threads?
Attachment #8448164 - Flags: review?(jmuizelaar) → review+
(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)
> 
> As an aside, didn't we have problems with gralloc/genlock when using the
> same file descriptor and accessed from the same process but on different
> threads?

Android basically expects one instance per process. When it is not like that, some implementations except qcom do not work correctly. There is a such explanation in the following link.

http://androidxref.com/4.4.3_r1.1/xref/hardware/libhardware/modules/gralloc/mapper.cpp#94

How to control gralloc buffer is almost same as android at Compositable level. Therefore, there seems no problem. Actually, the problem is not observed until now.
Apply the comment. Carry "r=jmuizelaar"
Attachment #8448164 - Attachment is obsolete: true
Attachment #8448371 - Flags: review+
Can you explain why it is too small?
https://hg.mozilla.org/mozilla-central/rev/05f283d784e0
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
(In reply to Andreas Gal :gal from comment #13)
> Can you explain why it is too small?

gralloc buffer key is incremented for each gralloc buffer allocation for each process. b2g process could exist very long time, therefore there is a risk the id could exceed the int max. I did not say that the problem happened by it.
Similar crashes exist in Bug 1021451, Bug 1032642. All crashes happens when PSharedBufferManagerChild exist in b2g process and failed to deserialize MaybeMagicGrallocBufferHandle. The deserialization happens at ParamTraits<MagicGrallocBufferHandle>::Read().

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp#110

attachment 8448371 [details] [diff] [review] fixed one prolem in ParamTraits<MagicGrallocBufferHandle>::Read() and add logs.
It seems better to keep-open until we confirmed the problem fix.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [caf priority: p1][CR 686674][b2g-crash] → [caf priority: p1][CR 686674][b2g-crash][keep-open]
Tapas, can you test again with attachment 8448371 [details] [diff] [review] and with logcat log? It might not fix the problem, but add some logs to ParamTraits<MagicGrallocBufferHandle>::Read(). It might help to debug.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: needinfo?(tkundu)
Resolution: --- → FIXED
Whiteboard: [caf priority: p1][CR 686674][b2g-crash][keep-open] → [caf priority: p1][CR 686674][b2g-crash]
Marking as fixed in 2.1 via comment 14.  Has this landed for 2.0?
4 billion allocations is ... unlikely. Not sure I buy this story.
(In reply to Sotaro Ikeda [:sotaro] from comment #18)
> Tapas, can you test again with attachment 8448371 [details] [diff] [review]
> and with logcat log? It might not fix the problem, but add some logs to
> ParamTraits<MagicGrallocBufferHandle>::Read(). It might help to debug.

Thanks for your work. We are asking out internal team to confirm this. I will re-open if it does not fix this issue.
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.020
Moz BuildID: 20140628000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=91958625774ebe6c425adf322b09f1edf906578d
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=eab4f8ff82758f08f049e14d305e29fcc50cb292
Could you please upload a patch for 2.0 and land it in 2.0 ?
Flags: needinfo?(sotaro.ikeda.g)
Can we confirm the problem is fixed in 2.1 before uplifting?
Flags: needinfo?(milan)
Sorry, I already pushed by the time you commented :(
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.022
Moz BuildID: 20140630000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=c0c8ad187c0466285f2580531e09f8322996f561
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=ef0781044f85ee7f364236c074ab0b7644c38ff8
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #26)
> Sorry, I already pushed by the time you commented :(

No worries.
I also found another problems
- all SharedBufferManagerParent's static members access are not protected by Monitor.
    + SharedBufferManagerParent creates one thread for each content process. 
- Two SharedBufferManagerParent instances are created for each content process.
    + One instance is created by SharedBufferManagerParent::CloneToplevel(),
       another is created by ContentParent::AllocPSharedBufferManagerParent().
    + In ImageBridge case, only one instance seem to be created by ImageBridgeParent::CloneToplevel().
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> I also found another problems
> - all SharedBufferManagerParent's static members access are not protected by
> Monitor.
>     + SharedBufferManagerParent creates one thread for each content process. 
> - Two SharedBufferManagerParent instances are created for each content
> process.
>     + One instance is created by SharedBufferManagerParent::CloneToplevel(),
>        another is created by
> ContentParent::AllocPSharedBufferManagerParent().
>     + In ImageBridge case, only one instance seem to be created by
> ImageBridgeParent::CloneToplevel().

I am going to create a new bug for the problems.
Depends on: 1034294
Bug 1034294 could be an actual cause of this bug. SharedBufferManagerParent::RecvAllocateGrallocBuffer() does not handle a race condition correctly. It could cause that a gralloc buffer is stored to incorrect key. See Bug 1034294 Comment 7.
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.026
Moz BuildID: 20140707000200
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=ef67af27dff3130d41a9139d6ae7ed640c34d922
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=d3eae03cdf4e6944e646d05938a22dc1380a0d95
(In reply to Sotaro Ikeda [:sotaro] from comment #31)
> Bug 1034294 could be an actual cause of this bug.
> SharedBufferManagerParent::RecvAllocateGrallocBuffer() does not handle a
> race condition correctly. It could cause that a gralloc buffer is stored to
> incorrect key. See Bug 1034294 Comment 7.

This seems to be fixed this issue for us. Tbanks a lot for your quick help.
Flags: needinfo?(tkundu)
(In reply to Tapas Kumar Kundu from comment #33)
> (In reply to Sotaro Ikeda [:sotaro] from comment #31)
> > Bug 1034294 could be an actual cause of this bug.
> > SharedBufferManagerParent::RecvAllocateGrallocBuffer() does not handle a
> > race condition correctly. It could cause that a gralloc buffer is stored to
> > incorrect key. See Bug 1034294 Comment 7.
> 
> This seems to be fixed this issue for us. Tbanks a lot for your quick help.

Good news! Thanks for the confirmation :-)
No STR is present to create test case to address bug.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(srapanan)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(srapanan)
Flags: in-moztrap-
Unable to verify due to automation necessary in STR
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?][QAnalyst-verify-]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-] → [QAnalyst-Triage+][QAnalyst-verify-]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: