Closed
Bug 1047149
Opened 10 years ago
Closed 10 years ago
Invalid acquire fence for FB layer.
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
People
(Reporter: sushilchauhan, Assigned: sushilchauhan)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 310][caf priority: p1][CR 699312])
Attachments
(2 files, 2 obsolete files)
(deleted),
patch
|
sushilchauhan
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sushilchauhan
:
review+
|
Details | Diff | Splinter Review |
In an automation run, it has been observed that HwcComposer2D received invalid acquire fence for FB layer. So when the fd is passed to HAL and display driver tries to get file or sync_fence data from that fd, it fails and reports invalid fd.
I will upload a patch to check for acquire fence validity for FB layer in FramebufferSurface, before passing it to HWC HAL.
Sotaro, do you know, in which condition, the FB layer's acquire fence can be invalid?
Flags: needinfo?(sotaro.ikeda.g)
Add check to validate acquire fence of FrameBuffer layer, before passing acquire fence fd to HAL.
Attachment #8466557 -
Flags: review?(sotaro.ikeda.g)
Comment 3•10 years ago
|
||
Comment on attachment 8466557 [details] [diff] [review]
Add check to validate acquire fence of FrameBuffer layer.
Review of attachment 8466557 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good!
Attachment #8466557 -
Flags: review?(sotaro.ikeda.g) → review+
Comment 4•10 years ago
|
||
(In reply to Sushil from comment #1)
> I will upload a patch to check for acquire fence validity for FB layer in
> FramebufferSurface, before passing it to HWC HAL.
>
> Sotaro, do you know, in which condition, the FB layer's acquire fence can be
> invalid?
If Surface::queueBuffer() or Surface::cancelBuffer() is called by EGLSurface without fence file descriptor, FramebufferSurface's fence becomes invalid.
Flags: needinfo?(sotaro.ikeda.g)
Comment 5•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #4)
>
> If Surface::queueBuffer() or Surface::cancelBuffer() is called by EGLSurface
> without fence file descriptor, FramebufferSurface's fence becomes invalid.
Sorry, in Surface::cancelBuffer()'s case, buffer is not delivered to FramebufferSurface.
Blocks: CAF-v2.0-CC-metabug
Sotaro,
"isValid" check is not sufficient because it only checks for -1 (Reference: include/ui/Fence.h):
- bool isValid() const { return mFenceFd != -1; }
The problem is that framework is sending bogus acquire fence fd (fd > 0) for FB layer to display HAL, that's why display driver returns this error:
- null fence! i=0 fd=150
So driver checks if file* is NULL or file pointer does not support sync fence operations i.e. it is not a sync fence fd. So it is hitting one of these 2 checks below.
struct file *file = fget(fd);
- if (file == NULL)
return NULL;
- if (file->f_op != &sync_fence_fops)
return NULL;
The patch in this bug is not sufficient. Please check, in which scenario, we can get a fd, which is not a sync fence fd (bogus fd). It is happening on acquire fence of FB layer (in automation runs).
Assignee: sushilchauhan → nobody
Flags: needinfo?
Comment 7•10 years ago
|
||
(In reply to Sushil from comment #6)
>
> The patch in this bug is not sufficient. Please check, in which scenario, we
> can get a fd, which is not a sync fence fd (bogus fd). It is happening on
> acquire fence of FB layer (in automation runs).
all fences except -1 are provided by EGL. Therefore, I assume that the bogus fds are provided by EGL. But EGL implementation is not open sourced. Therefore mozilla side can not analyze the scenarios...
Flags: needinfo?(sotaro.ikeda.g)
By EGL, do you mean GPU (KGSL) is providing bogus fd for FB layer in this case ?
Comment 9•10 years ago
|
||
android framework expects fence fds are valid except -1 and they are delivered exactly same way as android until FramebufferSurface::onFrameAvailable().
Comment 10•10 years ago
|
||
(In reply to Sushil from comment #8)
> By EGL, do you mean GPU (KGSL) is providing bogus fd for FB layer in this
> case ?
From the symptom, it seems most possible. EGL write fence fd to FramebufferSurface via ANativeWindow.
Flags: needinfo?(sotaro.ikeda.g)
Comment 11•10 years ago
|
||
Within android framework and gecko, fence fd is handled as android::Fence object. Therefore, there seems no mistake of passing fence fd from ANativeWindow to FramebufferSurface. If there is no possibility of that fence fd changes from valid to bogus, original fd that provided by EGL seems bogus.
Assignee | ||
Comment 13•10 years ago
|
||
Sotaro,
There can be 2 scenarios:
1. The original fd provided by EGL seems to be bogus (as you had mentioned).
2. Someone close fd and do not set the "variable" to -1. Meanwhile, this fd is occupied by some non sync fence operation, now same variable is passed to HwcComposer2D without initializing it to -1. Can you also double, there is no such code in framework (before coming to HwcComposer2D) ?
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 14•10 years ago
|
||
Sotaro, I have updated the patch to take care of Scenario # 2 in HwcComposer2D, which I mentioned in previous comment.
Attachment #8466557 -
Attachment is obsolete: true
Attachment #8468953 -
Flags: review?(sotaro.ikeda.g)
Comment 15•10 years ago
|
||
(In reply to Sushil from comment #6)
>
> The problem is that framework is sending bogus acquire fence fd (fd > 0) for
> FB layer to display HAL, that's why display driver returns this error:
> - null fence! i=0 fd=150
Sushil, I have some questions.
- Which code return this error?
- Is it clear that the acquire fence is provided by FramebufferSurface?
- Does the problem happen when hw composer composition is disabled?
Because, WebGL and hw accelerated canvas 2d could also provide acquire fence. And they are consumed by HwComposer2d during HwComposer composition.
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(sushilchauhan)
Updated•10 years ago
|
Assignee: nobody → sushilchauhan
blocking-b2g: 2.0? → 2.0+
Assignee | ||
Comment 16•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #15)
>
> Sushil, I have some questions.
>
> - Which code return this error?
Display driver code returns this error in BUFFER_SYNC_IOCTL which gets called from display HAL in hwc set. Do you have access to driver code ? Driver checks 2 things: "NULL and non sync fence operation" on file* from this fd, which I have mentioned in Comment 6. Bogus fd = 150 is failing in 1 of them.
> - Is it clear that the acquire fence is provided by FramebufferSurface?
Yes, I have checked logs of all instances of this error, bogus fd is the acquire fence fd of FB layer.
> - Does the problem happen when hw composer composition is disabled?
Yes, this problem will happen even when HWC composition is disabled because the problem is in acquire fence of FB layer.
Can you review my attached patch? Can you also double check all such code in Gecko framework before it comes to HwcComposer2D (all acquire fences coming to HwcComposer2D in GPU and HWC paths).
Flags: needinfo?(sushilchauhan)
Updated•10 years ago
|
Attachment #8468953 -
Flags: review?(sotaro.ikeda.g) → review+
Assignee | ||
Comment 17•10 years ago
|
||
Sotaro,
I further looked into ADB logs for the bogus acquire fence fd issue. I found sync_merge() failures. I have listed some of them below. Does it give you some hint? What FB acquire fence does framework send to HwcComposer2D when sync_merge() fails in case of FramebufferSurface, like below instances. The failure log is in Fence::merge() in frameworks/native/libs/ui/Fence.cpp. Please check and let me know.
08-05 20:19:32.870 239 882 E Fence : merge: sync_merge("FramebufferSurface:1", 100, 167) returned an error: Bad file number (-9)
....
08-05 20:19:40.110 239 882 E Fence : merge: sync_merge("TextureHostOGL", 209, 110) returned an error: No such file or directory (-2)
08-05 20:19:40.110 239 882 E Fence : merge: sync_merge("TextureHostOGL", 206, 110) returned an error: No such file or directory (-2)
08-05 20:19:40.110 239 882 E Fence : merge: sync_merge("TextureHostOGL", 188, 110) returned an error: No such file or directory (-2)
08-05 20:19:40.110 239 882 E Fence : merge: sync_merge("TextureHostOGL", 184, 110) returned an error: No such file or directory (-2)
08-05 20:19:40.110 239 882 E Fence : merge: sync_merge("TextureHostOGL", 175, 110) returned an error: No such file or directory (-2)
....
08-05 22:31:38.580 239 882 E Fence : merge: sync_merge("FramebufferSurface:0", 198, 247) returned an error: Bad file number (-9)
....
08-05 22:33:34.080 239 882 E Fence : merge: sync_merge("TextureHostOGL", 184, 93) returned an error: Not a typewriter (-25)
....
08-05 22:33:34.140 239 882 E Fence : merge: sync_merge("TextureHostOGL", 184, 138) returned an error: Bad file number (-9)
....
08-05 22:42:08.180 239 882 E Fence : merge: sync_merge("TextureHostOGL", 138, 100) returned an error: Bad file number (-9)
Meanwhile, I am planning to add logs in driver code to check which specific case we are hitting, which I mentioned in Comment 6. Do you have access to driver code?
Flags: needinfo?(sotaro.ikeda.g)
Comment 18•10 years ago
|
||
(In reply to Sushil from comment #16)
>
> Can you also double check all such code in
> Gecko framework before it comes to HwcComposer2D (all acquire fences coming
> to HwcComposer2D in GPU and HWC paths).
"fence fd of FB layer" does not pass gecko framework. As in Comment 9, From EGL to FramebufferSurface::onFrameAvailable() is exactly same as android. From FramebufferSurface::onFrameAvailable(), they are different. But they just store fd for next composition.
In android case, raw fd is stored in the function like the following.
FramebufferSurface::onFrameAvailable()
->HWComposer::fbPost()
->HWComposer::setFramebufferTarget()
->nextBuffer(buf, acquireFence);
->acquireFenceFd = acquireFence->dup();
->disp.framebufferTarget->acquireFenceFd = acquireFenceFd;
In gecko case, Fence object is stored in the function like the following.
FramebufferSurface::onFrameAvailable()
->nextBuffer(buf, acquireFence);
->mPrevFBAcquireFence = new Fence(acquireFence->dup());
https://github.com/sotaroikeda/firefox-diagrams/blob/master/widget/widget_HwcComposer2D__FirefoxOS_1_4.pdf?raw=true
Assignee | ||
Comment 19•10 years ago
|
||
Can you check the errors in Comment 17 ?
Comment 20•10 years ago
|
||
(In reply to Sushil from comment #17)
> Sotaro,
>
> I further looked into ADB logs for the bogus acquire fence fd issue. I found
> sync_merge() failures. I have listed some of them below. Does it give you
> some hint? What FB acquire fence does framework send to HwcComposer2D when
> sync_merge() fails in case of FramebufferSurface, like below instances. The
> failure log is in Fence::merge() in frameworks/native/libs/ui/Fence.cpp.
> Please check and let me know.
It says that FB layer's acquire fence is bogus. gecko re-uses FB layer's acquire fence as gfx layers' release fences.
By the way, Bug merge is already remove in master gecko by bug 1028532. because merge is not necessary in gecko's use case and it consumes time.
Comment 21•10 years ago
|
||
(In reply to Sushil from comment #19)
> Can you check the errors in Comment 17 ?
Fence::merge() returns Fence::NO_FENCE, if merge fails. Therefore error can be checked by checking if merged fence is valid.
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 22•10 years ago
|
||
Should we pull patch from Bug 1028532, to see if it helps on v2.0?
Comment 23•10 years ago
|
||
(In reply to Sushil from comment #17)
>
> Meanwhile, I am planning to add logs in driver code to check which specific
> case we are hitting, which I mentioned in Comment 6. Do you have access to
> driver code?
Recently, Flame builds also build kernel. So, I think I have an access to the driver code. But current mozilla's flame is JB. It seems better to use KK. Since today, flame KK build setting seems to be added to build script.
Comment 24•10 years ago
|
||
(In reply to Sushil from comment #22)
> Should we pull patch from Bug 1028532, to see if it helps on v2.0?
Bug 1028532 is only for performance. I do not think it helps to fix this bug.
Comment 25•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #24)
> (In reply to Sushil from comment #22)
> > Should we pull patch from Bug 1028532, to see if it helps on v2.0?
>
> I do not think it helps to fix this bug.
The above is just my assumption, because I do not know about fence implementation in kernel side and EGL implementation.
If we could uplift just for performance. It seems OK to uplift. A patch iof Bug 1028532 is very simple and seem low risk of regression.
Assignee | ||
Comment 26•10 years ago
|
||
Sotaro, can you ask to uplift the patch in Bug 1028532 to v2.0 because we are not sure of consequence of sync_merge() failure later in display pipeline. I assume you already done sanity test with that patch.
Flags: needinfo?(sotaro.ikeda.g)
Comment 27•10 years ago
|
||
Nominated Bug 1028532 to b2g-v2.0.
Comment 28•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #23)
> (In reply to Sushil from comment #17)
>
> Recently, Flame builds also build kernel. So, I think I have an access to
> the driver code. But current mozilla's flame is JB. It seems better to use
> KK. Since today, flame KK build setting seems to be added to build script.
mozilla's kk flame build is very unstable now. :-(
Assignee | ||
Comment 29•10 years ago
|
||
I am uploading the HG friendly version of reviewed patch.
Patch details:
1. Add check to validate the acquire fence of FrameBuffer layer
2. Reset release fence fd of FB layer after passing it to FB Surface.
3. Reset acquire fence fd of HWC layers before getting it from Gecko.
Attachment #8468953 -
Attachment is obsolete: true
Attachment #8469620 -
Flags: review+
Assignee | ||
Comment 30•10 years ago
|
||
Sotaro, can you please push the patch in Comment 29 to Try Server? Sorry, I have forgotten the steps :(
Flags: needinfo?(sotaro.ikeda.g)
Comment 31•10 years ago
|
||
Flags: needinfo?(sotaro.ikeda.g)
Keywords: checkin-needed
Assignee | ||
Comment 32•10 years ago
|
||
Sotaro,
I need to create a separate HG patch for v2.0 branch because the code is different at: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/libdisplay/FramebufferSurface.cpp#171 in master and v2.0.
To create an HG patch on b2g-inbound (master), I use the command:
hg clone http://hg.mozilla.org/integration/b2g-inbound
So I need to create an HG patch for v2.0 tip as well. What is the corresponding clone path for v2.0 tip?
Flags: needinfo?(sotaro.ikeda.g)
Comment 33•10 years ago
|
||
you can create a commit patch also from git repository. By the following command.
> git diff -p -U8
Flags: needinfo?(sotaro.ikeda.g)
Comment 34•10 years ago
|
||
current b2g-v2.0 gecko's repository seems like the following.
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/
Assignee | ||
Comment 35•10 years ago
|
||
Uploading HG friendly version of reviewed patch for b2g v2.0 branch.
Attachment #8470177 -
Flags: review+
Comment 36•10 years ago
|
||
Keywords: checkin-needed
Comment 37•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Comment 38•10 years ago
|
||
status-b2g-v2.0:
--- → fixed
status-b2g-v2.1:
--- → fixed
status-firefox32:
--- → wontfix
status-firefox33:
--- → wontfix
status-firefox34:
--- → fixed
Updated•10 years ago
|
Whiteboard: [CR 699312] → [caf priority: p1][CR 699312]
Updated•10 years ago
|
Whiteboard: [caf priority: p1][CR 699312] → [b2g-crash][caf-crash 310][caf priority: p1][CR 699312]
Comment 39•10 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.054
Moz BuildID: 20140804000204
B2G Version: 2.0
Gecko Version: 32.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=9e5907995c9327f14cb5d182cee5ff16b1743ed4
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=ed87f0a54baf646eb0b20b4debc090c8016d2104
Assignee | ||
Comment 40•10 years ago
|
||
The patch landed in AU 57.
Comment 41•10 years ago
|
||
This bug could be caused by Bug 1053204.
Updated•10 years ago
|
Flags: in-moztrap?(bzumwalt)
Comment 42•10 years ago
|
||
No STR is present to create test case to address bug.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(bzumwalt)
Flags: in-moztrap-
Updated•10 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•