Closed Bug 1024025 Opened 10 years ago Closed 10 years ago

WebGL game's FPS is low when HwComposer is used

Categories

(Core :: Graphics: Layers, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED WONTFIX
tracking-b2g backlog

People

(Reporter: sotaro, Assigned: sotaro)

References

Details

(Keywords: perf, Whiteboard: [c=progress p= s=2014.06.20.t u=])

Attachments

(3 files, 1 obsolete file)

When running WebGL game, HwComposer's composition is slower than OpenGL composition.
I found the following problems.
-[1] EGLImage binding is done even when OpenGL composition is not used.
-[2] EGLImage binding is done when gralloc buffer is delivered to Compositor side.
   We need to defer EGLImage binding. Because Fence wait need to be done before EGLImage binding.
-[3] Flame's EGLSurface uses Surface::hook_dequeueBuffer_DEPRECATED().
     It is sync function and block Composer thread.
     On latest hw platform does not use this function.
Assignee: nobody → sotaro.ikeda.g
Can you dump and share the HWC layer list sent to HWC hal. Add this line at [1]:

LOGE("HWC layer[%d]::%s: dst=[%d %d %d %d] src=[%f %f %f %f] Tr=%x Blend=0x%x Flags=0x%x", mList->numHwLayers, aLayer->Name(),
hwcLayer.displayFrame.left, hwcLayer.displayFrame.top, hwcLayer.displayFrame.right, hwcLayer.displayFrame.bottom,
hwcLayer.sourceCropf.left, hwcLayer.sourceCropf.top, hwcLayer.sourceCropf.right, hwcLayer.sourceCropf.bottom,
hwcLayer.transform, hwcLayer.blending, hwcLayer.flags);

[1]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#505
Grep for "HWC" in ADB logs.
Flags: needinfo?(sotaro.ikeda.g)
Also change LOGD to LOGE in HwcComposer2D::TryRender() to confirm GPU/HWC Composition. If possible, share Game app and steps to install it.
I already confirmed that the majority of wait happened by Acquire fence wait and Surface::hook_dequeueBuffer_DEPRECATED().
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Sushil from comment #2)
> Can you dump and share the HWC layer list sent to HWC hal. Add this line at
> [1]:
> 

I am going to provide it.
Sushil, do you know why the following code is necessary? If HwComposer do all layer's composition by blit, it seem not necessary except fence sync purpose.

----------------------
HwcComposer2D::TryHwComposition()
>>
        } else if (blitComposite) {
            // BLIT Composition, flip FB target
            GetGonkDisplay()->UpdateFBSurface(mDpy, mSur);
            FramebufferSurface* fbsurface = (FramebufferSurface*)(GetGonkDisplay()->GetFBSurface());
            if (!fbsurface) {
                LOGE("H/W Composition failed. NULL FBSurface.");
                return false;
            }
            mList->hwLayers[idx].handle = fbsurface->lastHandle;
            mList->hwLayers[idx].acquireFenceFd = fbsurface->GetPrevFBAcquireFd();
        }
Flags: needinfo?(sushilchauhan)
This is called HWC_BLIT Composition. The content of the frame, which has been composed by Copybit, gets drawn on FrameBuffer layer by driver. It avoids use of unnecessary intermediate full screen render buffers, which improves memory usage on device. Since the composed content is drawn on FB layer, that's why we need to call eglSwapBuffers() in HwcComposer2D to flip FB target before calling HWC set, even when the frame gets composed by HWC. See "HWC_BLIT" in HwcComposer2D.cpp. See the logic at:
[1]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#546
[2]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#560
[3]: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#586

Since, you mentioned, majority of wait happens while waiting for the acquire fence, check the acquire fence fd code at: http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#594, to make sure if this is a valid fence. I guess you added this code.
Flags: needinfo?(sushilchauhan)
Attached patch patch - Deliver acquire fence to HwComposer (obsolete) (deleted) β€” β€” Splinter Review
(In reply to Sushil from comment #7)
> 
> Since, you mentioned, majority of wait happens while waiting for the acquire
> fence, check the acquire fence fd code at:
> http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.
> cpp#594, to make sure if this is a valid fence. I guess you added this code.

I recognized that Current gecko does not deliver acquire fence to HwComposer. The aquireFence wait happen incorrectly at the following.
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/GrallocTextureHost.cpp#448

attachment 8438673 [details] [diff] [review] fix this problem. But fps is still low. After applying patch, wait is moved to Release fence at Surface::hook_dequeueBuffer_DEPRECATED().
(In reply to Sushil from comment #2)
> Can you dump and share the HWC layer list sent to HWC hal. Add this line at
> [1]:
> 
> LOGE("HWC layer[%d]::%s: dst=[%d %d %d %d] src=[%f %f %f %f] Tr=%x
> Blend=0x%x Flags=0x%x", mList->numHwLayers, aLayer->Name(),
> hwcLayer.displayFrame.left, hwcLayer.displayFrame.top,
> hwcLayer.displayFrame.right, hwcLayer.displayFrame.bottom,
> hwcLayer.sourceCropf.left, hwcLayer.sourceCropf.top,
> hwcLayer.sourceCropf.right, hwcLayer.sourceCropf.bottom,
> hwcLayer.transform, hwcLayer.blending, hwcLayer.flags);

Flame uses JB, sourceCropf seems not defined yet. I am going to remove sourceCropf locally.
Attached file logcat during the app run (deleted) β€”
Logcat during the app running. Sorry, we can not share the app.
http://people.mozilla.org/~bgirard/cleopatra/#report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41

profile of the Compositor during the app running. Almost time is waiting Fence::waitForever() in Surface::hook_dequeueBuffer_DEPRECATED()
There might be following way to the problem in Comment 12.
- [1] Increase Framebuffer number to 3.
- [2] Apply vsync implementation
- [3] Replace gonk to new one that does not use Surface::hook_dequeueBuffer_DEPRECATED().
Easier solution is [1]. But it require more gralloc buffer. It seems not fit to b2g. When applying [1], FPS becomes good.
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> Easier solution is [1]. But it require more gralloc buffer. It seems not fit
> to b2g. When applying [1], FPS becomes good.

Yes, I was about to suggest this one. We also enable triple frame buffer via BoardConfig file of the device. See Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=988160. If you have confirmed that [1] improved FPS, then please make this change in the board config file of the device. We should not change it in FrameBufferSurface.cpp since it might not be needed on higher end devices.

BTW, [2] will not improve the FPS, it will make it worse. I am not aware about [3], you can give it a try.
1 suggestion to reduce load on HWC. In below layer configuration, layer[1] is totally hidden under layer[2] so it should not be present in layer tree (similar to Bug 1022164). It will lead to unnecessary solid fill operation in driver. Just for a quick check, can you hack HwcComposer2D code and do not include layer[1] and see if FPS improves ?

E HWComposer: HWC layer[0]::ColorLayerComposite: dst=[0 0 480 854] Tr=ff000000 Blend=0x100 Flags=0x8
E HWComposer: HWC layer[1]::ColorLayerComposite: dst=[1 2 479 852] Tr=ff000000 Blend=0x100 Flags=0x8
E HWComposer: HWC layer[2]::CanvasLayerComposite: dst=[1 2 479 852] Tr=2 Blend=0x100 Flags=0x0
qdhwcomposer: canUseCopybitForRGB:renderArea 406300, fbArea 409920
HWComposer: Frame rendered
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Sushil from comment #15)
> (In reply to Sotaro Ikeda [:sotaro] from comment #14)
> > Easier solution is [1]. But it require more gralloc buffer. It seems not fit
> > to b2g. When applying [1], FPS becomes good.
> 
> Yes, I was about to suggest this one. We also enable triple frame buffer via
> BoardConfig file of the device. See Bug:
> https://bugzilla.mozilla.org/show_bug.cgi?id=988160. If you have confirmed
> that [1] improved FPS, then please make this change in the board config file
> of the device. We should not change it in FrameBufferSurface.cpp since it
> might not be needed on higher end devices.
> 
> BTW, [2] will not improve the FPS, it will make it worse. I am not aware
> about [3], you can give it a try.

[1] makes a good result. The profile is the following. [2][3] are just possible workaournd. They are not realistic as b2g-v1.4.

http://people.mozilla.org/~bgirard/cleopatra/#report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41
Flags: needinfo?(sotaro.ikeda.g)
Cool. You can go ahead with [1] then. Please do a sanity on device to make sure it does not break anything else. Also in layer config of frame, HWC layer[1] is unnecessary, can you check by hacking and dropping this layer in HwcComposer to check if it improves FPS (with & without triple FrameBuffer).
(In reply to Sushil from comment #16)
> 1 suggestion to reduce load on HWC. In below layer configuration, layer[1]
> is totally hidden under layer[2] so it should not be present in layer tree
> (similar to Bug 1022164). It will lead to unnecessary solid fill operation
> in driver. Just for a quick check, can you hack HwcComposer2D code and do
> not include layer[1] and see if FPS improves ?

From the profile in Comment 17, we do not need to handle about it in this bug, I think.
(In reply to Sushil from comment #18)
> Also in layer config of frame, HWC
> layer[1] is unnecessary, can you check by hacking and dropping this layer in
> HwcComposer to check if it improves FPS (with & without triple FrameBuffer).

It is better to handle as a different problem as a different bug to keep a bug simple.
Depends on: 1024144
Comment on attachment 8438673 [details] [diff] [review]
patch - Deliver acquire fence to HwComposer

The patch moved to Bug 1024144.
Attachment #8438673 - Attachment is obsolete: true
(In reply to Sotaro Ikeda [:sotaro] from comment #17)
> 
> [1] makes a good result. The profile is the following. [2][3] are just
> possible workaournd. They are not realistic as b2g-v1.4.
> 
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=8dec5906370d8ebb17fc5cde78be403dc0b1ec41

Sorry, I pasted wrong profile data. It is same one to comment 12.
The following is the correct profile when [1] is applied.
http://people.mozilla.org/~bgirard/cleopatra/#report=5abf4e60431b2439498a3e95ec1b6ac36cb5bdea

> - [1] Increase Framebuffer number to 3.
Flame(msm8610) seems not have an enough graphics handling power. The profile in Comment 12, Compositor thread comsume 80% of time at eglSwapBuffers(). It is not an acceptable value as a product. But [1] mean that the device consumes more graphic buffer steadily than before. It is very bad thing for b2g.

I am going to increase the frame buffer count only on flame device. And after Bug 987529 is fixed, I am going to check if it is possible to reduce frame buffer number to 2.
VSYNC implementation will not improve the FPS, when GPU itself it taking too much time in eglSwapBuffers(). You can verify with a quick patch by adding vsync hook in HwcComposer2D in Flame code.
Status: NEW → ASSIGNED
Android has a mechanism to modify a number of frame buffer. It uses ifdef in FramebufferSurface.cpp. If we add "NUM_FRAMEBUFFER_SURFACE_BUFFERS := 3" to BoardConfig.mk. The config is delivered to FramebufferSurface.cpp in android.

http://androidxref.com/4.4.3_r1.1/xref/frameworks/native/services/surfaceflinger/DisplayHardware/FramebufferSurface.cpp#39

But FramebufferSurface.cpp is build as part of gecko now. IIRC, if we want to add device specific ifdef to gecko build. It seems to need to change "configure.in". It is not desirable. Instead, use system property is easier solution.
(In reply to Sushil from comment #25)
> VSYNC implementation will not improve the FPS, when GPU itself it taking too
> much time in eglSwapBuffers(). You can verify with a quick patch by adding
> vsync hook in HwcComposer2D in Flame code.

Possibility of reduction is not important here. Do evaluation after Bug 987529 fix is important. So, there is no necessity to check it urgently.
Blocks: 1014011
Nominate to "b2g-v1.4". This bug affect to WebGL game's performance and blocking 1.4+ bug.
blocking-b2g: --- → 1.4?
I confirmed that the patches change the frame buffer surface count to 3 on master flame.
Comment on attachment 8439485 [details] [diff] [review]
patch part 1 - Number of Framebuffer be changeable

mwu, can you review the patch? If there are a better way than this, can you give me an advice?
Attachment #8439485 - Flags: review?(mwu)
Attachment #8439486 - Flags: review?(mwu)
triage team, see comment 28 for why this is a blocker.
NI :elan, if we still need given we passed partner certification?
Flags: needinfo?(elancaster)
Investigating, see bug 1022823.
Whiteboard: [c=progress p= s= u=]
Priority: -- → P2
blocking for dependent bug being blocker.
blocking-b2g: 1.4? → 1.4+
Upon further review I'd like this in backlog.
Blocks: 1014011
blocking-b2g: 1.4+ → backlog
No longer blocks: 1014011
No longer blocks: 1014011
When flame is on 256MB config, "frame buffer surface count to 3" could cause oom problem. It seems better not to change the count in 256MB config.
On b2g v1.4, I got the following fps. I checked performance down in HwComposition on master flame. gfx layer's difference might cause it.
- b2g-v1.4, OpenGL composition: 41fps
- b2g-v1.4, HwComposer composition: 42fps
Attachment #8439485 - Flags: review?(mwu)
Attachment #8439486 - Flags: review?(mwu)
It might better to re-think about to increase frame buffer surface count. An effect of changing it is very big.
(In reply to Sotaro Ikeda [:sotaro] from comment #40)
> It might better to re-think about to increase frame buffer surface count. An
> effect of changing it is very big.

The main problem comes from flame's egl surface's implementation. It uses  Surface::hook_dequeueBuffer_DEPRECATED()v like in Comment 12. egl surface should not use the function. And newer platform does not use the function.
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> On b2g v1.4, I got the following fps. I checked performance down in
> HwComposition on master flame. gfx layer's difference might cause it.
> - b2g-v1.4, OpenGL composition: 41fps
> - b2g-v1.4, HwComposer composition: 42fps

Bug 1022164 is fixing some gfx layer's problems.
With On b2g v1.4 + latest app, the app's rendering seems very efficient. HwComposer compose only Canvas layer. In this case, flame's hw composer seems efficient.

> qdhwcomposer: canUseCopybitForRGB:renderArea 409920, fbArea 409920.
From Comment 41 and Comment 43, it seems better close this bug.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Sotaro,

So you are not increasing FB surfaces to 3. What is the conclusion of this Bug? Are you planning to fix Flame's egl surface's implementation ? Or Flame is being moved to b2g v1.4 ?
Flags: needinfo?(sotaro.ikeda.g)
Whiteboard: [c=progress p= s= u=] → [c=progress p= s=2014.06.20.t u=]
(In reply to Sushil from comment #45)
> Sotaro,
> 
> So you are not increasing FB surfaces to 3. What is the conclusion of this
> Bug? Are you planning to fix Flame's egl surface's implementation ? Or Flame
> is being moved to b2g v1.4 ?

On b2g-v1.4, mozilla will fix only very critical bugs now. "Fullscreen web gl game" use case, is not a critical problem anymore. Therefore I do not make a change to b2g-v1.4 about this problem.

To fix Surface::hook_dequeueBuffer_DEPRECATED() problem, we need to replace prebuilt binary to latest codeaurora's one. In my understanding, mozilla side does not have a right to do it. It is device vendor side's problem. The vendor already chose old JB platform as gonk.
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(elancaster)
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: