Closed Bug 1036770 Opened 10 years ago Closed 1 year ago

[flame][webrtc] Compositor CPU usage is significantly higher when make the video call

Categories

(Core :: Layout, defect)

x86_64
Linux
defect

Tracking

()

RESOLVED INVALID
tracking-b2g backlog
Tracking Status
b2g-v1.4 --- unaffected
b2g-v2.0 --- affected
b2g-v2.1 --- affected

People

(Reporter: rlin, Unassigned)

References

Details

(Keywords: perf, regression)

Attachments

(2 files)

I found the Compositor/b2g thread CPU usage is higher then before. 
STR. 
1. flash v122 flame ROM
2. flash PVT build for gecko/gaia 
I test the mozilla-central-flame/2014072014-07-03-04-02-09, the same result as latest build.
3. goto http://nightly-gupshup.herokuapp.com/login, make a video call. (note:USE front CAMERA)
4. use adb shell top -m 30 -t to observe the CPU usage for b2g/Compositor threads

result: on 2014072014-07-03-04-02-09/
User 73%, System 23%, IOW 0%, IRQ 0%
User 238 + Nice 268 + Sys 166 + Idle 20 + IOW 0 + IRQ 0 + SIRQ 1 = 693

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
  313   313  0  17% R 240576K  93740K     root     b2g             /system/b2g/b2g
 1275  1745  1  12% R 132432K  50612K     u0_a1275 opensl_rec_thre /system/b2g/plugin-container
  313   784  0  12% R 240684K  93856K     root     Compositor      /system/b2g/b2g
 1275  1803  1   9% R 132388K  50496K     u0_a1275 ViECaptureThrea /system/b2g/plugin-container
 1275  1742  1   7% S 132432K  50612K     u0_a1275 MediaStreamGrph /system/b2g/plugin-container
 1275  1275  0   4% S 132432K  50612K     u0_a1275 Browser         /system/b2g/plugin-container

*revert the gaia to 2014-06-30-04-02-01
the result is
User 51%, System 22%, IOW 0%, IRQ 0%
User 126 + Nice 204 + Sys 143 + Idle 161 + IOW 0 + IRQ 0 + SIRQ 4 = 638

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
 1453  1765  0  13% R 128616K  54036K     u0_a1453 opensl_rec_thre /system/b2g/plugin-container
 1453  1763  0   7% S 128544K  54036K     u0_a1453 MediaStreamGrph /system/b2g/plugin-container
  299   299  0   7% S 235108K  94808K     root     b2g             /system/b2g/b2g
  299   865  0   6% S 235108K  94808K     root     Compositor      /system/b2g/b2g
 1453  1824  0   5% S 128500K  53920K     u0_a1453 ViECaptureThrea /system/b2g/plugin-container

I test on aurora branch and found the similar problem, too.
Can we get performance profiles and a dump of the layer tree before and after?
Ohh wait, you're seeing this difference from reverting gaia back just a handful of days only and leaving gecko unchanged.

In that case you should just bisect the change that introduces the regression.
Keywords: perf
Whiteboard: ft:loop
Found this commit would affect this issue. 
commit 7da24d97aced3264bd35f87be2990fb8f86491ab
Merge: d4c5090 b9bd5d1
Author: Michael Wu <mwu@mozilla.com>
Date:   Tue Jul 1 21:26:14 2014 +0800

    Merge pull request #21210 from michaelwu/bug-1032659
    
    Bug 1032659 - Replace font-weight: lighter with 300 r=timdream
I wouldn't expect to see an increase in composite times from such a patch.

We should get a dump of the display list before and after the patch.

Build with B2G_DUMP_PAINTING=1, flip layout.display-list.dump and collect the display lists from the relevant processes. Perhaps this property changes layer tree.
Randy -- can you identify the regression window?

Or Jsmith -- can QA help us identify a regression window?  I added qawanted to the keywords.
Flags: needinfo?(rlin)
Flags: needinfo?(jsmith)
Keywords: qawanted
Sure - let's first do a branch check. After we do a branch check, we can look into getting a window.
blocking-b2g: --- → 2.1?
Flags: needinfo?(jsmith)
(In reply to Randy Lin [:rlin] from comment #3)
> Found this commit would affect this issue. 
> commit 7da24d97aced3264bd35f87be2990fb8f86491ab
> Merge: d4c5090 b9bd5d1
> Author: Michael Wu <mwu@mozilla.com>
> Date:   Tue Jul 1 21:26:14 2014 +0800
> 
>     Merge pull request #21210 from michaelwu/bug-1032659
>     
>     Bug 1032659 - Replace font-weight: lighter with 300 r=timdream
Use AppManager to change the CSS value doesn't affect the CPU usage.
Also reTest this again and it isn't the key one.
====

I test several times again and found this one should be the suspect commit.
commit 33131cfb04992957039b12b9ffd3464f444e186c
Author: Pavel Ivanov <pivanov@mozilla.com>
Date:   Mon Jun 30 14:16:06 2014 +0300

    Bug 1021271 - Status Bar needs refinement for 1.5x Scale and New Homescreen
Flags: needinfo?(rlin)
Whiteboard: ft:loop
Pavel -- We think we have found a regression from your checkin of bug 1021271.  Can you look into this for us?

Sotoro -- Do you agree that this is likely the cause?
Depends on: 1021271
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(pivanov)
Can we get a layer tree before and after that change? It's quite possible that we're hitting bug 1033538. A layer tree of it before and after should show different layers and/or have the opaque property change. If it's bug 1033538 then we should expect to lose opaque layers and color alyers. It might also be that the changes in bug 1021271 just lead to a more complex layer tree.

We can rule it out bug 1021271 by patching the Flame to use content scale x1 in widget/gonk and seeing if the regression goes away.
Hey guys,
my patch for bug 1021271 is only images and CSS changes (most of them are only the new background-positions and font-sizes who follow the new spec). If I can help with something please ping me
Flags: needinfo?(pivanov)
Sotoro and Benoit knows this better than I do, but images and CSS changes can impact perf if things are not sized as multiples of 1.5x. See Bug 1027231 as an example.
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #8)
> Pavel -- We think we have found a regression from your checkin of bug
> 1021271.  Can you look into this for us?
> 
> Sotoro -- Do you agree that this is likely the cause?

I am not sure if bug 1021271 causes the problem. As in Comment 9, can we have a layer tree?
Flags: needinfo?(sotaro.ikeda.g)
Who can get us a layer tree before and after the change?  Randy -- Do you know how to do this?  If so, do you have the time to do this today?  Thanks.
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(rlin)
Flags: needinfo?(bgirard)
Attached file high cpu usage layer dump (deleted) —
base on gecko = b7b20af4a4fb, totoal > 90%
Flags: needinfo?(rlin)
Attachment #8456631 - Attachment description: high cpu usage layer dump → lower cpu usage layer dump
Thanks, Randy! 

Sotaro, Benoit -- Do Randy's attachments get you the info you need?
Sorry for the late response.

The log are incomplete because of bug 1030245 unfortunately. We did land a patch to avoid bug 1033538 so it would be worth retesting at the same time while getting the layer tree. It's possible that the problem is fixed. Otherwise having the non truncated layers dump would help.
Flags: needinfo?(bgirard)
I was able to reproduce this issue on the latest Flame 2.0,  Flame 2.1, and Buri 2.1 builds.

Environmental Variables:
Device: Flame 2.0
BuildID: 20140718073930
Gaia: 155b71b5fb3e06d0f04020bc4e869ae180a820c5
Gecko: ea0e9e117349
Version: 32.0a2 (2.0) 
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Environmental Variables:
Device: Flame Master
BuildID: 20140718061630
Gaia: Unknown
Gecko: 330ba968ed61
Version: 33.0a1 (Master) 
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0

Environmental Variables:
Device: Buri Master
BuildID: 20140718061630
Gaia: Unknown
Gecko: 330ba968ed61
Version: 33.0a1 (Master) 
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0

Compositor was 8% or more cpu usage.


This issue does not occur on Flame 1.4

Environmental Variables:
Device: Flame 1.4
BuildID: 20140718081451
Gaia: 621d152f89347c79619aa909ad62cc2ac9d3ab5b
Gecko: 989db90b4457
Version: 30.0 (1.4) 
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0

Compositor was 2% cpu usage
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qawanted
blocking-b2g: 2.1? → 2.0?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
To get a regression-window here we will need to know exactly what % to consider 'working' and what % to consider 'not working' or 'no repro' and 'repro' as the tester is seeing a variety of ranges.
If you still think you can benefit from a regression window feel free to re-add the tag when posting this info.
Flags: needinfo?(sotaro.ikeda.g)
Jon - Can you weigh in on whether you think this is a blocking issue from a power perspective?
Flags: needinfo?(jhylands)
Jason, I'm in Paris this week, but I brought my flame harness with me - I'll see if I can run the test, and I'll report here with the results. I assume the test uses wifi and does not require an active SIM card, correct?
So I've tried a couple times now to initiate a call between my laptop and my flame on that site - it shows the local view in both browsers, but nothing in either remote view.

Is there some specific way to set up a call in this system?
(In reply to Jon Hylands [:jhylands] from comment #22)
> So I've tried a couple times now to initiate a call between my laptop and my
> flame on that site - it shows the local view in both browsers, but nothing
> in either remote view.
> 
> Is there some specific way to set up a call in this system?

Nils - Can you advise Jon here on what he should do here?
Flags: needinfo?(drno)
(In reply to Jason Smith [:jsmith] from comment #23)
> (In reply to Jon Hylands [:jhylands] from comment #22)
> > So I've tried a couple times now to initiate a call between my laptop and my
> > flame on that site - it shows the local view in both browsers, but nothing
> > in either remote view.
> > 
> > Is there some specific way to set up a call in this system?
> 
> Nils - Can you advise Jon here on what he should do here?

Jon, your descriptions sounds like the two devices failed to establish a connection. Are both devices in the same network/WiFi? If not you might encounter bug 1042345. Until we have fix for that the easiest solution is to have both devices in the LAN/network, so they don't have to use a TURN relay.
Flags: needinfo?(drno)
With the two devices on the same Wifi network, I was able to successfully connect, and ran some power tests. Each test sampled power for 30 seconds, with the display on the phone zoomed out so I could see both video sources on screen.

2014-06-30-04-02-01
Run 1: 768 mA
Run 2: 777 mA
Run 3: 774 mA

Average: 773 mA

2014-07-20-16-02-02
Run 1: 773 mA
Run 2: 791 mA
Run 3: 779 mA

Average: 781 mA

Based on that (about 1% difference), I wouldn't block based on power...
Flags: needinfo?(jhylands)
Thanks for the analysis Jon. Not blocking based on comment 25.
blocking-b2g: 2.0? → backlog
blocking-b2g: backlog → ---
Severity: normal → S3

Closing old B2G bugs

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: