Closed Bug 861050 Opened 12 years ago Closed 6 years ago

[meta] WebRTC performance issue on B2G

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: slee, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: meta, Whiteboard: [WebRTC][blocking-webrtc-][b2g-webrtc-])

Attachments

(10 files, 8 obsolete files)

WIP 12 years ago StevenLee[:slee] (deleted), patch		Details \| Diff \| Splinter Review
Reduce timer overhead in webrtc threading, fix upstream int-conversion bug 12 years ago Randell Jesup [:jesup] (needinfo me) (deleted), patch		Details \| Diff \| Splinter Review
gzipped jprof profile of a video loopback call on Linux desktop 12 years ago Randell Jesup [:jesup] (needinfo me) (deleted), application/octet-stream		Details
gzipped jprof profile of a video loopback call on Linux desktop (opt build) 12 years ago Randell Jesup [:jesup] (needinfo me) (deleted), application/octet-stream		Details
Screenshot of CPU usage on unagi - the division function is not optimized 12 years ago StevenLee[:slee] (deleted), image/png		Details
needed patches for testing 12 years ago StevenLee[:slee] (deleted), application/x-tar		Details
per-thread profile on Linux desktop (gzipped) 12 years ago Randell Jesup [:jesup] (needinfo me) (deleted), application/octet-stream		Details
perf data of browser app without any modification 12 years ago StevenLee[:slee] (deleted), text/plain		Details
perf data of browser app with applying patch in comment 12 on B2G 12 years ago StevenLee[:slee] (deleted), text/plain		Details
perf data of browser app with fixing timer thread and no colour space converting 12 years ago StevenLee[:slee] (deleted), text/plain		Details
perf data of browser app with audio 12 years ago StevenLee[:slee] (deleted), text/plain		Details
needed patches for testing with audio turns on 12 years ago StevenLee[:slee] (deleted), application/x-tar		Details
needed patches for testing with audio turns on 12 years ago StevenLee[:slee] (deleted), application/x-tar		Details
needed patches for testing 12 years ago StevenLee[:slee] (deleted), application/x-tar		Details
needed patches for testing 12 years ago StevenLee[:slee] (deleted), application/x-tar		Details
needed patches for testing 12 years ago StevenLee[:slee] (deleted), application/x-tar		Details
perf data of browser app with audio only 12 years ago StevenLee[:slee] (deleted), text/plain		Details
Prototype patch to insert video on a worker thread 12 years ago Eric Rescorla (:ekr) (deleted), patch		Details \| Diff \| Splinter Review

StevenLee[:slee]

Reporter

Description

•

12 years ago

With 127944:b0d842380959 of mc, the cpu usage on unagi is about 70%. Here is the settings. * fake video only * encode once in every 25 frames so that the FPS is about 30/25=1.x * do not decode video frames from the other peer.

StevenLee[:slee]

Reporter

Comment 1

•

12 years ago

Here is the thread usage of browser app. User 58%, System 14%, IOW 0%, IRQ 0% User 123 + Nice 68 + Sys 47 + Idle 91 + IOW 0 + IRQ 0 + SIRQ 0 = 329 PID TID PR CPU% S VSS RSS PCY UID Thread Proc 1288 1317 0 25% S 147016K 73128K fg root Browser /system/b2g/plugin-container 1288 1319 0 13% S 147016K 73128K fg root ProcessThread /system/b2g/plugin-container 1288 1325 0 12% S 147016K 73128K fg root ViECaptureThrea /system/b2g/plugin-container 1288 1324 0 6% S 147016K 73128K fg root ProcessThread /system/b2g/plugin-container 1288 1311 0 4% S 147016K 73128K fg root MediaManager /system/b2g/plugin-container 1359 1359 0 3% R 1120K 472K fg root top top 1288 1288 0 2% S 147016K 73128K fg root Browser /system/b2g/plugin-container 1288 1318 0 1% S 147016K 73128K fg root Trace /system/b2g/plugin-container 1288 1296 0 1% S 147016K 73128K fg root Socket Thread /system/b2g/plugin-container Thread "Browser" is the encoding thread.

Chiajung Hung [:chiajung]

Comment 2

•

12 years ago

This is quiet weird for me...ProcessThread for WebRTC should used to handle Module events, and called with some time interval. I think it should be quiet light weight function except those callback doing something heavy. We can find which module it is handling for those heavy thread...

Jason Smith [:jsmith]

Updated

•

12 years ago

Whiteboard: [WebRTC] → [WebRTC][blocking-webrtc-]

Randell Jesup [:jesup] (needinfo me)

Comment 3

•

12 years ago

Not knowing b2g, I assume you can get function-level profiles of all threads. Could you upload those here, or email if too large? If you need help getting a working profiler, BenWa and I can probably help (he handles the profiler built-in by default, and I maintain (occasionally when needed) jprof (--enable-jprof in builds -- see the documentation in tools/jprof/README.html). jprof can profile all threads in the process, and produce per-thread profiles. Note: it needs to write data to a file while running, with is then post-processed with the 'jprof' executable; getting this to work in a cross-compiled setup like B2G or Android may take some work, but fundamentally it just needs a way to translate addresses into symbols (it dumps stack traces to a file to analyze later). Hopefully you already have working tools that do this or more.

StevenLee[:slee]

Reporter

Comment 4

•

12 years ago

Hi jesup, Thanks for your information. Bug 831611 ports perf to b2g. I used it to profile the browser on b2g. I will update the data when I find something interesting.:)

Randell Jesup [:jesup] (needinfo me)

Comment 5

•

12 years ago

Feel free to put some raw data up here; I'm very experienced at analyzing profiles and I know the main code paths in webrtc

Benoit Girard (:BenWa)

Comment 6

•

12 years ago

This might be a dupe of bug 860441.

StevenLee[:slee]

Reporter

Comment 7

•

12 years ago

(In reply to Benoit Girard (:BenWa) from comment #6) > This might be a dupe of bug 860441. Hi Benoit, I think it's different from bug 860441. I don't use camera in the testing. I use fake video.

StevenLee[:slee]

Reporter

Comment 8

•

12 years ago

Attached patch WIP (obsolete) (deleted) — Details — Splinter Review

I found 2 places that make "ProcessThread" eats up much CPU. * [1] TimeUntilProcess will be called about more than 15000 times per second. And TimeUntilProcess will call clock_gettime. I don't think we need update the local time so frequently. * [2] The division is not optimized(on android and b2g, we use "-Os -mthumb", should "-marm -O2" be faster?). After applying the patch, the CPU usage of 2 "ProcessThread" change from 10~13% -> 3%. [1] http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/main/source/video_coding_impl.cc#233 [2] http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/interface/tick_util.h#245

Benoit Girard (:BenWa)

Comment 9

•

12 years ago

(In reply to StevenLee from comment #7) > (In reply to Benoit Girard (:BenWa) from comment #6) > > This might be a dupe of bug 860441. > Hi Benoit, > > I think it's different from bug 860441. I don't use camera in the testing. I > use fake video. I don't know how webRTC is being drawn but if it uses YUV images then they share drawing paths.

Randell Jesup [:jesup] (needinfo me)

Comment 10

•

12 years ago

WebRTC does use YUV images for everything. So Benoit may be right on that part. Looking at TimeUntilProcess... StevenLee: Was "the patch" changing from -Os -mthumb to -marm -O2?

Benoit Girard (:BenWa)

Comment 11

•

12 years ago

I'm tentatively putting bug 860441 as a depends. I've asked bjacob to look into this when he lands his patch. In the mean time we could confirm by seeing if the regression window includes bug 852734.

Depends on: 860441

Randell Jesup [:jesup] (needinfo me)

Comment 12

•

12 years ago

Attached patch Reduce timer overhead in webrtc threading, fix upstream int-conversion bug (deleted) — Details — Splinter Review

Turned off the division-optimization - the system call is far more expensive, and that optimization was breaking on my Linux64 system (though the concept is good, but there are incorrect details I think). Also fixed a significant bug that allowed bad values to be returned due to type problems. I don't see a need at any given update to update the millisecond time inbetween Process() calls - it's one timeout at one time. Also un-commented nack mode; if you want to disable it, do so in VideoConduit.cpp - it runs a 10ms periodic timer looking to see if it needs to send NACKs, I can talk to the upstream people about a better solution that isn't timer-based, or only starts timers on packet loss visible in the jitter buffer

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Attachment #738361 - Attachment is obsolete: true

StevenLee[:slee]

Reporter

Comment 13

•

12 years ago

(In reply to Randell Jesup [:jesup] from comment #10) > WebRTC does use YUV images for everything. So Benoit may be right on that > part. > > Looking at TimeUntilProcess... > > StevenLee: Was "the patch" changing from -Os -mthumb to -marm -O2? No, it does not. What I mean is the division operation should be optimized by the compiler. If we use "-O2 -marm", the division should be compiled as the efficient way.

StevenLee[:slee]

Reporter

Comment 14

•

12 years ago

(In reply to Randell Jesup [:jesup] from comment #12) > Created attachment 738741 [details] [diff] [review] > Reduce timer overhead in webrtc threading, fix upstream int-conversion bug > > Turned off the division-optimization - the system call is far more > expensive, and that optimization was breaking on my Linux64 system (though > the concept is good, but there are incorrect details I think). Also fixed a > significant bug that allowed bad values to be returned due to type problems. This is basically using multiplication to replace division, x/1000000 == x *0.000001. But this version has overflow problem. > I don't see a need at any given update to update the millisecond time > inbetween Process() calls - it's one timeout at one time. Also > un-commented nack mode; if you want to disable it, do so in VideoConduit.cpp > - it runs a 10ms periodic timer looking to see if it needs to send NACKs, I > can talk to the upstream people about a better solution that isn't > timer-based, or only starts timers on packet loss visible in the jitter > buffer Sorry, I didn't mean to mark the nack mode. It is a only for test.

Randell Jesup [:jesup] (needinfo me)

Comment 15

•

12 years ago

Attached file gzipped jprof profile of a video loopback call on Linux desktop (deleted) — Details

Attached is a gziped profile from jprof of a desktop self-to-self video call (fake audio). NOTE: it's not a wonderful profile as it's a DEBUG build, but it does give a first-order approximation (and all the bits are likely to have similar relative profile numbers, if not exactly the same, especially in the codec bits). the test is at http://mozilla.github.io/webrtc-landing/pc_test.html I let it use fake video on one side. This is VGA@30fps, and we're encoding and decoding two sets of streams (two encodes, and two decodes). Of the time it used (this wasn't a realtime prof, it was a process profile, so these are percentages of the CPU time used): 25% VP8 encode 7.5% VP8 decode + surrounding stuff 8% audio encode/process - 3% of that was send-side AEC, 4.4% Opus encode 2% receive audio 10% nsViewManager (outside of WebRTC) \ these two do 13% in Paint() 6% nsWindow::OnExposeEvent (outside of WebRTC) / ~2.2% of Paint time is YUV->RGB conversion ~6% for memory allocation (mostly for paint) - may be high due to scribbling memory in Debug 3.5% everything in SocketTransportService (networking) 1.2% SRTP (encryption) 1/2 AEC, 1/2 HMAC 0.1% TimeUntilNextProcess (with my patch from this bug) 0.3% webrtc::ExtractBuffer() (copies incoming frames to pass to DeliverFrame) <0.5% audio resampling That accounts for around 70% of the CPU used; the rest is scattered in dribs and drabs (GC, JS stuff, etc)

Benoit Jacob [:bjacob] (mostly away)

Comment 16

•

12 years ago

Please retry now that bug 860441 has landed.

Randell Jesup [:jesup] (needinfo me)

Comment 17

•

12 years ago

On same linux box, opt build: (again, not a real-time jprof) 18% VP8 encode 6.9% VP8 decode 8.5% ViewManager 4.5% audio encode/process - 0.9% send-side AEC, 3% Opus 2.7% networking 0.5% TimeUntilProcess (without the patch to reduce the number of calls to it) More details later...

Randell Jesup [:jesup] (needinfo me)

Comment 18

•

12 years ago

Attached file gzipped jprof profile of a video loopback call on Linux desktop (opt build) (deleted) — Details

StevenLee[:slee]

Reporter

Comment 19

•

12 years ago

Attached image Screenshot of CPU usage on unagi - the division function is not optimized (obsolete) (deleted) — Details

The attached file is the perf data. As you can see that the top 1 is __udivsi3 which is division. I think the compiler does not do optimization well. Here is the settings of both peers. * FPS: encode once in every 20 frames * use fake video, no audio * CPU usage by threads PID TID PR CPU% S VSS RSS PCY UID Thread Proc 1698 1743 0 20% S 141688K 48660K fg root ProcessThread /system/b2g/plugin-container 1698 1749 0 14% S 141688K 48660K fg root ProcessThread /system/b2g/plugin-container 1698 1741 0 7% S 141688K 48660K fg root Browser /system/b2g/plugin-container 1698 1750 0 4% S 141688K 48660K fg root ViECaptureThrea /system/b2g/plugin-container 1698 1698 0 3% S 141688K 48660K fg root Browser /system/b2g/plugin-container 1698 1742 0 2% S 141688K 48660K fg root Trace /system/b2g/plugin-container 508 508 0 2% R 1120K 372K fg root top top 1698 1705 0 2% S 141688K 48660K fg root Socket Thread /system/b2g/plugin-container 1563 1563 0 1% S 172212K 59096K fg root b2g /system/b2g/b2g 511 511 0 1% S 1060K 312K fg root top top 1698 is browser app and I profile thread 1743

Benoit Jacob [:bjacob] (mostly away)

Comment 20

•

12 years ago

Again, are you using code pre or post-landing of bug 860441?

Benoit Jacob [:bjacob] (mostly away)

Comment 21

•

12 years ago

(In reply to StevenLee from comment #7) > (In reply to Benoit Girard (:BenWa) from comment #6) > > This might be a dupe of bug 860441. > Hi Benoit, > > I think it's different from bug 860441. I don't use camera in the testing. I > use fake video. The fix in bug 860441 was not specific to the camera. It fixes a general regression whereby we had lost the ability to directly use a YUV texture in the compositor and were falling back to a slow read-back-and-convert-in-software path.

StevenLee[:slee]

Reporter

Comment 22

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #21) > The fix in bug 860441 was not specific to the camera. It fixes a general > regression whereby we had lost the ability to directly use a YUV texture in > the compositor and were falling back to a slow > read-back-and-convert-in-software path. Hi Benoit, I didn't update to that revision. I will try it later. But from the profiling data, display seems not the top 5. Maybe the fps is only about 1.x.

Chiajung Hung [:chiajung]

Comment 23

•

12 years ago

I think this is independent from Bug 860441, too. Since WebRTC do not create GonkIOSurfaceImage but PlanarYCbCrImage now. See: http://mxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp#1038 And I think the patch is 860441 has nothing to do with PlanarYCbCrImage, isn't it?

StevenLee[:slee]

Reporter

Comment 24

•

12 years ago

Attached file needed patches for testing (obsolete) (deleted) — Details

Here are the patches that I used to test the performance

Benoit Jacob [:bjacob] (mostly away)

Comment 25

•

12 years ago

(In reply to Chiajung Hung [:chiajung] from comment #23) > And I think the patch is 860441 has nothing to do with PlanarYCbCrImage, > isn't it? That's correct. Ping me if there is a display-related performance issue, I haven't looked into the compositing of PlanarYCbCrImage.

StevenLee[:slee]

Reporter

Updated

•

12 years ago

Blocks: b2g-webrtc

Benoit Jacob [:bjacob] (mostly away)

Comment 26

•

12 years ago

So, there actually is a graphics performance bug on mozilla-central now that could still possibly explain this. In gfx/layers/client/ImageClient.cpp, in ImageClientSingle::UpdateImage, there is only a path from PLANAR_YCBCR and no path for GRALLOC_PLANAR_YCBCR. But on B2G, we have a GRALLOC_PLANAR_YCBCR. So we don't take any fast path there and fall back to the slow path of calling GetAsSurface and doing things in software.

Randell Jesup [:jesup] (needinfo me)

Comment 27

•

12 years ago

Attached file per-thread profile on Linux desktop (gzipped) (deleted) — Details

This is broken down per-thread (it's a normal usertime profile; I can't get JP_REALTIME working with threads other than mainthread...); each thread will have times that add up to 100% (of that thread's CPU use). You can use the hit-counts (left column) to figure out relative CPU use between threads. Only the first 6 threads have enough hits to care about.

Maire Reavy [:mreavy]

Comment 28

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #26) > So, there actually is a graphics performance bug on mozilla-central now that > could still possibly explain this. In gfx/layers/client/ImageClient.cpp, in > ImageClientSingle::UpdateImage, there is only a path from PLANAR_YCBCR and > no path for GRALLOC_PLANAR_YCBCR. But on B2G, we have a > GRALLOC_PLANAR_YCBCR. So we don't take any fast path there and fall back to > the slow path of calling GetAsSurface and doing things in software. Have we filed a bug for this in bugzilla? If not, can you file one and link it to this bug? Thanks for the info on this!

Flags: needinfo?(bjacob)

Benoit Jacob [:bjacob] (mostly away)

Updated

•

12 years ago

Depends on: 864017

Benoit Jacob [:bjacob] (mostly away)

Comment 29

•

12 years ago

Filed bug 864017.

Flags: needinfo?(bjacob)

Jason Smith [:jsmith]

Updated

•

12 years ago

Keywords: meta

Whiteboard: [WebRTC][blocking-webrtc-] → [WebRTC][blocking-webrtc-][b2g-webrtc-]

StevenLee[:slee]

Reporter

Comment 30

•

12 years ago

Attached file perf data of browser app without any modification (deleted) — Details

The cpu usage of browser is about 65~68. Please note that on B2G the composition operation is in parent process, b2g process. Here is the settings of both peers. * FPS: 15 * video: 176x144 * audio: none

Attachment #739362 - Attachment is obsolete: true

StevenLee[:slee]

Reporter

Comment 31

•

12 years ago

Attached file perf data of browser app with applying patch in comment 12 on B2G (deleted) — Details

The CPU usage is down to 59~61%. The top one is NV12 to I420.

StevenLee[:slee]

Reporter

Comment 32

•

12 years ago

Attached file perf data of browser app with fixing timer thread and no colour space converting (deleted) — Details

The CPU usage now is 53~54%. The top one is I420Copy.

Timothy B. Terriberry (:derf)

Comment 33

•

12 years ago

(In reply to StevenLee from comment #31) > The CPU usage is down to 59~61%. The top one is NV12 to I420. It looks like its using the C fallback (SplitUV_C). /proc/cpuinfo says this thing had NEON, though. So either we're not building libyuv correctly, the CPU detection is not working, or the alignment is wrong.

StevenLee[:slee]

Reporter

Comment 34

•

12 years ago

(In reply to Timothy B. Terriberry (:derf) from comment #33) > It looks like its using the C fallback (SplitUV_C). /proc/cpuinfo says this > thing had NEON, though. So either we're not building libyuv correctly, the > CPU detection is not working, or the alignment is wrong. That's because the video width is 176. The width of U and V is 176/2=88. In libyuv, it detects the CPU correctly. It goes the neon path only if the width is multiple of 16. If not, it uses c version to do colour space convert. In this case, it uses the c version.

Timothy B. Terriberry (:derf)

Comment 35

•

12 years ago

(In reply to StevenLee from comment #34) > libyuv, it detects the CPU correctly. It goes the neon path only if the > width is multiple of 16. If not, it uses c version to do colour space > convert. In this case, it uses the c version. Yup. Upstream already has a SplutUVRow_Any_NEON that appears to work for any width.

Randell Jesup [:jesup] (needinfo me)

Comment 36

•

12 years ago

Thanks for the perf data! On libyuv: I'm going to look at what our options are for updating webrtc.org for FF23 for this and other reasons anyways (we discussed this in our meeting today). On a separate note, I may look at moving libyuv out of media/webrtc, as it's generally useful.

StevenLee[:slee]

Reporter

Comment 37

•

12 years ago

Attached file perf data of browser app with audio (obsolete) (deleted) — Details

cpu: 72~74 * top by process User 75%, System 23%, IOW 0%, IRQ 0% User 196 + Nice 51 + Sys 78 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 2 = 327 PID PR CPU% S #THR VSS RSS PCY UID Name 3657 0 73% S 37 146396K 48716K fg root /system/b2g/plugin-container 3504 0 14% R 33 176240K 56676K fg root /system/b2g/b2g 3210 0 3% S 20 36844K 2756K fg media /system/bin/mediaserver 2628 0 3% S 1 1124K 384K fg root top 2521 0 3% S 1 0K 0K fg root ksdioirqd/mmc2 * top by thread User 78%, System 20%, IOW 0%, IRQ 0% User 221 + Nice 47 + Sys 71 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 1 = 340 PID TID PR CPU% S VSS RSS PCY UID Thread Proc 3657 3719 0 41% R 146396K 48992K fg root Browser /system/b2g/plugin-container 3657 3730 0 8% S 146396K 48992K fg root DecodingThread /system/b2g/plugin-container 3657 3664 0 7% S 146396K 48992K fg root Socket Thread /system/b2g/plugin-container 3657 3734 0 4% S 146396K 48992K fg root ViECaptureThrea /system/b2g/plugin-container 3657 3699 0 3% S 146396K 48992K fg root webrtc_gonk_aud /system/b2g/plugin-container 3657 3726 0 3% S 146396K 48992K fg root ProcessThread /system/b2g/plugin-container 3657 3733 0 2% S 146396K 48992K fg root ProcessThread /system/b2g/plugin-container 2628 2628 0 2% R 1124K 384K fg root top top 2521 2521 0 2% S 0K 0K fg root ksdioirqd/mmc2 3504 3526 0 2% S 176176K 57144K fg root Compositor /system/b2g/b2g

StevenLee[:slee]

Reporter

Comment 38

•

12 years ago

Hi derf, I found that in attachment 742214 [details] audio uses much cpu. I am not familiar with this. Would you please take a look? Thanks.

Timothy B. Terriberry (:derf)

Comment 39

•

12 years ago

(In reply to StevenLee from comment #38) > I found that in attachment 742214 [details] audio uses much cpu. I am not > familiar with this. Would you please take a look? If I'm reading that right, we're spending 2/3 of our audio processing time doing AEC and 1/3 encoding. It looks like we're using the desktop AEC module by default. We probably want to use the "mobile" aecm module instead (media/webrtc/trunk/webrtc/modules/audio_processing/aecm), which uses integer math and has NEON optimizations. It looks like the NEON is currently hard-coded to only be used if 'OS=="android"' in audio_processing.gypi. GIPS only uses the module by default for Android and iOS: this is controlled from media/webrtc/trunk/webrtc/voice_engine/voe_audio_procesing_impl.cc (look for kDefaultEcMode). You should be able to override the default by setting the media.peerconnection.aec pref to 4 (see the EcModes enum in media/webrtc/trunk/webrtc/common_types.h for the full list). This pref is passed in from the audio conduit: See WebrtcAudioConduit::ConfigureSendMediaCodec() in media/webrtc/signaling/src/media-conduit/AudioConduit.cpp.

StevenLee[:slee]

Reporter

Comment 40

•

12 years ago

Attached file needed patches for testing with audio turns on (obsolete) (deleted) — Details

The current version will capture and send audio to other peer but it has serious delay. I will figure it out.

Attachment #739521 - Attachment is obsolete: true

StevenLee[:slee]

Reporter

Comment 41

•

12 years ago

(In reply to StevenLee from comment #40) > Created attachment 742279 [details] > needed patches for testing with audio turns on > > The current version will capture and send audio to other peer but it has > serious delay. I will figure it out. The series of patches is AudioDevice.patch Camera_part1.patch Camera_part2.patch Camera_part3.patch SkipPermissionCheck.patch ModifyVideoSettings.patch mediaEngine.patch nsComponent.patch reduceTimerThread.patch forceI420.patch

StevenLee[:slee]

Reporter

Comment 42

•

12 years ago

(In reply to Timothy B. Terriberry (:derf) from comment #39) > (In reply to StevenLee from comment #38) > If I'm reading that right, we're spending 2/3 of our audio processing time > doing AEC and 1/3 encoding. Thanks, I will try it. :)

StevenLee[:slee]

Reporter

Comment 43

•

12 years ago

Attached file needed patches for testing with audio turns on (obsolete) (deleted) — Details

Rebase the camera and mic patches since bug 825110 and bug 825112 have updated. Here is the sequence of applying these patches 825110_v11_part_webrtc.patch 825110_v5_part_camera.patch 825110_v6_part_webrtc_media.patch 825112_audio_device_impl_for_gonk.patch 825112_duplicate_audio_device_impl.patch 825112_v4_audio_device.patch SkipPermissionCheck.patch ModifyVideoSettings.patch mediaEngine.patch nsComponent.patch reduceTimerThread.patch forceI420.patch FixAudioIntial.patch

Attachment #742279 - Attachment is obsolete: true

StevenLee[:slee]

Reporter

Comment 44

•

12 years ago

(In reply to Timothy B. Terriberry (:derf) from comment #39) > (In reply to StevenLee from comment #38) > You should be able to override the default by setting the > media.peerconnection.aec pref to 4 (see the EcModes enum in > media/webrtc/trunk/webrtc/common_types.h for the full list). This pref is > passed in from the audio conduit: See > WebrtcAudioConduit::ConfigureSendMediaCodec() in > media/webrtc/signaling/src/media-conduit/AudioConduit.cpp. Hi derf, It seems works. The CPU usage goes down about 5% when I set media.peerconnection.aec pref to 4. But the total CPU usage goes high after I update to the latest m-c. I will find out which parts cause the problem. Thanks for your suggestion.

StevenLee[:slee]

Reporter

Comment 45

•

12 years ago

Attached file needed patches for testing (obsolete) (deleted) — Details

Fixed the problem of gUM and the patching series is the same as comment 43.

Attachment #743526 - Attachment is obsolete: true

StevenLee[:slee]

Reporter

Comment 46

•

12 years ago

Attached file needed patches for testing (obsolete) (deleted) — Details

It's my bad that I did not remove the testing code for timestamp. It makes no video displaying when the connection is constructed.

Attachment #744409 - Attachment is obsolete: true

StevenLee[:slee]

Reporter

Comment 47

•

12 years ago

Attached file needed patches for testing (deleted) — Details

Update patch_series.txt

Attachment #748747 - Attachment is obsolete: true

StevenLee[:slee]

Reporter

Comment 48

•

12 years ago

I found 2 places that may make the performance worse. * NotifyPull and NotifyQueuedTrackChanges been called too often, about 10ms per time interval. For audio we may need to encode every 10 ms but video does not. So that the camera may capture 30 frames per second but there are more than 30 frames been encoded per second. * This may happen only on FFOS. The audio encoding uses much CPU, about 80%. derf, should we set any flags when compiling opus encoder on ARM? One question, could we set the interval of NotifyPull and NotifyQueuedTrackChanges longer, ex, 30 ms?

u459114

Comment 49

•

12 years ago

Steven, For video encoding path performance issue, we may discuss in bug 872887. That one is more specific. Change this issue as meta one, list perf bottleneck we find in whole pipeline.

u459114

Updated

•

12 years ago

Depends on: 872887

Randell Jesup [:jesup] (needinfo me)

Comment 50

•

12 years ago

NotifyPull is called to see if there's a new video or audio frame. Eventually we want to switch to a push-notification, but the way MediaStreams are currently defined makes this hard. Note that when called, it will return the same image as the previous call unless the video source has changed the image. Returning the same image ptr in gUM should not cause a new frame to be encoded.... However, it appears this is not the case (though likely the GIPS encode code is throwing away many of the dups). Putting in a dup detector shows we're getting around 90-95 dups per 30 real frames per second on desktop. I'll file a bug on this issue. Thanks!

Timothy B. Terriberry (:derf)

Comment 51

•

12 years ago

(In reply to StevenLee from comment #48) > * This may happen only on FFOS. The audio encoding uses much CPU, about 80%. > derf, should we set any flags when compiling opus encoder on ARM? You can pass a setting to reduce the complexity of Opus encoding (with a drop in quality), but do you have updated profile data around this? The data in comment 44 had us spending only 37% on Opus encoding.

StevenLee[:slee]

Reporter

Comment 52

•

12 years ago

Attached file perf data of browser app with audio only (deleted) — Details

Hi derf, Here is the perf data when using "audio only". When turing on aecm, the cpu usage is from 80% to 70%. The perf data in comment 44 is different from this one. I think it's because the test in comment 44 has audio and video. If I run video only the cpu usage is about 60%. Currently, the cpu may not afford both audio and video without fixing some performance issues. Can you suggest a settings for opus encoder? Thanks.

Attachment #742214 - Attachment is obsolete: true

Maire Reavy [:mreavy]

Comment 53

•

12 years ago

(In reply to StevenLee from comment #52) > Created attachment 750904 [details] > perf data of browser app with audio only > > Hi derf, > > Here is the perf data when using "audio only". When turing on aecm, the cpu > usage is from 80% to 70%. The perf data in comment 44 is different from this > one. I think it's because the test in comment 44 has audio and video. If I > run video only the cpu usage is about 60%. Currently, the cpu may not > afford both audio and video without fixing some performance issues. Can you > suggest a settings for opus encoder? Thanks. Steven -- Can you run the same tests with G.711 in place of Opus? Thanks.

Timothy B. Terriberry (:derf)

Comment 54

•

12 years ago

(In reply to StevenLee from comment #52) > Here is the perf data when using "audio only". When turing on aecm, the cpu > usage is from 80% to 70%. The perf data in comment 44 is different from this > one. I think it's because the test in comment 44 has audio and video. If I > run video only the cpu usage is about 60%. Currently, the cpu may not > afford both audio and video without fixing some performance issues. Can you > suggest a settings for opus encoder? Thanks. So, from that profile it appears we're running in Hybrid mode, which uses both the CELT and SILK encoders (and thus the maximum CPU). I'm not actually sure why, because AFAIK the default bitrate (64 kbps) should be high enough to switch to CELT mode by itself, and I don't think we're overriding that default anywhere. Generally the CELT encoder is much faster than the SILK one at higher complexities. I'll investigate. You can use opus_encoder_ctl(inst->encoder, OPUS_SET_COMPLEXITY(4)) in media/webrtc/trunk/webrtc/modules/audio_coding/codecs/opus/opus_interface.c to lower the complexity. The default is 10 (the maximum). In other news, this weekend I committed some changes upstream which should reduce complexity on ARM in the neighborhood of 20%: <https://git.xiph.org/?p=opus.git;a=commit;h=972a34ec>.

Eric Rescorla (:ekr)

Comment 55

•

12 years ago

Attached patch Prototype patch to insert video on a worker thread (deleted) — Details — Splinter Review

We notice that a lot of time is being spent inserting the frame and this maybe delayin the MSG. This moves the insert to a different thread. I tested this once locally, but it shouldn't be considered done.

StevenLee[:slee]

Reporter

Updated

•

11 years ago

Depends on: 877518

StevenLee[:slee]

Reporter

Comment 56

•

11 years ago

Hi all, Sorry for late reply. I tried G711 and the total CPU usage is down to about 50% percent. It's quit helpful. But the audio still has latency problem. I found one duplicate audio encoding problem of both desktop and B2G(bug 877518). I was trying to find out the audio latency problem on B2G. I think it should be fixed first. Then I will try Opus encoder with lower complexity and new version of Opus encoder.

StevenLee[:slee]

Reporter

Comment 57

•

11 years ago

I tested OPUS with complexity 4 and 1. Both of the CPU usage are about 60% when audio only. The audio also has jitter problem.

Randell Jesup [:jesup] (needinfo me)

Updated

•

11 years ago

Depends on: 890419

Jason Smith [:jsmith]

Updated

•

11 years ago

No longer depends on: 877518

StevenLee[:slee]

Reporter

Updated

•

11 years ago

Depends on: 979716

Jason Smith [:jsmith]

Updated

•

11 years ago

Depends on: 979726

Sylvestre Ledru [:Sylvestre]

Comment 58

•

6 years ago

Mass closing as we are no longer working on b2g/firefox os.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Sylvestre Ledru [:Sylvestre]

Comment 59

•

6 years ago

Mass closing as we are no longer working on b2g/firefox os.

You need to log in before you can comment on or make changes to this bug.