Closed
Bug 861050
Opened 12 years ago
Closed 6 years ago
[meta] WebRTC performance issue on B2G
Categories
(Core :: WebRTC, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: slee, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: meta, Whiteboard: [WebRTC][blocking-webrtc-][b2g-webrtc-])
Attachments
(10 files, 8 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
application/octet-stream
|
Details | |
(deleted),
application/octet-stream
|
Details | |
(deleted),
application/octet-stream
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
application/x-tar
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
With 127944:b0d842380959 of mc, the cpu usage on unagi is about 70%. Here is the settings.
* fake video only
* encode once in every 25 frames so that the FPS is about 30/25=1.x
* do not decode video frames from the other peer.
Reporter | ||
Comment 1•12 years ago
|
||
Here is the thread usage of browser app.
User 58%, System 14%, IOW 0%, IRQ 0%
User 123 + Nice 68 + Sys 47 + Idle 91 + IOW 0 + IRQ 0 + SIRQ 0 = 329
PID TID PR CPU% S VSS RSS PCY UID Thread Proc
1288 1317 0 25% S 147016K 73128K fg root Browser /system/b2g/plugin-container
1288 1319 0 13% S 147016K 73128K fg root ProcessThread /system/b2g/plugin-container
1288 1325 0 12% S 147016K 73128K fg root ViECaptureThrea /system/b2g/plugin-container
1288 1324 0 6% S 147016K 73128K fg root ProcessThread /system/b2g/plugin-container
1288 1311 0 4% S 147016K 73128K fg root MediaManager /system/b2g/plugin-container
1359 1359 0 3% R 1120K 472K fg root top top
1288 1288 0 2% S 147016K 73128K fg root Browser /system/b2g/plugin-container
1288 1318 0 1% S 147016K 73128K fg root Trace /system/b2g/plugin-container
1288 1296 0 1% S 147016K 73128K fg root Socket Thread /system/b2g/plugin-container
Thread "Browser" is the encoding thread.
Comment 2•12 years ago
|
||
This is quiet weird for me...ProcessThread for WebRTC should used to handle Module events, and called with some time interval.
I think it should be quiet light weight function except those callback doing something heavy.
We can find which module it is handling for those heavy thread...
Updated•12 years ago
|
Whiteboard: [WebRTC] → [WebRTC][blocking-webrtc-]
Comment 3•12 years ago
|
||
Not knowing b2g, I assume you can get function-level profiles of all threads. Could you upload those here, or email if too large?
If you need help getting a working profiler, BenWa and I can probably help (he handles the profiler built-in by default, and I maintain (occasionally when needed) jprof (--enable-jprof in builds -- see the documentation in tools/jprof/README.html). jprof can profile all threads in the process, and produce per-thread profiles. Note: it needs to write data to a file while running, with is then post-processed with the 'jprof' executable; getting this to work in a cross-compiled setup like B2G or Android may take some work, but fundamentally it just needs a way to translate addresses into symbols (it dumps stack traces to a file to analyze later).
Hopefully you already have working tools that do this or more.
Reporter | ||
Comment 4•12 years ago
|
||
Hi jesup,
Thanks for your information.
Bug 831611 ports perf to b2g. I used it to profile the browser on b2g. I will update the data when I find something interesting.:)
Comment 5•12 years ago
|
||
Feel free to put some raw data up here; I'm very experienced at analyzing profiles and I know the main code paths in webrtc
Comment 6•12 years ago
|
||
This might be a dupe of bug 860441.
Reporter | ||
Comment 7•12 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #6)
> This might be a dupe of bug 860441.
Hi Benoit,
I think it's different from bug 860441. I don't use camera in the testing. I use fake video.
Reporter | ||
Comment 8•12 years ago
|
||
I found 2 places that make "ProcessThread" eats up much CPU.
* [1] TimeUntilProcess will be called about more than 15000 times per second. And TimeUntilProcess will call clock_gettime. I don't think we need update the local time so frequently.
* [2] The division is not optimized(on android and b2g, we use "-Os -mthumb", should "-marm -O2" be faster?).
After applying the patch, the CPU usage of 2 "ProcessThread" change from 10~13% -> 3%.
[1] http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/main/source/video_coding_impl.cc#233
[2] http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/interface/tick_util.h#245
Comment 9•12 years ago
|
||
(In reply to StevenLee from comment #7)
> (In reply to Benoit Girard (:BenWa) from comment #6)
> > This might be a dupe of bug 860441.
> Hi Benoit,
>
> I think it's different from bug 860441. I don't use camera in the testing. I
> use fake video.
I don't know how webRTC is being drawn but if it uses YUV images then they share drawing paths.
Comment 10•12 years ago
|
||
WebRTC does use YUV images for everything. So Benoit may be right on that part.
Looking at TimeUntilProcess...
StevenLee: Was "the patch" changing from -Os -mthumb to -marm -O2?
Comment 11•12 years ago
|
||
I'm tentatively putting bug 860441 as a depends. I've asked bjacob to look into this when he lands his patch. In the mean time we could confirm by seeing if the regression window includes bug 852734.
Depends on: 860441
Comment 12•12 years ago
|
||
Turned off the division-optimization - the system call is far more expensive, and that optimization was breaking on my Linux64 system (though the concept is good, but there are incorrect details I think). Also fixed a significant bug that allowed bad values to be returned due to type problems. I don't see a need at any given update to update the millisecond time inbetween Process() calls - it's one timeout at one time. Also un-commented nack mode; if you want to disable it, do so in VideoConduit.cpp - it runs a 10ms periodic timer looking to see if it needs to send NACKs, I can talk to the upstream people about a better solution that isn't timer-based, or only starts timers on packet loss visible in the jitter buffer
Updated•12 years ago
|
Attachment #738361 -
Attachment is obsolete: true
Reporter | ||
Comment 13•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #10)
> WebRTC does use YUV images for everything. So Benoit may be right on that
> part.
>
> Looking at TimeUntilProcess...
>
> StevenLee: Was "the patch" changing from -Os -mthumb to -marm -O2?
No, it does not. What I mean is the division operation should be optimized by the compiler. If we use "-O2 -marm", the division should be compiled as the efficient way.
Reporter | ||
Comment 14•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #12)
> Created attachment 738741 [details] [diff] [review]
> Reduce timer overhead in webrtc threading, fix upstream int-conversion bug
>
> Turned off the division-optimization - the system call is far more
> expensive, and that optimization was breaking on my Linux64 system (though
> the concept is good, but there are incorrect details I think). Also fixed a
> significant bug that allowed bad values to be returned due to type problems.
This is basically using multiplication to replace division, x/1000000 == x *0.000001. But this version has overflow problem.
> I don't see a need at any given update to update the millisecond time
> inbetween Process() calls - it's one timeout at one time. Also
> un-commented nack mode; if you want to disable it, do so in VideoConduit.cpp
> - it runs a 10ms periodic timer looking to see if it needs to send NACKs, I
> can talk to the upstream people about a better solution that isn't
> timer-based, or only starts timers on packet loss visible in the jitter
> buffer
Sorry, I didn't mean to mark the nack mode. It is a only for test.
Comment 15•12 years ago
|
||
Attached is a gziped profile from jprof of a desktop self-to-self video call (fake audio). NOTE: it's not a wonderful profile as it's a DEBUG build, but it does give a first-order approximation (and all the bits are likely to have similar relative profile numbers, if not exactly the same, especially in the codec bits).
the test is at http://mozilla.github.io/webrtc-landing/pc_test.html
I let it use fake video on one side. This is VGA@30fps, and we're encoding and decoding two sets of streams (two encodes, and two decodes).
Of the time it used (this wasn't a realtime prof, it was a process profile, so these are percentages of the CPU time used):
25% VP8 encode
7.5% VP8 decode + surrounding stuff
8% audio encode/process - 3% of that was send-side AEC, 4.4% Opus encode
2% receive audio
10% nsViewManager (outside of WebRTC) \ these two do 13% in Paint()
6% nsWindow::OnExposeEvent (outside of WebRTC) /
~2.2% of Paint time is YUV->RGB conversion
~6% for memory allocation (mostly for paint) - may be high due to scribbling memory in Debug
3.5% everything in SocketTransportService (networking)
1.2% SRTP (encryption) 1/2 AEC, 1/2 HMAC
0.1% TimeUntilNextProcess (with my patch from this bug)
0.3% webrtc::ExtractBuffer() (copies incoming frames to pass to DeliverFrame)
<0.5% audio resampling
That accounts for around 70% of the CPU used; the rest is scattered in dribs and drabs (GC, JS stuff, etc)
Comment 16•12 years ago
|
||
Please retry now that bug 860441 has landed.
Comment 17•12 years ago
|
||
On same linux box, opt build: (again, not a real-time jprof)
18% VP8 encode
6.9% VP8 decode
8.5% ViewManager
4.5% audio encode/process - 0.9% send-side AEC, 3% Opus
2.7% networking
0.5% TimeUntilProcess (without the patch to reduce the number of calls to it)
More details later...
Comment 18•12 years ago
|
||
Reporter | ||
Comment 19•12 years ago
|
||
The attached file is the perf data.
As you can see that the top 1 is __udivsi3 which is division. I think the compiler does not do optimization well.
Here is the settings of both peers.
* FPS: encode once in every 20 frames
* use fake video, no audio
* CPU usage by threads
PID TID PR CPU% S VSS RSS PCY UID Thread Proc
1698 1743 0 20% S 141688K 48660K fg root ProcessThread /system/b2g/plugin-container
1698 1749 0 14% S 141688K 48660K fg root ProcessThread /system/b2g/plugin-container
1698 1741 0 7% S 141688K 48660K fg root Browser /system/b2g/plugin-container
1698 1750 0 4% S 141688K 48660K fg root ViECaptureThrea /system/b2g/plugin-container
1698 1698 0 3% S 141688K 48660K fg root Browser /system/b2g/plugin-container
1698 1742 0 2% S 141688K 48660K fg root Trace /system/b2g/plugin-container
508 508 0 2% R 1120K 372K fg root top top
1698 1705 0 2% S 141688K 48660K fg root Socket Thread /system/b2g/plugin-container
1563 1563 0 1% S 172212K 59096K fg root b2g /system/b2g/b2g
511 511 0 1% S 1060K 312K fg root top top
1698 is browser app and I profile thread 1743
Comment 20•12 years ago
|
||
Again, are you using code pre or post-landing of bug 860441?
Comment 21•12 years ago
|
||
(In reply to StevenLee from comment #7)
> (In reply to Benoit Girard (:BenWa) from comment #6)
> > This might be a dupe of bug 860441.
> Hi Benoit,
>
> I think it's different from bug 860441. I don't use camera in the testing. I
> use fake video.
The fix in bug 860441 was not specific to the camera. It fixes a general regression whereby we had lost the ability to directly use a YUV texture in the compositor and were falling back to a slow read-back-and-convert-in-software path.
Reporter | ||
Comment 22•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #21)
> The fix in bug 860441 was not specific to the camera. It fixes a general
> regression whereby we had lost the ability to directly use a YUV texture in
> the compositor and were falling back to a slow
> read-back-and-convert-in-software path.
Hi Benoit,
I didn't update to that revision. I will try it later. But from the profiling data, display seems not the top 5. Maybe the fps is only about 1.x.
Comment 23•12 years ago
|
||
I think this is independent from Bug 860441, too. Since WebRTC do not create GonkIOSurfaceImage but PlanarYCbCrImage now. See:
http://mxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp#1038
And I think the patch is 860441 has nothing to do with PlanarYCbCrImage, isn't it?
Reporter | ||
Comment 24•12 years ago
|
||
Here are the patches that I used to test the performance
Comment 25•12 years ago
|
||
(In reply to Chiajung Hung [:chiajung] from comment #23)
> And I think the patch is 860441 has nothing to do with PlanarYCbCrImage,
> isn't it?
That's correct. Ping me if there is a display-related performance issue, I haven't looked into the compositing of PlanarYCbCrImage.
Reporter | ||
Updated•12 years ago
|
Blocks: b2g-webrtc
Comment 26•12 years ago
|
||
So, there actually is a graphics performance bug on mozilla-central now that could still possibly explain this. In gfx/layers/client/ImageClient.cpp, in ImageClientSingle::UpdateImage, there is only a path from PLANAR_YCBCR and no path for GRALLOC_PLANAR_YCBCR. But on B2G, we have a GRALLOC_PLANAR_YCBCR. So we don't take any fast path there and fall back to the slow path of calling GetAsSurface and doing things in software.
Comment 27•12 years ago
|
||
This is broken down per-thread (it's a normal usertime profile; I can't get JP_REALTIME working with threads other than mainthread...); each thread will have times that add up to 100% (of that thread's CPU use). You can use the hit-counts (left column) to figure out relative CPU use between threads.
Only the first 6 threads have enough hits to care about.
Comment 28•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #26)
> So, there actually is a graphics performance bug on mozilla-central now that
> could still possibly explain this. In gfx/layers/client/ImageClient.cpp, in
> ImageClientSingle::UpdateImage, there is only a path from PLANAR_YCBCR and
> no path for GRALLOC_PLANAR_YCBCR. But on B2G, we have a
> GRALLOC_PLANAR_YCBCR. So we don't take any fast path there and fall back to
> the slow path of calling GetAsSurface and doing things in software.
Have we filed a bug for this in bugzilla? If not, can you file one and link it to this bug? Thanks for the info on this!
Flags: needinfo?(bjacob)
Updated•12 years ago
|
Keywords: meta
Whiteboard: [WebRTC][blocking-webrtc-] → [WebRTC][blocking-webrtc-][b2g-webrtc-]
Reporter | ||
Comment 30•12 years ago
|
||
The cpu usage of browser is about 65~68. Please note that on B2G the composition operation is in parent process, b2g process. Here is the settings of both peers.
* FPS: 15
* video: 176x144
* audio: none
Attachment #739362 -
Attachment is obsolete: true
Reporter | ||
Comment 31•12 years ago
|
||
The CPU usage is down to 59~61%. The top one is NV12 to I420.
Reporter | ||
Comment 32•12 years ago
|
||
The CPU usage now is 53~54%. The top one is I420Copy.
Comment 33•12 years ago
|
||
(In reply to StevenLee from comment #31)
> The CPU usage is down to 59~61%. The top one is NV12 to I420.
It looks like its using the C fallback (SplitUV_C). /proc/cpuinfo says this thing had NEON, though. So either we're not building libyuv correctly, the CPU detection is not working, or the alignment is wrong.
Reporter | ||
Comment 34•12 years ago
|
||
(In reply to Timothy B. Terriberry (:derf) from comment #33)
> It looks like its using the C fallback (SplitUV_C). /proc/cpuinfo says this
> thing had NEON, though. So either we're not building libyuv correctly, the
> CPU detection is not working, or the alignment is wrong.
That's because the video width is 176. The width of U and V is 176/2=88. In libyuv, it detects the CPU correctly. It goes the neon path only if the width is multiple of 16. If not, it uses c version to do colour space convert. In this case, it uses the c version.
Comment 35•12 years ago
|
||
(In reply to StevenLee from comment #34)
> libyuv, it detects the CPU correctly. It goes the neon path only if the
> width is multiple of 16. If not, it uses c version to do colour space
> convert. In this case, it uses the c version.
Yup. Upstream already has a SplutUVRow_Any_NEON that appears to work for any width.
Comment 36•12 years ago
|
||
Thanks for the perf data!
On libyuv: I'm going to look at what our options are for updating webrtc.org for FF23 for this and other reasons anyways (we discussed this in our meeting today).
On a separate note, I may look at moving libyuv out of media/webrtc, as it's generally useful.
Reporter | ||
Comment 37•12 years ago
|
||
cpu: 72~74
* top by process
User 75%, System 23%, IOW 0%, IRQ 0%
User 196 + Nice 51 + Sys 78 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 2 = 327
PID PR CPU% S #THR VSS RSS PCY UID Name
3657 0 73% S 37 146396K 48716K fg root /system/b2g/plugin-container
3504 0 14% R 33 176240K 56676K fg root /system/b2g/b2g
3210 0 3% S 20 36844K 2756K fg media /system/bin/mediaserver
2628 0 3% S 1 1124K 384K fg root top
2521 0 3% S 1 0K 0K fg root ksdioirqd/mmc2
* top by thread
User 78%, System 20%, IOW 0%, IRQ 0%
User 221 + Nice 47 + Sys 71 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 1 = 340
PID TID PR CPU% S VSS RSS PCY UID Thread Proc
3657 3719 0 41% R 146396K 48992K fg root Browser /system/b2g/plugin-container
3657 3730 0 8% S 146396K 48992K fg root DecodingThread /system/b2g/plugin-container
3657 3664 0 7% S 146396K 48992K fg root Socket Thread /system/b2g/plugin-container
3657 3734 0 4% S 146396K 48992K fg root ViECaptureThrea /system/b2g/plugin-container
3657 3699 0 3% S 146396K 48992K fg root webrtc_gonk_aud /system/b2g/plugin-container
3657 3726 0 3% S 146396K 48992K fg root ProcessThread /system/b2g/plugin-container
3657 3733 0 2% S 146396K 48992K fg root ProcessThread /system/b2g/plugin-container
2628 2628 0 2% R 1124K 384K fg root top top
2521 2521 0 2% S 0K 0K fg root ksdioirqd/mmc2
3504 3526 0 2% S 176176K 57144K fg root Compositor /system/b2g/b2g
Reporter | ||
Comment 38•12 years ago
|
||
Hi derf,
I found that in attachment 742214 [details] audio uses much cpu. I am not familiar with this. Would you please take a look?
Thanks.
Comment 39•12 years ago
|
||
(In reply to StevenLee from comment #38)
> I found that in attachment 742214 [details] audio uses much cpu. I am not
> familiar with this. Would you please take a look?
If I'm reading that right, we're spending 2/3 of our audio processing time doing AEC and 1/3 encoding.
It looks like we're using the desktop AEC module by default. We probably want to use the "mobile" aecm module instead (media/webrtc/trunk/webrtc/modules/audio_processing/aecm), which uses integer math and has NEON optimizations. It looks like the NEON is currently hard-coded to only be used if 'OS=="android"' in audio_processing.gypi. GIPS only uses the module by default for Android and iOS: this is controlled from media/webrtc/trunk/webrtc/voice_engine/voe_audio_procesing_impl.cc (look for kDefaultEcMode).
You should be able to override the default by setting the media.peerconnection.aec pref to 4 (see the EcModes enum in media/webrtc/trunk/webrtc/common_types.h for the full list). This pref is passed in from the audio conduit: See WebrtcAudioConduit::ConfigureSendMediaCodec() in media/webrtc/signaling/src/media-conduit/AudioConduit.cpp.
Reporter | ||
Comment 40•12 years ago
|
||
The current version will capture and send audio to other peer but it has serious delay. I will figure it out.
Attachment #739521 -
Attachment is obsolete: true
Reporter | ||
Comment 41•12 years ago
|
||
(In reply to StevenLee from comment #40)
> Created attachment 742279 [details]
> needed patches for testing with audio turns on
>
> The current version will capture and send audio to other peer but it has
> serious delay. I will figure it out.
The series of patches is
AudioDevice.patch
Camera_part1.patch
Camera_part2.patch
Camera_part3.patch
SkipPermissionCheck.patch
ModifyVideoSettings.patch
mediaEngine.patch
nsComponent.patch
reduceTimerThread.patch
forceI420.patch
Reporter | ||
Comment 42•12 years ago
|
||
(In reply to Timothy B. Terriberry (:derf) from comment #39)
> (In reply to StevenLee from comment #38)
> If I'm reading that right, we're spending 2/3 of our audio processing time
> doing AEC and 1/3 encoding.
Thanks, I will try it. :)
Reporter | ||
Comment 43•12 years ago
|
||
Rebase the camera and mic patches since bug 825110 and bug 825112 have updated.
Here is the sequence of applying these patches
825110_v11_part_webrtc.patch
825110_v5_part_camera.patch
825110_v6_part_webrtc_media.patch
825112_audio_device_impl_for_gonk.patch
825112_duplicate_audio_device_impl.patch
825112_v4_audio_device.patch
SkipPermissionCheck.patch
ModifyVideoSettings.patch
mediaEngine.patch
nsComponent.patch
reduceTimerThread.patch
forceI420.patch
FixAudioIntial.patch
Attachment #742279 -
Attachment is obsolete: true
Reporter | ||
Comment 44•12 years ago
|
||
(In reply to Timothy B. Terriberry (:derf) from comment #39)
> (In reply to StevenLee from comment #38)
> You should be able to override the default by setting the
> media.peerconnection.aec pref to 4 (see the EcModes enum in
> media/webrtc/trunk/webrtc/common_types.h for the full list). This pref is
> passed in from the audio conduit: See
> WebrtcAudioConduit::ConfigureSendMediaCodec() in
> media/webrtc/signaling/src/media-conduit/AudioConduit.cpp.
Hi derf,
It seems works. The CPU usage goes down about 5% when I set media.peerconnection.aec pref to 4. But the total CPU usage goes high after I update to the latest m-c. I will find out which parts cause the problem.
Thanks for your suggestion.
Reporter | ||
Comment 45•12 years ago
|
||
Fixed the problem of gUM and the patching series is the same as comment 43.
Attachment #743526 -
Attachment is obsolete: true
Reporter | ||
Comment 46•12 years ago
|
||
It's my bad that I did not remove the testing code for timestamp. It makes no video displaying when the connection is constructed.
Attachment #744409 -
Attachment is obsolete: true
Reporter | ||
Comment 47•12 years ago
|
||
Update patch_series.txt
Attachment #748747 -
Attachment is obsolete: true
Reporter | ||
Comment 48•12 years ago
|
||
I found 2 places that may make the performance worse.
* NotifyPull and NotifyQueuedTrackChanges been called too often, about 10ms per time interval. For audio we may need to encode every 10 ms but video does not. So that the camera may capture 30 frames per second but there are more than 30 frames been encoded per second.
* This may happen only on FFOS. The audio encoding uses much CPU, about 80%. derf, should we set any flags when compiling opus encoder on ARM?
One question, could we set the interval of NotifyPull and NotifyQueuedTrackChanges longer, ex, 30 ms?
Comment 49•12 years ago
|
||
Steven,
For video encoding path performance issue, we may discuss in bug 872887. That one is more specific.
Change this issue as meta one, list perf bottleneck we find in whole pipeline.
Comment 50•12 years ago
|
||
NotifyPull is called to see if there's a new video or audio frame. Eventually we want to switch to a push-notification, but the way MediaStreams are currently defined makes this hard.
Note that when called, it will return the same image as the previous call unless the video source has changed the image. Returning the same image ptr in gUM should not cause a new frame to be encoded.... However, it appears this is not the case (though likely the GIPS encode code is throwing away many of the dups).
Putting in a dup detector shows we're getting around 90-95 dups per 30 real frames per second on desktop.
I'll file a bug on this issue. Thanks!
Comment 51•12 years ago
|
||
(In reply to StevenLee from comment #48)
> * This may happen only on FFOS. The audio encoding uses much CPU, about 80%.
> derf, should we set any flags when compiling opus encoder on ARM?
You can pass a setting to reduce the complexity of Opus encoding (with a drop in quality), but do you have updated profile data around this? The data in comment 44 had us spending only 37% on Opus encoding.
Reporter | ||
Comment 52•12 years ago
|
||
Hi derf,
Here is the perf data when using "audio only". When turing on aecm, the cpu usage is from 80% to 70%. The perf data in comment 44 is different from this one. I think it's because the test in comment 44 has audio and video. If I run video only the cpu usage is about 60%. Currently, the cpu may not afford both audio and video without fixing some performance issues. Can you suggest a settings for opus encoder? Thanks.
Attachment #742214 -
Attachment is obsolete: true
Comment 53•12 years ago
|
||
(In reply to StevenLee from comment #52)
> Created attachment 750904 [details]
> perf data of browser app with audio only
>
> Hi derf,
>
> Here is the perf data when using "audio only". When turing on aecm, the cpu
> usage is from 80% to 70%. The perf data in comment 44 is different from this
> one. I think it's because the test in comment 44 has audio and video. If I
> run video only the cpu usage is about 60%. Currently, the cpu may not
> afford both audio and video without fixing some performance issues. Can you
> suggest a settings for opus encoder? Thanks.
Steven -- Can you run the same tests with G.711 in place of Opus? Thanks.
Comment 54•12 years ago
|
||
(In reply to StevenLee from comment #52)
> Here is the perf data when using "audio only". When turing on aecm, the cpu
> usage is from 80% to 70%. The perf data in comment 44 is different from this
> one. I think it's because the test in comment 44 has audio and video. If I
> run video only the cpu usage is about 60%. Currently, the cpu may not
> afford both audio and video without fixing some performance issues. Can you
> suggest a settings for opus encoder? Thanks.
So, from that profile it appears we're running in Hybrid mode, which uses both the CELT and SILK encoders (and thus the maximum CPU). I'm not actually sure why, because AFAIK the default bitrate (64 kbps) should be high enough to switch to CELT mode by itself, and I don't think we're overriding that default anywhere. Generally the CELT encoder is much faster than the SILK one at higher complexities. I'll investigate.
You can use opus_encoder_ctl(inst->encoder, OPUS_SET_COMPLEXITY(4)) in media/webrtc/trunk/webrtc/modules/audio_coding/codecs/opus/opus_interface.c to lower the complexity. The default is 10 (the maximum).
In other news, this weekend I committed some changes upstream which should reduce complexity on ARM in the neighborhood of 20%: <https://git.xiph.org/?p=opus.git;a=commit;h=972a34ec>.
Comment 55•12 years ago
|
||
We notice that a lot of time is being spent inserting the frame and this
maybe delayin the MSG. This moves the insert to a different thread.
I tested this once locally, but it shouldn't be considered done.
Reporter | ||
Comment 56•11 years ago
|
||
Hi all,
Sorry for late reply. I tried G711 and the total CPU usage is down to about 50% percent. It's quit helpful. But the audio still has latency problem. I found one duplicate audio encoding problem of both desktop and B2G(bug 877518).
I was trying to find out the audio latency problem on B2G. I think it should be fixed first. Then I will try Opus encoder with lower complexity and new version of Opus encoder.
Reporter | ||
Comment 57•11 years ago
|
||
I tested OPUS with complexity 4 and 1. Both of the CPU usage are about 60% when audio only. The audio also has jitter problem.
Comment 58•6 years ago
|
||
Mass closing as we are no longer working on b2g/firefox os.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Comment 59•6 years ago
|
||
Mass closing as we are no longer working on b2g/firefox os.
You need to log in
before you can comment on or make changes to this bug.
Description
•