Closed Bug 601542 Opened 14 years ago Closed 8 years ago

High CPU use playing video with sound on some platforms (old Ubuntu specific?)

Categories

(Core :: Audio/Video: Playback, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE
mozilla2.0

People

(Reporter: alice0775, Unassigned)

References

()

Details

(Keywords: regression)

Attachments

(1 file, 5 obsolete files)

Build Identifier: Mozilla/5.0 (X11; Linux i686; rv:2.0b7pre) Gecko/20101002 Firefox/4.0b7pre ID:20101002031237 After landing Bug 531340, High CPU usage on OGG Video Playback in Lunux. Reproducible: Always Steps to Reproduce: 1. Start Minefield with new profile 2. Open URL ( http://www.dailymotion.com/html5 ) 3. Watch System Monitor Actual Results: High CPU usage than before landing the Bug 531340. CPU usage is about 2 or 3 times. Regression range: CPU less than 20%: http://hg.mozilla.org/mozilla-central/rev/6a398023b956 Mozilla/5.0 (X11; Linux i686; rv:2.0b7pre) Gecko/20101002 Firefox/3.7a4pre ID:20100401034504 CPU more then 70%: http://hg.mozilla.org/mozilla-central/rev/c8f0660250e0 Mozilla/5.0 (X11; Linux i686; rv:2.0b7pre) Gecko/20101002 Firefox/3.7a4pre ID:20100402030334 And I revert to cgangeset 4613e0369937 in local build then CPU usage less than 20%. So, The following changeset causes the hight CPU usage in Linux build. 1d00691be5f2 Chris Pearce — Bug 531340 - New Ogg video decoder. r=doublec sr=roc
Can you please include your computer's specs in the bug? Particularly, CPU type and speed, video card (and driver, if known) and what Linux distro you're running.
Also, can you please test against http://media.tinyvid.tv/3d198wqepg78m.ogg or some other single-video simplified case and report if you still see the problem? The DailyMotion link is slightly complicated because it plays a random video, and it also will use canvas to paint the video if you click any of the effects buttons. It's possible this problem is related to DailyMotion specifically, so testing the link above will help narrow this down.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Looks like I can reproduce this... On a 1.7GHz Core i7 with CPU scaling disabled, testing with that video (and without the controls showing), I see around 11% CPU in Firefox and 7% in Xorg with 3.6.9 beta. With a 4.0 nightly, I see around 20% CPU in Firefox and 7% in Xorg.
(In reply to comment #1) > Can you please include your computer's specs in the bug? Particularly, CPU > type and speed, video card (and driver, if known) and what Linux distro you're > running. Core2Quad Q8300 @2.5GHz, ATI HD4350, Ubuntu 8.04 LTS japanese. However I use Ubuntu on VMware Player, and use 2 of 4CPU for the virtual machine and GPU Accelerated Windows 0/1. (In reply to comment #2) > Also, can you please test against http://media.tinyvid.tv/3d198wqepg78m.ogg or > some other single-video simplified case and report if you still see the > problem? Using http://media.tinyvid.tv/3d198wqepg78m.ogg, Recent build:CPU around 55-66% http://hg.mozilla.org/mozilla-central/rev/e3fa26aab8a6 Mozilla/5.0 (X11; Linux i686; rv:2.0b7pre) Gecko/20101003 Firefox/4.0b7pre ID:20101003192227 Before landing the Bug 531340 :CPU around 16-22% http://hg.mozilla.org/mozilla-central/rev/6a398023b956 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.3a4pre) Gecko/20100401 Minefield/3.7a4pre ID:20100401034504
blocking2.0: --- → ?
Attached file Profile & callgraph (deleted) —
I profiled using oprofile the revisions which Alice noted the performance decrease. x86 Linux. This machine has no sound hardware.
Using kinda primitive tools here, but /usr/bin/time (passing video to Firefox binary so it autoplays, then closing the browser once the video fades to black) reports: old, sound: 21.38user 5.00system 3:36.90elapsed 12%CPU (0avgtext+0avgdata 303504maxresident)k 16inputs+824outputs (1major+405814minor)pagefaults 0swaps new, sound: 29.67user 6.00system 3:34.40elapsed 16%CPU (0avgtext+0avgdata 296608maxresident)k 0inputs+14040outputs (0major+32544minor)pagefaults 0swaps User CPU increased 28%. old, no sound: 24.12user 6.27system 3:34.00elapsed 14%CPU (0avgtext+0avgdata 325472maxresident)k 32inputs+16496outputs (0major+47087minor)pagefaults 0swaps new, no sound: 25.40user 3.92system 3:33.23elapsed 13%CPU (0avgtext+0avgdata 268064maxresident)k 360inputs+4664outputs (0major+19369minor)pagefaults 0swaps User CPU increased 5%. Alice0775, can you please test against http://flim.org/~kinetik/3d198wqepg78m_nosound.ogg and report whether you see the high CPU use with the new decoder implementation with this file?
Typo, I meant to say User CPU increased 38% in the first case.
(In reply to comment #6) > Alice0775, can you please test against > http://flim.org/~kinetik/3d198wqepg78m_nosound.ogg and report whether you see > the high CPU use with the new decoder implementation with this file? The CPU use is low(almost same as before landing of Bug 531340).
Thanks for confirming! I wonder if this is mostly caused by the behavioural change mentioned in bug 531340 comment 91, then. Given you're testing in a VM and with a distro using PulseAudio (I think), it's possible the overhead is magnified by the cross-process calls to PA and further magnified by the VM making contexting switching more expensive than usual. I'll come up with a patch tomorrow and post it for testing.
The CPU numbers I posted earlier tested 4613e0369937 (revision before bug 531340 landed) and 1d00691be5f2 (bug 531340 landed). I've run the same measurement with a recent trunk build using the same compile flags: sound: 35.71user 5.47system 3:34.09elapsed 19%CPU (0avgtext+0avgdata 320256maxresident)k 0inputs+408outputs (0major+22370minor)pagefaults 0swaps no sound: 31.25user 3.63system 3:34.15elapsed 16%CPU (0avgtext+0avgdata 344432maxresident)k 96inputs+10088outputs (0major+29127minor)pagefaults 0swaps So it looks like there are further regressions that need investigation.
Should we block on a regression here?
Yeah, I'm part way through a patch that will help with this. I don't think it'll fix the entire perf regression, but it'll be in the same ballpark at least.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Recording some more numbers here for my own convenience: old: 17.39user 3.30system 8:05.41elapsed 4%CPU (0avgtext+0avgdata 253120maxresident)k 0inputs+4248outputs (0major+21012minor)pagefaults 0swaps new: 21.82user 7.86system 8:16.39elapsed 5%CPU (0avgtext+0avgdata 255872maxresident)k 0inputs+4128outputs (0major+21119minor)pagefaults 0swaps 25% trunk: 32.13user 8.94system 9:15.38elapsed 7%CPU (0avgtext+0avgdata 308048maxresident)k 0inputs+8336outputs (0major+21410minor)pagefaults 0swaps trunk #2: 31.77user 9.69system 8:02.48elapsed 8%CPU (0avgtext+0avgdata 347728maxresident)k 8inputs+5816outputs (0major+24837minor)pagefaults 0swaps 45% trunk #3 (recent alsa-plugins): 29.96user 7.43system 7:47.09elapsed 8%CPU (0avgtext+0avgdata 316448maxresident)k 3608inputs+12632outputs (0major+22987minor)pagefaults 0swaps patched trunk: 28.24user 4.71system 8:10.55elapsed 6%CPU (0avgtext+0avgdata 284192maxresident)k 120inputs+6376outputs (0major+22055minor)pagefaults 0swaps -11% patched trunk #2: 28.27user 4.59system 7:47.16elapsed 7%CPU (0avgtext+0avgdata 282416maxresident)k 0inputs+4040outputs (0major+19666minor)pagefaults 0swaps no sound device trunk: 25.73user 4.75system 7:48.33elapsed 6%CPU (0avgtext+0avgdata 281968maxresident)k 6928inputs+536outputs (0major+19098minor)pagefaults 0swaps patched trunk: 26.57user 5.67system 11:21.78elapsed 4%CPU (0avgtext+0avgdata 327776maxresident)k 80inputs+1480outputs (0major+23188minor)pagefaults 0swaps ----------------------------------------------------------- trunk, no controls, IK+: 9.72user 5.68system 8:25.51elapsed 3%CPU (0avgtext+0avgdata 293040maxresident)k 64inputs+6608outputs (1major+19689minor)pagefaults 0swaps patched trunk, no controls, IK+: 7.73user 2.58system 8:07.01elapsed 2%CPU (0avgtext+0avgdata 353056maxresident)k 7712inputs+32920outputs (0major+23460minor)pagefaults 0swaps with fixed 32kB writeBuffer: 6.88user 2.12system 7:58.31elapsed 1%CPU (0avgtext+0avgdata 282960maxresident)k 0inputs+728outputs (0major+18947minor)pagefaults 0swaps and floorf removed: 5.91user 2.13system 7:50.50elapsed 1%CPU (0avgtext+0avgdata 282800maxresident)k 136inputs+888outputs (0major+18486minor)pagefaults 0swaps
Attached patch prototype patch (obsolete) (deleted) — Splinter Review
Short term fix until we rework this with the new audio library (sydneyaudio replacement). Coalesce audio writes in the audio loop. Reduces the number of writes to sydneyaudio (and, thus, on PulseAudio systems, interprocess communication to the PA server) significantly. Needs testing on other platforms.
Summary: High CPU usage on OGG Video Playback → Many small audio writes causes high CPU during media playback
Attached patch patch v1 (obsolete) (deleted) — Splinter Review
Fixed test failures caused by bugs in the patch. Should be good-to-go now, but I'll test on other platforms before requesting review.
Attachment #483392 - Attachment is obsolete: true
Oh, also: this patch contains a fix for bug 598829.
Tests pass on Linux and Win32 for me, pushed to tryserver for better coverage. Test builds should turn up in a few hours: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mgregan@mozilla.com-e852bf2718bd/
Builds are available. Alice0775, if you could test these and let me know whether it helps, that'd be much appreciated. Linux tests were all green. Win32 haven't come in yet. OSX got this: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1287391620.1287393618.6758.gz s: talos-r3-snow-042 55345 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_a4_tone.html | The audioAvailable event's time attribute was invalid. 55346 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_a4_tone.html | Check amount of signal data processed - got 103424, expected 133120 55347 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_a4_tone.html | Check signal present - got false, expected true 55348 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_a4_tone.html | Check mute fragments present - got false, expected true
(In reply to comment #19) > Builds are available. Alice0775, if you could test these and let me know > whether it helps, that'd be much appreciated. Tested on VMware as same as comment #4 Laatest Nightly: http://hg.mozilla.org/mozilla-central/rev/397c458b40a4 Mozilla/5.0 (X11; Linux i686; rv:2.0b8pre) Gecko/20101016 Firefox/4.0b8pre ID:20101016025826 TryServer in comment #18: http://hg.mozilla.org/try/rev/e852bf2718bd Mozilla/5.0 (X11; Linux i686; rv:2.0b8pre) Gecko/20101018 Firefox/4.0b8pre ID:20101018003128 Video: http://www.dailymotion.com/html5 Latest Nightly: CPU around around 55-77% TrySercer: CPU around around 66-72% Video : http://media.tinyvid.tv/3d198wqepg78m.ogg Latest Nightly: CPU around 55-66% TrySercer: CPU around 55-66%
Thanks for that, I'm a little disappointed that the patch hasn't helped at all. I'm setting up a more similarly configured system to what you're testing against to see if I can work out what's going on. The OS X failures above are caused by the new code queuing a large enough amount of data that the event queue size limit in nsAudioAvailableEventManager is exceeded and it starts dropping events. This is due to Available() returning 882000 on OS X. The event queue size limit is 100 "frame buffer" (typically 1024 byte) sized events. It actually makes more sense to try to always queue a fixed amount (e.g. 32kB) rather than using Available()--changing the patch to do this, so long as it doesn't exceed the limits in nsAudioAvailableEventManager, fixes the test failures on OS X.
Attached patch patch v1.1 (obsolete) (deleted) — Splinter Review
Attachment #483910 - Attachment is obsolete: true
Attached patch patch v1.2 (obsolete) (deleted) — Splinter Review
Rebased after Tremor landing. Use a 4k buffer rather than 32k.
Attachment #484460 - Attachment is obsolete: true
Looking at this problem on the N900, where it seems to be rather extreme. Using this testcase: http://flim.org/~kinetik/out.html, which is a mono 48kHz Vorbis stream. N900, audio disabled in nsAudioStream (mInError=true before returning OK from Init()): 8% CPU N900, audio enabled: 100% (75 user, 25 system) CPU, would need even more to play back smoothly No audio, system profile: samples % image name app name symbol name 278301 85.0309 no-vmlinux no-vmlinux /no-vmlinux 5484 1.6756 oprofiled oprofiled /usr/bin/oprofiled 4838 1.4782 no-vmlinux plugin-container /no-vmlinux 3636 1.1109 libxul.so plugin-container mdct_backward 2030 0.6202 no-vmlinux pulseaudio /no-vmlinux No audio, plugin-container profile: samples % image name symbol name 4838 19.8124 no-vmlinux /no-vmlinux 3636 14.8900 libxul.so mdct_backward 1428 5.8479 libxul.so nsAudioAvailableEventManager::QueueWrittenAudioData(short*, unsigned int, unsigned long long) 1165 4.7709 libxul.so _vorbis_apply_window 1012 4.1443 libxul.so decode_packed_entry_number Audio, system profile: samples % image name app name symbol name 72364 19.0834 no-vmlinux no-vmlinux /no-vmlinux 55717 14.6933 libasound.so.2.0.0 plugin-container /usr/lib/libasound.so.2.0.0 44412 11.7121 no-vmlinux plugin-container /no-vmlinux 31489 8.3041 libasound_module_pcm_pulse.so plugin-container /usr/lib/alsa-lib/libasound_module_pcm_pulse.so 19831 5.2297 libpulsecommon-0.9.15.so plugin-container /usr/lib/libpulsecommon-0.9.15.so 19332 5.0981 libpulse.so.0.8.0 plugin-container /usr/lib/libpulse.so.0.8.0 15096 3.9810 no-vmlinux pulseaudio /no-vmlinux 13820 3.6445 libpthread-2.5.so plugin-container pthread_mutex_lock 10356 2.7310 module-nokia-voice.so pulseaudio /usr/lib/pulse-0.9.15/modules/module-nokia-voice.so 9405 2.4802 oprofiled oprofiled /usr/bin/oprofiled 8656 2.2827 libpthread-2.5.so plugin-container __pthread_mutex_unlock_usercnt 7408 1.9536 libc-2.5.so plugin-container poll 4872 1.2848 libpulsecore-0.9.15.so pulseaudio /usr/lib/libpulsecore-0.9.15.so 4310 1.1366 libpthread-2.5.so plugin-container pthread_mutex_unlock 4174 1.1007 libpulsecommon-0.9.15.so pulseaudio /usr/lib/libpulsecommon-0.9.15.so 3110 0.8201 libxul.so plugin-container mdct_backward Audio, plugin-container profile: samples % image name symbol name 55717 23.5732 libasound.so.2.0.0 /usr/lib/libasound.so.2.0.0 44412 18.7902 no-vmlinux /no-vmlinux 31489 13.3226 libasound_module_pcm_pulse.so /usr/lib/alsa-lib/libasound_module_pcm_pulse.so 19831 8.3903 libpulsecommon-0.9.15.so /usr/lib/libpulsecommon-0.9.15.so 19332 8.1792 libpulse.so.0.8.0 /usr/lib/libpulse.so.0.8.0 13820 5.8471 libpthread-2.5.so pthread_mutex_lock 8656 3.6623 libpthread-2.5.so __pthread_mutex_unlock_usercnt 7408 3.1342 libc-2.5.so poll 4310 1.8235 libpthread-2.5.so pthread_mutex_unlock 3110 1.3158 libxul.so mdct_backward 2209 0.9346 libc-2.5.so __libc_enable_asynccancel
The problem on the N900 may be different after all. I'm seeing a pattern like this when writing audio, for almost every write: ms 337: write 8192 337: writei 8192 342: writei 2432 602 writei recover: -32 611: writei 2432 write is sa_pcm_write, writei is calling snd_pcm_writei. We try to write 8k, which succeeds but reports that only part of the buffer was accepted (5760 bytes). We immediately try to write the remaining 2432 bytes, which blocks for 260ms before returning EPIPE (indicating an underrun occurred), so we call snd_pcm_recover and reattempt the write, which succeeds. snd_pcm_recover is *very* expensive, and we wouldn't expect to call it frequently in normal use. We initialize the PCM device with 250ms latency, so I think we're ending up in a state where the buffer sizing is pathological for this configuration. Maybe we end up with a single 250ms buffer, and we end up blocked in writei waiting for more space and underrun while we wait for it to empty. As a quick hack, changing: 143 if (snd_pcm_set_params(s->output_unit, 144 #ifdef SA_LITTLE_ENDIAN 145 SND_PCM_FORMAT_S16_LE, 146 #else 147 SND_PCM_FORMAT_S16_BE, 148 #endif 149 SND_PCM_ACCESS_RW_INTERLEAVED, 150 s->n_channels, 151 s->rate, 152 1, 153 250000) < 0) { 250000 at line 253 to something larger (I used 1000000, which is 1s) makes this problem disappear completely and we're able to play the testcase while using around 8-10% CPU in plugin-container (and similar in PulseAudio).
Note that the value used to be 500ms, but we dropped it to 250ms in bug 526411 to work around playback problems with the old oggplay decoder implementation. We can probably revert to 500ms safely on trunk since we're using our own decoder implementation now. snd_pcm_dump output for 250ms configuration: buffer_size : 12000 period_size : 3000 period_time : 62500 tstamp_mode : NONE period_step : 1 avail_min : 3000 period_event : 0 start_threshold : 12000 stop_threshold : 12000 silence_threshold: 0 silence_size : 0 boundary : 1572864000 Logging: write 1: 89785: writei 4672 write 2: 89799: writei 4096 write 3: 89800: writei 4096 -> total writes would be 12864 (buffer_size is 12000) 89807: writei 864 -> need to rewrite 864 bytes -> blocks in snd_pcm_writei for 250ms 90057: writei recover: -32 Note that 12000 / 48000 is 250ms. If I understand how ALSA works correctly, there should be 4 3000 frame buffers (for 12000 frames total), so I'd only expect the write to block until one of those freed up (~63ms), but it seems to block until the entire buffer runs dry.
Attached patch n900 patch v0 (obsolete) (deleted) — Splinter Review
Spun the N900 stuff off into bug 607200 as it's a separate issue, though the patch in this bug may also help reduce CPU during playback.
Attachment #485228 - Attachment is obsolete: true
Alice, are you able to reproduce this regression outside of a VM (i.e. on a native Linux install)?
Tested on Native installed Ubuntu 10.10 (on the same hardware of the above mentioned VM) Graphic GPU Accelerated Windows : 0/1 http://hg.mozilla.org/mozilla-central/rev/f7016571b472 Mozilla/5.0 (X11; Linux i686; rv:2.0b8pre) Gecko/20101104 Firefox/4.0b8pre ID:20101104025921 Video: http://www.dailymotion.com/html5 CPU around 11-22% Video : http://media.tinyvid.tv/3d198wqepg78m.ogg CPU around 5-16% It is a quite low CPU utilization!!
Attachment #484538 - Attachment is obsolete: true
I tested this on a 2.4GHz T7700 Core 2 Duo running VMWare Fusion 3.x with a Fedora 12 VM and couldn't reproduce the problem (playing an audio-only file with controls disabled used ~3% CPU). I then tried my Ubuntu 9.10 VM with the same testcase and saw the CPU use at around 40-80% during playback. That eliminates VMs as the common factor, so perhaps this bug is specific to older Ubuntu versions.
blocking2.0: final+ → ---
Summary: Many small audio writes causes high CPU during media playback → High CPU use playing video with sound on some platforms (Ubuntu 8.04 specific?)
Summary: High CPU use playing video with sound on some platforms (Ubuntu 8.04 specific?) → High CPU use playing video with sound on some platforms (old Ubuntu specific?)
Yeh,10.10 is better than 8.04 on VM. On VMware Player3.1.2 build-301548 (host Windows7x64) Video: http://www.dailymotion.com/html5 ubuntu8.04 : CPU around around 55-77% ubuntu10.10: CPU around around 11-33% Video : http://media.tinyvid.tv/3d198wqepg78m.ogg ubuntu8.04: CPU around 55-66% ubuntu10.10: CPU around 5-27%
Oprofile seems to suggest that we're spending most of our time below sa_stream_write and sa_stream_get_position, but applying my patch from this bug makes very little difference despite reducing the number of writes by at least 8x.
Assignee: kinetik → nobody
Component: Audio/Video → Audio/Video: Playback
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: