Open Bug 1616184 Opened 5 years ago Updated 2 years ago

soundtouch::SoundTouch in xul.dll - high CPU usage

Categories

(Core :: Audio/Video: Playback, defect, P3)

73 Branch
Desktop
Windows 10
defect

Tracking

()

Tracking Status
firefox73 --- affected

People

(Reporter: Ooquouquei864, Unassigned)

References

Details

Attachments

(11 files)

Attached image soundTouch.png (deleted) —

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0

Steps to reproduce:

I have not managed to find a consistent way to reproduce the issue. SoundTouch tends to hog the CPU even when no audio is playing (Using Firefox on Windows 10, with current Realtek audio drivers)

the issue is intermittent. sometimes, opening about:addons resolves the issue (no idea why, just noticed this randomly).

It's been happening for a while, the recent update to SoundTouch 2.1.2 has not helped.

Actual results:

Stack call saturating one CPU core:

VCRUNTIME140.dll!memcpy+0x21f
xul.dll!soundtouch::SoundTouch::operator=+0x2244942

Stack call with next highest usage (~2%):

xul.dll!soundtouch::SoundTouch::numChannels+0x260faa
xul.dll!XRE_GetBootstrap+0x1384e5
xul.dll!soundtouch::SoundTouch::numChannels+0x263c28
xul.dll!soundtouch::SoundTouch::numChannels+0x49cd7e
xul.dll!DumpCompleteHeap+0x30cfed
xul.dll!workerlz4_compress+0x26b76e
xul.dll!XRE_GetBootstrap+0x1d16b
xul.dll!XRE_GetBootstrap+0xfe9bd
xul.dll!XRE_GetBootstrap+0x1cdaf
xul.dll!XRE_GetBootstrap+0x1cd28
xul.dll!XRE_GetBootstrap+0x37621
xul.dll!XRE_GetBootstrap+0x1c5d4
xul.dll!soundtouch::SoundTouch::numChannels+0x23416a
KERNEL32.DLL!BaseThreadInitThunk+0x14
mozglue.dll!DllBlocklist_Initialize+0x122d
ntdll.dll!RtlUserThreadStart+0x21

Expected results:

no crazy high CPU usage for no apparent reason

I am also affected by this problem in 73.0.1 (64-bit). Firefox is constantly maxing out two CPU threads, even when there is no user activity or media playing. Firefox also occasionally and briefly feels unresponsive (lag switching tabs, lag handling website events, lag typing into forms).

I can see a very similar stack trace as reporter's 2nd one via Process Hacker. It looks like two threads fighting for locks.

Opening about:addons doesn't seem to help in my case.

My about:support is here: https://pastebin.com/9nkjjpr3

Attached image 20200222-084942.png (deleted) —
Attached image 20200222-085021.png (deleted) —

I should add, the behaviour starts a few seconds to a few minutes after Firefox is started and then does not end until Firefox is restarted (but will start again rather sooner than later).
I only noticed this problem after the update to version 73. I am not sure if it happened in 72, but it probably didn't.

This issue seems to be a lot worse than it previously seemed.

Ever since Firefox started this behaviour my PC experiences hard crashes every couple days (I leave my PC in standby overnight). Audio loops for half a second then everything goes quiet and dark. Only the hard-reset button works.
I can't be 100% sure this is related to this Firefox problem, but these crashes have not previously happened and I have not noticed any other misbehaving software. The Windows error log shows no errors. A blue screen dump was not created. A malware scan came back clean.

Until this is fixed I'm downgrading to Firefox 72.

The problem persists in Firefox 72.0.2 but Firefox 71.0 seems unaffected (for now).

Is there a SoundTouch web app that you use in Firefox and reproduce high CPU usage? please provide us with a link to it.
I need to be able to reproduce the issue myself in order to confirm it.

Furthermore, I don't know in which direction I should investigate to find the cause of this issue, so I have set the component to (Core) XUL in hope that the triage owner will pick it up and help in moving this forward. If the chosen component is incorrect, please set a more appropriate one.

Thank you for your contribution!

Component: Untriaged → XUL
Flags: needinfo?(Ooquouquei864)
OS: Unspecified → Windows 10
Product: Firefox → Core
Hardware: Unspecified → Desktop

for those reading this thread - just so we're on the same page:
SoundTouch is not a webapp (from the wording in your post, it sounds like it is), but a C++ library that is part of Firefox.

https://gitlab.com/soundtouch/soundtouch


Unfortunately I haven't been able to isolate the issue down to a single webpage that would trigger the error. I'm not sure what the cause is at all.

Flags: needinfo?(Ooquouquei864)

Updating my Realtek sound drivers (from version 6.0.1.8184/2017-07-20 to 6.0.1.8273/2017-12-26, the latest one available from my mainboard vendor) seems to have fixed the high CPU usage problem in Firefox 73 and 72 for me.

The priority flag is not set for this bug.
:bgrins, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bgrinstead)
Component: XUL → Audio/Video
Flags: needinfo?(bgrinstead)
Component: Audio/Video → Audio/Video: Playback

The libsoundtouch is updated in Firefox 72 in bug 1588233. If the problem starts from Firefox 72, it probably is bug in soundtouch. It's better to file the bug there.

From comment 9, it seems like this issue is resolved?

Flags: needinfo?(cobrafast)

not resolved, it keeps happening, although I still have no idea what triggers it

the issue did not start with Firefox 72 for me, it used to happen even the older SoundTouch lib and FF versions

installing different Realtek driver versions does not seem to help on my end

I reported the bug to the SoundTouch maintainer a month ago, he insists the lib itself is fine but it is used incorrectly in Firefox:
https://gitlab.com/soundtouch/soundtouch/-/issues/32

(In reply to Ooquouquei864 from comment #12)

I reported the bug to the SoundTouch maintainer a month ago, he insists the lib itself is fine but it is used incorrectly in Firefox:
https://gitlab.com/soundtouch/soundtouch/-/issues/32

Thanks for the update. I'll put the link to the tracking list.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3

If you want to reproduce this then go to:

https://www.pcgamer.com/news/

This started in a more recent firefox release, and it seems like thread contention.

It's been going on for months and I guess I'm tired of seeing it which is why I got over being lazy and registered to post this.

My stack traces are the same as the others, but it's not my job to debug your code.

I'm running win10 x64 with 77.0.1 (64-bit) firefox. I've also run in safemode and same issue.

Now fire up the debugger.

BTW you have to have javascript enabled, if that helps you at all. If javascript is disabled then the CPU spike isn't there. It might be a third party lib, but you know, you gotta prove that, because people will blame you anyways :)

Question for everybody experiencing seemingly unpredictable and random CPU hogs, where Soundtouch always appears in the thread stack:
I also get these symptoms, but these CPU hogs always stop of their own accord after about 1 or 2 minutes, and everything goes back to normal.
Do your CPU hogs also stop spontaneously, and if so, after how long?

The problem went away for me in version 80.0 that I updated today. The website in question no longer takes up CPU usage.

Try updating if you haven't yet. My issue was the CPU would take up roughly 20% every time I was on pcgamer.

(In reply to ThisisMyCupCake from comment #18)

The problem went away for me in version 80.0 that I updated today. The website in question no longer takes up CPU usage.

Try updating if you haven't yet. My issue was the CPU would take up roughly 20% every time I was on pcgamer.

I should say it was a constant elevated CPU to answer your question further. If this not a firefox bug and something on PCgamer has been changed then I would suggest the developer contact them and see what changed.

In contrast to the easily reproducible CPU hogs formerly triggered by https://www.pcgamer.com/news, the random, unpredictable and
unreproducible CPU hogs originally described and discussed in this bug still DO occur in Firefox 80.0.1.

(In reply to Bob Hill from comment #20)

In contrast to the easily reproducible CPU hogs formerly triggered by https://www.pcgamer.com/news, the random, unpredictable and
unreproducible CPU hogs originally described and discussed in this bug still DO occur in Firefox 80.0.1.

I've been looking around and: https://www.foxnews.com/ now displays the same behavior as pcgamer did.

So whatever is going on it is still there in the current version.

Can a developer respond to this please. You've been given two sites that you can reproduce this on (one of which the problem seems to have gone away). It has been months. Not looking good.

Here is some of the debug output from windbg when breaking into the thread that is causing the cpu spike:

Somebody is being naughty here:

[JavaScript Error: "uncaught exception: undefined"]

Opened log file 'f:\tmp\windbg.log'
0:057> g
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 6606}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 6606}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 6606}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3053}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3053}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3053}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3053}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3053}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3060}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3060}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3060}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3060}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3060}]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3060}]
[JavaScript Error: "uncaught exception: undefined"]
[JavaScript Error: "uncaught exception: undefined"]
[JavaScript Error: "uncaught exception: undefined"]
[JavaScript Error: "uncaught exception: undefined"]
[JavaScript Error: "uncaught exception: undefined"]
[JavaScript Warning: "Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted." {file: "https://foxnewsplayer-a.akamaihd.net/player/9.0.12/akamai/amp/core/amp.js?v=20201005160202" line: 3075}]

Might this Bug be related to Bug 1668433?

Sorry I'm just finding this bug. Can someone follow https://blog.paul.cx/post/profiling-firefox-media-workloads/#the-media-preset (see the video or the written instructions) from a firefox build where the problem can be reproduced and post the profile either here or privately via email at padenot@mozilla.com ?

I can't seem to reproduce here, but I'll try again tomorrow on more configurations.

The fact that this has SoundTouch all over it might be because of identical code folding and/or an artifact of symbol resolution, but a profile with the Firefox Profiler knows how to symbolicate all this a lot better.

Flags: needinfo?(mouser_musing)
Flags: needinfo?(hill-robert)
Flags: needinfo?(Ooquouquei864)

Paul Adenot, Thank you for your post. In my case, this problem is entirely unpredictable, and I have no way to reproduce it at will. However, the next time it does happen (maybe in a few days?) I will endeavor to record a profile and let you know.

Flags: needinfo?(hill-robert)

Paul Adenot, the problem finally recurred, and the profile is here: https://share.firefox.dev/3sHIKBY

  1. FWIW, the hardware is single-core/single-thread (Pentium M).

  2. Firefox has been my primary browser for nearly 13 years, but this is the first time I have ever recorded and uploaded a profile. Please do not hesitate to let me know if I have done anything wrong, and what I should do to record a better profile next time.

  3. According to Windows Sysinternals Process Explorer, the CPU hog was thread 2512 in process 3740.

  4. Here is a snapshot of the stack from thread 2512. The stack did however change every time I pressed the refresh button in the Process Explorer window "Stack for thread 2512", but soundtouch was always present somewhere in the stack:

ntkrnlpa.exe!KeSetEvent+0x2c1
ntkrnlpa.exe!RtlInitWeakEnumerationHashTable+0xea
nss3.dll!PR_UnblockClockInterrupts+0x23293
nss3.dll!PR_UnblockClockInterrupts+0x184f5
nss3.dll!sqlite3_step+0x129
xul.dll!v8::internal::RegExpParser::ParseClassEscape+0x22676
xul.dll!DumpCompleteHeap+0x1ecd6
xul.dll!DumpCompleteHeap+0x1fb28
xul.dll!XRE_GetBootstrap+0x322c6
xul.dll!XRE_GetBootstrap+0x31341
xul.dll!soundtouch::SoundTouch::numChannels+0x12d68f
xul.dll!XRE_GetBootstrap+0x3122f
xul.dll!v8::internal::RegExpBytecodeGenerator::length+0x5b67c
nss3.dll!PRP_TryLock+0xa0f
nss3.dll!PR_MD_UNLOCK+0x1086
ucrtbase.DLL!o____lc_collate_cp_func+0x4f
kernel32.dll!BaseThreadInitThunk+0x12
ntdll.dll!RtlInitializeExceptionChain+0xef
ntdll.dll!RtlInitializeExceptionChain+0xc2

Ok so this stack (and all stacks in this bug) are bogus, in the sense that soundtouch::SoundTouch::numChannels doesn't call anything, for a start, and DumpCompleteHeap isn't called by anything in Firefox (it's a utility function to call with a debugger attached, for diagnosing). I've tried to use the Process Explorer utility in the same way you do, and it's unable to resolve symbols properly unfortunately, I also see soundtouch references everywhere, and I'm 100% sure I'm not doing anything with SoundTouch while doing this. It also doesn't show thread names (that's always been pretty weird on Windows), so it's a bit unclear how to proceed.

Do you know how to attach the Visual Studio debugger to the process that is behaving strangely ? This could get us the thread name and then via the Mozilla Symbol Server, good stacks, and we could diagnose this in no time with this information.

Attaching Visual Studio to Firefox and following this: https://developer.mozilla.org/en-US/docs/Mozilla/Using_the_Mozilla_symbol_server, and pausing in the debugger when you reproduce the issue and then looking in the Threads window, finding the stacks for the thread id that is pegging the CPU would be immensely useful.

I'm sorry for the complex instructions, I'm trying to find another (simpler) way to diagnose this. I've tried the URLs in this bug, and I can't seem to reproduce at all.

Flags: needinfo?(hill-robert)

Paul Adenot, thank you for your post, and for your willingness to investigate this problem further.

Unfortunately I am not willing to install Visual Studio (and its requisite .NET) for the sake of this Bug. Firstly, installing VS and .NET and their updates would take a while, and secondly I keep my Windows 7 extremely lean - Control Panel "Programs and Features" contains Drivers, AntiVirus, Firefox and SumatraPDF - that's it. I don't like to install (and then maybe de-install) anything I don't absolutely need.

FWIW, the CPU hog only ever lasts for a few minutes, and then clears up by itself. On my single-core/single-thread hardware the CPU hog is very noticeable, but on multi-core/multi-thread hardware a CPU hog for a few minutes might not even be noticed, which might explain why not more users are following this Bug.

One crazy thought that has gone through my mind is that the problem might maybe even be caused by single-core/single-thread hardware. What if thread 1 goes into a deliberate very short CPU loop waiting for thread 2 to do something which never happens because thread 2 never gets control???

I apologize for not being more helpful.

Flags: needinfo?(hill-robert)

On dual core this also happen, I also can't install VS Code as my laptop has small SSD space, only 60gb partition for windows with 32gb used. I think could it be done on Procexp with dumping the process dll?

It's more or less the same issue, procexp.exe can't resolve the symbols correctly, and so whatever info it would give us would be unreliable to the point of being useless.

I've thought about this a bit more, there might be a way to diagnose this in a way that doesn't require installing external software, using the built-in profiler with a special mode. Here is a video that shows how to do it, but it's essentially the same: the only different is ticking the box labeled: "Bypass selections above and record all registered threads".

https://paul.cx/public/all-threads.webm

Un-ticking the box about screenshots will help keeping the size of the profile down. If it's smaller than 50MB, uploading it will work. Otherwise saving it locally is an option, and we can find a way to transfer it to me so I can investigate.

Paul Adenot, Thank you for your perseverance. I have adjusted my Profiler Settings as per your instructions. Since then, the problem has recurred, but Firefox crashed when I stopped the Profiler recording, about five seconds after starting it. FWIW, the Crash UUID is 51ee66c3-27ec-482f-ac84-8a4da0210511, which seems like Out-Of-Memory (2GB Win7 laptop), so next time (maybe in a few days?) I will stop the Profiler recording about one second (instead of five seconds) after starting it, in the hope of not running out of memory. Best, Bob.

Bob, in the video you see there is a "buffer size" slider and an "interval" slider. You can probably reduce the memory slider (maybe to 500MB?) and the interval slider (maybe 5ms?), this should help with machines with less memory.

Paul, Many thanks for your info - my profiler settings are now set to 5ms sampling interval and 512MB buffer size - hopefully the problem will recur in the next few days. Bob.

(In reply to Paul Adenot (:padenot) from comment #27)

Do you know how to attach the Visual Studio debugger to the process that is behaving strangely ? This could get us the thread name and then via the Mozilla Symbol Server, good stacks, and we could diagnose this in no time with this information.

Attaching Visual Studio to Firefox and following this: https://developer.mozilla.org/en-US/docs/Mozilla/Using_the_Mozilla_symbol_server, and pausing in the debugger when you reproduce the issue and then looking in the Threads window, finding the stacks for the thread id that is pegging the CPU would be immensely useful.

I installed Visual Studio and used it to collect some performance profiles. This seems to be happening in both the main process, and several tab content processes at the same time, and I collected profiles for both.

Though I'm not sure if I got the right thing, since I'm still pretty unfamiliar with this.

Flags: needinfo?(padenot)
Attached file VS profile - firefox.diagsession (deleted) —
Attached image Process Hacker - Firefox threads.png (deleted) —

You mentioned finding the stacks for the threads. I didn't find thread info, so not sure if that profile helps, but I can do more debugging and collect other info if I got the wrong thing. Just let me know what you need.

Also this issue seems to be getting worse recently, happening multiple times a day, but restarting Firefox tends to fix it for a while. I know Process Hacker is unreliable, but just wanted to mention that besides the "soundtouch" thread, I'm also seeing firefox.exe!TargetNtUnmapViewOfSection+0x9560 as high CPU. Version is FF 88.0.1 release if that matters.

Attached file Bob Hill's Profiler Settings (deleted) —

Paul, The problem finally recurred, and I tried to record a Profile with a duration of about two seconds, but Firefox crashed (like before, in Comment 31) with an OOM condition (Crash UUID a1d4ad7f-16fc-4dc1-b4b5-6dc9b0210522).

Please find attached a complete listing of my Profiler Settings in "bob-hill-about-profiling.pdf". Is there anything I could change in all these settings which would produce a Profile which might still be useful for you in diagnosing this bug, without OOM-crashing?

If not, it would seem that my single-core single-thread 1.85GHz Pentium M with 2GB RAM is insufficient for producing a useful Profile for you. That would be a shame. Bob.

Paul, P.S. When the problem recurred, I was not actually doing anything with Firefox. Firefox had only one window, with only one tab, which displayed a local static HTML file (5187KB, a Firefox "Export Bookmarks to HTML", which I use as my Home Page) on my hard drive. Firefox had however been running for several days (including numerous Win7 suspends/resumes), during which time I had visited and left hundreds of web sites. I only became aware of the recurrence because the fan sped up due to the high CPU usage. The only other active Win7 application was a Windows Command Prompt (doing nothing), so there seems to be no way more RAM could have been made available for the Profiler.

FWIW, this problem often (but not always) occurs when I am not actually doing anything with Firefox. So I wonder if some kind of automatic background Firefox activity (like maybe Global Garbage Collection, or Cycle Collection, or whatever) might be going berserk and hogging a CPU core (always only for a few minutes, never "never-ending"). Bob.

Bob, you should be able to go lower and lower in terms of sampling period / buffer size: your problem is big/long enough that sampling not as frequently is not really problematic: my expectation is a multi-second/minute pause will be caught by the profiler in any case.

I've relayed the OOM crash to various folks internally, we'll see what we can do, this is useful info to improve the profiler in any case.

Flags: needinfo?(padenot) → needinfo?(hill-robert)

mesvam, I had a look, and this is related to heavy IO performed by a website using a legacy API availsble on the web. I'm attaching a screenshot of your profile, symbolicated.

Attached image storage-spinning.png (deleted) —

(In reply to Paul Adenot (:padenot) from comment #41)

mesvam, I had a look, and this is related to heavy IO performed by a website using a legacy API availsble on the web. I'm attaching a screenshot of your profile, symbolicated.

I'm told by our storage folks that going to about:config, searching dom.storage.next_gen, and changing it to true (and restarting) might help, if this is a website behaving badly. Otherwise, maybe something is corrupted somehow, and going to about:support, and clicking the "Refresh" button on the right carefully reading what it preserves and doesn't preserve will help.

Flags: needinfo?(mesvam)

Paul, Thank you for your feedback. Please give me concrete recommendations for Profiler Sampling Interval (currently 5ms) and Buffer Size (currently 512 MiB) for my next attempt - you are surely a far better judge of these things than me! FWIW, in my normal usage, Windows Task Manager usually shows around 50% to 70% of my 2GB Physical Memory being in use. Bob.

Let's try 256MB, but we can lower to 128MB if we find that it still crashes.

2GB total with 50 to 70% use leaves 1GB to ~512MB left. With a bit of room for some of the profiler itself to function, 256 seem like a good compromise.

Paul, Thanks again. Buffer now set to 256MB; Interval unchanged at 5ms. If that OOMs, will set Buffer to 128MB, and still leave Interval at 5ms. Bob.

I am also affected by this

latest firefox release on windows 10

Paul, CPU hog has recurred twice, with the following results attempting to capture a Profile:
Sampling Interval 5ms and Buffer Size 256MB: OOM Crash UUID 53cb4ba9-c3cd-4c14-b96f-f339a0210526;
Sampling Interval 5ms and Buffer Size 128MB: OOM Crash UUID 169fc4e8-3015-4a7b-a409-8f52b0210528.
For the next attempt, I have increased Sampling Interval to 10ms and left Buffer Size unchanged at 128MB, unless you have any other concrete recommendations. Bob.

Paul, no further CPU hog recurrence yet, but another crazy idea: I had some bookmark problems today, and resorted to using "Verify Integrity" under "Places Database" in "about:support". "Verify Integrity" ran (ultimately successfully) for several minutes at 100% CPU (I have nearly 1200 bookmarks), which is outwardly the same symptom as my CPU hogs (which also only last a few minutes) in this bug.

Then I found this statement under "Places Database" in https://www.mozillafirefoxhelp.com/blog/ :
"Firefox periodically performs maintenance tasks on your bookmarks and history database (also known as Places database). Use the Verify Integrity button to perform those tasks on-demand."

Is it possible that my CPU hogs are actually Firefox doing periodic bookmark/history maintenance on my single-core/thread CPU? Bob.

Hi Everybody, my several-minute CPU hog recurred, but suspecting it might be a periodic Places Database Maintenance (see comment 51), I did not run the Profiler this time, but did confirm my suspicions by observing that:

  1. during the CPU hog, Windows Resource Monitor showed I/O activity in the Places Database (sqlite.places and sqlite.favicons in my Firefox user-profile), and:

  2. at the end of the CPU hog, about:config preference "places.database.lastMaintenance" had been updated to the current date and time.

So now, inspired by the circumvention described in 10-year-old bug 696560 comment 0, I have reduced my maximum history records as follows:

old: 124,633 (set automatically by Firefox in about:config preference "places.history.expiration.transient_current_max_pages");

new: 10,000 (set by me in newly-created about:config preference "places.history.expiration.max_pages", type "Number", value "10000").

Having kept all my bookmarks (currently 1,175), the total items in my Places Database has now been reduced from 125,808 to 11,175.

Hopefully my periodic Places Database Maintenance (and hence my CPU hogs) will now take seconds rather than minutes, at the cost of retaining only days or weeks of history (instead of months or years).

Flags: needinfo?(hill-robert)

Hm that's an interesting discovery, hopefully this is the root of the issue. I assume, considering the kind of machine this is, that your disk is a real hard disk (not SSD/NVMe) ?

Paul, your assumption is correct: Dell Latitude D610, 1.85 GHz single-core Pentium M, 2GB RAM, 93GB hard disk - Parallel ATA! Bob.

(In reply to Paul Adenot (:padenot) from comment #43)

I'm told by our storage folks that going to about:config, searching dom.storage.next_gen, and changing it to true (and restarting) might help, if this is a website behaving badly. Otherwise, maybe something is corrupted somehow, and going to about:support, and clicking the "Refresh" button on the right carefully reading what it preserves and doesn't preserve will help.

Thanks Paul for the insight. After trying this option for a while, dom.storage.next_gen = true seems to have fixed my issue.

However, I'm not clear, is this something that should be addressed by Firefox? or the website? Is dom.storage.next_gen meant to be the solution to this, or are websites hitting the storage improperly? I'm happy to consider my issue resolved, but if there's something in Firefox that still needs to be fixed, I can try to provide more info or find something reproducible for you guys.

Also, there seems to be a variety of unrelated issues with this same symptom, all erroneously showing soundTouch. I can split off into a different bug if that would help.

Flags: needinfo?(mesvam)

(In reply to mesvam from comment #55)

However, I'm not clear, is this something that should be addressed by Firefox? or the website? Is dom.storage.next_gen meant to be the solution to this, or are websites hitting the storage improperly? I'm happy to consider my issue resolved, but if there's something in Firefox that still needs to be fixed, I can try to provide more info or find something reproducible for you guys.

This is the new, improved, version of an API that is already available, that is being rolled out gradually, you're just trying it a bit earlier, but it would have been enabled for you at some point. Glad it fixes the issue though, that's very much one of the goal, to fix performance issues.

Also, there seems to be a variety of unrelated issues with this same symptom, all erroneously showing soundTouch. I can split off into a different bug if that would help.

It's always a good idea to file bugs with performance profiles, thanks if you do that. Generally, the Firefox Platform setting is a good bet, Graphics or Media can be better options if that's clearly a graphics or media issue.

(In reply to Bob Hill from comment #51)

Then I found this statement under "Places Database" in https://www.mozillafirefoxhelp.com/blog/ :
"Firefox periodically performs maintenance tasks on your bookmarks and history database (also known as Places database). Use the Verify Integrity button to perform those tasks on-demand."

Places may run weekly maintenance (once a week usually) when the user is idle, that in general should just last a few seconds (about 2.5s according to telemetry mean value), but it's possible this specific database had a data coherence problem causing some kind of query recursion.

Verify Integrity from about:Support is a much larger and complex process that may be a lot more expensive than normal maintenance, and that's acceptable because it's a once in a time task started by the user only when there's detected problems already. Verify integrity may indeed try to copy all the data to a new database if it detects a serious file corruption. That's not an indication of a performance problem, it's an explicit user started operation meant for emergencies.

Because some Places queries are still on the main-thread (UI ones mostly, since nobody rewrote the ui views yet), in certain cases they may block the main-thread until the async maintenance is complete.
So far we didn't get reports where it was necessary to reduce the amount of history artificially from the already existing limits. My suspect is that, if you had a perf problem with the defaults, may have been caused by a database corruption.
But I would be interested into getting reports for a db that already went through integrity check but it is still slow at the default settings.

Marco, Thank you for your very interesting feedback about db maintenance, db Verify Integrity, and possible db corruption.
As per comment 52, the Places db on my main PC is now small and fast, but I do have a motley collection of other Places db's on other computers, and also on old full-volume backups. Are you primarily interested in a "slow" db which successfully goes through Verify Integrity without any error messages (might be hard for me to find), or would a "slow" db which only goes through Verify Integrity with error messages (easier for me to find) also be of interest to you? Bob.

Marco, For example, one of my other PC's has a "slow" Places db, where Verify Integrity hogs a 1.8GHz CPU core for eight minutes, only does enough I/O to make the HDD-LED flicker occasionally, but does not (as far as I can tell) "copy all the data to a new database if it detects a serious file corruption". Verify Integrity on that PC produces these messages, all at once at the end of the eight minutes:

> Task: checkIntegrity
+ The places.sqlite database is sane
+ The favicons.sqlite database is sane
> Task: invalidateCaches
+ The caches have been invalidated
> Task: checkCoherence
+ The database is coherent
> Task: expire
+ Database cleaned up
> Task: originFrecencyStats
+ Recalculated origin frecency stats
> Task: vacuum
- Unable to vacuum database
> Task: stats
- The task queue was cleared by an error in another task.
> Task: _refreshUI
- The task queue was cleared by an error in another task.

I see exactly these messages quite often (but not always) doing Verify Integrity on my "slow" Places db's. More rarely, even a "slow" Verify Integrity produces a lot more messages, but where they all look "clean" (no hyphens in column 1).
Anyway, this is what I meant by "error messages" in my question at the end of comment 58. Bob.

Paul and Marco, I have observed several-minute CPU hogs in both "Weekly Places DB Maintenance" and "about:support Verify Integrity", which (as you say) are similar, but not the same (I found the details in PlacesDBUtils.jsm).

Nonetheless, I venture to suggest that if the CPU hog in Verify Integrity (which is far easier to reproduce) could be tracked down (and fixed?), then there is a reasonable chance that the CPU hog in Weekly Maint (which hurts users trying to use Firefox after leaving it idle just long enough for Weekly Maint to start) might also be addressed.

I now have a PC where the CPU hog can reliably be reproduced in Verify Integrity, and have even managed to capture a Profile of the CPU hog: https://share.firefox.dev/3widJWW. In Parent Process PID 4004 there is a thread "mozStorage #1", which seems to be hogging a CPU core (this PC is dual-core). Bob.

Hi Bob! While this CPU hog is happening, does the rest of your machine remain responsive?

Bas, on the two-core machine where I am now reproducing this issue, yes, the rest of the machine remains responsive, because Firefox only hogs one core. (Earlier I had this issue on a one-core machine, and then the rest of the machine was very sluggish, but not entirely locked out.) Bob.

Paul, Marco, I have found out that my CPU hogs in Verify-Integrity are mostly in the task "expire".

Modifying PlacesDBUtils.jsm to show a few timestamps resulted in this output from Verify-Integrity:

> Task: checkIntegrity
+ 13:20:59 First task started
+ 13:21:22 The places.sqlite database is sane
+ 13:21:23 The favicons.sqlite database is sane
> Task: invalidateCaches
+ 13:21:23 The caches have been invalidated
> Task: checkCoherence
+ 13:21:45 The database is coherent
> Task: expire
+ 13:28:45 Database cleaned up
> Task: originFrecencyStats
+ 13:28:45 Recalculated origin frecency stats
> Task: vacuum
- 13:28:45 Unable to vacuum database
> Task: stats
- 13:28:45 The task queue was cleared by an error in another task.
> Task: _refreshUI
- 13:28:45 The task queue was cleared by an error in another task.

According to the timestamps in this output:
7'46" = duration of the entire Verify-Integrity;
7'00" = duration of task "expire", during which one CPU core ran 100%, with very little I/O (HDD-LED hardly flickered).

https://share.firefox.dev/3gyglL1 is a new Profile, captured at 13:25, about halfway through task "expire" shown above.
(I captured the Profile at 13:25 according to my watch. The Profiler's timestamp 12:50 is incorrect - see Bug 1717004.)
In Parent Process PID 2416, thread "mozStorage #1" seems to be hogging a CPU core (Intel T7100 dual-core 1.80GHz). Bob.

Severity: normal → S3

Clear needinfos that are pending on inactive users.

Inactive users most likely will not respond; if the missing information is essential and cannot be collected another way, the bug maybe should be closed as INCOMPLETE.

For more information, please visit auto_nag documentation.

Flags: needinfo?(mouser_musing)
Flags: needinfo?(cobrafast)
Flags: needinfo?(Ooquouquei864)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: