Closed Bug 371879 Opened 18 years ago Closed 17 years ago

100% CPU usage after resume from hibernate or suspend --- now with clue

Categories

(Firefox :: General, defect)

x86
Windows XP
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 376643

People

(Reporter: whorfin, Unassigned)

Details

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2 There are many, many posts and forum notices of firefox chewing CPU mysteriously. I've not, however, seen exactly this issue reported, nor anything as specific as the debug trail i'm reporting. Admins should of course feel free to reclassify this report should it turn out i missed something. Upon resume from suspend on a laptop, or resume from hibernate on a a laptop or desktop, i often see Firefox chew 100% CPU [literally; it's using one CPU on multiprocs]. After some amount of time, sometimes ten minutes or more, sometimes only a few minutes, eventually it simmers down and returns to normal, ie minimal CPU. This seems to be more likely if many firefox windows and tabs were active, and again more likely when a virtual desktop software is in usage [i favor Virtual Dimension]. It has happened on all laptops i have tried [three], and three desktop machines. This happens quite regularly, and even without touching firefox. in other words, the web not being actively browsed, firefox quiescent, suspend. upon resume, CPU is pegged for many minutes, but ultimately does resolve itself if one waits long enough. If one kills firefox and then resumes exactly as before [using session recovery], then CPU usage will be normal. this indicates that it it not malformed site content, ie javascript, as the CPU load is not evidenced on first time load, only on system hibernate/suspend/resume, and further ultimately settles down. This has been going on since i first started using firefox, certainly since 1.5 days. While i don't know exactly what's happening, recent frustration got me to delve into a debugger [on the no-symbols release version], so i have some information to contribute. Here's a snippet of the code i saw stuck in infinite loop in the debugger. 0055C24A . B8 395E9600 MOV EAX,firefox.00965E39 0055C24F . E8 AC6F4000 CALL firefox.00963200 0055C254 . 83EC 30 SUB ESP,30 0055C257 . 53 PUSH EBX 0055C258 . 8B5D 08 MOV EBX,DWORD PTR SS:[EBP+8] 0055C25B . 56 PUSH ESI 0055C25C . 57 PUSH EDI 0055C25D . 8B03 MOV EAX,DWORD PTR DS:[EBX] 0055C25F . 53 PUSH EBX 0055C260 . FF50 04 CALL NEAR DWORD PTR DS:[EAX+4] 0055C263 . 33FF XOR EDI,EDI 0055C265 . 8D45 E8 LEA EAX,DWORD PTR SS:[EBP-18] 0055C268 . 47 INC EDI 0055C269 . 33F6 XOR ESI,ESI 0055C26B . 897D EC MOV DWORD PTR SS:[EBP-14],EDI 0055C26E . C745 E0 1483A>MOV DWORD PTR SS:[EBP-20],firefox.00AF83>; ASCII "@mozilla.org/timer/manager;1" 0055C275 . 8945 E4 MOV DWORD PTR SS:[EBP-1C],EAX 0055C278 . 8975 08 MOV DWORD PTR SS:[EBP+8],ESI 0055C27B . 8D45 E0 LEA EAX,DWORD PTR SS:[EBP-20] 0055C27E . 68 D8D9A300 PUSH firefox.00A3D9D8 0055C283 . 50 PUSH EAX 0055C284 . 8D4D 08 LEA ECX,DWORD PTR SS:[EBP+8] 0055C287 . 8975 FC MOV DWORD PTR SS:[EBP-4],ESI 0055C28A . FF15 BC919600 CALL NEAR DWORD PTR DS:[<&xpcom_core.?as>; xpcom_co.?assign_from_gs_contractid_with_error@nsCOMPtr_base@@QAEXABVnsGetServiceByContractIDWithError@@ABUnsID@@@Z 0055C290 . F645 EB 80 TEST BYTE PTR SS:[EBP-15],80 0055C294 . 897D FC MOV DWORD PTR SS:[EBP-4],EDI 0055C297 . 74 08 JE SHORT firefox.0055C2A1 0055C299 . 8B75 E8 MOV ESI,DWORD PTR SS:[EBP-18] 0055C29C . E9 D3000000 JMP firefox.0055C374 0055C2A1 > 8B45 08 MOV EAX,DWORD PTR SS:[EBP+8] 0055C2A4 . 57 PUSH EDI 0055C2A5 . 50 PUSH EAX 0055C2A6 . 8B08 MOV ECX,DWORD PTR DS:[EAX] 0055C2A8 . FF51 10 CALL NEAR DWORD PTR DS:[ECX+10] 0055C2AB . 893D 1083AF00 MOV DWORD PTR DS:[AF8310],EDI 0055C2B1 > 8D45 C4 LEA EAX,DWORD PTR SS:[EBP-3C] ; top of spin loop 0055C2B4 . 56 PUSH ESI 0055C2B5 . 50 PUSH EAX 0055C2B6 . E8 22FFFFFF CALL firefox.0055C1DD 0055C2BB . 59 POP ECX 0055C2BC . 85C0 TEST EAX,EAX 0055C2BE . 59 POP ECX 0055C2BF . 75 7D JNZ SHORT firefox.0055C33E ; not taken 0055C2C1 . 57 PUSH EDI ; /RemoveMsg 0055C2C2 . 68 0A020000 PUSH 20A ; |MsgFilterMax = WM_MOUSEWHEEL 0055C2C7 . 68 00020000 PUSH 200 ; |MsgFilterMin = WM_MOUSEMOVE 0055C2CC . 8D45 C4 LEA EAX,DWORD PTR SS:[EBP-3C] ; | 0055C2CF . 56 PUSH ESI ; |hWnd 0055C2D0 . 50 PUSH EAX ; |pMsg 0055C2D1 . FF15 80FFB300 CALL NEAR DWORD PTR DS:[B3FF80] ; \PeekMessageW 0055C2D7 . 85C0 TEST EAX,EAX 0055C2D9 . 75 63 JNZ SHORT firefox.0055C33E ; not taken 0055C2DB . 57 PUSH EDI ; /RemoveMsg 0055C2DC . 56 PUSH ESI ; |MsgFilterMax 0055C2DD . 56 PUSH ESI ; |MsgFilterMin 0055C2DE . 8D45 C4 LEA EAX,DWORD PTR SS:[EBP-3C] ; | 0055C2E1 . 56 PUSH ESI ; |hWnd 0055C2E2 . 50 PUSH EAX ; |pMsg 0055C2E3 . FF15 80FFB300 CALL NEAR DWORD PTR DS:[B3FF80] ; \PeekMessageW 0055C2E9 . 85C0 TEST EAX,EAX 0055C2EB . 75 51 JNZ SHORT firefox.0055C33E ; taken 0055C2ED . 8B45 08 MOV EAX,DWORD PTR SS:[EBP+8] 0055C2F0 . 8D55 F0 LEA EDX,DWORD PTR SS:[EBP-10] 0055C2F3 . 52 PUSH EDX 0055C2F4 . 50 PUSH EAX 0055C2F5 . 8B08 MOV ECX,DWORD PTR DS:[EAX] 0055C2F7 . FF51 14 CALL NEAR DWORD PTR DS:[ECX+14] ; an xpcom_core.dll call 0055C2FA . 3975 F0 CMP DWORD PTR SS:[EBP-10],ESI 0055C2FD . 74 2F JE SHORT firefox.0055C32E 0055C2FF > 8B45 08 MOV EAX,DWORD PTR SS:[EBP+8] 0055C302 . 50 PUSH EAX 0055C303 . 8B08 MOV ECX,DWORD PTR DS:[EAX] 0055C305 . FF51 18 CALL NEAR DWORD PTR DS:[ECX+18] 0055C308 . 8B45 08 MOV EAX,DWORD PTR SS:[EBP+8] 0055C30B . 8D55 F0 LEA EDX,DWORD PTR SS:[EBP-10] 0055C30E . 52 PUSH EDX 0055C30F . 50 PUSH EAX 0055C310 . 8B08 MOV ECX,DWORD PTR DS:[EAX] 0055C312 . FF51 14 CALL NEAR DWORD PTR DS:[ECX+14] 0055C315 . 3975 F0 CMP DWORD PTR SS:[EBP-10],ESI 0055C318 . 74 48 JE SHORT firefox.0055C362 0055C31A . 56 PUSH ESI ; /RemoveMsg 0055C31B . 56 PUSH ESI ; |MsgFilterMax 0055C31C . 56 PUSH ESI ; |MsgFilterMin 0055C31D . 8D45 C4 LEA EAX,DWORD PTR SS:[EBP-3C] ; | 0055C320 . 56 PUSH ESI ; |hWnd 0055C321 . 50 PUSH EAX ; |pMsg 0055C322 . FF15 80FFB300 CALL NEAR DWORD PTR DS:[B3FF80] ; \PeekMessageW 0055C328 . 85C0 TEST EAX,EAX 0055C32A .^ 74 D3 JE SHORT firefox.0055C2FF 0055C32C . EB 34 JMP SHORT firefox.0055C362 0055C32E > 3935 1083AF00 CMP DWORD PTR DS:[AF8310],ESI 0055C334 . 74 35 JE SHORT firefox.0055C36B 0055C336 . FF15 6C779600 CALL NEAR DWORD PTR DS:[<&USER32.WaitMes>; [WaitMessage 0055C33C . EB 24 JMP SHORT firefox.0055C362 0055C33E > 33C0 XOR EAX,EAX ; notice how we just skipped the Wait; CPU is consumed unless we do that wait! 0055C340 . 837D C8 12 CMP DWORD PTR SS:[EBP-38],12 0055C344 . 0F95C0 SETNE AL 0055C347 . 3BC6 CMP EAX,ESI 0055C349 . 8945 EC MOV DWORD PTR SS:[EBP-14],EAX 0055C34C . 74 1D JE SHORT firefox.0055C36B ; not taken 0055C34E . 8D45 C4 LEA EAX,DWORD PTR SS:[EBP-3C] 0055C351 . 50 PUSH EAX ; /pMsg 0055C352 . FF15 CC769600 CALL NEAR DWORD PTR DS:[<&USER32.Transla>; \TranslateMessage 0055C358 . 8D45 C4 LEA EAX,DWORD PTR SS:[EBP-3C] 0055C35B . 50 PUSH EAX ; /pMsg 0055C35C . FF15 7CFFB300 CALL NEAR DWORD PTR DS:[B3FF7C] ; \DispatchMessageW 0055C362 > 3975 EC CMP DWORD PTR SS:[EBP-14],ESI 0055C365 .^ 0F85 46FFFFFF JNZ firefox.0055C2B1 ; to top of repeating spin loop 0055C36B > 8B03 MOV EAX,DWORD PTR DS:[EBX] 0055C36D . 53 PUSH EBX 0055C36E . FF50 08 CALL NEAR DWORD PTR DS:[EAX+8] 0055C371 . 8B75 CC MOV ESI,DWORD PTR SS:[EBP-34] 0055C374 > 834D FC FF OR DWORD PTR SS:[EBP-4],FFFFFFFF 0055C378 . 8D4D 08 LEA ECX,DWORD PTR SS:[EBP+8] 0055C37B . FF15 90929600 CALL NEAR DWORD PTR DS:[<&xpcom_core.??1>; xpcom_co.??1nsCOMPtr_base@@QAE@XZ 0055C381 . 8B4D F4 MOV ECX,DWORD PTR SS:[EBP-C] 0055C384 . 8BC6 MOV EAX,ESI 0055C386 . 5F POP EDI 0055C387 . 5E POP ESI 0055C388 . 5B POP EBX 0055C389 . 64:890D 00000>MOV DWORD PTR FS:[0],ECX 0055C390 . C9 LEAVE 0055C391 . C2 0400 RETN 4 What i observed when single-stepping was that this loop runs typically as commented [with taken/not taken for the jumps] when the CPU is pegged in the above-discussed symptoms. When things settle down, WaitMessage gets called, and CPU usage drops to normal. I _think_ [but do not know] that this code is from the main message handler loop of nsAppShell::Run from nsAppShell.cpp. similar code is in nsWinMain.cpp in webshell/tests/viewer It would appear that something must be going haywire with the idle timers. Is it possible something goes nuts when a timer firing or message delivery happens "way too late" because of the suspend/resume? Reproducible: Sometimes Steps to Reproduce: 1. Have firefox running with many open windows and tabs 2. Hibernate or suspend the machine 3. Wait an hour or more 4. Resume the machine Actual Results: Note that firefox is now pegging a CPU at 100% Eventually, firefox simmers back down to normal Expected Results: Machine should restore to state it was in before --- ie firefox quietly idle, not using 100% of a CPU! I classify this as Major since it renders a single-proc machine almost unusable. Because it's a tight spin-loop, performance is severely degraded in firefox itself even on multi-core machines.
Do you have the latest flash version installed?
(In reply to comment #1) > Do you have the latest flash version installed? Yes. the About Flash page reports: You have version 9,0,28,0 installed I had this problem with the previous version of flash, and flash v8 before that.
I've noticed this on SeaMonkey. I think the problem is that the page is using a setInterval and when you resume it runs every interval between when you suspended and the current time.
Right now, firefox is eating 100% of one of my cpus [ie it's doing this bug right now]. Disabling javascript from the options menu does not change this; javascript is disabled, but CPU usage is still pegged. For what it's worth...
I'd have to say i'm fairly convinced this isn't due to setInterval or javascript. I quit firefox [in a 100% cpu state] Restart, and resume session. After much loading, normal cpu idle. Disable javascript [and java] Suspend Resume 4.5 hours later CPU pegs, javascript [and java] still disabled. Ergo, can't possibly be javascript or setInterval related, right?
...perhaps i spoke too soon. approximately five minutes later, cpu settled down to normal. So, that's bad [there should be no cpu peg due to firefox just because i resumed], but that's better than the normal scenario, where it takes tens of minutes or hours to quiet down.
Do you maybe have an extension loaded that uses setInterval? I can reproduce this using a page with setInterval. And I don't see it otherwise. Btw, I'm using Linux. If you're sure this isn't the problem you're seeing, then I'll probably open a new bug for setInterval. :-)
Good point, a number of extensions i use have setInterval calls in them. AdBlockPlus, in particular, seems to reload prefs every 2 seconds via setInterval. This could explain the behavior i'm seeing even with javascript disabled. The current summary is --- with javascript active, firefox chews 100% cpu upon suspend/overnight/resume for from hours to days. With javascript disabled, firefox chews 100% cpu upon suspend/overnight/resume for between five and twenty minutes. The amount of time spent in this churn state does seem directly related [though perhaps not linearly] to exactly how long the machine was in suspended state. Maybe the thing to do is to open a bug for setInterval as you describe, and make my bug a dependent. When setInterval is fixed, I can test, and we'll see either that "it's fixed" or it isn't, and if it isn't, how the path through the code referenced above has changed [if at all].
Could you disable the extensions or try it with a new profile and see if it still happens?
I will try, it takes some time to go through suspend/resume cycles etc. I just disabled all extensions and shut down firefox, causing a crash: TB30867552Z Interesting - firefox was in its usual 100% CPU, and the symptoms were all windows had closed but CPU usage was still 100%. Eventually this crash... At present javascript and all extensions are disabled. I can't work like this but i'll leave the machine suspended and see what happens tomorrow.
OK. With all extensions AND javascript disabled, this bug does not show itself. With a single setInterval-using extension enabled [AdBlockPlus for instance, which sets a 2-second interval check], firefox exhibits the problem reported. So, I think Samuel Sieb has identified the problem --- "when you resume it runs every interval between when you suspended and the current time." This sure seems consistent with what I observe. Moreover, disabling javascript doesn't immediately turn-off this bogus servicing of zillions of pending intervals, if javascript had been active upon suspend/resume. The extent of how horrible this problem is depends on how many extensions one has which use setInterval, what those setIntervals are, how many currently active web pages and their setInterval usage [assuming javascript enabled] and how long a machine has been suspended before resume. As those factors can lead to absurdly long numbers of intervals to service, firefox becomes unusable.
Depends on: 376643
Logged Bug 376643, making this bug dependent
BTW, to test this you don't have to go through a full Hibernate/Resume cycle; For Windows you can use SysInternals PsSuspend: http://www.microsoft.com/technet/sysinternals/utilities/pssuspend.mspx ...and for Unix you can send STOP and later CONT signals to the process. Hope this helps simplify testing
Ah sorry, I now checked the other bug and see you're already aware.
Bug 376643 was just fixed.
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
No longer depends on: 376643
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.