Closed Bug 1295217 Opened 8 years ago Closed 7 years ago

frequent jank/hang/pause in Firefox UI during nsCocoaWindow::setTitle/SetSizeConstraints

Categories

(Core :: Widget: Cocoa, defect, P2)

51 Branch
x86
macOS
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox50 --- unaffected
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- fix-optional

People

(Reporter: myk, Unassigned)

References

Details

(Keywords: regression, Whiteboard: tpi:+, gfx-noted)

I've been seeing frequent jank/hang/pause in the Firefox UI on Nightly builds for the last few weeks. This morning I installed the Gecko profiler and generated some profiles. https://cleopatra.io/#report=bec536839e9660d1122da6feeadf3ea42e7d2d72&selection=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54 https://cleopatra.io/#report=dca66657ec5b6cfaa15b2a79629162750f5f2134&selection=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,20,21,22,23,24,25,26,128,129,130,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,19 https://cleopatra.io/#report=1a95da3c49ce24688e44920effa4a95bed729d63&selection=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50 I'm unfamiliar with the Gecko profiler, so I may be misreading the profiles, but all of them report that most of the time is being spent in mach_msg_trap, and in all of the profiles, either most or half of the time the mach_msg_trap caller is CGSSetWindowCornerMask. Two of the three profiles get there from nsCocoaWindow::setTitle, which calls [NSWindow _doSetTitle:andDefeatWrap:] in AppKit, which then proceeds through a series of calls to CGSSetWindowCornerMask. One of the three profiles gets there from nsCocoaWindow::SetSizeConstraints, which calls [NSWindow _commonMinMaxSizeChanged], which then proceeds through a series of calls to CGSSetWindowCornerMask. Since both of those call paths go through nsCocoaWindow, I'm optimistically filing this in the Widget: Cocoa component. But please feel free to redirect it to a more appropriate component! More info about my environment, in case it matters: I have two browser windows, both fullscreen, each one on its own Mac OS X space. The first window has six pinned tabs (keep.google.com, calendar.google.com, web.telegram.org, web.whatsapp.org, www.messenger.com, hangouts.google.com). The second window has five pinned tabs (keep.google.com, calendar.google.com, www.irccloud.com, and two slack.com subdomains). Both of them also occasionally host other tabs, but the jank doesn't seem to depend on them. It occurs even when only the pinned tabs are open.
I'm also seeing this happen on Win10. Youtube in particular seems to trigger it.
(In reply to timbugzilla from comment #1) > I'm also seeing this happen on Win10. Youtube in particular seems to trigger > it. Hmm, I suspect that's a different issue, since this bug appears to be specific to Mac (based on the profiles I ran). There's a discussion in the support forum about a hang with YouTube in fullscreen, and you might be able to resolve your issue via the suggestions in that discussion: https://support.mozilla.org/en-US/questions/979557
spohl, any ideas here? marking P2 for now since we only have one user reporting the issue. a regression range would be helpful.
Flags: needinfo?(spohl.mozilla.bugs)
Priority: -- → P2
Whiteboard: tpi:+
(In reply to Jim Mathies [:jimm] from comment #3) > marking P2 for now since we only have one user reporting the issue. a > regression range would be helpful. I've bisected it down to https://hg.mozilla.org/mozilla-central/rev/bfc47d8a87ef from bug 1230641.
Blocks: 1230641
Keywords: regression
This bug frightens me. Myk, what is your macOS version?
(In reply to Markus Stange [:mstange] from comment #5) > This bug frightens me. Myk, what is your macOS version? I'm running OS X El Capitan version 10.11.6 (15G31) on a MacBook Pro (Retina, 15-inch, Mid 2015).
Matt, seems like fallout from bug 1230641?
Flags: needinfo?(matt.woodrow)
Whiteboard: tpi:+ → tpi:+, gfx-noted
Clearing n-i based on comment 4 and comment 7.
Flags: needinfo?(spohl.mozilla.bugs)
Version: unspecified → 51 Branch
Myk, just to make sure, is it reproducible on Firefox 51 beta/52 aurora too ?
Flags: needinfo?(myk)
(In reply to Astley Chen [:astley] UTC+8 from comment #9) > Myk, just to make sure, is it reproducible on Firefox 51 beta/52 aurora too ? Unfortunately, I'm having trouble reproducing the bug on those versions, as Beta's chrome process hangs indefinitely shortly after startup (the OS reports that the application has "stopped responding"), while Aurora's content process hangs (each tab's browser pane displays a throbber that throbs indefinitely). I've tried to reproduce in a new profile by loading IRCCloud in a pinned tab, as I suspect that the behavior is related to pinned tabs that frequently update their titles. But so far I haven't succeeded. I'll try recreating the session more extensively next. (Leaving the needinfo request to remind me to do this.)
After reproducing my tabset this morning in a separate profile, I reproduced the bug this afternoon. Or rather, I might have reproduced it. I'm experiencing identical symptoms, and my profiles all end up blocking in mach_msg_trap like before; but their stacks look different. Here's an example: https://clptr.io/2hZ1goO
Flags: needinfo?(myk)
Does setting media.video-queue.hw-accel-size to 3 make any difference?
Flags: needinfo?(myk)
(In reply to Myk Melez [:myk] [@mykmelez] from comment #11) > After reproducing my tabset this morning in a separate profile, I reproduced > the bug this afternoon. Erm, I meant to say: I reproduced the bug this afternoon *on Aurora* (hence partly answering the question about whether this is reproducible on Beta/Aurora). > Or rather, I might have reproduced it. I'm > experiencing identical symptoms, and my profiles all end up blocking in > mach_msg_trap like before; but their stacks look different. My stacks now look different on Nightly as well, f.e. this Nightly stack looks like the one I saw on Aurora last week: https://clptr.io/2i9QPu0
(In reply to Milan Sreckovic [:milan] from comment #12) > Does setting media.video-queue.hw-accel-size to 3 make any difference? No such luck, I'm afraid. The stack I just referenced in comment 13 is from a Nightly build with that preference set to 3 (after which I restarted Nightly).
Flags: needinfo?(myk)
If the regression is from bug 1230641, then it's likely the changes to widget/cocoa/nsChildView.mm that cause these symptoms. That code will execute regardless of the video playback preferences. Interestingly, that code was added to fix excessive time in mach_msg_trap in bug 1230641 comment 48 for the video playback case. We may be too aggressive when resetting the opacity value: bool isFullscreen = (styleMask & NSFullScreenWindowMask) || !(styleMask & NSTitledWindowMask); Matt: why is the code after the || also setting fullscreen and flipping opacity? Can we get away without it?
Looks like Xidorn added that check in bug 1105939, maybe he remembers more. I don't think it matters though, the idea is that when we're fullscreen (or just not drawing a titlebar) then we stop masking out the rounded corners and we mark our GL context as being opaque (since we no longer need opacity). It sounds like Myk's windows are fullscreen, so it would appear that making the GL context opaque causes CGSSetWindowCornerMask to hang/pause. It's not obvious why that would happen, or even why cocoa is calling that function at all (it comes from a call to _NSSpaceIsVisible? weird side effect). Does anyone know enough about Cocoa to know what this call is doing and how we might avoid it?
Flags: needinfo?(matt.woodrow) → needinfo?(xidorn+moz)
(In reply to Matt Woodrow (:mattwoodrow) from comment #16) > It sounds like Myk's windows are fullscreen, so it would appear that making > the GL context opaque causes CGSSetWindowCornerMask to hang/pause. Yes, this only happens when my two windows are fullscreen.
(In reply to Matt Woodrow (:mattwoodrow) from comment #16) > Looks like Xidorn added that check in bug 1105939, maybe he remembers more. > > I don't think it matters though, the idea is that when we're fullscreen (or > just not drawing a titlebar) then we stop masking out the rounded corners > and we mark our GL context as being opaque (since we no longer need opacity). > > It sounds like Myk's windows are fullscreen, so it would appear that making > the GL context opaque causes CGSSetWindowCornerMask to hang/pause. > > It's not obvious why that would happen, or even why cocoa is calling that > function at all (it comes from a call to _NSSpaceIsVisible? weird side > effect). > > Does anyone know enough about Cocoa to know what this call is doing and how > we might avoid it? Maybe Markus?
Flags: needinfo?(mstange)
(In reply to Matt Woodrow (:mattwoodrow) from comment #16) > Looks like Xidorn added that check in bug 1105939, maybe he remembers more. > > I don't think it matters though, the idea is that when we're fullscreen (or > just not drawing a titlebar) then we stop masking out the rounded corners > and we mark our GL context as being opaque (since we no longer need opacity). That's right. This shouldn't matter, unless the state can switch back and forth automatically without user action, which shouldn't happen.
Flags: needinfo?(xidorn+moz)
(In reply to Andrew Overholt [:overholt] from comment #18) > > Does anyone know enough about Cocoa to know what this call is doing and how > > we might avoid it? > > Maybe Markus? Unfortunately not, no :( It would be nice to know if updating to 10.12 fixes this, but if Myk upgrades and that fixes it, then we've lost the only machine that reproduces this bug (that we know of). Myk, can you profile the WindowServer process with Instruments when this happens and attach the profile?
Flags: needinfo?(mstange) → needinfo?(myk)
(In reply to Markus Stange [:mstange] from comment #20) > Myk, can you profile the WindowServer process with Instruments when this > happens and attach the profile? Instruments doesn't list WindowServer in its lists of Applications and Running Processes, but I can sample "all processes," which includes WindowServer. Here's a profile that used the Time Profiler template and contains several short runs during which I experienced pauses. The profile is too large to attach to this bug, even after compression, so I've uploaded it to people-mozilla.org: https://people-mozilla.org/~myk/Instruments3.trace.tbz2
Flags: needinfo?(myk)
Flags: needinfo?(mstange)
As the scenario happened in two fullscreen windows and it's too late for 51. I would say it's not a blocking issue, mark 51 won't fix.
Thanks Myk. Unfortunately I wasn't able to find any useful information in the profile. Can you try to get another profile with "Record Waiting Threads" and "Callstacks: User & Kernel" checked?
Flags: needinfo?(mstange) → needinfo?(myk)
(In reply to Markus Stange [:mstange] from comment #23) > Thanks Myk. Unfortunately I wasn't able to find any useful information in > the profile. Can you try to get another profile with "Record Waiting > Threads" and "Callstacks: User & Kernel" checked? Yes, here's such a profile: https://people-mozilla.org/~myk/Instruments4.trace.tbz2
Flags: needinfo?(myk)
Thanks! I was able to find the hang in there. Firefox is waiting for the WindowServer, and the WindowServer is blocked in IOAccelFlushSurfaceOnFramebuffers. We're not completely sure what that means, but it's probably waiting for the GPU hardware. E.g. it could be waiting for a GPU switch. Or your GPU has gone bad somehow - it could be a hardware problem. It's very hard to say why our change to use an opaque GLContext triggered this.
Is this still reproducing with current Nightlies? We switched to the 10.11 SDK, which may have had an effect on this.
Flags: needinfo?(myk)
Hmm, I can't reproduce it with the latest Nightly. Note, however, that I replaced my Mac last month, upgrading to macOS 10.12 in the process. And I no longer have the old Mac, so I can't test there.
Flags: needinfo?(myk)
Thanks. Let's close for now until we have a way to reproduce this again.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.