Closed Bug 550392 Opened 15 years ago Closed 14 years ago

"Assertion failure in -[NSNextStepFrame lockFocus]" (NSInternalInconsistencyException)

Categories

(Core :: Widget: Cocoa, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -
status1.9.2 --- .17-fixed
status1.9.1 --- unaffected

People

(Reporter: jruderman, Assigned: smichaud)

References

Details

(Keywords: assertion, crash, testcase, Whiteboard: [sg:dos])

Attachments

(3 files, 4 obsolete files)

2010-03-04 19:51:24.453 firefox-bin[35063:613] *** Assertion failure in -[NSNextStepFrame lockFocus], /SourceCache/AppKit/AppKit-1038.25/AppKit.subproj/NSView.m:5237 2010-03-04 19:51:24.456 firefox-bin[35063:613] Mozilla has caught an Obj-C exception [NSInternalInconsistencyException: -[NSNextStepFrame(0x1ec5d9b0) lockFocus] failed with window=0x1ec5d6c0, windowNumber=1220, [self isHiddenOrHasHiddenAncestor]=0] 2010-03-04 19:51:24.457 firefox-bin[35063:613] Generating stack trace for Obj-C exception... 2010-03-04 19:51:27.970 firefox-bin[35063:613] Stack trace: Looking up symbols in process 35063 named: firefox-bin __raiseError (in CoreFoundation) + 381 objc_exception_throw (in libobjc.A.dylib) + 56 +[NSException raise:format:arguments:] (in CoreFoundation) + 136 -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] (in Foundation) + 116 -[NSView lockFocus] (in AppKit) + 280 -[NSView _displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:] (in AppKit) + 2917 -[NSView displayIfNeeded] (in AppKit) + 818 -[NSNextStepFrame displayIfNeeded] (in AppKit) + 98 -[NSWindow displayIfNeeded] (in AppKit) + 204 _handleWindowNeedsDisplay (in AppKit) + 696 __CFRunLoopDoObservers (in CoreFoundation) + 1186 __CFRunLoopRun (in CoreFoundation) + 557 CFRunLoopRunSpecific (in CoreFoundation) + 452 CFRunLoopRunInMode (in CoreFoundation) + 97 RunCurrentEventLoopInMode (in HIToolbox) + 392 ReceiveNextEventCommon (in HIToolbox) + 158 BlockUntilNextEventMatchingListInMode (in HIToolbox) + 81 _DPSNextEvent (in AppKit) + 847 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 156 -[NSApplication run] (in AppKit) + 821 nsAppShell::Run() (in XUL) (nsAppShell.mm:863) nsAppStartup::Run() (in XUL) (nsAppStartup.cpp:183) XRE_main (in XUL) (nsAppRunner.cpp:3494) main (in firefox-bin) (nsBrowserApp.cpp:158) start (in firefox-bin) + 54
In case it's helpful, I have a site that crashes Firefox with this same error with 100% reliability. My config: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 Site: http://www.geniustools.net/Products/search.aspx If you just click on the pulldown menu under the text "WRENCHES & TORQUE WRENCHES" (about 2/3 down in left column), Firefox will crash immediately and ignominiously.
blocking2.0: --- → ?
Keywords: crash
(In reply to comment #1) > In case it's helpful, I have a site that crashes Firefox with this same error > with 100% reliability. Somewhat, but not entirely, unsurprisingly, it doesn't crash or hit an exception in (1.9.2-based) Camino, presumably since we use our own <select> implementation. However, that URL does also crash Safari 4.0.5 on 10.5.8, but at least Safari is able to generate a crash report! (no exceptions with Safari, though).
This could be an Apple bug. Sounds like the problem started with Leopard. http://www.cocoabuilder.com/archive/cocoa/206587-assertion-failure-in-nsnextstepframe-lockfocus.html http://lists.apple.com/archives/Cocoa-dev/2007/Oct/msg01672.html Whatever it turns out to be, though, I'll probably need to figure out a way to work around it.
Blocking 1.9.3+ for now.
Assignee: nobody → smichaud
blocking2.0: ? → final+
http://tinderbox.mozilla.org/showlog.cgi?log=Electrolysis/1282056626.1282057018.14741.gz Rev3 MacOSX Leopard 10.5.8 electrolysis opt test mochitests-5/5 on 2010/08/17 07:50:26 s: talos-r3-leopard-006 10164 INFO TEST-START | /tests/toolkit/content/tests/widgets/test_popupincontent.xul 2010-08-17 07:56:34.462 firefox-bin[343:10b] *** Assertion failure in -[NSNextStepFrame lockFocus], /SourceCache/AppKit/AppKit-949.54/AppKit.subproj/NSView.m:4755 2010-08-17 07:56:34.462 firefox-bin[343:10b] Mozilla has caught an Obj-C exception [NSInternalInconsistencyException: -[NSNextStepFrame(0x24ec6c10) lockFocus] failed with window=0x243028c0, windowNumber=793, [self isHiddenOrHasHiddenAncestor]=0] INFO | automation.py | Application ran for: 0:03:11.787227
Blocks: 438871
Whiteboard: [orange]
Blocks: 599143
Orange is bug 599143. I'm not sure it's the same bug.
No longer blocks: 438871
Whiteboard: [orange]
(In reply to comment #1) > Site: http://www.geniustools.net/Products/search.aspx > > If you just click on the pulldown menu under the text "WRENCHES & TORQUE > WRENCHES" (about 2/3 down in left column), Firefox will crash immediately and > ignominiously. I can't reproduce this.
Until we have a better idea of what is going on here and some more evidence that this crash strikes frequently enough to be a top crasher I don't think we should block.
blocking2.0: final+ → -
Bug 601897 (a dup of this bug) has much better STR.
The STR from bug 601897 make me crash at nsCocoaWindow::Show(): bp-41ee32a4-39d4-4137-a0c0-6073f2101006 But there are only 30 of these crashes in the last week -- so it's nowhere near a topcrasher.
All of the (79) crashes over the last 3 weeks are on either OS X 10.5.X or 10.6.X.
This is definitely a top crasher. This only started since that latest update of Mozilla. I will have to make a work around to avoid the browser crash. To replicate, go to http://sbgi.tapdev.co.uk/events/content/179/utility_street_works and select a drop down menu.
(In reply to comment #14) > To replicate, go to > http://sbgi.tapdev.co.uk/events/content/179/utility_street_works and select a > drop down menu. Awesome, that reproduced on the first try for me. Thanks!
I am also experiencing this precise crash on a page I developed. One thing it has in common with the test case above is the use of jQuery 1.4.2. The original report I received from a user was for FF 3.6.10 under Windows 7. There, the select list would simply fail to drop down. I am able to reproduce the actual crash under Mac OS X 10.5.8; sometimes it crashes, sometimes the select list drops down briefly and then disappears. At this point, I'd really like some suggestions about what I might be able to do as a workaround. Turning off jQuery does solve it, but that's not an option that would fly with the bosses :-).
For what it's worth, I was able to avoid the crash by changing the CSS of one of the containing DIVs to "display: inline-block", where it had been "inline". This doesn't seem to apply to the SBGI site. I tried changing a few things, and couldn't find anything analogous to prevent the crash. So the fact that I was able to prevent it this way in my site may be a coincidence.
Any progress on this?
Bug 625843 has close to 100% effective STR for this "crash".
Many (most?) of this bug's "crashes" don't bring up Breakpad, and so aren't included in the statistics at http://crash-stats.mozilla.com. So what I said in comment #30 about this not being a topcrasher is misleading -- we actually have no way of knowing whether or not its a topcrasher. So I'm going to try to find time next week for this bug, and see whether or not it has an easy fix/workaround. If it does, this bug should probably block the FF4 final release.
> So what I said in comment #30 about this not being a topcrasher is > misleading -- we actually have no way of knowing whether or not its a > topcrasher. It was comment #12.
This bug is our fault (not Apple's): On the trunk it's a regression caused by the combination of the patch for bug 517804 (http://hg.mozilla.org/mozilla-central/rev/3cdaf0a84414) and part of the patch for bug 498340 (http://hg.mozilla.org/mozilla-central/rev/ea520d552f8c). The problem is that the NS_WILL_PAINT event sent from [ChildView viewWillDraw] can cause our ChildView object's NSWindow to be closed: [NSView viewWillDraw] is called on an NSView object that the OS has set up for drawing (including already having successfully called [NSView canDraw] on it). So no matter what we do in [ChildView viewWillDraw], the OS *will* try to draw it (or at least its NSWindow) after this method has returned. But if we close our NSWindow in [ChildView viewWillDraw], every NSView in it will become undrawable. Then [NSView lockFocus] will throw an exception when (subsequently) it calls [NSView lockFocusIfCanDraw] on one of our NSWindow's NSViews and this call fails (returns FALSE). There are two possible solutions to this problem: 1) Stop cross-platform code from deleting our widget (our ChildView object's mGeckoChild) while handling the NS_WILL_PAINT event, which in turn stops our ChildView's NSWindow from being closed. 2) Hook an undocumented NSView method (_displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:) to stop the OS from trying to draw an NSView that (since the last, successful, call to canDraw) no longer has an NSWindow, or whose NSWindow has since been closed. Clearly #1 is preferable. But if (for some reason) #1 is difficult (or difficult to do quickly), we can safely fall back to option #2. I'll post patches for both these options in later comments. The regression range for this bug on the trunk is: firefox-2009-11-15-03-mozilla-central firefox-2009-11-16-04-mozilla-central The patch for bug 517804 landed (on the trunk) on 2009-11-14 at 01:41:34 PST. The relevant part of the patch for bug 498340 landed at 2009-11-15 at 14:59:38 PST. I used 'hg bisect' to confirm that this part of the patch for bug 498340 is what triggers this bug. The regression range for this bug on the 1.9.2 branch is: firefox-2009-11-18-03-mozilla-1.9.2 firefox-2009-11-19-03-mozilla-1.9.2 The patch for bug 517804 landed on the 1.9.2 branch on 2009-11-18 at 05:17:19 PST. This bug doesn't happen on the earlier branches (see bug 625843 comment #6).
Blocks: 517804, 498340
Here's a stack trace of our ChildView's NSWindow being closed while sending an NS_WILL_DRAW event to Gecko from [ChildView viewWillDraw]. I made it using my debug patch, which I can post if people would like to see it. I admit that this stack trace puzzles me -- it seems to be missing a few "levels". But it does clearly show the ChildView's PopupWindow being closed (and its widget and parent widget being deleted).
Attached patch First stab at a Gecko patch (obsolete) (deleted) — Splinter Review
This patch sort-of works. It gets rid of this bug's crashes, but doesn't fully resolve the underlying problem -- the ChildView's NSWindow still gets closed (and its widget deleted) in [ChildView viewWillDraw] whenever a browser window is closed (though no crash or OS assertion results). I also don't really understand how/why it works (to the extent that it does work). Roc, can you help me find a better solution?
Attached patch Cocoa patch (obsolete) (deleted) — Splinter Review
How about in viewWillDraw, if the refcount of the window is 1 before we release the reference held by the nsPaintEvent, add a reference and dispatch an XPCOM event to release the last reference?
Attached patch Cocoa patch rev1 (obsolete) (deleted) — Splinter Review
> How about in viewWillDraw, if the refcount of the window is 1 before > we release the reference held by the nsPaintEvent, add a reference > and dispatch an XPCOM event to release the last reference? I don't fully understand this -- retaining an NSWindow won't prevent it being closed. But I get your drift: You don't think it's worthwhile trying to prevent Gecko from deleting the current widget (or its parents) while processing an NS_WILL_DRAW event in the general case (in cross-platform code). So here's another Cocoa-specific patch/workaround, which is much simpler than my previous Cocoa patch (attachment 506017 [details] [diff] [review]). I can't use the traditional "kung-fu death grip" to retain mGeckoChild or its parent(s) -- they need to be retained until after [ChildView viewWillDraw] returns. And it turns out I don't need to retain mGeckoChild -- only its parent(s). But Cocoa provides a very straightforward way to postpone releasing the parent widget(s) until the next time through the main thread's run loop, which is all we need. This patch works fine in all my tests. I don't know why I didn't think of it last week.
Attachment #506015 - Attachment is obsolete: true
Attachment #506017 - Attachment is obsolete: true
Attachment #506560 - Flags: review?(roc)
Attachment #506560 - Flags: review?(joshmoz)
This patch is OK but I think it would be simpler if you just used a heap-allocated nsTArray<nsRefPtr<nsIWidget> >. Probably less leaky too --- what deletes aWidgetArray? I guess you'd need to use an XPCOM event instead of performWithSelector in that case.
In order to use performSelector:withObject:afterDelay:0 I need to use an NSArray (a Cocoa object that can be retained and released using [NSObject retain] and [NSObject release]). This NSArray object gets retained by the call to performSelector:withObject:afterDelay:0 and then released after the delayed call to releaseWidgets: has returned. This is very simple and straightforward (simpler than using an XPCOM event), and won't result in any leaks. Furthermore, if I did use an XPCOM event it might get run "too early" (before the next time through the main thread's run loop), thanks to nesting between the "native" and "Gecko" event loops.
Comment on attachment 506560 [details] [diff] [review] Cocoa patch rev1 Oops, this patch might lead to releaseWidgets: messages being sent to a deleted ChildView object. I'll change it to also retain and release mGeckoChild, which will fix the problem.
Attachment #506560 - Flags: review?(joshmoz)
Comment on attachment 506560 [details] [diff] [review] Cocoa patch rev1 + if (parent) + widgetArray = [NSMutableArray arrayWithCapacity:2]; I prefer brackets around all "if" blocks.
> I prefer brackets around all "if" blocks. Also fixed in this patch.
Attachment #506560 - Attachment is obsolete: true
Attachment #506746 - Flags: review?(joshmoz)
Comment on attachment 506746 [details] [diff] [review] Cocoa patch rev2 (also retain and release mGeckoChild) Are you sure you don't want to addref "mGeckoChild" unless it has a parent?
> Are you sure you don't want to addref "mGeckoChild" unless it has a > parent? I'm not sure I understand. Right now I'm only addrefing mGeckoChild if I send a (delayed) releaseWidgets: message, and I only send a message if mGeckoChild has a parent. As best I can tell, we don't have trouble if mGeckoChild is deleted during viewWillDraw and it *doesn't* have a parent. I didn't see any problems in my tests. And I think _displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView: is only ever called on top-level views (like NSGrayFrame or NSNextStepFrame). So when one of the [NSView _recursiveDisplay...] methods is called, is only recurses through NSViews that are still subviews of the top-level view.
(In reply to comment #35) > Right now I'm only addrefing mGeckoChild if I send a (delayed) > releaseWidgets: message, and I only send a message if mGeckoChild has > a parent. Right. I'm just making sure that was intentional. You say in the next bit of your reply why, that's all I was looking for. > As best I can tell, we don't have trouble if mGeckoChild is deleted > during viewWillDraw and it *doesn't* have a parent.
Comment on attachment 506746 [details] [diff] [review] Cocoa patch rev2 (also retain and release mGeckoChild) >+ if (parent) { >+ widgetArray = [NSMutableArray arrayWithCapacity:3]; >+ } >+ while (parent) { >+ NS_ADDREF(parent); >+ [widgetArray addObject:[NSNumber numberWithUnsignedInteger:(NSUInteger)parent]]; >+ parent = parent->GetParent(); >+ } You could put the "while" loop inside the "if (parent)" block. That would be a little cleaner logic-wise and save a check when !parent. >+ if (widgetArray) { >+ NS_ADDREF(mGeckoChild); >+ [widgetArray addObject:[NSNumber numberWithUnsignedInteger:(NSUInteger)mGeckoChild]]; >+ [self performSelector:@selector(releaseWidgets:) >+ withObject:widgetArray >+ afterDelay:0]; >+ } You could put this stuff inside the "if (parent)" block as well. That makes it more clear that none of this work happen unless there is a parent.
Attachment #506746 - Flags: review?(joshmoz) → review+
I assume this is OK, Josh.
Attachment #506746 - Attachment is obsolete: true
Attachment #506868 - Flags: review?(joshmoz)
Attachment #506868 - Flags: approval2.0?
Attachment #506868 - Flags: review?(joshmoz)
Attachment #506868 - Flags: review+
Attachment #506868 - Flags: approval2.0?
Attachment #506868 - Flags: approval2.0+
Comment on attachment 506868 [details] [diff] [review] Cocoa patch rev3 (follow Josh's suggestions) Landed on trunk: http://hg.mozilla.org/mozilla-central/rev/5daf5e03b8e1
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment on attachment 506868 [details] [diff] [review] Cocoa patch rev3 (follow Josh's suggestions) This bug also happens on the 1.9.2 branch. So we should presumably also fix it there, after letting this patch bake on the trunk for a while.
Attachment #506868 - Flags: approval1.9.2.15?
(In reply to comment #40) > This bug also happens on the 1.9.2 branch. So we should presumably also > fix it there, after letting this patch bake on the trunk for a while. Why? We fix lots of bugs that also happen on old branches. What bad things happen to whom and how often with this bug vs. saving the regression risk and QA time if we take the patch?
There's no good way to find out how many people experience this bug, either on the trunk or the 1.9.2 branch (see comment #21). But this bug is arguably worse on the 1.9.2 branch -- the "crash" (actually an OS-level process abort, following the assertion) only happens in 32-bit mode, and 64-bit mode isn't available on the branches. (And of course it's the "crash" that we should be most concerned about.) Also, the fix is very simple and straightforward. But, like I said, it makes sense to let it bake on the trunk for a while. If it makes you feel better, maybe we should wait until after the FF4 release (which will put the trunk patch into the hands of a much larger number of users/testers).
Whiteboard: [sg:dos]
Unfortunately this appears to have missed 4.0b10 so we won't get serious feedback for a few weeks until b11 goes out
Comment on attachment 506868 [details] [diff] [review] Cocoa patch rev3 (follow Josh's suggestions) Approved for 1.9.2.15, a=dveditz for release-drivers
Attachment #506868 - Flags: approval1.9.2.15? → approval1.9.2.15+
When will this be applied to FF 3.6?
A better question is is there a chance this getting applied to FF 3.6 because this happens for me in 3.6 and not 4? See https://bugzilla.mozilla.org/show_bug.cgi?id=625843 which was marked as a duplicate of this issue.
Comment on attachment 506868 [details] [diff] [review] Cocoa patch rev3 (follow Josh's suggestions) Landed on the 1.9.2 branch: http://hg.mozilla.org/releases/mozilla-1.9.2/rev/16c83c0d5e7e
> When will this be applied to FF 3.6? It's now been landed on the 1.9.2 branch, which means it will appear in FF 3.6.15. It will also appear in the FF 3.6.15pre nightlies, starting with tomorrow's.
Thank you very much and will help verify this tomorrow.
I tried the FF 3.6.15pre that I found in the below url and the fix worked! Good job guys, thanks! http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-firefox-3.6.x/
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: