Closed Bug 402505 Opened 17 years ago Closed 17 years ago

Crash [@ libobjc.A.dylib@objc_msgSend AppKit@[NSTSMInputContext interpretKeyEvents:] AppKit@[NSView interpretKeyEvents:] [ChildView keyDown:]]

Categories

(Core :: Widget: Cocoa, defect, P1)

x86
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla1.9beta3

People

(Reporter: djst, Assigned: MatsPalmgren_bugz)

References

Details

(Keywords: crash, testcase, Whiteboard: [sg:critical] using freed memory)

Crash Data

Attachments

(10 files, 7 obsolete files)

(deleted), text/html
Details
(deleted), text/plain
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), text/html
Details
(deleted), text/html
Details
(deleted), patch
jaas
: review+
Details | Diff | Splinter Review
Steps to reproduce 1. Go to http://fsb.se/ 2. Click on "Logga in" at the top right corner 3. Click on "Fortsätt" (Continue) 4. In the text box, enter 000000000000 5. Hit Enter ONCE 6. Observe the JavaScript error dialog that pops up, and wait for the next page to load 7. Hit Enter again Crashes here every time.
Summary: Crash when logging in to fsb.se with keyboard → Crash [@ ChildView keyDown AppKit@0xf516f AppKit@0xf5dc8] when logging in to fsb.se with keyboard
Summary: Crash [@ ChildView keyDown AppKit@0xf516f AppKit@0xf5dc8] when logging in to fsb.se with keyboard → Crash [@ libobjc.A.dylib@0x24d8 AppKit@0xf516f AppKit@0xf5dc8 ChildView keyDown] when logging in to fsb.se with keyboard
Attached file debug crash log (deleted) —
We get more symbols here (as an aside, is there a bug about mozilla's socorro instance not having OS symbols?). We're in some TSM stuff; I wonder if this might be related to the other enter/return bugs that have popped up recently?
Summary: Crash [@ libobjc.A.dylib@0x24d8 AppKit@0xf516f AppKit@0xf5dc8 ChildView keyDown] when logging in to fsb.se with keyboard → Crash [@ libobjc.A.dylib@objc_msgSend AppKit@[NSTSMInputContext interpretKeyEvents:] AppKit@[NSView interpretKeyEvents:] [ChildView keyDown:]] when logging in to fsb.se with keyboard
I'm not sure the cause. The crashed line is not mine, I have only added the if statement before the line.
I know what the problem is... more in a bit...
Assignee: joshmoz → mats.palmgren
Flags: blocking1.9? → blocking1.9+
Status: NEW → ASSIGNED
Whiteboard: [sg:critical] using freed memory
Target Milestone: --- → mozilla1.9 M10
Attached file nsChildView::Destroy stack (deleted) —
nsChildView::Destroy is called while handling a keyDown event for it. The problem is that nsChildView::DispatchEvent() can lead to really destructive things, like deleting frames,views,widgets,shells etc, and the caller needs to be prepared for that.
Attached file [ChildView dealloc] stack (deleted) —
[ChildView dealloc] while handling an event.
Attached file Dangerous call paths... (obsolete) (deleted) —
Call paths to nsChildView::DispatchEvent() and Rollup() in nsChildView.mm (Rollup() is also potentially destructive)
Attached patch Patch rev. 1 (obsolete) (deleted) — Splinter Review
Fixes this crash and other potential crashes. Note that holding a strong ref on the nsIWidget does NOT protect its |mView| from being destroyed, we need to hold a separate strong ref on that where needed (this is what the nsAutoRetainView helper class does).
Attachment #287626 - Flags: review?(joshmoz)
Priority: -- → P2
Mats - can you post a patch that does the minimum required to fix this bug? I want to understand what you're doing better, that would help.
Unfortunately the site has changed so I can't reproduce it anymore - they seem to have removed the dialog window in step 6. I tried to make a testcase but failed, so if someone could help with that it would be appreciated.
Flags: in-testsuite?
Keywords: testcase
Whiteboard: [sg:critical] using freed memory → [sg:critical] using freed memory [needs-testcase]
(In reply to comment #9) Looking at the "nsChildView::Destroy stack" I would guess that the minimal patch to fix this particular crash would be to retain 'self' in the -[ChildView keyDown:] method. The rest of the patch adds protection to other methods that makes direct or indirect synchronous calls into Gecko (through nsChildView::DispatchEvent() or Rollup()).
Comment on attachment 287626 [details] [diff] [review] Patch rev. 1 The approach here doesn't make much sense to me. From comment #8: "Note that holding a strong ref on the nsIWidget does NOT protect its |mView| from being destroyed, we need to hold a separate strong ref on that where needed (this is what the nsAutoRetainView helper class does)." I don't see how mView could be getting destroyed from under an nsIWidget because nsIWidget retains the view. It may be that there is an incorrect double release happening somehow, but in general we should not have to protect ourselves in this way. Also, we shouldn't need all of those kungFuDeathGrips because callers to those methods (like "::Resize") should be retaining. Also, this patch doesn't apply any more, it'll need to be regenerated.
Attachment #287626 - Flags: review?(joshmoz) → review-
Priority: P2 → P4
Severity: critical → major
Summary: Crash [@ libobjc.A.dylib@objc_msgSend AppKit@[NSTSMInputContext interpretKeyEvents:] AppKit@[NSView interpretKeyEvents:] [ChildView keyDown:]] when logging in to fsb.se with keyboard → Crash [@ libobjc.A.dylib@objc_msgSend AppKit@[NSTSMInputContext interpretKeyEvents:] AppKit@[NSView interpretKeyEvents:] [ChildView keyDown:]]
Severity: major → critical
Attached patch fix v1.0 (obsolete) (deleted) — Splinter Review
From looking at the information posted as attachments on this bug, I suspect this'll do it.
Assignee: mats.palmgren → joshmoz
Attachment #287626 - Attachment is obsolete: true
Attachment #291581 - Flags: review?(smichaud)
Attachment #291581 - Attachment is obsolete: true
Attachment #291581 - Flags: review?(smichaud)
Attached patch fix v1.1 (obsolete) (deleted) — Splinter Review
After some further thinking and discussion with Steven, this is better.
Attachment #291588 - Flags: review?(smichaud)
Everything's fine with this patch but one thing -- I think you should restore conversationIdentifier to the way it was before (returning a long). Apple's new docs on the NSTextInput protocol have conversationIdentifier returning an NSInteger (which is an int on 32-bit systems and a long on 64-bit systems). But keeping its return value a long works the same way -- a long is 32 bits on a 32-bit system and (I think) 64 bits on a 64-bit system (an Apple one, at least). (Older Apple docs have conversationIdentifier returning a long.)
Comment on attachment 291588 [details] [diff] [review] fix v1.1 Josh says he'll restore conversationIdentifier to the way it was (returning a long) on checkin.
Attachment #291588 - Flags: review?(smichaud) → review+
Attached patch fix v1.2 (obsolete) (deleted) — Splinter Review
Steven talked me out of the long->int change.
Attachment #291588 - Attachment is obsolete: true
Attachment #291590 - Flags: review?(smichaud)
Attachment #291590 - Flags: review?(smichaud) → superreview?(roc)
(In reply to comment #12) > I don't see how mView could be getting destroyed from under an nsIWidget > because nsIWidget retains the view. Please look at attachment 287622 [details] and note that the nsChildView marked red is Destroy'ed while events on it are handled. nsChildView::Destroy() releases the native view which is now dangling: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/widget/src/cocoa/nsChildView.mm&root=/cvsroot&mark=499,552#473 and through the magic of nested event loops (nsXULWindow::ShowModal) there is a real risk that Cocoa will deallocate it - and this is indeed what caused the crash in this bug. See attachment 287623 [details] One could argue that fault really lays with nsView::~nsView that calls nsChildView::Destroy() prematurely. I think Olli have fixed that problem in bug 373344 now though, at least for the view the event was originally dispatched for. If the nsWeakView in the pres shell has the effect of protecting the event view from being deleted then it should also protect against nsChildView::Destroy() being called. If so, that would fix most cases, leaving perhaps the code that calls Rollup() to need to be reviewed again under this new assumption.
Attachment #291590 - Flags: superreview?(roc)
Comment on attachment 291590 [details] [diff] [review] fix v1.2 That explanation makes sense to me. I moved my here patch out to bug 406909.
Attachment #291590 - Attachment is obsolete: true
Mats, as best I can tell, what happens in your attachment 287623 [details] is impossible with proper checking (in ChildView methods) for mGeckoChild == NULL. nsChildView::Destroyed() calls [mView widgetDestroyed], which sets mView's mGeckoChild variable to NULL.
The problem is the active call frames for the native view that is being released -- when unwinding we need 'self' to be alive so we can safely check its 'mGeckoChild' field... this is why it needs to be retained during event handling. If you look at attachment 287624 [details], every nsChildView/ChildView (including 'this'/'self') that is used after every call up to DispatchEvent needs to have a strong ref to make sure the null-checks are on objects that are still alive. Hmm, now that I look more closely at Olli's fix for bug 373344 again, I don't think it really helps this bug. AFAICT it doesn't delay view destruction. It adds a safe way to check if the view is alive and then passes a null view pointer to PostHandleEvent() in that case. (sorry, I should have understood that by the nsWeakView name of course) Which means we're back to square one here. I'll try to do a new call analysis of the current code and post an updated patch tomorrow.
Mats, I think I now see how a ChildView object could get dealloced while calls are still being made on it (from the OS). But it's not the way you describe it in your first paragraph in comment #18. The comment above this line shows why it can't be the cause: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/widget/src/cocoa/nsChildView.mm&root=/cvsroot&mark=499#473 But, through various twists and turns, it's possible that this line might be: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/widget/src/cocoa/nsChildView.mm&root=/cvsroot&mark=502#473 The call to delayedTearDown (where mView is released and dealloced) is postponed until the next time through the main thread's run loop. So the call to delayedTearDown can never take place in the same stack as the call to TearDownView(). But if the OS keeps sending events to the ChildView object's "event handlers" (e.g. keyDown: or mouseDown:) on subsequent invocations of the main thread's event loop, they might (eventually) hit a ChildView object that's already been dealloced (which would make tests of mGeckoChild == NULL (probably) return the wrong result, even if they didn't cause crashes themselves). So the best fix would (as you said in comment #18) be to stop nsChildView::Destroy() from being called "prematurely". One way to do this is your kungFuDeathGrip patch.
Here's something that might make it easier to test for prematurely dealloced ChildView objects: OS X supports "scribbling" an arbitrary value into every dealloced object (so that any attempt to dereference pointers in it will cause a crash) -- http://developer.apple.com/technotes/tn2004/tn2124.html (which also has good info on many other things). At a Terminal prompt: $ export MallocScribble=1 $ [...]/Minefield.app/Contents/MacOS/firefox-bin
(In reply to comment #22) > The call to delayedTearDown (where mView is released and dealloced) is > postponed until the next time through the main thread's run loop. So > the call to delayedTearDown can never take place in the same stack as > the call to TearDownView(). No, I believe you're mistaken. While a modal dialog is posted we call NS_ProcessNextEvent() which causes native events to be processed: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/xpfe/appshell/src/nsXULWindow.cpp&rev=1.177&root=/cvsroot&mark=397-400#379 Compare the top and bottom of the stack in attachment 287623 [details], there's native event handling in both places. Yes, 'delayedTearDown' will in most cases be sufficient, but if you call into Gecko event handling there's a risk that it will force native events to be processed (including calling 'delayedTearDown'). IMHO, we should remove 'delayedTearDown' since it wallpapers over real bugs making them occur less often, but it does not fix them completely. FWIW, I tested removing it and that seems to work fine, neither of the testcases in bug 373122 or bug 374260 crashes, and all Mochitests runs without crashing too (with my patch for this bug). It's a little late to do that change now though. > So the best fix would (as you said in comment #18) be to stop > nsChildView::Destroy() from being called "prematurely". One way to do > this is your kungFuDeathGrip patch. Well, I never claimed that my patch would stop nsChildView::Destroy() from being called prematurely, and it doesn't. What it does is simply to retain the objects we depend on, so we can safely null-check mView or mGeckoChild for example.
Attached file Dangerous call paths, rev. 2 (deleted) —
Mostly the same as before...
Attachment #287624 - Attachment is obsolete: true
Attached patch Patch rev. 2 (obsolete) (deleted) — Splinter Review
Assignee: joshmoz → mats.palmgren
Attachment #291859 - Flags: review?(joshmoz)
(In reply to comment #24) > No, I believe you're mistaken. No, I'm not, and your attachment 287623 [details] shows it. But you misunderstood what I said, and perhaps I wasn't clear enough. Yes, native events can be processed at just about any point in the browser -- that's by design (to make sure native events are processed smoothly). But no, you'll never see calls to TearDownView and delayedTearDown on the same object in the same stack. > Well, I never claimed that my patch would stop > nsChildView::Destroy() from being called prematurely, and it > doesn't. What it does is simply to retain the objects we depend on, > so we can safely null-check mView or mGeckoChild for example. Yes, you're right. Sorry for the confusion.
(In reply to comment #24) > IMHO, we should remove 'delayedTearDown' since it wallpapers over > real bugs making them occur less often, but it does not fix them > completely. Once again I disagree. As the comments above delayedTearDown and the call to delayedTearDown say, it's needed to prevent changes to the NSView hierarchy during calls to [ChildView drawRect:]. The problems that it fixes don't happen very often, so I'm not surprised you didn't see any after removing this code. But see bug 373122, particularly bug 373122 comment #9.
(Following up comment #28) And delayedTearDown would (theoretically) still be needed even if you used your kungFuDeathGrip in [ChildView drawRect:] -- only delayedTearDown guarantees that the ChildView object won't be dealloced until the next time through the main thread's event loop. By the way, what guarantees this behavior is calling [NSObject performSelectorOnMainThread:withObject:waitUntilDone] with waitUntilDone set to FALSE.
Attached patch logging.diff (deleted) — Splinter Review
Attached file GDB commands (deleted) —
Comment on attachment 292521 [details] [diff] [review] logging.diff BTW, this does not change the code in any significant way, it just adds a 'printf' on entry/exit of the event handling methods and tracks if we're in an event handler for each view.
Attached file Testcase #1 (deleted) —
STEPS TO REPRODUCE 1. click on the link 2. in the alert that pops up: click ok
Steven, allow me demonstrate what is really happening. If you load the testcase (from local disk) with the attached logging patch applied and run it in GDB with the attached GDB commands this is the result. I have highlighted the interesting stuff and added comments describing what's going on. Can you reproduce the results?
Mats, you still haven't shown a case of nsChildView::TearDownView() and [ChildView delayedTearDown] on the same object in the same stack. And I still think delayedTearDown is needed for [ChildView drawRect:] (I'll go into my reasons below). But you have shown a case of a single method ([ChildView mouseDown:]) remaining on the stack while the main thread run loop is spun multiple times. So you do have a point, which seems to be this: If you call code that spins the current thread's run loop, this will cause the run loop to be re-entered. This in turn will cause code to run that the OS has been told to postpone until the "next time" through the current thread's run loop. Above I said that native events can be processed at just about any point in the browser. This was an oversimplification. The OS (of course) spins its own run loop(s). But the browser also spins the main thread event loop as it processes Gecko events (the relevant code is in nsBaseAppShell::OnProcessNextEvent() (in widget/src/xpwidgets/nsBaseAppShell.cpp) and nsAppShell::ProcessNextNativeEvent() (in widget/src/cocoa/nsAppShell.mm)). So if your code calls code that spins the Gecko event loop (which in turn spins the main thread's run loop), the "wait until next time" rule is likely to be broken, and it's unwise to depend on it. We still need to stop the view hierarchy being changed during calls to [ChildView drawRect:], and I still think delayedTearDown is the best way (perhaps the only way) to accomplish this. As best I can tell, the Gecko event loop doesn't get spun multiple times as a result of the call (in [ChildView drawRect:]) to DispatchWindowEvent(). If this is true, delayedTearDown should continue to work properly there. But I grant that it's probably best not to use delayedTearDown in other cases -- as you said, it just muddies the waters. And I saw some wierdness while running my own tests that I need to look into (it looked like delayedTearDown was being called multiple times on the same object). So at some point I need to revisit the whole delayedTearDown business ... though I don't think this is urgent. By the way, in my own tests I just ran a recent trunk build (with debugging symbols) in gdb, and didn't use your patch. Like you, I did see the same call to [ChildView mouseDown:] on the stack through multiple calls to nsChildView::TearDownView() and [ChildView delayedTearDown]. But I didn't see any errors when I set NSZombieEnabled or MallocScribble.
I just wanted to mentioned that I crashed in this stack twice testing the Beta 2 candidate during my smoketest run using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b2) Gecko/2007121014 Firefox/3.0b2.
Priority: P4 → P2
Target Milestone: mozilla1.9 M10 → mozilla1.9 M11
Attached patch Patch rev. 3 (obsolete) (deleted) — Splinter Review
Updated to tip; no significant changes from last version. Please review.
Attachment #291859 - Attachment is obsolete: true
Attachment #294600 - Flags: review?(joshmoz)
Attachment #291859 - Flags: review?(joshmoz)
Priority: P2 → P1
Whiteboard: [sg:critical] using freed memory [needs-testcase] → [sg:critical][need review] using freed memory [needs-testcase]
Comment on attachment 294600 [details] [diff] [review] Patch rev. 3 I'd like Steven to review this patch so you can both agree before I look at it. Please request review from me when he has given r+. Sorry this is taking so long but we're both in the middle of other bugs and this requires a lot of time to review.
Attachment #294600 - Flags: review?(joshmoz) → review?(smichaud)
Comment on attachment 294600 [details] [diff] [review] Patch rev. 3 I'm going to be doing this review now, sorry for the switches. So far my impression is that this is not an acceptable solution. We need a more systematic approach to protecting ourselves, I think that is possible. More on that soon.
Attachment #294600 - Flags: review?(smichaud) → review?(joshmoz)
Attached patch Patch rev. 4 (deleted) — Splinter Review
I spent most of today going over this and I understand much more about what is going on now. I think Mats has the right idea, nice work Mats! As I was reviewing this I ended up moving Mats' patch piece by piece into my tree, and because I was doing that anyway I just produced a patch with my comments addressed. I did not change very much. Mats - if you review my edits to your patch then we can consider this r+ from me. Then we can get sr from roc and get this in. List of changes in my patch: - modified some comments - fixed compiler warning unrelated to this bug - moved location of nsAutoRetainView class definition - in nsChildView::Resize and nsChildView::Scroll, check mOnDestroyCalled to see if widget was destroyed instead of mView since the former is more logically direct
Attachment #294600 - Attachment is obsolete: true
Attachment #296081 - Flags: review?(mats.palmgren)
Attachment #294600 - Flags: review?(joshmoz)
"more logically direct" -> "logically more direct" You probably knew what I meant but my typo was killing the part of me that majored in English. I had to say something.
Attachment #296081 - Flags: review?(mats.palmgren) → superreview?(roc)
Comment on attachment 296081 [details] [diff] [review] Patch rev. 4 Your edits looks fine. Please put an r+ on the patch so it looks formally reviewed. Thanks.
Attachment #296081 - Flags: review?(joshmoz)
Attachment #296081 - Flags: review?(joshmoz) → review+
Whiteboard: [sg:critical][need review] using freed memory [needs-testcase] → [sg:critical] using freed memory [needs-testcase]
Attachment #296081 - Flags: superreview?(roc) → superreview+
Landed on trunk.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
verified fixed using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9pre) Gecko/2008050621 Firefox/3.0pre and the steps to reproduce from tenser and the testcase from mats ...no crash -> Verified fixed
Status: RESOLVED → VERIFIED
Keywords: testcase
Whiteboard: [sg:critical] using freed memory [needs-testcase] → [sg:critical] using freed memory
Crash Signature: [@ libobjc.A.dylib@objc_msgSend AppKit@[NSTSMInputContext interpretKeyEvents:] AppKit@[NSView interpretKeyEvents:] [ChildView keyDown:]]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: