Closed Bug 678607 Opened 13 years ago Closed 13 years ago

[10.7] crash with two finger swipe

Categories

(Core :: Widget: Cocoa, defect)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: asa, Assigned: smichaud)

References

(Blocks 1 open bug)

Details

Crash Data

Attachments

(7 files, 1 obsolete file)

With the new two-finger swipe (implemented in Bug 668953 bug 668953) for back and forward navigation on Mac OS X Lion, I can, with some reproducibility, cause Firefox to crash. Steps to reproduce: 1. visit five or six addresses in the same tab. 2. swipe swipe swipe to the end of your history in one direction 3. swipe swipe swipe to the end of your history in the other direction 4. repeat until you crash. Results: Crash in [@ CoreFoundation@0x884e ] or [@ libobjc.A.dylib@0xb390 ] Tested on Mac OS X 10.7 (Lion) using a 3.06 GHz Core i3 iMac with a Magic Trackpad
The addresses Asa gave me (and which which it's easy for to me to reproduce this bug) are: www.techcrunch.com www.engadget.com slashdot.org techmeme.com/river news.ycombinator.com
As Asa pointed out, you don't crash if you use the Back and Forward buttons to switch between these sites.
Blocks: 668953
I'm going to see what I can do about this. But it'll take at least several days, and there may not be anything I can do. I suspect this is an Apple bug.
Here are a couple of Asa's crash stacks, with their symbols translated: https://crash-stats.mozilla.com/report/index/bp-7c0602f1-5fec-4e49-bbf9-e946d2110812 0 CoreFoundation CoreFoundation@0x884e 1 CoreFoundation CoreFoundation@0x30d1f 2 libobjc.A.dylib libobjc.A.dylib@0xd2c5 3 libobjc.A.dylib libobjc.A.dylib@0xd4f8 4 libobjc.A.dylib libobjc.A.dylib@0xf03b 5 AppKit AppKit@0x6ceef 6 AppKit AppKit@0x6d938 7 AppKit AppKit@0x3f2eba 8 libobjc.A.dylib libobjc.A.dylib@0xd2c5 9 libobjc.A.dylib libobjc.A.dylib@0xd254 10 CoreFoundation CoreFoundation@0x30738 11 AppKit AppKit@0x6d55b 12 AppKit AppKit@0x8e04 0 CFRelease (in CoreFoundation) + 254 1 -[__NSArrayM dealloc] (in CoreFoundation) + 303 2 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::FindAndConstruct(objc_object* const&) (in libobjc.A.dylib) + 75 3 _objc_rootReleaseWasZero (in libobjc.A.dylib) + 188 4 (anonymous namespace)::AutoreleasePoolPage::pop(void*) (in libobjc.A.dylib) + 433 5 _NSPopoverPreProcessLocalEvent (in AppKit) + 46 6 -[NSApplication sendEvent:] (in AppKit) + 64 7 -[NSEvent momentumPhase] (in AppKit) + 32 8 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::FindAndConstruct(objc_object* const&) (in libobjc.A.dylib) + 75 9 _objc_rootRetain (in libobjc.A.dylib) + 107 10 -[NSObject retain] (in CoreFoundation) + 40 11 -[NSApplication _setCurrentEvent:] (in AppKit) + 107 12 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 1014 https://crash-stats.mozilla.com/report/index/bp-a14e1854-5fde-4834-88f3-0fe552110812 0 libobjc.A.dylib libobjc.A.dylib@0xb390 1 CoreFoundation CoreFoundation@0x87ff 2 CoreFoundation CoreFoundation@0x30d1f 3 libobjc.A.dylib libobjc.A.dylib@0xd2c5 4 libobjc.A.dylib libobjc.A.dylib@0xd4f8 5 libobjc.A.dylib libobjc.A.dylib@0xf03b 6 AppKit AppKit@0x8c485d 7 AppKit AppKit@0x6ceef 8 AppKit AppKit@0x3f474e 9 AppKit AppKit@0x6d938 10 AppKit AppKit@0x3f5077 11 AppKit AppKit@0x3f49d4 12 AppKit AppKit@0x3f474e 13 libobjc.A.dylib libobjc.A.dylib@0xd2c5 14 libobjc.A.dylib libobjc.A.dylib@0xd254 15 CoreFoundation CoreFoundation@0x30738 16 AppKit AppKit@0x6d55b 17 AppKit AppKit@0x8e04 0 objc_msgSend_vtable14 (in libobjc.A.dylib) + 16 1 CFRelease (in CoreFoundation) + 175 2 -[__NSArrayM dealloc] (in CoreFoundation) + 303 3 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::FindAndConstruct(objc_object* const&) (in libobjc.A.dylib) + 75 4 _objc_rootReleaseWasZero (in libobjc.A.dylib) + 188 5 (anonymous namespace)::AutoreleasePoolPage::pop(void*) (in libobjc.A.dylib) + 433 6 0x008c485d (in AppKit) 7 _NSPopoverPreProcessLocalEvent (in AppKit) + 46 8 __destroy_helper_block_5 (in AppKit) + 18 9 -[NSApplication sendEvent:] (in AppKit) + 64 10 __-[NSEvent _initTouches]_block_invoke_1 (in AppKit) + 75 11 __destroy_helper_block_19 (in AppKit) + 18 12 __destroy_helper_block_5 (in AppKit) + 18 13 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::FindAndConstruct(objc_object* const&) (in libobjc.A.dylib) + 75 14 _objc_rootRetain (in libobjc.A.dylib) + 107 15 -[NSObject retain] (in CoreFoundation) + 40 16 -[NSApplication _setCurrentEvent:] (in AppKit) + 107 17 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 1014
I used atos -arch x86_64 [binary] [0xaddress] to obtain these translations.
> atos -arch x86_64 [binary] [0xaddress] atos -arch x86_64 -o [path-to-binary] [0xaddress]
I tested this using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0a2) Gecko/20110812 Firefox/7.0a2 with similar sites, but haven't yet been able to reproduce the crash. Here are my specifics: *Using Magic Mouse - not trackpad *OS X 10.7.2 Build 11C37 *2.8 GHz Intel Core 2 Duo with 8 GB RAM
> *OS X 10.7.2 Build 11C37 What's this?
>> *OS X 10.7.2 Build 11C37 > > What's this? And where can you get it? :-)
Steven: I found it in the iCloud area of the Mac Developer website. You can download it as a .dmg and install. It was just released today.
Got the following trace in GDB: 2011-08-12 21:19:02.769 firefox[10028:1207] -_scrollPhase is deprecated for NSScrollWheel. Please use -momentumPhase. 2011-08-12 21:19:27.343 firefox[10028:1207] -_continuousScroll is deprecated for NSScrollWheel. Please use -hasPreciseScrollingDeltas. 2011-08-12 21:19:27.344 firefox[10028:1207] -deviceDeltaY is deprecated for NSScrollWheel. Please use -scrollingDeltaY. 2011-08-12 21:19:27.345 firefox[10028:1207] -deviceDeltaX is deprecated for NSScrollWheel. Please use -scrollingDeltaX. Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 13 at address: 0x0000000000000000 0x00007fff8cec345a in ___trackSwipeWithScrollEvent_block_invoke_1 () (gdb) where #0 0x00007fff8cec345a in ___trackSwipeWithScrollEvent_block_invoke_1 () #1 0x00007fff8cb3d6c0 in _NSSendEventToObservers () #2 0x00007fff8cb3b939 in -[NSApplication sendEvent:] () #3 0x0000000101cd370b in -[GeckoNSApplication sendEvent:] (self=0x10420ab20, _cmd=<value temporarily unavailable, due to optimizations>, anEvent=0x13c553e80) at /Users/bgirard/mozilla/mozilla-central/tree/widget/src/cocoa/nsAppShell.mm:192
Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 13 at address: 0x0000000000000000 0x00007fff82ad7390 in objc_msgSend_vtable14 () (gdb) where #0 0x00007fff82ad7390 in objc_msgSend_vtable14 () #1 0x00007fff8b417800 in CFRelease () #2 0x00007fff8b43fd20 in -[__NSArrayM dealloc] () #3 0x00007fff82adb03c in (anonymous namespace)::AutoreleasePoolPage::pop () #4 0x00007fff8b4406a5 in _CFAutoreleasePoolPop () #5 0x00007fff8adf60d7 in -[NSAutoreleasePool drain] () #6 0x00007fff8cad347a in -[NSApplication run] ()
Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 13 at address: 0x0000000000000000 0x00007fff8cec3187 in ___trackSwipeWithScrollEvent_block_invoke_1 () (gdb) where #0 0x00007fff8cec3187 in ___trackSwipeWithScrollEvent_block_invoke_1 () #1 0x00007fff8cb3d6c0 in _NSSendEventToObservers () #2 0x00007fff8cb3b939 in -[NSApplication sendEvent:] () #3 0x0000000101cd370b in -[GeckoNSApplication sendEvent:] (self=0x104522a70, _cmd=<value temporarily unavailable, due to optimizations>, anEvent=0x13d013db0) at /Users/bgirard/mozilla/mozilla-central/tree/widget/src/cocoa/nsAppShell.mm:192
After fixing the block copy I crash on the following more descriptive trace: firefox(95686,0x7fff72bdb960) malloc: *** error for object 0x14798ada0: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug Breakpoint 1, 0x00007fff8aaed6c0 in malloc_error_break () (gdb) where #0 0x00007fff8aaed6c0 in malloc_error_break () #1 0x00007fff8aaed805 in free () #2 0x00007fff82adad53 in object_dispose () #3 0x00007fff8b43fe16 in -[NSObject dealloc] () #4 0x00007fff8ceb9f14 in -[_NSEventObserver dealloc] () #5 0x00007fff8b417800 in CFRelease () #6 0x00007fff8b43fd20 in -[__NSArrayM dealloc] () #7 0x00007fff82adb03c in (anonymous namespace)::AutoreleasePoolPage::pop () #8 0x00007fff8b4406a5 in _CFAutoreleasePoolPop () #9 0x00007fff8adf60d7 in -[NSAutoreleasePool drain] () #10 0x00007fff8cad347a in -[NSApplication run] ()
I have no idea whether that might help any, but have you tried adding *mSwipeAnimationCancelled = YES; mSwipeAnimationCancelled = nil; to -[ChildView dealloc]?
Actually, scratch that, that ChildView probably won't be released before the window closes.
(In reply to Benoit Girard (:BenWa) from comment #14) > After fixing the block copy How?
From what I gather this change is required but is no sufficient to fix the crash (don't have a proper patch handy sorry): in nsChildView.mm - usingHandler:^(CGFloat gestureAmount, NSEventPhase phase, BOOL isComplete, BOOL *stop) { + usingHandler:[^(CGFloat gestureAmount, NSEventPhase phase, BOOL isComplete, BOOL *stop) { *stop = YES; return; } if (isComplete) { if (gestureAmount) { nsSimpleGestureEvent geckoEventCopy(geckoEvent); if (gestureAmount > 0) { geckoEventCopy.direction |= nsIDOMSimpleGestureEvent::DIRECTION_LEFT; } else { geckoEventCopy.direction |= nsIDOMSimpleGestureEvent::DIRECTION_RIGHT; } mGeckoChild->DispatchWindowEvent(geckoEventCopy); } mSwipeAnimationCancelled = nil; } - }]; + } copy]]; Otherwise the block will get release when maybeTrackScrollEventAsSwipe returns. So by copying the blocks we ensure that it will survive when the method return and will be owned by trackSwipeEventWithOptions.
Interesting. I really need to read up on how blocks work. Another idea (which is probably completely irrelevant again, but might be worth trying): Instead of copying geckoEvent from the scope, how about retaining the NSEvent and creating a fresh geckoEvent from the NSEvent inside the block? All quick block overviews I've read say something like "blocks handle memory management of non-__block scope variables, don't worry about it, it's magic". Since geckoEvent is a C++ object, maybe something goes wrong there... even though the stack trace doesn't really point to it.
Attached patch retain anEvent (deleted) — Splinter Review
(In reply to Markus Stange from comment #19) > Interesting. I really need to read up on how blocks work. > > Another idea (which is probably completely irrelevant again, but might be > worth trying): Instead of copying geckoEvent from the scope, how about > retaining the NSEvent and creating a fresh geckoEvent from the NSEvent > inside the block? > > All quick block overviews I've read say something like "blocks handle memory > management of non-__block scope variables, don't worry about it, it's > magic". Since geckoEvent is a C++ object, maybe something goes wrong > there... even though the stack trace doesn't really point to it. At this point I don't believe this is such a far fetch idea :). After seeing the compile segfault by adding a __block keyword I am worried that we are running into a compiler bug. Have you tried your patch without/with my block copy changes?
In my tests I find that Benoit's patch (sending 'copy' to the trackingHandler) doesn't help -- I crash just as often. I also suspect Markus' patch won't help, either (though I'll also try it out). I have a hunch that I'll be playing out over the course of the day: OS code is 'autorelease'ing some object when it's unsafe to release it. This doesn't cause trouble in Safari, or when geckoEventCopy isn't dispatched from the trackingHandler. But when geckoEventCopy *is* dispatched, there are sometimes nested calls to process native events, some of which can cause the autorelease pool to be drained. This can cause the autoreleased object to be released "early". Right now I suspect (from the evidence of comment #14) that the autoreleased object is an _NSEventObserver object (or possibly another object that holds a reference to an _NSEventObserver object -- which makes sense because an _NSEventObserver object holds a reference to a block (which is the actual observer).
I *don't* think the autoreleased object is the trackingHandler block itself -- otherwise Benoit's copy patch would have worked.
(In reply to Benoit Girard (:BenWa) from comment #21) > Have you tried your patch without/with my block copy changes? No, as I said, I'll only get back to my 10.7 machine next week, so I can't test anything at the moment. I'm on my Macbook at the moment which runs 10.6 (and downloading Lion will take days over here...). (In reply to Steven Michaud from comment #22) As convoluted as it is, that hypothesis sounds pretty plausible :)
Another thing I've noticed: Moving the geckoEventCopy constructor to the top of trackingHandler makes it substantially harder to crash. I assume this is because OS code makes an immutable copy of geckoEvent (to pass to the geckoEventCopy constructor) only on first access, and putting it at the top of trackingHandler ensures that this always happens before maybeTrackSwipeEvent has returned. Doing this doesn't eliminate *all* crashes, though.
Have you tried running with Zombies turned on and the Allocations instrument running? That should reveal more about what exactly is being over released, and should even give you a history of all the stacks that retained/released it.
I've tried running with libgmalloc, but not yet with NSZombies. libgmalloc didn't tell me anything new. I've made some progress -- I've found that blocking -[_NSLocalEventObserver dealloc] stops the crashes from happening. But I haven't yet found exactly why dealloc is sometimes called on an _NSLocalEventObserver object that's already been deleted. (To stop all the crashes, you also need to do as I suggest in comment #25.)
> But I haven't yet found exactly why dealloc is sometimes called on > an _NSLocalEventObserver object that's already been deleted. Now I've found out more: +[NSEvent removeMonitor:] can be called twice on the same _NSLocalEventObserver object -- which (when it happens) causes a crash. I should be able to work around this by hooking calls to +[NSEvent addLocalMonitorForEventsMatchingMask:handler:] and +[NSEvent removeMonitor:] and maintaining a local copy of the sEventObservers hashtable (containing just those objects of class _NSLocalEventObserver). Of course it'd be nice to know the reason for the second call to +[NSEvent removeMonitor:] -- and I'm going to try to find out. But that's not absolutely necessary.
Attached patch Fix (obsolete) (deleted) — Splinter Review
Here's a patch that fixes this bug! It does what I described in comment #25 and comment #29. A tryserver build should be available in a few hours. I tested my patch by swiping rapidly between the pages listed in comment #1. Here they are again: www.techcrunch.com www.engadget.com slashdot.org techmeme.com/river news.ycombinator.com I also sometimes swiped back and forth while going through the list. I did this for three minutes straight without seeing any crashes.
Attachment #553794 - Flags: review?(bgirard)
Attachment #553794 - Flags: review?(bgirard) → review+
(In reply to Colin Barrett [:cbarrett] from comment #27) > Have you tried running with Zombies turned on and the Allocations instrument > running? That should reveal more about what exactly is being over released, > and should even give you a history of all the stacks that retained/released > it. I had tried it but didn't get any useful information and it crashes quickly. Instruments has been very useful but it hasn't been stable for me with the latest xcode updates. (In reply to Steven Michaud from comment #30) > Created attachment 553794 [details] [diff] [review] I spent 20 mins trying out this patch with a magic mouse and have seen no crash report.
It looks to me were are still missing something. I tried compiling a reduced example of what we have in that file: struct foo { foo(); foo(const foo&); void bar() const; }; typedef void (^bar)(); void g(bar x); void f() { foo x; g(^{x.bar();}); } I will attach the IL produce by clang, some points that are interesting: * The copy constructor to foo is called, so "x" in the block is a local copy, no need to do that manually. * G is passed a stack allocated struct for the block I am currently building firefox. I will try to reproduce the bug with parts of this patch backed out and see what we get.
Comment on attachment 553794 [details] [diff] [review] Fix (In reply to Steven Michaud from comment #30) > Created attachment 553794 [details] [diff] [review] > Fix > > Here's a patch that fixes this bug! Great! >+ // Make a dummy call to trackingHandler to ensure that it's called at least >+ // once before maybeTrackScrollEventAsSwipe:scrollOverflow: returns, so that >+ // geckoEvent is still in scope when geckoEventCopy is constructed from it. I still think holding on to the NSEvent would be a better solution than copying the geckoEvent. (Another thing I've thought of: The retains and releases that I added in attachment 553481 [details] [diff] [review] maybe aren't even necessary because the block should already do that on its own.)
> I still think holding on to the NSEvent would be a better solution > than copying the geckoEvent. Why? Holding on to the NSEvent could considerably extend its lifetime (by as much as a second or two), and some of the fields in NSEvent are pointers, which might in that time have become stale.
Part of my patch makes the following assumption (quoting from the patch): On first access to a local, non-__block variable in a block's enclosing scope, an immutable copy is made of it for use within the block. Rafael tells me that the IL he posted in comment #23 shows this is wrong. I'm in the process of figuring out how to read the IL (http://llvm.org/docs/LangRef.html), and what it means.
(In reply to Steven Michaud from comment #35) > > I still think holding on to the NSEvent would be a better solution > > than copying the geckoEvent. > > Why? Holding on to the NSEvent could considerably extend its lifetime > (by as much as a second or two), and some of the fields in NSEvent are > pointers, which might in that time have become stale. I don't think that can happen; at least the window pointer is strong. The context pointer is probably strong, too, but I couldn't verify that because it was always nil in my tests. We have other long-lived NSEvents in widget land and haven't had problems with them, for example gLastDragMouseDownEvent, and ChildViewMouseTracker:: sLastMouseMoveEvent which I've just added in bug 678481.
(In reply to comment #37) Yes, we may be able to get away with extending the life of the NSEvent. But I can't understand why you think doing this is "better". I think it's as ugly as sin :-)
And it's true that we have several other long-lived NSEvents ... but we actually *need* those :-)
I think it's better because copying Objective-C objects into blocks is well-tread ground and copying C++ objects is not. We also wouldn't need the "call the block while gecko event is still in scope" workaround that your current patch uses (or would we? I don't know).
That's what copy constructors are for ... and there is one. But the whole thing may be moot -- I suspect that, once I understand the IL from comment #33 properly, this part of my patch will change quite a lot.
Attached file another small testcase (deleted) —
(In reply to Steven Michaud from comment #41) > That's what copy constructors are for ... and there is one. Sure. I'm slowly coming to agree that copying geckoEvent should work just as well. Here's a small test program that I've used to play around with blocks, and it demonstrates that nothing we're doing here should be problematic. Output is at the end. The only notable thing I've found out is that [theBlock copy], which moves the block to the heap, also causes a copy of the CPPStruct object to live on the heap. But in hindsight that's not surprising at all.
I found http://www.friday.com/bbum/2009/08/29/blocks-tips-tricks/ very interesting because it explains why the [block copy] is necessary. It also says "The system provided APIs that take Blocks as arguments will copy said Blocks when warranted. For example, dispatch_async() will copy the passed in Blocks." and I'd find it slightly surprising if trackSwipeEvent... wouldn't make such a copy.
Attached file another small testcase (deleted) —
I've added some more output and found something which worries me: // CPPStruct(0x7fff5fbfe5b0) copy construction from 0x10040f9b8 // CPPStruct(0x7fff5fbfe5b0) destruction // block called with x = 0, &e: 0x7fff5fbfe5b0, e.mX: 5, [d retainCount]: 1 // CPPStruct(0x7fff5fbfe5b0) copy construction from 0x10040f9b8 // CPPStruct(0x7fff5fbfe5b0) destruction // block called with x = 1, &e: 0x7fff5fbfe5b0, e.mX: 5, [d retainCount]: 1 It looks like the block is called with a destroyed object. Transferring this to our specific case, the object that geckoEvent points to inside the block might already have been destroyed.
(In reply to Benoit Girard (:BenWa) from comment #31) > (In reply to Colin Barrett [:cbarrett] from comment #27) > > Have you tried running with Zombies turned on and the Allocations instrument > > running? That should reveal more about what exactly is being over released, > > and should even give you a history of all the stacks that retained/released > > it. > > I had tried it but didn't get any useful information and it crashes quickly. > Instruments has been very useful but it hasn't been stable for me with the > latest xcode updates. Assuming you're on Lion, are you using 4.1 or the 4.2 betas? The 4.2 betas are quite bad. Anyway, zombies at least should work just by setting the NSZombieEnabled environment variable to true. You won't get retain count stack traces though.
Attached patch Fix rev1 (remove unneeded code) (deleted) — Splinter Review
Just now I found out that only part of my patch is needed to fix these crashes. Because my changes to maybeTrackScrollEventAsSwipe:scrollOverflow: made it so much harder for me to reproduce these crashes, I assumed they were still required. But this isn't true -- the "other part" of my patch fixes these crashes all by itself. I tested my patch by rapidly doing two-finger swipes for five minutes straight -- I didn't crash at all. The wierdness that Markus found in how OS block-support code copies C++ structures still needs to be addressed. But it turns out to be irrelevant to this bug's crashes. I'll have more to say in a later comment.
Attachment #553794 - Attachment is obsolete: true
Attachment #554185 - Flags: review?(bgirard)
Attachment #554185 - Flags: review?(bgirard) → review+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
My patch for this bug is now in mozilla-central nightlies (starting with today's): ftp://ftp.mozilla.org/pub/firefox/nightly/2011-08-19-03-07-49-mozilla-central/firefox-9.0a1.en-US.mac-shark.dmg Please try it out, and let us know if you still see any crashes while swiping.
Apple just released OS X 10.7.1. But it doesn't fix this bug (testing with yesterday's mozilla-central nightly, which doesn't have my workaround).
(In reply to Markus Stange from comment #44) > Created attachment 553900 [details] > another small testcase > > I've added some more output and found something which worries me: > > // CPPStruct(0x7fff5fbfe5b0) copy construction from 0x10040f9b8 > // CPPStruct(0x7fff5fbfe5b0) destruction > // block called with x = 0, &e: 0x7fff5fbfe5b0, e.mX: 5, [d retainCount]: 1 > // CPPStruct(0x7fff5fbfe5b0) copy construction from 0x10040f9b8 > // CPPStruct(0x7fff5fbfe5b0) destruction > // block called with x = 1, &e: 0x7fff5fbfe5b0, e.mX: 5, [d retainCount]: 1 > > It looks like the block is called with a destroyed object. Transferring this > to our specific case, the object that geckoEvent points to inside the block > might already have been destroyed. I think you found a bug in gcc-4.2. With clang I get 2011-08-19 16:24:41.391 t[57666:707] RetainCountChangePrinter init 2011-08-19 16:24:41.392 t[57666:707] CPPStruct(0x7fff61ab8538) regular construction 2011-08-19 16:24:41.392 t[57666:707] CPPStruct(0x7fff61ab8530) copy construction from 0x7fff61ab8538 2011-08-19 16:24:41.393 t[57666:707] storing block and queuing for 2 calls 2011-08-19 16:24:41.394 t[57666:707] CPPStruct(0x7f9852912838) copy construction from 0x7fff61ab8530 2011-08-19 16:24:41.394 t[57666:707] RetainCountChangePrinter retain 2011-08-19 16:24:41.395 t[57666:707] RetainCountChangePrinter release 2011-08-19 16:24:41.395 t[57666:707] CPPStruct(0x7fff61ab8530) destruction 2011-08-19 16:24:41.396 t[57666:707] CPPStruct(0x7fff61ab8538) destruction 2011-08-19 16:24:41.895 t[57666:707] block called with x = 0, &e: 0x7f9852912838, e.mX: 5, [d retainCount]: 1 2011-08-19 16:24:42.398 t[57666:707] block called with x = 1, &e: 0x7f9852912838, e.mX: 5, [d retainCount]: 1 2011-08-19 16:24:42.399 t[57666:707] releasing block, retain count before release: 1 2011-08-19 16:24:42.400 t[57666:707] RetainCountChangePrinter release 2011-08-19 16:24:42.400 t[57666:707] RetainCountChangePrinter dealloc 2011-08-19 16:24:42.401 t[57666:707] CPPStruct(0x7f9852912838) destruction
> I think you found a bug in gcc-4.2. With clang I get I'm not yet convinced of that. It seems like a copy of CPPStruct is made and destroyed for every call to NSLog that refers to it (or to its address). It's definitely destroyed before NSLog "prints" its output, but possibly not before the content of that output is generated. I've been trying to find a way to pin this down.
Also, I though that, with XCode 4.1 (on OS X 10.7.X), gcc is a wrapper for clang.
> Also, I though that, with XCode 4.1 (on OS X 10.7.X), gcc is a wrapper for > clang. But Markus has been working on 10.6, so this (even if true) may not be relevant.
> Also, I though that, with XCode 4.1 (on OS X 10.7.X), gcc is a > wrapper for clang. However this may turn out, on OS X 10.7.1 I get the same results as Rafael when compiling Markus' 2nd testcase (from comment #44) when compiling with "clang test.mm -lstdc++ -framework Cocoa -o test". (And the same results as Markus when compiling with "g++ test.mm -framework Cocoa -o test".) So even if (in XCode 4.1) g++ is on some level a wrapper for clang, "g++" and "clang" seem to work at least slightly differently in XCode 4.1.
>> I think you found a bug in gcc-4.2. > > I'm not yet convinced of that. *Now* I'm convinced :-(
The puzzle, though, is why this doesn't seem to cause any crashes. Maybe the block-support code, though it definitely calls CPPStruct's destructor prematurely, somehow holds on to the memory occupied by CPPStruct.
(In reply to Steven Michaud from comment #53) > Also, I though that, with XCode 4.1 (on OS X 10.7.X), gcc is a wrapper for > clang. No, in xcode 4.1 "gcc" is llvm-gcc and "gcc-4.2" is the old Apple gcc.
(In reply to Steven Michaud from comment #57) > The puzzle, though, is why this doesn't seem to cause any crashes. > > Maybe the block-support code, though it definitely calls CPPStruct's > destructor prematurely, somehow holds on to the memory occupied by CPPStruct. It might just be luck that the memory still points to valid contents. I will run you test on valgrind.
valgrind is not working on 10.7 :-(
(In reply to comment # 58) Thanks for the info. I've also found the following from bug 574346 comment #0: > Note that LLVM-GCC is different to Clang: > GCC-4.2 -> Parser (GCC) / Code Generator (GCC) > LLVM-GCC -> Parser (GCC) / Code Generator (LLVM) > Clang -> Parser (Clang-LLVM) / Code Generator (LLVM) > valgrind is not working on 10.7 :-( You might want to bug Julian Seward :-) By the way, I've been trying to get Apple's compilers (gcc, llvm-gcc or clang) to dump C/C++ code showing how blocks are implemented, but I haven't been able to figure it out. Do you know a way to do this? I can get gcc to dump assembler and "RTL" ... but both are more difficult to read :-(
> You might want to bug Julian Seward :-) I did :-) > By the way, I've been trying to get Apple's compilers (gcc, llvm-gcc or > clang) to dump C/C++ code showing how blocks are implemented, but I haven't > been able to figure it out. > > Do you know a way to do this? > > I can get gcc to dump assembler and "RTL" ... but both are more difficult to > read :-( The example I posted is llvm IL, so you can use llvm-gcc or clang. The clang output should be easier to read in general and in the case of blocks (or other new features) more likely to be correct. What you want is clang++ -S -emit-llvm -o test.ll test.cpp or g++ -S -emit-llvm -o test.ll test.cpp I normally also add -Os to get a "cleaned up" IL. At -O0 the IL is quiet verbose.
I should have made my question more explicit: I'm trying to find out where the bug is that causes the CPPStruct destructor to be called prematurely using g++ on 10.6 and 10.7. It might be in the C/C++ code that Apple's compilers (presumably) generate to support blocks, or it might be in the compilers themselves. That's why I want to see this C/C++ code, if possible. Or maybe Apple's block support is generated at a lower level (in RTL using "traditional" gcc/g++ or IL using clang/clang++). I'll keep digging.
I see. Since both llvm-gcc (the one in xcode 4.1) and Apple's gcc 4.2 get it wrong, it is very likely that it is the frontend that is producing invalid gimple. I tried a simple testcase: ----------------- struct foo { foo(); ~foo(); foo(const foo&); void bar() const; }; typedef void (^bar)(); void g(bar x); void f() { foo x; g(^{x.bar();}); } --------------- With the llvm-gcc include in xcode 4.1 (a.k.a. gcc), I get the following: ... call void @_ZN3fooD1Ev(%struct.foo* %x) nounwind call void @_ZNK3foo3barEv(%struct.foo* %x) nounwind .... note that foo::bar is being called with a destructed object. No such sequence appears with clang.
In the full file, the IL shows a call to the destructor: call void @_ZN20nsSimpleGestureEventD1Ev(%struct.nsSimpleGestureEvent* %geckoEvent) ssp Followed by a use (the construction of the copy): call void @_ZN20nsSimpleGestureEventC1ERKS_(%struct.nsSimpleGestureEvent* %geckoEventCopy, %struct.nsSimpleGestureEvent* %geckoEvent) ssp What the constructor does in the end is call _ZN13nsCOMPtr_baseD2Ev on the various members. This destructor is defined in nsCOMPtr.o. That destructor checks it the raw pointer is null and call a virtual method (Release) in it. In summary, while the base memory is not being freed (the storage is part of the block context), all the destructors are and they include virtual calls, so this looks really scary.
all the destructors are *called*... I can try building the current llvm-gcc trunk to check if it is fixed, but I am not sure it is worth it. gcc is gone from xcode 4.2 (I will double check if llvm-gcc is too).
The "llvm-gcc/gcc can call the destructor too early" bug has been report to Apple. The rdar number is 10007196.
The gcc crashing bug is now rdar 10010257. The reduced testcase is: struct nsSimpleGestureEvent { nsSimpleGestureEvent(int); }; @implementation ChildView - (void)maybeTrackScrollEventAsSwipe:(void *)anEvent { __attribute__((__blocks__(byref))) nsSimpleGestureEvent geckoEvent(1); } @end
(In reply to comment #65) > _ZN20nsSimpleGestureEventD1Ev (aka > nsSimpleGestureEvent::~nsSimpleGestureEvent()) > ... > What the constructor does in the end is call _ZN13nsCOMPtr_baseD2Ev > on the various members. This destructor is defined in nsCOMPtr.o. > > That destructor checks it the raw pointer is null and call a virtual > method (Release) in it. This doesn't make much sense. If you change "what the constructor ..." to "what the destructor ..." it's still not very clear, but it does start to make sense. The problem is that an nsSimpleGestureEvent object, while not itself an XPCOM object, has a number of member variables that *are* XPCOM objects: nsEvent.userType nsEvent.target nsEvent.currentTarget nsEvent.originalTarget nsGUIEvent.widget nsMouseEvent_base.relatedTarget Each of these objects gets destroyed (its destructor is called) when its parent destructor is called. But the entire object is stored inside the parent (an nsCOMPtr<> is not a pointer). And while Release() is called on nsCOMPtr_base.mRawPtr (which *is* a pointer) if it's non-null, this doesn't necessarily mean that the object pointed to by mRawPtr is also released (if other objects also hold references to it). The trackingHandler block's prologue uses nsSimpleGestureEvent's copy constructor to make a copy (of another good copy) of geckoEvent before any of "our own" block code runs: First it uses "alloca" to allocate sizeof(nsSimpleGestureEvent) bytes on the stack (which won't go out of scope until the end of the trackingHandler block). Then it calls the nsSimpleGestureEvent copy constructor to initialize the memory it just allocated. To make the following discussion easier to understand, I'll call this local copy prologueGeckoEvent. But (and here's Apple's bug), immediately afterwards the prologue calls nsSimpleGestureEvent's destructor (and therefore also all of its superclasses' destructors) on prologueGeckoEvent. This doesn't free the space occupied by prologueGeckoEvent, or by any of its nsCOMPtr member variables. But each nsCOMPtr's destructor does get called, and Release() does get called on each non-null nsCOMPtr_base.mRawPtr (which stays non-null). But even this doesn't, by itself, cause any trouble. It doesn't cause any of the XPCOM objects pointed to by prologueGeckoEvent's nsCOMPtrs to be released prematurely -- since the good copy from which prologueGeckoEvent was created still exists, and stays "good" until after the block is called for the last time. And so each of these nsCOMPtrs' mRawPtrs remains valid -- whether or not it's null.
Comment #70 shows (I think) why Apple's "premature destructor" bug doesn't cause current code (specifically my patch for bug 668953) to crash, and why it's safe to continue using it. But we definitely don't want to start using "blocks" freely in our OS X code until we move to always and only using clang (instead of gcc/g++ or llvm-gcc/llvm-g++). I breathe a sigh of relief that the destructors for nsEvent and its subclasses mostly don't do anything, and that nsCOMPtr objects behave the way they do.
> This doesn't make much sense. > > If you change "what the constructor ..." to "what the destructor ..." > it's still not very clear, but it does start to make sense. Yes, sorry. I was typing while reading the IL. The summary is that the only interesting thing the destructor does is call the virtual Release in the member variable that are smart pointers. > The trackingHandler block's prologue uses nsSimpleGestureEvent's copy > constructor to make a copy (of another good copy) of geckoEvent before > any of "our own" block code runs: First it uses "alloca" to allocate > sizeof(nsSimpleGestureEvent) bytes on the stack (which won't go out of > scope until the end of the trackingHandler block). Then it calls the > nsSimpleGestureEvent copy constructor to initialize the memory it just > allocated. To make the following discussion easier to understand, > I'll call this local copy prologueGeckoEvent. correct. > But (and here's Apple's bug), immediately afterwards the prologue > calls nsSimpleGestureEvent's destructor (and therefore also all of its > superclasses' destructors) on prologueGeckoEvent. This doesn't free > the space occupied by prologueGeckoEvent, or by any of its nsCOMPtr > member variables. But each nsCOMPtr's destructor does get called, and > Release() does get called on each non-null nsCOMPtr_base.mRawPtr > (which stays non-null). Check. > But even this doesn't, by itself, cause any trouble. It doesn't cause > any of the XPCOM objects pointed to by prologueGeckoEvent's nsCOMPtrs > to be released prematurely -- since the good copy from which > prologueGeckoEvent was created still exists, and stays "good" until > after the block is called for the last time. And so each of these > nsCOMPtrs' mRawPtrs remains valid -- whether or not it's null. I haven't investigated the code past the calls to Release. Happy to know it is safe in the current state. I am still a bit nervous about it, since it is an unstable situation. A change to the destructor of the class, of a base class or of a member can make this unsafe :-(
> A change to the destructor of the class, of a base class or of a > member can make this unsafe :-( Yes :-( The only reason I want to use blocks now is to fix bug 668953, for which there's very high demand. Apple's new [NSEvent trackSwipeEventWithOptions:...] uses blocks, and it seems to be the best way to provide support for two-finger swipes on OS X Lion. The nsCOMPtr code seems quite stable, and nsEvent and its subclasses mostly don't have explicit destructors. So I think there's a good chance we can get away with continuing to use [NSEvent trackSwipeEventWithOptions:...] for the next year or so, despite Apple's bug. However, since the premature destructor bug doesn't exist in clang, it seems unlikely that Apple's going to fix it in gcc/g++ or llvm-gcc/llvm-g++. So if we want to keep using [NSEvent trackSwipeEventWithOptions:...] (or blocks in general), we'll almost certainly need to switch to using clang on OS X. As far as I know, the biggest obstacle to switching to clang is that we need to continue supporting OS X Leopard (10.5.X) for a while. But Apple will soon have released its last security update for Leopard (if it hasn't done so already), and we'll probably want to drop our own support (on the trunk) within the next 12 months or so. So maybe we should plan to switch to clang at the same time we drop support for Leopard.
(In reply to Steven Michaud from comment #73) > > A change to the destructor of the class, of a base class or of a > > member can make this unsafe :-( > > Yes :-( > > The only reason I want to use blocks now is to fix bug 668953, for > which there's very high demand. Apple's new [NSEvent > trackSwipeEventWithOptions:...] uses blocks, and it seems to be the > best way to provide support for two-finger swipes on OS X Lion. > > The nsCOMPtr code seems quite stable, and nsEvent and its subclasses > mostly don't have explicit destructors. So I think there's a good > chance we can get away with continuing to use [NSEvent > trackSwipeEventWithOptions:...] for the next year or so, despite > Apple's bug. Agreed, hopefully we will be able to switch to clang before this breaks. > However, since the premature destructor bug doesn't exist in clang, it > seems unlikely that Apple's going to fix it in gcc/g++ or > llvm-gcc/llvm-g++. So if we want to keep using [NSEvent > trackSwipeEventWithOptions:...] (or blocks in general), we'll almost > certainly need to switch to using clang on OS X. Yes, once we move out of vc 2005 gcc 4.2 will also be our oldest compiler. > As far as I know, the biggest obstacle to switching to clang is that > we need to continue supporting OS X Leopard (10.5.X) for a while. But > Apple will soon have released its last security update for Leopard (if > it hasn't done so already), and we'll probably want to drop our own > support (on the trunk) within the next 12 months or so. Clang works just fine targeting 10.5. The obstacles I know for a full switch are * Clang -O0 produces programs that use too much stack, which causes oranges in the bots. We want to switch the bots to use optimizations anyway, so this should be fixed soon. (bug 669953) * Infrastructure. We need to set up bots running clang. We already have a clang package installed, so we need to figure out with releng how hard it is (bug 629459). > So maybe we should plan to switch to clang at the same time we drop > support for Leopard. I agree we need both, but they can be independent :-)
> Clang works just fine targeting 10.5. Can you do clang builds while using the 10.5 SDK? Work's in progress at bug 674655 to switch to using the 10.6 SDK even for builds that are supposed to be runnable on 10.5. And so far no problems have (apparently) been found. But, like Josh, I'm skeptical -- Apple has a history of neglecting backwards compatibility, and sometimes there's no substitute for linking to the actual target libraries (in this case the 10.5 libraries). We can't be sure this will work until it's undergone a *lot* of testing, including testing by users.
I have done builds some time ago using the 10.5 SDK. In fact, I did try build on the bots, so they use our current 10.6/10.5 sdk split for 64 and 32 bits. I will try sending a new try request.
I've opened bug 681694 on the Apple premature destructor bug.
The crash signature of bug 673456 is #2 top browser crasher in 7.0b1 and #1 on Mac OS X only.
Crash Signature: [@ libobjc.A.dylib@0xb390 ] [@ CoreFoundation@0x884e ] → [@ libobjc.A.dylib@0xb390 ] [@ CoreFoundation@0x884e ] [@ objc_msgSend_vtable14 | CFRelease | -[__NSArrayM dealloc] ]
Depends on: 682445
Steven: [@ ___trackSwipeWithScrollEvent_block_invoke_1 ] is another new signature showing up on the 7.0 radar - I assume it may be related to this bug?
> Steven: [@ ___trackSwipeWithScrollEvent_block_invoke_1 ] is another new signature > showing up on the 7.0 radar - I assume it may be related to this bug? Yes, I'm sure it is.
Crash Signature: [@ libobjc.A.dylib@0xb390 ] [@ CoreFoundation@0x884e ] [@ objc_msgSend_vtable14 | CFRelease | -[__NSArrayM dealloc] ] → [@ libobjc.A.dylib@0xb390 ] [@ CoreFoundation@0x884e ] [@ objc_msgSend_vtable14 | CFRelease | -[__NSArrayM dealloc] ] [@ ___trackSwipeWithScrollEvent_block_invoke_1 ]
Did we back this out for Aurora and Beta? The status is still "affected" for those trains.
> Did we back this out for Aurora and Beta? Not this patch. We backed out the patch for bug 668953, which triggered this bug's crashes (because it introduced Apple's blocks to the tree). > The status is still "affected" for those trains. It shouldn't be.
(In reply to Steven Michaud from comment #84) > > The status is still "affected" for those trains. > It shouldn't be. It's me that marked this bug as affected for 7.0 and 8.0, but after looking at crash reports, I see no occurences of these crash signatures in 7.0b2 or 8.0a2: https://crash-stats.mozilla.com/report/list?signature=libobjc.A.dylib@0xb390 https://crash-stats.mozilla.com/report/list?signature=CoreFoundation@0x884e https://crash-stats.mozilla.com/report/list?signature=objc_msgSend_vtable14%20|%20CFRelease%20|%20-[__NSArrayM%20dealloc] https://crash-stats.mozilla.com/report/list?signature=___trackSwipeWithScrollEvent_block_invoke_1
Target Milestone: --- → mozilla9
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: