Closed
Bug 678607
Opened 13 years ago
Closed 13 years ago
[10.7] crash with two finger swipe
Categories
(Core :: Widget: Cocoa, defect)
Tracking
()
RESOLVED
FIXED
mozilla9
People
(Reporter: asa, Assigned: smichaud)
References
(Blocks 1 open bug)
Details
Crash Data
Attachments
(7 files, 1 obsolete file)
With the new two-finger swipe (implemented in Bug 668953 bug 668953) for back and forward navigation on Mac OS X Lion, I can, with some reproducibility, cause Firefox to crash.
Steps to reproduce:
1. visit five or six addresses in the same tab.
2. swipe swipe swipe to the end of your history in one direction
3. swipe swipe swipe to the end of your history in the other direction
4. repeat until you crash.
Results:
Crash in [@ CoreFoundation@0x884e ] or [@ libobjc.A.dylib@0xb390 ]
Tested on Mac OS X 10.7 (Lion) using a 3.06 GHz Core i3 iMac with a Magic Trackpad
Assignee | ||
Comment 1•13 years ago
|
||
The addresses Asa gave me (and which which it's easy for to me to reproduce this bug) are:
www.techcrunch.com
www.engadget.com
slashdot.org
techmeme.com/river
news.ycombinator.com
Assignee | ||
Comment 2•13 years ago
|
||
As Asa pointed out, you don't crash if you use the Back and Forward buttons to switch between these sites.
Assignee | ||
Comment 3•13 years ago
|
||
I'm going to see what I can do about this.
But it'll take at least several days, and there may not be anything I can do. I suspect this is an Apple bug.
Assignee | ||
Comment 4•13 years ago
|
||
Here are a couple of Asa's crash stacks, with their symbols translated:
https://crash-stats.mozilla.com/report/index/bp-7c0602f1-5fec-4e49-bbf9-e946d2110812
0 CoreFoundation CoreFoundation@0x884e
1 CoreFoundation CoreFoundation@0x30d1f
2 libobjc.A.dylib libobjc.A.dylib@0xd2c5
3 libobjc.A.dylib libobjc.A.dylib@0xd4f8
4 libobjc.A.dylib libobjc.A.dylib@0xf03b
5 AppKit AppKit@0x6ceef
6 AppKit AppKit@0x6d938
7 AppKit AppKit@0x3f2eba
8 libobjc.A.dylib libobjc.A.dylib@0xd2c5
9 libobjc.A.dylib libobjc.A.dylib@0xd254
10 CoreFoundation CoreFoundation@0x30738
11 AppKit AppKit@0x6d55b
12 AppKit AppKit@0x8e04
0 CFRelease (in CoreFoundation) + 254
1 -[__NSArrayM dealloc] (in CoreFoundation) + 303
2 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::FindAndConstruct(objc_object* const&) (in libobjc.A.dylib) + 75
3 _objc_rootReleaseWasZero (in libobjc.A.dylib) + 188
4 (anonymous namespace)::AutoreleasePoolPage::pop(void*) (in libobjc.A.dylib) + 433
5 _NSPopoverPreProcessLocalEvent (in AppKit) + 46
6 -[NSApplication sendEvent:] (in AppKit) + 64
7 -[NSEvent momentumPhase] (in AppKit) + 32
8 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::FindAndConstruct(objc_object* const&) (in libobjc.A.dylib) + 75
9 _objc_rootRetain (in libobjc.A.dylib) + 107
10 -[NSObject retain] (in CoreFoundation) + 40
11 -[NSApplication _setCurrentEvent:] (in AppKit) + 107
12 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 1014
https://crash-stats.mozilla.com/report/index/bp-a14e1854-5fde-4834-88f3-0fe552110812
0 libobjc.A.dylib libobjc.A.dylib@0xb390
1 CoreFoundation CoreFoundation@0x87ff
2 CoreFoundation CoreFoundation@0x30d1f
3 libobjc.A.dylib libobjc.A.dylib@0xd2c5
4 libobjc.A.dylib libobjc.A.dylib@0xd4f8
5 libobjc.A.dylib libobjc.A.dylib@0xf03b
6 AppKit AppKit@0x8c485d
7 AppKit AppKit@0x6ceef
8 AppKit AppKit@0x3f474e
9 AppKit AppKit@0x6d938
10 AppKit AppKit@0x3f5077
11 AppKit AppKit@0x3f49d4
12 AppKit AppKit@0x3f474e
13 libobjc.A.dylib libobjc.A.dylib@0xd2c5
14 libobjc.A.dylib libobjc.A.dylib@0xd254
15 CoreFoundation CoreFoundation@0x30738
16 AppKit AppKit@0x6d55b
17 AppKit AppKit@0x8e04
0 objc_msgSend_vtable14 (in libobjc.A.dylib) + 16
1 CFRelease (in CoreFoundation) + 175
2 -[__NSArrayM dealloc] (in CoreFoundation) + 303
3 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::FindAndConstruct(objc_object* const&) (in libobjc.A.dylib) + 75
4 _objc_rootReleaseWasZero (in libobjc.A.dylib) + 188
5 (anonymous namespace)::AutoreleasePoolPage::pop(void*) (in libobjc.A.dylib) + 433
6 0x008c485d (in AppKit)
7 _NSPopoverPreProcessLocalEvent (in AppKit) + 46
8 __destroy_helper_block_5 (in AppKit) + 18
9 -[NSApplication sendEvent:] (in AppKit) + 64
10 __-[NSEvent _initTouches]_block_invoke_1 (in AppKit) + 75
11 __destroy_helper_block_19 (in AppKit) + 18
12 __destroy_helper_block_5 (in AppKit) + 18
13 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::FindAndConstruct(objc_object* const&) (in libobjc.A.dylib) + 75
14 _objc_rootRetain (in libobjc.A.dylib) + 107
15 -[NSObject retain] (in CoreFoundation) + 40
16 -[NSApplication _setCurrentEvent:] (in AppKit) + 107
17 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (in AppKit) + 1014
Assignee | ||
Comment 5•13 years ago
|
||
I used atos -arch x86_64 [binary] [0xaddress] to obtain these translations.
Assignee | ||
Comment 6•13 years ago
|
||
> atos -arch x86_64 [binary] [0xaddress]
atos -arch x86_64 -o [path-to-binary] [0xaddress]
Comment 7•13 years ago
|
||
I tested this using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0a2) Gecko/20110812 Firefox/7.0a2 with similar sites, but haven't yet been able to reproduce the crash. Here are my specifics:
*Using Magic Mouse - not trackpad
*OS X 10.7.2 Build 11C37
*2.8 GHz Intel Core 2 Duo with 8 GB RAM
Assignee | ||
Comment 8•13 years ago
|
||
> *OS X 10.7.2 Build 11C37
What's this?
Assignee | ||
Comment 9•13 years ago
|
||
>> *OS X 10.7.2 Build 11C37
>
> What's this?
And where can you get it? :-)
Comment 10•13 years ago
|
||
Steven: I found it in the iCloud area of the Mac Developer website. You can download it as a .dmg and install. It was just released today.
Comment 11•13 years ago
|
||
Got the following trace in GDB:
2011-08-12 21:19:02.769 firefox[10028:1207] -_scrollPhase is deprecated for NSScrollWheel. Please use -momentumPhase.
2011-08-12 21:19:27.343 firefox[10028:1207] -_continuousScroll is deprecated for NSScrollWheel. Please use -hasPreciseScrollingDeltas.
2011-08-12 21:19:27.344 firefox[10028:1207] -deviceDeltaY is deprecated for NSScrollWheel. Please use -scrollingDeltaY.
2011-08-12 21:19:27.345 firefox[10028:1207] -deviceDeltaX is deprecated for NSScrollWheel. Please use -scrollingDeltaX.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x00007fff8cec345a in ___trackSwipeWithScrollEvent_block_invoke_1 ()
(gdb) where
#0 0x00007fff8cec345a in ___trackSwipeWithScrollEvent_block_invoke_1 ()
#1 0x00007fff8cb3d6c0 in _NSSendEventToObservers ()
#2 0x00007fff8cb3b939 in -[NSApplication sendEvent:] ()
#3 0x0000000101cd370b in -[GeckoNSApplication sendEvent:] (self=0x10420ab20, _cmd=<value temporarily unavailable, due to optimizations>, anEvent=0x13c553e80) at /Users/bgirard/mozilla/mozilla-central/tree/widget/src/cocoa/nsAppShell.mm:192
Comment 12•13 years ago
|
||
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x00007fff82ad7390 in objc_msgSend_vtable14 ()
(gdb) where
#0 0x00007fff82ad7390 in objc_msgSend_vtable14 ()
#1 0x00007fff8b417800 in CFRelease ()
#2 0x00007fff8b43fd20 in -[__NSArrayM dealloc] ()
#3 0x00007fff82adb03c in (anonymous namespace)::AutoreleasePoolPage::pop ()
#4 0x00007fff8b4406a5 in _CFAutoreleasePoolPop ()
#5 0x00007fff8adf60d7 in -[NSAutoreleasePool drain] ()
#6 0x00007fff8cad347a in -[NSApplication run] ()
Comment 13•13 years ago
|
||
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x00007fff8cec3187 in ___trackSwipeWithScrollEvent_block_invoke_1 ()
(gdb) where
#0 0x00007fff8cec3187 in ___trackSwipeWithScrollEvent_block_invoke_1 ()
#1 0x00007fff8cb3d6c0 in _NSSendEventToObservers ()
#2 0x00007fff8cb3b939 in -[NSApplication sendEvent:] ()
#3 0x0000000101cd370b in -[GeckoNSApplication sendEvent:] (self=0x104522a70, _cmd=<value temporarily unavailable, due to optimizations>, anEvent=0x13d013db0) at /Users/bgirard/mozilla/mozilla-central/tree/widget/src/cocoa/nsAppShell.mm:192
Comment 14•13 years ago
|
||
After fixing the block copy I crash on the following more descriptive trace:
firefox(95686,0x7fff72bdb960) malloc: *** error for object 0x14798ada0: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Breakpoint 1, 0x00007fff8aaed6c0 in malloc_error_break ()
(gdb) where
#0 0x00007fff8aaed6c0 in malloc_error_break ()
#1 0x00007fff8aaed805 in free ()
#2 0x00007fff82adad53 in object_dispose ()
#3 0x00007fff8b43fe16 in -[NSObject dealloc] ()
#4 0x00007fff8ceb9f14 in -[_NSEventObserver dealloc] ()
#5 0x00007fff8b417800 in CFRelease ()
#6 0x00007fff8b43fd20 in -[__NSArrayM dealloc] ()
#7 0x00007fff82adb03c in (anonymous namespace)::AutoreleasePoolPage::pop ()
#8 0x00007fff8b4406a5 in _CFAutoreleasePoolPop ()
#9 0x00007fff8adf60d7 in -[NSAutoreleasePool drain] ()
#10 0x00007fff8cad347a in -[NSApplication run] ()
Comment 15•13 years ago
|
||
I have no idea whether that might help any, but have you tried adding *mSwipeAnimationCancelled = YES; mSwipeAnimationCancelled = nil;
to -[ChildView dealloc]?
Comment 16•13 years ago
|
||
Actually, scratch that, that ChildView probably won't be released before the window closes.
Comment 17•13 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #14)
> After fixing the block copy
How?
Comment 18•13 years ago
|
||
From what I gather this change is required but is no sufficient to fix the crash (don't have a proper patch handy sorry):
in nsChildView.mm
- usingHandler:^(CGFloat gestureAmount, NSEventPhase phase, BOOL isComplete, BOOL *stop) {
+ usingHandler:[^(CGFloat gestureAmount, NSEventPhase phase, BOOL isComplete, BOOL *stop) {
*stop = YES;
return;
}
if (isComplete) {
if (gestureAmount) {
nsSimpleGestureEvent geckoEventCopy(geckoEvent);
if (gestureAmount > 0) {
geckoEventCopy.direction |= nsIDOMSimpleGestureEvent::DIRECTION_LEFT;
} else {
geckoEventCopy.direction |= nsIDOMSimpleGestureEvent::DIRECTION_RIGHT;
}
mGeckoChild->DispatchWindowEvent(geckoEventCopy);
}
mSwipeAnimationCancelled = nil;
}
- }];
+ } copy]];
Otherwise the block will get release when maybeTrackScrollEventAsSwipe returns. So by copying the blocks we ensure that it will survive when the method return and will be owned by trackSwipeEventWithOptions.
Comment 19•13 years ago
|
||
Interesting. I really need to read up on how blocks work.
Another idea (which is probably completely irrelevant again, but might be worth trying): Instead of copying geckoEvent from the scope, how about retaining the NSEvent and creating a fresh geckoEvent from the NSEvent inside the block?
All quick block overviews I've read say something like "blocks handle memory management of non-__block scope variables, don't worry about it, it's magic". Since geckoEvent is a C++ object, maybe something goes wrong there... even though the stack trace doesn't really point to it.
Comment 20•13 years ago
|
||
Comment 21•13 years ago
|
||
(In reply to Markus Stange from comment #19)
> Interesting. I really need to read up on how blocks work.
>
> Another idea (which is probably completely irrelevant again, but might be
> worth trying): Instead of copying geckoEvent from the scope, how about
> retaining the NSEvent and creating a fresh geckoEvent from the NSEvent
> inside the block?
>
> All quick block overviews I've read say something like "blocks handle memory
> management of non-__block scope variables, don't worry about it, it's
> magic". Since geckoEvent is a C++ object, maybe something goes wrong
> there... even though the stack trace doesn't really point to it.
At this point I don't believe this is such a far fetch idea :). After seeing the compile segfault by adding a __block keyword I am worried that we are running into a compiler bug.
Have you tried your patch without/with my block copy changes?
Assignee | ||
Comment 22•13 years ago
|
||
In my tests I find that Benoit's patch (sending 'copy' to the
trackingHandler) doesn't help -- I crash just as often.
I also suspect Markus' patch won't help, either (though I'll also try
it out).
I have a hunch that I'll be playing out over the course of the day:
OS code is 'autorelease'ing some object when it's unsafe to release
it. This doesn't cause trouble in Safari, or when geckoEventCopy
isn't dispatched from the trackingHandler. But when geckoEventCopy
*is* dispatched, there are sometimes nested calls to process native
events, some of which can cause the autorelease pool to be drained.
This can cause the autoreleased object to be released "early".
Right now I suspect (from the evidence of comment #14) that the
autoreleased object is an _NSEventObserver object (or possibly another
object that holds a reference to an _NSEventObserver object -- which
makes sense because an _NSEventObserver object holds a reference to a
block (which is the actual observer).
Assignee | ||
Comment 23•13 years ago
|
||
I *don't* think the autoreleased object is the trackingHandler block itself -- otherwise Benoit's copy patch would have worked.
Comment 24•13 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #21)
> Have you tried your patch without/with my block copy changes?
No, as I said, I'll only get back to my 10.7 machine next week, so I can't test anything at the moment. I'm on my Macbook at the moment which runs 10.6 (and downloading Lion will take days over here...).
(In reply to Steven Michaud from comment #22)
As convoluted as it is, that hypothesis sounds pretty plausible :)
Assignee | ||
Comment 25•13 years ago
|
||
Another thing I've noticed:
Moving the geckoEventCopy constructor to the top of trackingHandler makes it substantially harder to crash. I assume this is because OS code makes an immutable copy of geckoEvent (to pass to the geckoEventCopy constructor) only on first access, and putting it at the top of trackingHandler ensures that this always happens before maybeTrackSwipeEvent has returned.
Doing this doesn't eliminate *all* crashes, though.
Comment 27•13 years ago
|
||
Have you tried running with Zombies turned on and the Allocations instrument running? That should reveal more about what exactly is being over released, and should even give you a history of all the stacks that retained/released it.
Assignee | ||
Comment 28•13 years ago
|
||
I've tried running with libgmalloc, but not yet with NSZombies.
libgmalloc didn't tell me anything new.
I've made some progress -- I've found that blocking -[_NSLocalEventObserver dealloc] stops the crashes from happening. But I haven't yet found exactly why dealloc is sometimes called on an _NSLocalEventObserver object that's already been deleted.
(To stop all the crashes, you also need to do as I suggest in comment #25.)
Assignee | ||
Comment 29•13 years ago
|
||
> But I haven't yet found exactly why dealloc is sometimes called on
> an _NSLocalEventObserver object that's already been deleted.
Now I've found out more:
+[NSEvent removeMonitor:] can be called twice on the same
_NSLocalEventObserver object -- which (when it happens) causes a
crash. I should be able to work around this by hooking calls to
+[NSEvent addLocalMonitorForEventsMatchingMask:handler:] and +[NSEvent
removeMonitor:] and maintaining a local copy of the sEventObservers
hashtable (containing just those objects of class
_NSLocalEventObserver).
Of course it'd be nice to know the reason for the second call to
+[NSEvent removeMonitor:] -- and I'm going to try to find out. But
that's not absolutely necessary.
Assignee | ||
Comment 30•13 years ago
|
||
Here's a patch that fixes this bug!
It does what I described in comment #25 and comment #29.
A tryserver build should be available in a few hours.
I tested my patch by swiping rapidly between the pages listed in
comment #1. Here they are again:
www.techcrunch.com
www.engadget.com
slashdot.org
techmeme.com/river
news.ycombinator.com
I also sometimes swiped back and forth while going through the list.
I did this for three minutes straight without seeing any crashes.
Attachment #553794 -
Flags: review?(bgirard)
Updated•13 years ago
|
Attachment #553794 -
Flags: review?(bgirard) → review+
Comment 31•13 years ago
|
||
(In reply to Colin Barrett [:cbarrett] from comment #27)
> Have you tried running with Zombies turned on and the Allocations instrument
> running? That should reveal more about what exactly is being over released,
> and should even give you a history of all the stacks that retained/released
> it.
I had tried it but didn't get any useful information and it crashes quickly. Instruments has been very useful but it hasn't been stable for me with the latest xcode updates.
(In reply to Steven Michaud from comment #30)
> Created attachment 553794 [details] [diff] [review]
I spent 20 mins trying out this patch with a magic mouse and have seen no crash report.
It looks to me were are still missing something. I tried compiling a reduced example of what we have in that file:
struct foo {
foo();
foo(const foo&);
void bar() const;
};
typedef void (^bar)();
void g(bar x);
void f() {
foo x;
g(^{x.bar();});
}
I will attach the IL produce by clang, some points that are interesting:
* The copy constructor to foo is called, so "x" in the block is a local copy, no need to do that manually.
* G is passed a stack allocated struct for the block
I am currently building firefox. I will try to reproduce the bug with parts of this patch backed out and see what we get.
Comment 34•13 years ago
|
||
Comment on attachment 553794 [details] [diff] [review]
Fix
(In reply to Steven Michaud from comment #30)
> Created attachment 553794 [details] [diff] [review]
> Fix
>
> Here's a patch that fixes this bug!
Great!
>+ // Make a dummy call to trackingHandler to ensure that it's called at least
>+ // once before maybeTrackScrollEventAsSwipe:scrollOverflow: returns, so that
>+ // geckoEvent is still in scope when geckoEventCopy is constructed from it.
I still think holding on to the NSEvent would be a better solution than copying the geckoEvent.
(Another thing I've thought of: The retains and releases that I added in attachment 553481 [details] [diff] [review] maybe aren't even necessary because the block should already do that on its own.)
Assignee | ||
Comment 35•13 years ago
|
||
> I still think holding on to the NSEvent would be a better solution
> than copying the geckoEvent.
Why? Holding on to the NSEvent could considerably extend its lifetime
(by as much as a second or two), and some of the fields in NSEvent are
pointers, which might in that time have become stale.
Assignee | ||
Comment 36•13 years ago
|
||
Part of my patch makes the following assumption (quoting from the
patch):
On first access to a local, non-__block variable in a block's
enclosing scope, an immutable copy is made of it for use within the
block.
Rafael tells me that the IL he posted in comment #23 shows this is
wrong. I'm in the process of figuring out how to read the IL
(http://llvm.org/docs/LangRef.html), and what it means.
Comment 37•13 years ago
|
||
(In reply to Steven Michaud from comment #35)
> > I still think holding on to the NSEvent would be a better solution
> > than copying the geckoEvent.
>
> Why? Holding on to the NSEvent could considerably extend its lifetime
> (by as much as a second or two), and some of the fields in NSEvent are
> pointers, which might in that time have become stale.
I don't think that can happen; at least the window pointer is strong. The context pointer is probably strong, too, but I couldn't verify that because it was always nil in my tests.
We have other long-lived NSEvents in widget land and haven't had problems with them, for example gLastDragMouseDownEvent, and ChildViewMouseTracker:: sLastMouseMoveEvent which I've just added in bug 678481.
Assignee | ||
Comment 38•13 years ago
|
||
(In reply to comment #37)
Yes, we may be able to get away with extending the life of the NSEvent. But I can't understand why you think doing this is "better". I think it's as ugly as sin :-)
Assignee | ||
Comment 39•13 years ago
|
||
And it's true that we have several other long-lived NSEvents ... but we actually *need* those :-)
Comment 40•13 years ago
|
||
I think it's better because copying Objective-C objects into blocks is well-tread ground and copying C++ objects is not. We also wouldn't need the "call the block while gecko event is still in scope" workaround that your current patch uses (or would we? I don't know).
Assignee | ||
Comment 41•13 years ago
|
||
That's what copy constructors are for ... and there is one.
But the whole thing may be moot -- I suspect that, once I understand the IL from comment #33 properly, this part of my patch will change quite a lot.
Comment 42•13 years ago
|
||
(In reply to Steven Michaud from comment #41)
> That's what copy constructors are for ... and there is one.
Sure. I'm slowly coming to agree that copying geckoEvent should work just as well.
Here's a small test program that I've used to play around with blocks, and it demonstrates that nothing we're doing here should be problematic.
Output is at the end.
The only notable thing I've found out is that [theBlock copy], which moves the block to the heap, also causes a copy of the CPPStruct object to live on the heap. But in hindsight that's not surprising at all.
Comment 43•13 years ago
|
||
I found http://www.friday.com/bbum/2009/08/29/blocks-tips-tricks/ very interesting because it explains why the [block copy] is necessary. It also says "The system provided APIs that take Blocks as arguments will copy said Blocks when warranted. For example, dispatch_async() will copy the passed in Blocks." and I'd find it slightly surprising if trackSwipeEvent... wouldn't make such a copy.
Comment 44•13 years ago
|
||
I've added some more output and found something which worries me:
// CPPStruct(0x7fff5fbfe5b0) copy construction from 0x10040f9b8
// CPPStruct(0x7fff5fbfe5b0) destruction
// block called with x = 0, &e: 0x7fff5fbfe5b0, e.mX: 5, [d retainCount]: 1
// CPPStruct(0x7fff5fbfe5b0) copy construction from 0x10040f9b8
// CPPStruct(0x7fff5fbfe5b0) destruction
// block called with x = 1, &e: 0x7fff5fbfe5b0, e.mX: 5, [d retainCount]: 1
It looks like the block is called with a destroyed object. Transferring this to our specific case, the object that geckoEvent points to inside the block might already have been destroyed.
Comment 45•13 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #31)
> (In reply to Colin Barrett [:cbarrett] from comment #27)
> > Have you tried running with Zombies turned on and the Allocations instrument
> > running? That should reveal more about what exactly is being over released,
> > and should even give you a history of all the stacks that retained/released
> > it.
>
> I had tried it but didn't get any useful information and it crashes quickly.
> Instruments has been very useful but it hasn't been stable for me with the
> latest xcode updates.
Assuming you're on Lion, are you using 4.1 or the 4.2 betas? The 4.2 betas are quite bad.
Anyway, zombies at least should work just by setting the NSZombieEnabled environment variable to true. You won't get retain count stack traces though.
Assignee | ||
Comment 46•13 years ago
|
||
Just now I found out that only part of my patch is needed to fix these
crashes.
Because my changes to maybeTrackScrollEventAsSwipe:scrollOverflow:
made it so much harder for me to reproduce these crashes, I assumed
they were still required. But this isn't true -- the "other part" of
my patch fixes these crashes all by itself.
I tested my patch by rapidly doing two-finger swipes for five minutes
straight -- I didn't crash at all.
The wierdness that Markus found in how OS block-support code copies
C++ structures still needs to be addressed. But it turns out to be
irrelevant to this bug's crashes. I'll have more to say in a later
comment.
Attachment #553794 -
Attachment is obsolete: true
Attachment #554185 -
Flags: review?(bgirard)
Updated•13 years ago
|
Attachment #554185 -
Flags: review?(bgirard) → review+
Assignee | ||
Comment 47•13 years ago
|
||
Comment on attachment 554185 [details] [diff] [review]
Fix rev1 (remove unneeded code)
Landed on trunk:
http://hg.mozilla.org/mozilla-central/rev/2578bdcf32ee
Assignee | ||
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 48•13 years ago
|
||
My patch for this bug is now in mozilla-central nightlies (starting with today's):
ftp://ftp.mozilla.org/pub/firefox/nightly/2011-08-19-03-07-49-mozilla-central/firefox-9.0a1.en-US.mac-shark.dmg
Please try it out, and let us know if you still see any crashes while swiping.
Assignee | ||
Comment 49•13 years ago
|
||
Apple just released OS X 10.7.1. But it doesn't fix this bug (testing with yesterday's mozilla-central nightly, which doesn't have my workaround).
Assignee | ||
Comment 50•13 years ago
|
||
Comment #48 gave the wrong link. Here's the correct one:
ftp://ftp.mozilla.org/pub/firefox/nightly/2011-08-19-03-07-49-mozilla-central/firefox-9.0a1.en-US.mac.dmg
(In reply to Markus Stange from comment #44)
> Created attachment 553900 [details]
> another small testcase
>
> I've added some more output and found something which worries me:
>
> // CPPStruct(0x7fff5fbfe5b0) copy construction from 0x10040f9b8
> // CPPStruct(0x7fff5fbfe5b0) destruction
> // block called with x = 0, &e: 0x7fff5fbfe5b0, e.mX: 5, [d retainCount]: 1
> // CPPStruct(0x7fff5fbfe5b0) copy construction from 0x10040f9b8
> // CPPStruct(0x7fff5fbfe5b0) destruction
> // block called with x = 1, &e: 0x7fff5fbfe5b0, e.mX: 5, [d retainCount]: 1
>
> It looks like the block is called with a destroyed object. Transferring this
> to our specific case, the object that geckoEvent points to inside the block
> might already have been destroyed.
I think you found a bug in gcc-4.2. With clang I get
2011-08-19 16:24:41.391 t[57666:707] RetainCountChangePrinter init
2011-08-19 16:24:41.392 t[57666:707] CPPStruct(0x7fff61ab8538) regular construction
2011-08-19 16:24:41.392 t[57666:707] CPPStruct(0x7fff61ab8530) copy construction from 0x7fff61ab8538
2011-08-19 16:24:41.393 t[57666:707] storing block and queuing for 2 calls
2011-08-19 16:24:41.394 t[57666:707] CPPStruct(0x7f9852912838) copy construction from 0x7fff61ab8530
2011-08-19 16:24:41.394 t[57666:707] RetainCountChangePrinter retain
2011-08-19 16:24:41.395 t[57666:707] RetainCountChangePrinter release
2011-08-19 16:24:41.395 t[57666:707] CPPStruct(0x7fff61ab8530) destruction
2011-08-19 16:24:41.396 t[57666:707] CPPStruct(0x7fff61ab8538) destruction
2011-08-19 16:24:41.895 t[57666:707] block called with x = 0, &e: 0x7f9852912838, e.mX: 5, [d retainCount]: 1
2011-08-19 16:24:42.398 t[57666:707] block called with x = 1, &e: 0x7f9852912838, e.mX: 5, [d retainCount]: 1
2011-08-19 16:24:42.399 t[57666:707] releasing block, retain count before release: 1
2011-08-19 16:24:42.400 t[57666:707] RetainCountChangePrinter release
2011-08-19 16:24:42.400 t[57666:707] RetainCountChangePrinter dealloc
2011-08-19 16:24:42.401 t[57666:707] CPPStruct(0x7f9852912838) destruction
Assignee | ||
Comment 52•13 years ago
|
||
> I think you found a bug in gcc-4.2. With clang I get
I'm not yet convinced of that. It seems like a copy of CPPStruct is made and destroyed for every call to NSLog that refers to it (or to its address). It's definitely destroyed before NSLog "prints" its output, but possibly not before the content of that output is generated.
I've been trying to find a way to pin this down.
Assignee | ||
Comment 53•13 years ago
|
||
Also, I though that, with XCode 4.1 (on OS X 10.7.X), gcc is a wrapper for clang.
Assignee | ||
Comment 54•13 years ago
|
||
> Also, I though that, with XCode 4.1 (on OS X 10.7.X), gcc is a wrapper for
> clang.
But Markus has been working on 10.6, so this (even if true) may not be relevant.
Assignee | ||
Comment 55•13 years ago
|
||
> Also, I though that, with XCode 4.1 (on OS X 10.7.X), gcc is a
> wrapper for clang.
However this may turn out, on OS X 10.7.1 I get the same results as
Rafael when compiling Markus' 2nd testcase (from comment #44) when
compiling with "clang test.mm -lstdc++ -framework Cocoa -o test".
(And the same results as Markus when compiling with "g++ test.mm
-framework Cocoa -o test".)
So even if (in XCode 4.1) g++ is on some level a wrapper for clang,
"g++" and "clang" seem to work at least slightly differently in XCode
4.1.
Assignee | ||
Comment 56•13 years ago
|
||
>> I think you found a bug in gcc-4.2.
>
> I'm not yet convinced of that.
*Now* I'm convinced :-(
Assignee | ||
Comment 57•13 years ago
|
||
The puzzle, though, is why this doesn't seem to cause any crashes.
Maybe the block-support code, though it definitely calls CPPStruct's destructor prematurely, somehow holds on to the memory occupied by CPPStruct.
(In reply to Steven Michaud from comment #53)
> Also, I though that, with XCode 4.1 (on OS X 10.7.X), gcc is a wrapper for
> clang.
No, in xcode 4.1 "gcc" is llvm-gcc and "gcc-4.2" is the old Apple gcc.
(In reply to Steven Michaud from comment #57)
> The puzzle, though, is why this doesn't seem to cause any crashes.
>
> Maybe the block-support code, though it definitely calls CPPStruct's
> destructor prematurely, somehow holds on to the memory occupied by CPPStruct.
It might just be luck that the memory still points to valid contents. I will run you test on valgrind.
valgrind is not working on 10.7 :-(
Assignee | ||
Comment 61•13 years ago
|
||
(In reply to comment # 58)
Thanks for the info.
I've also found the following from bug 574346 comment #0:
> Note that LLVM-GCC is different to Clang:
> GCC-4.2 -> Parser (GCC) / Code Generator (GCC)
> LLVM-GCC -> Parser (GCC) / Code Generator (LLVM)
> Clang -> Parser (Clang-LLVM) / Code Generator (LLVM)
> valgrind is not working on 10.7 :-(
You might want to bug Julian Seward :-)
By the way, I've been trying to get Apple's compilers (gcc, llvm-gcc or clang) to dump C/C++ code showing how blocks are implemented, but I haven't been able to figure it out.
Do you know a way to do this?
I can get gcc to dump assembler and "RTL" ... but both are more difficult to read :-(
> You might want to bug Julian Seward :-)
I did :-)
> By the way, I've been trying to get Apple's compilers (gcc, llvm-gcc or
> clang) to dump C/C++ code showing how blocks are implemented, but I haven't
> been able to figure it out.
>
> Do you know a way to do this?
>
> I can get gcc to dump assembler and "RTL" ... but both are more difficult to
> read :-(
The example I posted is llvm IL, so you can use llvm-gcc or clang. The clang output should be easier to read in general and in the case of blocks (or other new features) more likely to be correct. What you want is
clang++ -S -emit-llvm -o test.ll test.cpp
or
g++ -S -emit-llvm -o test.ll test.cpp
I normally also add -Os to get a "cleaned up" IL. At -O0 the IL is quiet verbose.
Assignee | ||
Comment 63•13 years ago
|
||
I should have made my question more explicit:
I'm trying to find out where the bug is that causes the CPPStruct destructor to be called prematurely using g++ on 10.6 and 10.7. It might be in the C/C++ code that Apple's compilers (presumably) generate to support blocks, or it might be in the compilers themselves. That's why I want to see this C/C++ code, if possible.
Or maybe Apple's block support is generated at a lower level (in RTL using "traditional" gcc/g++ or IL using clang/clang++).
I'll keep digging.
I see.
Since both llvm-gcc (the one in xcode 4.1) and Apple's gcc 4.2 get it wrong, it is very likely that it is the frontend that is producing invalid gimple.
I tried a simple testcase:
-----------------
struct foo {
foo();
~foo();
foo(const foo&);
void bar() const;
};
typedef void (^bar)();
void g(bar x);
void f() {
foo x;
g(^{x.bar();});
}
---------------
With the llvm-gcc include in xcode 4.1 (a.k.a. gcc), I get the following:
...
call void @_ZN3fooD1Ev(%struct.foo* %x) nounwind
call void @_ZNK3foo3barEv(%struct.foo* %x) nounwind
....
note that foo::bar is being called with a destructed object. No such sequence appears with clang.
In the full file, the IL shows a call to the destructor:
call void @_ZN20nsSimpleGestureEventD1Ev(%struct.nsSimpleGestureEvent* %geckoEvent) ssp
Followed by a use (the construction of the copy):
call void @_ZN20nsSimpleGestureEventC1ERKS_(%struct.nsSimpleGestureEvent* %geckoEventCopy, %struct.nsSimpleGestureEvent* %geckoEvent) ssp
What the constructor does in the end is call _ZN13nsCOMPtr_baseD2Ev on the various members. This destructor is defined in nsCOMPtr.o.
That destructor checks it the raw pointer is null and call a virtual method (Release) in it.
In summary, while the base memory is not being freed (the storage is part of the block context), all the destructors are and they include virtual calls, so this looks really scary.
all the destructors are *called*...
I can try building the current llvm-gcc trunk to check if it is fixed, but I am not sure it is worth it.
gcc is gone from xcode 4.2 (I will double check if llvm-gcc is too).
Updated•13 years ago
|
Blocks: lion-compatibility
Updated•13 years ago
|
status-firefox6:
--- → affected
status-firefox7:
--- → affected
status-firefox8:
--- → affected
status-firefox9:
--- → fixed
The "llvm-gcc/gcc can call the destructor too early" bug has been report to Apple. The rdar number is 10007196.
The gcc crashing bug is now rdar 10010257. The reduced testcase is:
struct nsSimpleGestureEvent {
nsSimpleGestureEvent(int);
};
@implementation ChildView - (void)maybeTrackScrollEventAsSwipe:(void *)anEvent {
__attribute__((__blocks__(byref))) nsSimpleGestureEvent geckoEvent(1);
}
@end
Assignee | ||
Comment 70•13 years ago
|
||
(In reply to comment #65)
> _ZN20nsSimpleGestureEventD1Ev (aka
> nsSimpleGestureEvent::~nsSimpleGestureEvent())
> ...
> What the constructor does in the end is call _ZN13nsCOMPtr_baseD2Ev
> on the various members. This destructor is defined in nsCOMPtr.o.
>
> That destructor checks it the raw pointer is null and call a virtual
> method (Release) in it.
This doesn't make much sense.
If you change "what the constructor ..." to "what the destructor ..."
it's still not very clear, but it does start to make sense.
The problem is that an nsSimpleGestureEvent object, while not itself
an XPCOM object, has a number of member variables that *are* XPCOM
objects:
nsEvent.userType
nsEvent.target
nsEvent.currentTarget
nsEvent.originalTarget
nsGUIEvent.widget
nsMouseEvent_base.relatedTarget
Each of these objects gets destroyed (its destructor is called) when
its parent destructor is called. But the entire object is stored
inside the parent (an nsCOMPtr<> is not a pointer). And while
Release() is called on nsCOMPtr_base.mRawPtr (which *is* a pointer) if
it's non-null, this doesn't necessarily mean that the object pointed
to by mRawPtr is also released (if other objects also hold references
to it).
The trackingHandler block's prologue uses nsSimpleGestureEvent's copy
constructor to make a copy (of another good copy) of geckoEvent before
any of "our own" block code runs: First it uses "alloca" to allocate
sizeof(nsSimpleGestureEvent) bytes on the stack (which won't go out of
scope until the end of the trackingHandler block). Then it calls the
nsSimpleGestureEvent copy constructor to initialize the memory it just
allocated. To make the following discussion easier to understand,
I'll call this local copy prologueGeckoEvent.
But (and here's Apple's bug), immediately afterwards the prologue
calls nsSimpleGestureEvent's destructor (and therefore also all of its
superclasses' destructors) on prologueGeckoEvent. This doesn't free
the space occupied by prologueGeckoEvent, or by any of its nsCOMPtr
member variables. But each nsCOMPtr's destructor does get called, and
Release() does get called on each non-null nsCOMPtr_base.mRawPtr
(which stays non-null).
But even this doesn't, by itself, cause any trouble. It doesn't cause
any of the XPCOM objects pointed to by prologueGeckoEvent's nsCOMPtrs
to be released prematurely -- since the good copy from which
prologueGeckoEvent was created still exists, and stays "good" until
after the block is called for the last time. And so each of these
nsCOMPtrs' mRawPtrs remains valid -- whether or not it's null.
Assignee | ||
Comment 71•13 years ago
|
||
Comment #70 shows (I think) why Apple's "premature destructor" bug
doesn't cause current code (specifically my patch for bug 668953) to
crash, and why it's safe to continue using it.
But we definitely don't want to start using "blocks" freely in our OS
X code until we move to always and only using clang (instead of
gcc/g++ or llvm-gcc/llvm-g++).
I breathe a sigh of relief that the destructors for nsEvent and its
subclasses mostly don't do anything, and that nsCOMPtr objects behave
the way they do.
> This doesn't make much sense.
>
> If you change "what the constructor ..." to "what the destructor ..."
> it's still not very clear, but it does start to make sense.
Yes, sorry. I was typing while reading the IL. The summary is that the only interesting thing the destructor does is call the virtual Release in the member variable that are smart pointers.
> The trackingHandler block's prologue uses nsSimpleGestureEvent's copy
> constructor to make a copy (of another good copy) of geckoEvent before
> any of "our own" block code runs: First it uses "alloca" to allocate
> sizeof(nsSimpleGestureEvent) bytes on the stack (which won't go out of
> scope until the end of the trackingHandler block). Then it calls the
> nsSimpleGestureEvent copy constructor to initialize the memory it just
> allocated. To make the following discussion easier to understand,
> I'll call this local copy prologueGeckoEvent.
correct.
> But (and here's Apple's bug), immediately afterwards the prologue
> calls nsSimpleGestureEvent's destructor (and therefore also all of its
> superclasses' destructors) on prologueGeckoEvent. This doesn't free
> the space occupied by prologueGeckoEvent, or by any of its nsCOMPtr
> member variables. But each nsCOMPtr's destructor does get called, and
> Release() does get called on each non-null nsCOMPtr_base.mRawPtr
> (which stays non-null).
Check.
> But even this doesn't, by itself, cause any trouble. It doesn't cause
> any of the XPCOM objects pointed to by prologueGeckoEvent's nsCOMPtrs
> to be released prematurely -- since the good copy from which
> prologueGeckoEvent was created still exists, and stays "good" until
> after the block is called for the last time. And so each of these
> nsCOMPtrs' mRawPtrs remains valid -- whether or not it's null.
I haven't investigated the code past the calls to Release. Happy to know it is safe in the current state. I am still a bit nervous about it, since it is an unstable situation. A change to the destructor of the class, of a base class or of a member can make this unsafe :-(
Assignee | ||
Comment 73•13 years ago
|
||
> A change to the destructor of the class, of a base class or of a
> member can make this unsafe :-(
Yes :-(
The only reason I want to use blocks now is to fix bug 668953, for
which there's very high demand. Apple's new [NSEvent
trackSwipeEventWithOptions:...] uses blocks, and it seems to be the
best way to provide support for two-finger swipes on OS X Lion.
The nsCOMPtr code seems quite stable, and nsEvent and its subclasses
mostly don't have explicit destructors. So I think there's a good
chance we can get away with continuing to use [NSEvent
trackSwipeEventWithOptions:...] for the next year or so, despite
Apple's bug.
However, since the premature destructor bug doesn't exist in clang, it
seems unlikely that Apple's going to fix it in gcc/g++ or
llvm-gcc/llvm-g++. So if we want to keep using [NSEvent
trackSwipeEventWithOptions:...] (or blocks in general), we'll almost
certainly need to switch to using clang on OS X.
As far as I know, the biggest obstacle to switching to clang is that
we need to continue supporting OS X Leopard (10.5.X) for a while. But
Apple will soon have released its last security update for Leopard (if
it hasn't done so already), and we'll probably want to drop our own
support (on the trunk) within the next 12 months or so.
So maybe we should plan to switch to clang at the same time we drop
support for Leopard.
(In reply to Steven Michaud from comment #73)
> > A change to the destructor of the class, of a base class or of a
> > member can make this unsafe :-(
>
> Yes :-(
>
> The only reason I want to use blocks now is to fix bug 668953, for
> which there's very high demand. Apple's new [NSEvent
> trackSwipeEventWithOptions:...] uses blocks, and it seems to be the
> best way to provide support for two-finger swipes on OS X Lion.
>
> The nsCOMPtr code seems quite stable, and nsEvent and its subclasses
> mostly don't have explicit destructors. So I think there's a good
> chance we can get away with continuing to use [NSEvent
> trackSwipeEventWithOptions:...] for the next year or so, despite
> Apple's bug.
Agreed, hopefully we will be able to switch to clang before this breaks.
> However, since the premature destructor bug doesn't exist in clang, it
> seems unlikely that Apple's going to fix it in gcc/g++ or
> llvm-gcc/llvm-g++. So if we want to keep using [NSEvent
> trackSwipeEventWithOptions:...] (or blocks in general), we'll almost
> certainly need to switch to using clang on OS X.
Yes, once we move out of vc 2005 gcc 4.2 will also be our oldest compiler.
> As far as I know, the biggest obstacle to switching to clang is that
> we need to continue supporting OS X Leopard (10.5.X) for a while. But
> Apple will soon have released its last security update for Leopard (if
> it hasn't done so already), and we'll probably want to drop our own
> support (on the trunk) within the next 12 months or so.
Clang works just fine targeting 10.5. The obstacles I know for a full switch are
* Clang -O0 produces programs that use too much stack, which causes oranges in the bots. We want to switch the bots to use optimizations anyway, so this should be fixed soon. (bug 669953)
* Infrastructure. We need to set up bots running clang. We already have a clang package installed, so we need to figure out with releng how hard it is (bug 629459).
> So maybe we should plan to switch to clang at the same time we drop
> support for Leopard.
I agree we need both, but they can be independent :-)
Assignee | ||
Comment 75•13 years ago
|
||
> Clang works just fine targeting 10.5.
Can you do clang builds while using the 10.5 SDK?
Work's in progress at bug 674655 to switch to using the 10.6 SDK even for builds that are supposed to be runnable on 10.5. And so far no problems have (apparently) been found. But, like Josh, I'm skeptical -- Apple has a history of neglecting backwards compatibility, and sometimes there's no substitute for linking to the actual target libraries (in this case the 10.5 libraries). We can't be sure this will work until it's undergone a *lot* of testing, including testing by users.
I have done builds some time ago using the 10.5 SDK. In fact, I did try build on the bots, so they use our current 10.6/10.5 sdk split for 64 and 32 bits.
I will try sending a new try request.
Assignee | ||
Comment 77•13 years ago
|
||
I've opened bug 681694 on the Apple premature destructor bug.
Comment 79•13 years ago
|
||
The crash signature of bug 673456 is #2 top browser crasher in 7.0b1 and #1 on Mac OS X only.
tracking-firefox7:
--- → ?
tracking-firefox8:
--- → ?
Updated•13 years ago
|
Crash Signature: [@ libobjc.A.dylib@0xb390 ]
[@ CoreFoundation@0x884e ] → [@ libobjc.A.dylib@0xb390 ]
[@ CoreFoundation@0x884e ]
[@ objc_msgSend_vtable14 | CFRelease | -[__NSArrayM dealloc] ]
Comment 81•13 years ago
|
||
Steven: [@ ___trackSwipeWithScrollEvent_block_invoke_1 ] is another new signature showing up on the 7.0 radar - I assume it may be related to this bug?
Assignee | ||
Comment 82•13 years ago
|
||
> Steven: [@ ___trackSwipeWithScrollEvent_block_invoke_1 ] is another new signature
> showing up on the 7.0 radar - I assume it may be related to this bug?
Yes, I'm sure it is.
Crash Signature: [@ libobjc.A.dylib@0xb390 ]
[@ CoreFoundation@0x884e ]
[@ objc_msgSend_vtable14 | CFRelease | -[__NSArrayM dealloc] ] → [@ libobjc.A.dylib@0xb390 ]
[@ CoreFoundation@0x884e ]
[@ objc_msgSend_vtable14 | CFRelease | -[__NSArrayM dealloc] ]
[@ ___trackSwipeWithScrollEvent_block_invoke_1 ]
Reporter | ||
Comment 83•13 years ago
|
||
Did we back this out for Aurora and Beta? The status is still "affected" for those trains.
Assignee | ||
Comment 84•13 years ago
|
||
> Did we back this out for Aurora and Beta?
Not this patch. We backed out the patch for bug 668953, which triggered this bug's crashes (because it introduced Apple's blocks to the tree).
> The status is still "affected" for those trains.
It shouldn't be.
Comment 85•13 years ago
|
||
(In reply to Steven Michaud from comment #84)
> > The status is still "affected" for those trains.
> It shouldn't be.
It's me that marked this bug as affected for 7.0 and 8.0, but after looking at crash reports, I see no occurences of these crash signatures in 7.0b2 or 8.0a2:
https://crash-stats.mozilla.com/report/list?signature=libobjc.A.dylib@0xb390
https://crash-stats.mozilla.com/report/list?signature=CoreFoundation@0x884e
https://crash-stats.mozilla.com/report/list?signature=objc_msgSend_vtable14%20|%20CFRelease%20|%20-[__NSArrayM%20dealloc]
https://crash-stats.mozilla.com/report/list?signature=___trackSwipeWithScrollEvent_block_invoke_1
status-firefox6:
affected → ---
status-firefox7:
affected → ---
status-firefox8:
affected → ---
status-firefox9:
fixed → ---
tracking-firefox7:
? → ---
tracking-firefox8:
? → ---
Target Milestone: --- → mozilla9
You need to log in
before you can comment on or make changes to this bug.
Description
•