Closed
Bug 479749
Opened 16 years ago
Closed 16 years ago
Crash with MozMill test when entering and leaving Private Browsing mode immediately [@ libobjc.A.dylib@0x15688][@ FrameIsInActiveWindow][@ nsNativeThemeCocoa::GetScrollbarDrawInfo]
Categories
(Core :: Widget: Cocoa, defect, P2)
Tracking
()
VERIFIED
FIXED
mozilla1.9.2a1
People
(Reporter: whimboo, Assigned: smichaud)
References
Details
(Keywords: crash, testcase, verified1.9.1, Whiteboard: [sg:critical?][mozmill])
Crash Data
Attachments
(5 files, 2 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
jaas
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090222 Minefield/3.2a1pre
With my upcoming MozMill test for entering/leaving the Private Browsing mode (see bug 479314) I'm able to crash the browser reproducible. Following three lines seem to be the cause:
controller.click(new elementslib.Elem(controller.menus['tools-menu'].privateBrowsingItem));
controller.sleep(1000);
controller.click(new elementslib.Elem(controller.menus['tools-menu'].privateBrowsingItem));
As you can see I enter the PB mode, wait 1000ms, and leave the PB mode again.
Crash report: bp-0206bd86-6c9d-4091-9e6c-d97872090223
Here the first 10 frames:
0 libobjc.A.dylib libobjc.A.dylib@0x15688
1 XUL nsNativeThemeCocoa::GetScrollbarDrawInfo widget/src/cocoa/nsNativeThemeCocoa.mm:1212
2 XUL nsNativeThemeCocoa::GetMinimumWidgetSize widget/src/cocoa/nsNativeThemeCocoa.mm:2182
3 XUL nsIFrame::AddCSSMinSize layout/xul/base/src/nsBox.cpp:735
4 XUL nsBoxFrame::GetMinSize layout/xul/base/src/nsBoxFrame.cpp:873
5 XUL nsBoxFrame::GetPrefSize layout/xul/base/src/nsBoxFrame.cpp:824
6 XUL nsSprocketLayout::GetPrefSize layout/xul/base/src/nsSprocketLayout.cpp:1337
7 XUL nsBoxFrame::GetPrefSize layout/xul/base/src/nsBoxFrame.cpp:818
8 XUL nsSliderFrame::GetPrefSize layout/xul/base/src/nsSliderFrame.cpp:1107
9 XUL nsSprocketLayout::GetPrefSize layout/xul/base/src/nsSprocketLayout.cpp:1337
10 XUL nsBoxFrame::GetPrefSize layout/xul/base/src/nsBoxFrame.cpp:818
Comment 1•16 years ago
|
||
Henrik, does it happen only with the private browsing menu item, or is it reproducible with all other items?
Comment 2•16 years ago
|
||
Also, FWIW you can switch to using the shortcut keys for now if you want to proceed with the Mozmill test until this gets fixed.
Reporter | ||
Comment 3•16 years ago
|
||
Reporter | ||
Comment 4•16 years ago
|
||
This could be security related. Continuing twice after the assertion results in a bad access:
(gdb) c
Continuing.
###!!! ASSERTION: XPConnect is being called on a scope without a 'Components' property!: 'Error', file /data/mozilla/build/mozilla-central/mozilla/js/src/xpconnect/src/xpcwrappednativescope.cpp, line 764
Program received signal SIGTRAP, Trace/breakpoint trap.
0x947b5e42 in __kill ()
(gdb) c
Continuing.
Reading symbols for shared libraries . done
++WEBSHELL 0x1e63b240 == 15
++DOMWINDOW == 32 (0x1e63bbf0) [serial = 36] [outer = 0x0]
++DOMWINDOW == 33 (0x1e63f300) [serial = 37] [outer = 0x1e63bbc0]
WARNING: Unable to test style tree integrity -- no content node: file /data/mozilla/build/mozilla-central/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 9043
WARNING: GetGeckoKeyCodeFromChar has failed.: file /data/mozilla/build/mozilla-central/mozilla/widget/src/cocoa/nsChildView.mm, line 4350
WARNING: Unable to test style tree integrity -- no content node: file /data/mozilla/build/mozilla-central/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 9043
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000020
0x94586688 in objc_msgSend ()
Group: core-security
Reporter | ||
Comment 5•16 years ago
|
||
I'll try to get a minimized testcase for tomorrow. Right now I haven't it tested with other menu items but I think it's PB mode related.
Assignee | ||
Comment 6•16 years ago
|
||
I guess the crash in bp-0206bd86-6c9d-4091-9e6c-d97872090223 is
actually taking place in FrameIsInActiveWindow(), and is caused by
calling objc_msgSend on a deleted 'win' object.
But we'd be a lot better off if we had a reproducible testcase.
Reporter | ||
Updated•16 years ago
|
Whiteboard: [sg:investigate]
Reporter | ||
Comment 8•16 years ago
|
||
With this test I'm able to crash Firefox each time we leave the PB mode. The following steps you have to do:
1. Create a fresh profile
2. Install MozMill 1.0.2 (https://addons.mozilla.org/de/firefox/addon/9018)
3. Open MozMill IDE and load test
4. Run test
If Firefox doesn't crash please close all windows and restart Firefox. Do the steps 3 and 4 again. No idea but probably it happens because of the "Know your rights" notification bar.
Assignee | ||
Comment 9•16 years ago
|
||
Here's a gdb stack trace that shows my guess from comment #6 is
correct.
I can easily reproduce Henrik's STR from comment #8, though it always
takes me two tries. (I do the STR once (on a "new" build) and get no
crash. Then I close all windows, restart Firefox, and go through the
STR again (with the same build) -- this time I always crash.)
I quickly found a simple fix. But I became dissatisfied with it and
have been trying to dig deeper. Now I think I have a better fix, and
I'll post a patch tomorrow.
As you'll see, this is primarily a Cocoa widgets bug. But it seems to
be triggered by the contents of a window being destroyed in an unusual
order -- the crashes happen when an nsCocoaWindow object is destroyed
before the nsChildView objects it contains.
Updated•16 years ago
|
Whiteboard: [sg:investigate] → [sg:critical?]
Assignee | ||
Comment 10•16 years ago
|
||
Here's my patch -- the one that fixes the crash Henrik reported and
for which I posted a gdb stack trace in comment #9.
A tryserver build will follow in a while.
But we're not out of the woods yet. Now that the first crash is
cleared up, I'm getting a second, unrelated crash (still using
Henrik's STR from comment #8). I'll post a gdb trace of the second
crash in my next comment.
Sounds like MozMill will keep us busy for quite a while :-)
Assignee | ||
Comment 11•16 years ago
|
||
Here's a gdb stack trace (with debug symbols) of this bug's second
crash. As you can see, it's also from accessing a deleted object (in
this case an NSView object -- the mWidget object associated with an
nsThebesDeviceContext object).
Turns out this crash is a little harder to reproduce than the first
one (though only a little): Before you use MozMill to run the
test_Crash.js test, you need to load some bug page at
bugzilla.mozilla.org -- this bug's page works just fine.
Assignee | ||
Comment 12•16 years ago
|
||
OK, this patch seems to do the trick. I've tested it with a window
containing several very busy pages loaded into different tabs, and I
didn't see any problems.
I haven't (yet) tested it in a debug build. And I suspect it won't
fix the assertion mentioned in comment #4 ... but that's probably not
a Cocoa widgets problem.
(I also suspect that other widgets implementations may have trouble
with this bug's STR.)
A tryserver build will follow in a while (probably quite a long
while).
Henrik, please test this patch as hard as you can, and let us know if
there's anything I missed.
Attachment #363901 -
Attachment is obsolete: true
Attachment #363923 -
Flags: review?(joshmoz)
Reporter | ||
Comment 13•16 years ago
|
||
(In reply to comment #10)
> Sounds like MozMill will keep us busy for quite a while :-)
That's great. I'll provide more of those tests. Believe me.
(In reply to comment #12)
> Henrik, please test this patch as hard as you can, and let us know if
> there's anything I missed.
Will do and even check it on Windows and Linux. I should better file new bugs if it can be seen on other platforms too, right?
Assignee: joshmoz → smichaud
Status: NEW → ASSIGNED
Reporter | ||
Updated•16 years ago
|
Flags: blocking1.9.1?
Assignee | ||
Comment 14•16 years ago
|
||
Comment on attachment 363923 [details] [diff] [review]
Fix for Cocoa-widgets-specific problems
Comment on my own patch:
This:
mContext->Init(nsnull);
should really be this:
if (mContext)
mContext->Init(nsnull);
I'll fix it in what I land (if I haven't done so in an earlier
revision).
Assignee | ||
Comment 15•16 years ago
|
||
Side note (about nsCocoaWindow/nsChildView parents and children):
I saw in my tests for this bug that (as best I can tell) an
nsCocoaWindow object can never have nsChildView children
(nsCocoaWindow::GetFirstChild() and nsCocoaWindow::GetLastChild() will
never return an nsChildView object). And an nsChildView object can
never have an nsCocoaWindow parent (nsChildView::GetParent() will
never return an nsCocoaWindow object).
If true, this is puzzling. And I don't think it's always been true.
Josh and/or Roc, do you have any idea what this is all about?
The following code exists at the start of the nsCocoaWindow
destructor:
// notify the children that we're gone
for (nsIWidget* kid = mFirstChild; kid; kid = kid->GetNextSibling()) {
nsCocoaWindow* childWindow = static_cast<nsCocoaWindow*>(kid);
childWindow->mParent = nsnull;
}
Notice that it assumes any child of an nsCocoaWindow object will also
be an nsCocoaWindow object. When I saw this (on my way through
tracking down the first crash), I figured this code must be wrong, and
that I could fix the crash by adding code that made 'kid' an
nsChildView object if its nsWindowType was 'eWindowType_child'.
But it didn't fix the crash, and in fact was never exercised.
Then I noticed that, in nsNativeThemeCocoa.mm's
NativeWindowForFrame(), topLevelWidget is always an nsChildView
object. Which means that it isn't really a "top level widget", as far
as I can tell.
Assignee | ||
Comment 16•16 years ago
|
||
(In reply to comment #13)
> I should better file new bugs if it can be seen on other platforms
> too, right?
Yes, I think that would be best.
Assignee | ||
Comment 17•16 years ago
|
||
I've now done some tests with debug builds -- an ordinary one (without
my patch), and a debug build containing my patch.
I don't see the assertion from comment #4 in either of them. So I
think it may be an unrelated problem.
Comment 18•16 years ago
|
||
(In reply to comment #7)
> Blake: Is the assertion in comment 4 dangerous?
No. It happens when you use an XPConnect property or object from a window that has been closed. We won't crash, although most properties will be missing, so things might not work as expected.
Assignee | ||
Comment 19•16 years ago
|
||
Here's a tryserver build made with my patch from comment #12:
https://build.mozilla.org/tryserver-builds/2009-02-24_11:30-smichaud@pobox.com-bugzilla479749-Cocoa/smichaud@pobox.com-bugzilla479749-Cocoa-firefox-try-mac.dmg
Reporter | ||
Comment 20•16 years ago
|
||
This tryserver build works fine. Running the crash test for several times doesn't crash the browser anymore. Builds without this patch on Windows and Linux aren't affected. So it qualifies for an OS X only bug. Thanks Steven!
Reporter | ||
Updated•16 years ago
|
Summary: Crash with MozMill test when entering and leaving Private Browsing mode immediately [@ libobjc.A.dylib@0x15688][@ nsNativeThemeCocoa::GetScrollbarDrawInfo] → Crash with MozMill test when entering and leaving Private Browsing mode immediately [@ libobjc.A.dylib@0x15688][@ FrameIsInActiveWindow][@ nsNativeThemeCocoa::GetScrollbarDrawInfo]
Reporter | ||
Updated•16 years ago
|
Whiteboard: [sg:critical?] → [sg:critical?][has patch][needs review josh]
Comment 21•16 years ago
|
||
Steven, your proposed patch looks alright if we do need to protect against such a destruction order, but I'm concerned that this might just be covering up another bug. It seems likely that destroying a widget before its children is the real bug and if widget implementations shouldn't have to protect against such a destruction order I'd rather not. It adds complexity and overhead and will cover up destruction order bugs in the future.
roc - can you weigh in on this?
Assignee | ||
Comment 22•16 years ago
|
||
Josh and Roc, please also look at my question in comment #15.
I would also have expected the children of nsCocoaWindow to be nsChildViews. Maybe no children are actually hooked up? This was added by Josh in bug 362952.
I'd like to know what situations cause the parent widget to be destroyed before its children, but it looks like we've been doing it for a long time --- isn't that what bug 362952 was about? --- so I'm inclined to take Steven's patch as-is.
The long-term solution is of course to get rid of child widgets so these issues will not arise.
Updated•16 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
Comment 24•16 years ago
|
||
Comment on attachment 363923 [details] [diff] [review]
Fix for Cocoa-widgets-specific problems
+ mContext->Init(nsnull);
We should probably do that in ::Destroy for the base widget since it is the base widget that contains mContext. In any case, the location of this null init strikes me as out-of-place, the product of a deeper design flaw. Either the base widget should be doing a better job cleaning up after itself or we need to do better null checking for the widget's validity in other places.
In "ensureWindowData" you're making sure mWindow is valid, then after a couple calls that wouldn't change validity there is another mWindow null check. You can remove the latter pre-existing null check.
Attachment #363923 -
Flags: review?(joshmoz)
Assignee | ||
Comment 25•16 years ago
|
||
>+ mContext->Init(nsnull);
>
> We should probably do that in ::Destroy for the base widget since it
> is the base widget that contains mContext.
Agreed. My new patch does this.
> In any case, the location of this null init strikes me as
> out-of-place, the product of a deeper design flaw. Either the base
> widget should be doing a better job cleaning up after itself or we
> need to do better null checking for the widget's validity in other
> places.
I'm not sure I understand. nsIWidget::Destroy() is exactly the place
for this kind of thing -- making sure resources we've passed to
external modules (in this case the (weak) pointer to ourselves used to
initialize an nsIDeviceContext) are "invalidated" when we're
destroyed.
> In "ensureWindowData" you're making sure mWindow is valid, then
> after a couple calls that wouldn't change validity there is another
> mWindow null check. You can remove the latter pre-existing null
> check.
Fixed.
Side note: When I moved mContext->Init(nsnull) to
nsBaseWidget::Destroy(), at first it seemed not to work (it no longer
fixed the second crash). Then I discovered that
nsChildView::Destroy() calls nsBaseWidget::OnDestroy() before
nsBaseWidget::Destroy() (and nsBaseWidget::OnDestroy nulls out
mContext).
This is wrong, and isn't how any of the other widgets implementations
work. All of them (except Cocoa widgets) use Destroy() to tear down
the widget, and then use OnDestroy() to signal that the job is done
(so that the base widget can release its own resources). In fact the
OS2 nsWindow destructor has a nice little comment explaining this: "A
call of Destroy() destroys the PM window. This triggers an
OnDestroy(), which frees resources."
I've changed nsChildView::Destroy() and nsCocoaWindow::Destroy()
accordingly.
A tryserver build will follow in a few hours.
Attachment #363923 -
Attachment is obsolete: true
Attachment #366029 -
Flags: review?(joshmoz)
Assignee | ||
Comment 26•16 years ago
|
||
Here's a tryserver build of my patch from comment #25:
https://build.mozilla.org/tryserver-builds/2009-03-06_15:10-smichaud@pobox.com-bugzilla479749-rev1/smichaud@pobox.com-bugzilla479749-rev1-firefox-try-mac.dmg
Comment 27•16 years ago
|
||
"I'm not sure I understand. nsIWidget::Destroy() is exactly the place for this kind of thing"
I was only objecting to the actual class it was in (nsChildView) - nsBaseWidget::Destroy might very well be fine.
Attachment #366029 -
Flags: review?(joshmoz) → review+
Whiteboard: [sg:critical?][has patch][needs review josh] → [sg:critical?][has patch]
Assignee | ||
Updated•16 years ago
|
Attachment #366029 -
Flags: superreview?(roc)
Attachment #366029 -
Flags: superreview?(roc) → superreview+
Assignee | ||
Comment 28•16 years ago
|
||
Landed on trunk:
http://hg.mozilla.org/mozilla-central/rev/253c445cc416
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: [sg:critical?][has patch] → [sg:critical?]
Reporter | ||
Comment 29•16 years ago
|
||
Verified fixed with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090310 Minefield/3.2a1pre. Thanks Steven!
How long should it bake on trunk before we wanna get this on 1.9.1?
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla1.9.2a1
Assignee | ||
Comment 30•16 years ago
|
||
> How long should it bake on trunk before we wanna get this on 1.9.1?
A week?
> Thanks Steven!
You're most welcome.
Reporter | ||
Updated•16 years ago
|
Whiteboard: [sg:critical?] → [sg:critical?][needs baking on trunk]
Assignee | ||
Comment 31•16 years ago
|
||
Comment on attachment 366029 [details] [diff] [review]
Fix rev1 (follow Josh's suggestions)
This has now been baking on the trunk for a week, with no reported
problems.
Attachment #366029 -
Flags: approval1.9.1?
Assignee | ||
Comment 32•16 years ago
|
||
(Following up comment #31)
This patch is basically trivial -- the fix for the second crash
(comment #11) really is trivial, and the fix for the first crash
(comment #9) is little more than the extension of a previous fix (that
for bug 362952). So it's low risk.
MozMill tests cannot (yet) be automated.
Comment 33•16 years ago
|
||
The last part of this patch:
+ // Clear the device context's mWidget field -- otherwise it may get accessed
+ // after it's been deleted. See bug 479749.
+ if (mContext)
+ mContext->Init(nsnull);
caused bug 482928, which causes popups to appear on the wrong monitor after a tab is closed. In this case, the device context being cleared is still being used, and popups no longer have a widget from which to calculate the screen from.
Assignee | ||
Updated•16 years ago
|
Attachment #366029 -
Flags: approval1.9.1?
Assignee | ||
Comment 34•16 years ago
|
||
(In reply to comment #33)
So I suppose that part of the patch should be backed out.
This will partly reopen this bug -- the "second crash" from comment
#11 will start happening again. I don't know how often this crash
happens (apparently it's no longer possible to search in the top ten
crash stack entries at crash-stats.mozilla.com). But I suspect it's a
less serious problem the one this part of my patch has triggered.
At least the new problem is more visible.
I don't have any way to test with multiple monitors, so I'm not the
best person to try to come up with an alternative.
Assignee | ||
Comment 35•16 years ago
|
||
(Following up comment #34)
But it's odd that the code which decides which monitor a popup should
appear on relies on a widget that's already been Destroy()ed.
So I think that part of my patch just uncovered one or more other
bugs, and didn't itself cause this problem.
Comment 36•16 years ago
|
||
My understanding is that device contexts are shared between widgets. At least the view code suggests that widgets are initialized with contexts that come from the view manager.
Comment 37•16 years ago
|
||
And so it's odd that a device context that refers to a Destory()ed widget is still around.
Assignee | ||
Comment 38•16 years ago
|
||
The "second crash" happens because the device context's mWidget
field contains an NSView object that's been deleted.
Assignee | ||
Comment 39•16 years ago
|
||
> an NSView object
A ChildView object (which is a subclass of NSView).
Reporter | ||
Comment 40•16 years ago
|
||
So what actions are necessary now? Backing-out a part of the patch or coming with a follow-up?
Assignee | ||
Comment 41•16 years ago
|
||
Before anything else, we need to resolve the ambiguities WRT bug
482928.
Does bug 482928 happen on both the trunk and the 1.9.1 branch?
Henrik, are you once again able to reproduce bug 482928 anywhere (on
the trunk and/or the 1.9.1 branch)?
I can't help with this, since I can't test with multiple monitors.
Reporter | ||
Comment 42•16 years ago
|
||
Bug 482928 only happens on trunk and yes, I'm able to reproduce it constantly. There are already a lot of bug reports filed in the last days. But this should be handled in bug 482928. I just wondered about your statement if this should be backed-out. I'll watch bug 482928...
Assignee | ||
Comment 43•16 years ago
|
||
Henrik, what you're saying is still partly ambiguous:
Are you currently able to reproduce bug 482928 on the 1.9.1 branch?
That's directly relevant to this bug -- since if bug 482928 happens on
the 1.9.1 branch, bug 482928 wasn't triggered by my patch for this
bug.
As for partially backing out my patch for this bug (if it *did*
trigger bug 482928):
I made this suggestion (in comment #34) before I'd fully thought
through the problem. Shortly afterwards (in comment #35) I realized
that my patch, though it may have triggered bug 482928, almost
certainly didn't *cause* it -- instead, it seems to have uncovered one
or more other bugs that are the true cause of bug 482828.
As I said above, I don't have multiple monitors to test with. And I'm
currently taking time off from Mozilla Corp to get the Java Embedding
Plugin working on SnowLeopard.
But in a couple of weeks I should have more time to work on this bug
(and bug 482928). This may be needed ... if nobody else is
able/willing to try to track down the true cause(s) of bug 482928.
Reporter | ||
Comment 44•16 years ago
|
||
Steven, as I have said in comment 43, it only happens on trunk. So no, I'm not able to reproduce it on 1.9.1. Take your time...
Assignee | ||
Comment 45•16 years ago
|
||
> Steven, as I have said in comment 43, it only happens on trunk. So
> no, I'm not able to reproduce it on 1.9.1.
Sorry. What you said in comment #42 *wasn't* ambiguous. I somehow
missed the "only" :-(
> Take your time...
I will :-)
Comment 46•16 years ago
|
||
Let's get this landed and watch out for bug 482928 there as soon as it does.
Assignee | ||
Comment 47•16 years ago
|
||
Landed on the 1.9.1 branch:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/41b607a57022
There's now a working patch for bug 482928. With luck it'll get landed on the 1.9.1 branch before the beta4 code freeze.
Assignee | ||
Updated•16 years ago
|
Keywords: fixed1.9.1
Whiteboard: [sg:critical?][needs baking on trunk] → [sg:critical?]
Reporter | ||
Comment 48•16 years ago
|
||
Verified fixed with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090410 Shiretoko/3.5b4pre ID:20090410035055. Thanks Steven!
Keywords: fixed1.9.1 → verified1.9.1
Reporter | ||
Updated•15 years ago
|
Whiteboard: [sg:critical?] → [sg:critical?][mozmill]
Updated•13 years ago
|
Crash Signature: [@ libobjc.A.dylib@0x15688]
[@ FrameIsInActiveWindow]
[@ nsNativeThemeCocoa::GetScrollbarDrawInfo]
Updated•11 years ago
|
Group: core-security
Crash Signature: [@ libobjc.A.dylib@0x15688]
[@ FrameIsInActiveWindow]
[@ nsNativeThemeCocoa::GetScrollbarDrawInfo] → [@ libobjc.A.dylib@0x15688]
[@ FrameIsInActiveWindow]
[@ nsNativeThemeCocoa::GetScrollbarDrawInfo]
You need to log in
before you can comment on or make changes to this bug.
Description
•