Closed
Bug 470487
Opened 16 years ago
Closed 15 years ago
Firefox Crash [@ nsWindow::GetParentWindow(int)] & [@ nsBaseWidget::Destroy()]?
Categories
(Core :: Widget: Win32, defect, P2)
Tracking
()
VERIFIED
FIXED
mozilla1.9.2a1
Tracking | Status | |
---|---|---|
status1.9.2 | --- | beta1-fixed |
blocking1.9.1 | --- | .6+ |
status1.9.1 | --- | .6-fixed |
People
(Reporter: chofmann, Assigned: jimm)
References
Details
(4 keywords, Whiteboard: [sg:critical?][crashkill][crashkill-fix])
Crash Data
Attachments
(5 files, 5 obsolete files)
(deleted),
patch
|
vlad
:
review+
beltzner
:
approval1.9.1+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
neil
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
samuel.sidler+old
:
approval1.9.1.6+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details |
Currently the #4 top crash in Firefox 3.1b2
0 xul.dll nsWindow::GetParentWindow widget/src/windows/nsWindow.cpp:1544
1 xul.dll nsWindow::GetParent widget/src/windows/nsWindow.cpp:1518
2 xul.dll nsBaseWidget::Destroy widget/src/xpwidgets/nsBaseWidget.cpp:260
3 xul.dll nsWindow::Destroy widget/src/windows/nsWindow.cpp:1416
4 xul.dll nsPluginInstanceOwner::Destroy layout/generic/nsObjectFrame.cpp:4030
5 xul.dll DoStopPlugin layout/generic/nsObjectFrame.cpp:1987
6 xul.dll nsStopPluginRunnable::Run layout/generic/nsObjectFrame.cpp:2024
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510
8 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170
9 nspr4.dll PR_GetEnv
10 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87
11 firefox.exe firefox.exe@0x2197
12 kernel32.dll kernel32.dll@0x17066
not many comments to go on.
*replying to message in yahoo mail at the time it crashed. Had just hit reply
Updated•16 years ago
|
Component: XUL Widgets → Widget: Win32
Product: Toolkit → Core
QA Contact: xul.widgets → win32
Reporter | ||
Updated•16 years ago
|
Flags: blocking1.9.1?
Reporter | ||
Comment 1•16 years ago
|
||
recent changes around that code...
http://hg.mozilla.org/mozilla-central/log/7cd58efd035a/widget/src/windows/nsWindow.cpp
Updated•16 years ago
|
Severity: normal → critical
OS: Mac OS X → Windows XP
Summary: Firefox 3.1b2 Crash Report [@ nsWindow::GetParentWindow() → Firefox 3.1b2 Crash Report [@ nsWindow::GetParentWindow() ]
Version: unspecified → 1.9.1 Branch
Reporter | ||
Comment 2•16 years ago
|
||
there is also another similar stack that goes though
nsWindow::Destroy widget/src/windows/nsWindow.cpp:1416
I wonder if this is a related crash, or another bug that needs to be spun off?
This stack signature "nsBaseWidget::Destroy()" is currently ranked #7
0 xul.dll nsBaseWidget::Destroy widget/src/xpwidgets/nsBaseWidget.cpp:262
1 xul.dll nsWindow::Destroy widget/src/windows/nsWindow.cpp:1416
2 xul.dll nsPluginInstanceOwner::Destroy layout/generic/nsObjectFrame.cpp:4030
3 xul.dll DoStopPlugin layout/generic/nsObjectFrame.cpp:1987
4 xul.dll nsStopPluginRunnable::Run layout/generic/nsObjectFrame.cpp:2024
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510
6 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170
7 nspr4.dll PR_GetEnv
8 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87
9 firefox.exe firefox.exe@0x2197
10 kernel32.dll kernel32.dll@0x17066
Summary: Firefox 3.1b2 Crash Report [@ nsWindow::GetParentWindow() ] → Firefox 3.1b2 Crash Report [@ nsWindow::GetParentWindow() ] & [@ nsBaseWidget::Destroy()?
Comment 3•16 years ago
|
||
I didn't change the actual logic of nsWindow::GetParentWindow which its old name is nsWindow::GetParent.
The crash timing is:
if (widget->mIsDestroying) {
this line.
So, looks like the parent widget is already destroyed.
Comment 4•16 years ago
|
||
The pointer to nsWindow is released from HWND by SubclassWindow(FALSE). It is called only in nsWindow::OnDestroy. nsWindow::Destroy and getting WM_CLOSE.
I guess that the parent window didn't get WM_CLOSE message but the instance was already released. Then, SubclassWindow(FALSE) is called via nsWindow::Destroy. But then, mWnd was already set to null, so, we fail to release the pointer to nsWindow from HWND.
Comment 5•16 years ago
|
||
(In reply to comment #4)
> The pointer to nsWindow is released from HWND by SubclassWindow(FALSE). It is
> called only in nsWindow::OnDestroy. nsWindow::Destroy and getting WM_CLOSE.
nsWindow::Destroy and getting WM_CLOSE are callers of nsWindow::OnDestroy.
Comment 6•16 years ago
|
||
It's strange... ::DestroyWindow automatically destroys its children. So, why can ::GetParent in nsWindow::GetParentWindow get the destroyed parent window after the parent nsWindow destroyed??
Reporter | ||
Comment 7•16 years ago
|
||
Masayuki, thanks for helping to investigate. I've cc'ed some others that have some worked in the windows widget code in the past to see if they have ideas.
Reporter | ||
Comment 8•16 years ago
|
||
any more ideas on this one?
This is actually more likely to be plugin related -- I know there were some bugs & changes about plugin destruction. Cc'ing jst to see if this looks similar..
Can we figure out what plugins are loaded for people who are crashing? Is it always yahoo-related? We've had problems with the yahoo state plugin in the past.
Comment 11•16 years ago
|
||
Given that this looks windows specific it's even more odd as on windows we reparent the plugins widget to the root widget (the desktop) when we do plugin destruction off of an event, which seems to be the case here.
I looked through a dozen or so crash reports, and only one of them had the yahoo state plugin dll loaded (npYState.dll), so it's likely not specific to yahoo.
It's also interesting to note that out of the 6886 crashes found by my query, 6675 happened on December 1st of 2008. So how much of a top-crash this is atm is not clear to me.
Reporter | ||
Comment 12•16 years ago
|
||
This query shows 851 crashes in the last week for users on beta2
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.1b2&query_search=signature&query_type=contains&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsWindow%3A%3AGetParentWindow%28%29
That ranks it as #7 top crash.
Glancing though a few of the module lists I've seen NPAskSBr.dll (the ask tool bar) show up in a couple of reports.
I think this should block.
Comment 14•16 years ago
|
||
Not blocking for now since it looks like we might have just seen a spike caused by other software/sites? If it persists in B3 as a topcrash please renom.
Flags: blocking1.9.1? → blocking1.9.1-
Comment 15•16 years ago
|
||
This ranks as the number 7 for Firefox 3.0.6. My guess is that it'll be highly present in b3 just as it was for b2.
Flags: wanted1.9.0.x+
Comment 16•16 years ago
|
||
This is currently #6 in the list of top crashes for Firefxo 3.1 beta 3. Renoming per comment 14.
Flags: blocking1.9.1- → blocking1.9.1?
Updated•16 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
Keywords: crash
Summary: Firefox 3.1b2 Crash Report [@ nsWindow::GetParentWindow() ] & [@ nsBaseWidget::Destroy()? → Firefox 3.1b2 Crash Report [@ nsWindow::GetParentWindow() ] & [@ nsBaseWidget::Destroy()]?
I've stared at this code and I can't see anything obviously wrong; we need to find steps to reproduce.
Keywords: qawanted
(In reply to comment #15)
> This ranks as the number 7 for Firefox 3.0.6. My guess is that it'll be highly
> present in b3 just as it was for b2.
Did this start in 3.0.6, or earlier? A regression range would be quite handy.
Comment 19•16 years ago
|
||
I've spent half a day trying different scenarios concerning Yahoo Mail and the Yahoo Mail extension and haven't been able to reproduce it whatsoever. The comments on the crash reports aren't helpful for this at all either. Has anyone been able to see this once or twice on their profile? I'd like to get some deatils as to what extensions, sites and operations/actions were used.
Reporter | ||
Comment 20•16 years ago
|
||
maybe between 3.0.4 and 3.0.5? searching the topcrasher lists for recent
3.0.x releases I see these in nsWindow code.
release rank stack
3.0.4 43 nsWindow::GetParent()
3.0.5 28 nsWindow::GetParentWindow()
3.0.4 17 nsWindow::GetParentWindow()
Reporter | ||
Comment 21•16 years ago
|
||
er, that should have been
maybe between 3.0.3 and 3.0.4? searching the topcrasher lists for recent
3.0.x releases I see these in nsWindow code.
release rank stack
3.0.3 43 nsWindow::GetParent()
3.0.4 17 nsWindow::GetParentWindow()
3.0.5 28 nsWindow::GetParentWindow()
3.0.6 20 nsWindow::GetParentWindow()
3.0.7 19 nsWindow::GetParentWindow()
Reporter | ||
Comment 22•16 years ago
|
||
and for the 1.9.1 related releases
release rank stack
3.1a2 nothing
3.1b1 7 nsWindow::GetParentWindow()
3.1b2 15 nsWindow::GetParentWindow()
3.1b3 9 nsWindow::GetParentWindow()
3.5b4pre 18 nsWindow::GetParentWindow()
Reporter | ||
Comment 23•16 years ago
|
||
so maybe some time last sept. looks like when it started showing up.
v3.0.3, released September 26, 2008 -- v3.0.4, released November 12, 2008
3.1 alpha2 Sept 5, 2008 --- v3.1 Beta 1, released October 14, 2008
Reporter | ||
Comment 24•16 years ago
|
||
at first glance the only thing I see landing around then that might be related is https://bugzilla.mozilla.org/show_bug.cgi?id=452385
any possibility of some bad interaction between the patches there and plugins?
Comment 25•16 years ago
|
||
Bug 452385 renamed nsWindow::GetParent() to nsWindow::GetParentWindow().
So this bug had occured in 3.0.3.
Reporter | ||
Comment 26•16 years ago
|
||
> re: comment 25
ok, that's helping to make a bit more sense of the pattern seen in the releases. I looked at the stack traces in 3.0.3 and they look basically the same as comment 0.
one thing looking at this bug shows is an increase in the ranking after 3.0.4, and from no visability to being ranked in 3.1 beta1. the increase in visability in 3.1 beta1 may just be related to more users ramping up on 3.1 when it hit the first beta. It does appear that that it takes a large user pool to hit this problem.
Is it possible the fixes that went in for bug 452385 reduced the chances of bug being surfaced in the bookmark UI case, but increased the chances in plugin window popups or some other windowing cases, or that some hangs for remaining edge cases in that bug have been turned into crashes?
Comment 27•16 years ago
|
||
(In reply to comment #26)
> one thing looking at this bug shows is an increase in the ranking after 3.0.4,
> and from no visability to being ranked in 3.1 beta1. the increase in
> visability in 3.1 beta1 may just be related to more users ramping up on 3.1
> when it hit the first beta. It does appear that that it takes a large user
> pool to hit this problem.
This is likely related to the fact that crash-stats top crash reports don't "freeze", so they're still based on current data (last 2 weeks or whatever). There's likely less users on 3.0.3 than on 3.1b1 or something weird like that...
Comment 28•16 years ago
|
||
A random thought... Can someone who can reproduce this bug (can anyone?) please try removing the mIsDestroying and mOnDestroyCalled variables from windows/nsWindow.h (they already exist in nsBaseWidget). Duplicate variables are never a good idea. They've been there for a long time, but maybe something is now tickling the classes into using different versions?
Yeah, I don't think we have a way to reproduce it.. the fix you suggest is probably a good one in any case, though.
This is still sitting on qawanted -- given that noone can reproduce this, it would be helpful for QA to:
- try to find a way to reproduce, by looking at crash info, perhaps by contacting users or some other type of outreach (blog post saying "hey, if you crashed, and you have an about:crashes log that looks like this, get a hold of us");
- get this in the debugger for a developer to look at;
- try to find a regression range.
Cc'ing Jim as well -- Jim, could you take a look at jeremy's suggestion on comment #28 and fix up those variables?
Comment 31•16 years ago
|
||
http://crashreports.songbirdnest.com/report/list?range_unit=months&query_search=signature&query_type=contains&signature=nsWindow%3A%3AGetParentWindow%28%29&query=nsWindow%3A%3AGetParentWindow%28%29&range_value=1
Same signature. There are a couple of reports with comments.
Comment 32•16 years ago
|
||
Juan, you probably added the wrong url? This are crashes from Songbird and only two with comments.
Our ones aren't helpful either:
http://crash-stats.mozilla.com/report/list?product=Firefox&branch=1.9.1&version=Firefox%3A3.1b3&query_search=signature&query_type=exact&query=nsWindow%3A%3AGetParentWindow%28%29&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsWindow%3A%3AGetParentWindow%28%29
We could file an IT ticket and ask for a couple of URLs where the crash happens.
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → jmathies
Comment 33•16 years ago
|
||
Henrik, I added the intended URL, because it had a couple of comments, as opposed to our stats site which, when I tried to access, would time out.
Assignee | ||
Comment 34•16 years ago
|
||
(In reply to comment #32)
> Juan, you probably added the wrong url? This are crashes from Songbird and only
> two with comments.
>
> Our ones aren't helpful either:
> http://crash-stats.mozilla.com/report/list?product=Firefox&branch=1.9.1&version=Firefox%3A3.1b3&query_search=signature&query_type=exact&query=nsWindow%3A%3AGetParentWindow%28%29&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsWindow%3A%3AGetParentWindow%28%29
>
> We could file an IT ticket and ask for a couple of URLs where the crash
> happens.
Clicking through a number of these, all seem to have some form of plugin loaded, but there doesn't appear to be a particular plugin that's causing it, so I'm guessing it's more generic.
Assignee | ||
Updated•16 years ago
|
Summary: Firefox 3.1b2 Crash Report [@ nsWindow::GetParentWindow() ] & [@ nsBaseWidget::Destroy()]? → Firefox Crash [@ nsWindow::GetParentWindow() ] & [@ nsBaseWidget::Destroy()]?
Comment 35•16 years ago
|
||
OK, here's my debugging assessment. I don't have time to work up a patch and test it now. But basically, widget destruction on windows is completely broken - it's a wonder it doesn't crash more... This is besides the duplicate member variable problem.
The sequence of preferred actions appears to be nsWindow::Destroy() which calls nsBaseWidget::Destroy(), nsWindow::OnDestroy() which calls nsBaseWidget::OnDestroy() and finally nsWindow::~nsWindow(). Lets look at these in order:
1381 NS_METHOD nsWindow::Destroy()
1382 {
1383 // Switch to the "main gui thread" if necessary... This method must
1384 // be executed on the "gui thread"...
1385 nsToolkit* toolkit = (nsToolkit *)mToolkit;
1386 if (toolkit != nsnull && !toolkit->IsGuiThread()) {
1387 MethodInfo info(this, nsWindow::DESTROY);
1388 toolkit->CallMethod(&info);
1389 return NS_ERROR_FAILURE;
1390 }
This is fine (the bogus failure code is always ignored).
1392 // disconnect from the parent
1393 if (!mIsDestroying) {
1394 nsBaseWidget::Destroy();
1395 }
This is wrong. This should test be:
if (mIsDestroying) // Should be an atomic test and set...
return NS_OK;
nsCOMPtr<nsIWidget> kungFuDeathGrip(this);
mIsDestroying = PR_TRUE;
nsBaseWidget::Destroy();
1397 // just to be safe. If we're going away and for some reason we're still
1398 // the rollup widget, rollup and turn off capture.
1399 if ( this == gRollupWidget ) {
1400 if ( gRollupListener )
1401 gRollupListener->Rollup(nsnull);
1402 CaptureRollupEvents(nsnull, PR_FALSE, PR_TRUE);
1403 }
This is OK.
1405 EnableDragDrop(PR_FALSE);
1406
1407 // destroy the HWND
1408 if (mWnd) {
1409 // prevent the widget from causing additional events
1410 mEventCallback = nsnull;
1411
1412 // if IME is disabled, restore it.
1413 if (mOldIMC) {
1414 mOldIMC = ::ImmAssociateContext(mWnd, mOldIMC);
1415 NS_ASSERTION(!mOldIMC, "Another IMC was associated");
1416 }
1417
1418 HICON icon;
1419 icon = (HICON) ::SendMessageW(mWnd, WM_SETICON, (WPARAM)ICON_BIG, (LPARAM) 0);
1420 if (icon)
1421 ::DestroyIcon(icon);
1422
1423 icon = (HICON) ::SendMessageW(mWnd, WM_SETICON, (WPARAM)ICON_SMALL, (LPARAM) 0);
1424 if (icon)
1425 ::DestroyIcon(icon);
1426
1427 #ifdef MOZ_XUL
1428 if (eTransparencyTransparent == mTransparencyMode)
1429 {
1430 SetupTranslucentWindowMemoryBitmap(eTransparencyOpaque);
1431
1432 }
1433 #endif
This is OK, until here. We should add:
SubclassWindow(FALSE);
1434
1435 VERIFY(::DestroyWindow(mWnd));
1436
1437 mWnd = NULL;
1438 //our windows can be subclassed by
1439 //others and these nameless, faceless others
1440 //may not let us know about WM_DESTROY. so,
1441 //if OnDestroy() didn't get called, just call
1442 //it now. MMP
1443 if (PR_FALSE == mOnDestroyCalled)
1444 OnDestroy();
1445 }
1446
1447 return NS_OK;
1448 }
This should be:
VERIFY(::DestroyWindow(mWnd));
mWnd = NULL;
}
//our windows can be subclassed by
//others and these nameless, faceless others
//may not let us know about WM_DESTROY. so,
//if OnDestroy() didn't get called, just call
//it now. MMP
if (PR_FALSE == mOnDestroyCalled)
OnDestroy();
return NS_OK;
}
Looking at OnDestroy:
5759 void nsWindow::OnDestroy()
5760 {
5761 mOnDestroyCalled = PR_TRUE;
This should test and exit.
5763 SubclassWindow(FALSE);
5764
5765 // We have to destroy the native drag target before we null out our
5766 // window pointer
5767 EnableDragDrop(PR_FALSE);
5768
5769 mWnd = NULL;
This should test mIsDestroying and call Destroy() and exit. This should be done before the test for mOnDestroyCalled.
5770
5771 // free GDI objects
5772 if (mBrush) {
5773 VERIFY(::DeleteObject(mBrush));
5774 mBrush = NULL;
5775 }
5776
5777 #if 0
5778 if (mPalette) {
5779 VERIFY(::DeleteObject(mPalette));
5780 mPalette = NULL;
5781 }
5782 #endif
5783
5784 // if we were in the middle of deferred window positioning then
5785 // free the memory for the multiple-window position structure
5786 if (mDeferredPositioner) {
5787 VERIFY(::EndDeferWindowPos(mDeferredPositioner));
5788 mDeferredPositioner = NULL;
5789 }
5790
5791 // release references to children, device context, toolkit, and app shell
5792 nsBaseWidget::OnDestroy();
nsBaseWidget::OnDestroy() does not destroy children (and there is not more app shell). GTK widgets explicitly destroy their children in Destroy()... should windows be doing that too? It seems cleaner... Otherwise OK.
5793
5794 // dispatch the event
5795 if (!mIsDestroying) {
5796 // dispatching of the event may cause the reference count to drop to 0
5797 // and result in this object being destroyed. To avoid that, add a reference
5798 // and then release it after dispatching the event
5799 AddRef();
5800 DispatchStandardEvent(NS_DESTROY);
5801 Release();
5802 }
5803 }
This is not needed. Or otherwise should the ordering be OnDestroy(), Destroy()? If it is needed, should this have a kungfuDeathGrip?
697 nsWindow::~nsWindow()
698 {
699 mIsDestroying = PR_TRUE;
This should test and call Destroy(). Or just call destroy unconditionally...
700 if (gCurrentWindow == this) {
701 gCurrentWindow = nsnull;
702 }
703
704 MouseTrailer* mtrailer = nsToolkit::gMouseTrailer;
705 if (mtrailer) {
706 if (mtrailer->GetMouseTrailerWindow() == mWnd)
707 mtrailer->DestroyTimer();
708
709 if (mtrailer->GetCaptureWindow() == mWnd)
710 mtrailer->SetCaptureWindow(nsnull);
711 }
These need mWnd, so should be moved to Destroy().
713 // If the widget was released without calling Destroy() then the native
714 // window still exists, and we need to destroy it
715 if (NULL != mWnd) {
716 Destroy();
717 }
Called above.
719 if (mCursor == -1) {
720 // A successfull SetCursor call will destroy the custom cursor, if it's ours
721 SetCursor(eCursor_standard);
722 }
Move to destroy?
724 sInstanceCount--;
725
726 #ifdef NS_ENABLE_TSF
727 if (!sInstanceCount)
728 nsTextStore::Terminate();
729 #endif //NS_ENABLE_TSF
730
731 #ifndef WINCE
732 //
733 // delete any of the IME structures that we allocated
734 //
735 if (sInstanceCount == 0) {
736 if (sIMECompUnicode)
737 delete sIMECompUnicode;
738 if (sIMEAttributeArray)
739 delete [] sIMEAttributeArray;
740 if (sIMECompClauseArray)
741 delete [] sIMECompClauseArray;
742
743 NS_IF_RELEASE(gCursorImgContainer);
744
745 if (sIsOleInitialized) {
746 ::OleFlushClipboard();
747 ::OleUninitialize();
748 sIsOleInitialized = FALSE;
749 }
750 }
751
752 NS_IF_RELEASE(mNativeDragTarget);
753 #endif
754
755 }
These are OK.
Otherwise, once the order of Destroy() and OnDestroy() is fixed, all of the other tests for mIsDestroying and mOnDetroyCalled should be changed to whichever is first.
Finally, and what I think is actually the bug here. SubClassWindow should addref and release the this pointer.
Hope this helps.
Comment 36•16 years ago
|
||
One more thing... The WindowProc does and conditional kungfuDeathGrip on the target, which assumes nsIsDestroying is only set in the destructor. GTK has a nsIsDestroyed flag which is set in the destructor. Maybe windows should have one of these too?
Assignee | ||
Comment 37•16 years ago
|
||
Adding a bit of data - I was looking at the normal shutdown steps a windowed plugin takes when it's destroyed. Similar to Jeremy's break down:
Starting a shutdown for a Flash nsObjectFrame:
nsObjectFrame::StopPluginInternal
nsPluginInstanceOwner::PrepareToStop -> mWidget->SetParent(nsnull)
nsObjectFrame::StopPluginInternal done.
(time passes for a delayed plugin teardown which gets fired off as a runnable event)
nsObjectFrame's DoStopPlugin
nsPluginInstanceOwner::Destroy()
nsWindow::Destroy()
nsBaseWidget::Destroy()
nsWindow::GetParentWindow()
mWnd = 0x260afe | parent hwnd = 0x10010 | parent widget = 0x0
This is where this crash would occur.
A couple notes - we set the native parent window of the plugin to null way up in nsPluginInstanceOwner's PrepareToStop. This happens way before we actually destroy the window. This is the same parent hwnd we retrieve in GetParentWindow later on when we are destroying. During destruction, GetParentWindow's ::GetParent(mWnd) call actually returns the handle to the desktop instead, since GetParent returns the parent _or_ the owner, which in this case is the desktop. We then query that for a mozilla property through GetProp to get the widget, which fails and everything moves on smoothly...
nsWindow::GetParentWindow() done.
nsBaseWidget::Destroy() done.
nsWindow::OnDestroy()
nsWindow::SubclassWindow(0) 0x260afe
nsBaseWidget::OnDestroy()
nsBaseWidget::OnDestroy() done.
nsWindow::OnDestroy() done.
nsWindow::Destroy() done.
~nsWindow()
~nsWindow() done.
So that's a normal shutdown. This bug is rather odd in that in the crash case GetParentWindow retrieves a valid parent hwnd, retrieves a valid mozilla property for the widget pointer, and then tries to access mIsDestroying for the widget. But if shutdown is working correctly, parent should have been set to null and the desktop hwnd would have been returned.
I still don't have a specific answer as to how this could happen, but I think it's something more than a slight timing issue in how we destroy. I'm going to look a bit more at the plugin code. I'm wondering if there are cases where we do not delay the unload of plugins like flash.
(Also we should probably be calling GetAncestor instead of GetParent in GetParentWindow so we can avoid checking for properties on the desktop window during shutdown.)
Assignee | ||
Comment 38•16 years ago
|
||
Actually, looking at the stacks, all of these crashes occur on threads that are coming through nsStopPluginRunnable::Run, so the delayed vs. immediate destruction isn't the issue. These crashes are all on delayed tear down events.
Assignee | ||
Comment 39•16 years ago
|
||
I'm going to look at Jeremy's other cleanup suggestions but I'd like to start with something simple - cleanup the variable madness and make sure get parent is always working with a parent window handle.
Attachment #373186 -
Flags: review?(vladimir)
Comment on attachment 373186 [details] [diff] [review]
getparent / destroy variable cleanup v.1
ugh, field shadowing.. can you get rid of the other shadowed fields as well? mIs{Shift,Control,Alt}Down are the ones I see at a quick glance
Attachment #373186 -
Flags: review?(vladimir) → review+
Assignee | ||
Comment 41•16 years ago
|
||
Looks like it was this little block - beos and os2 also use most of these.
PRPackedBool mIsShiftDown;
PRPackedBool mIsControlDown;
PRPackedBool mIsAltDown;
PRPackedBool mIsDestroying;
PRPackedBool mOnDestroyCalled;
new patch coming up...
Assignee | ||
Comment 42•16 years ago
|
||
Threw this at try. I'll land it on m-c once that clears.
Attachment #373186 -
Attachment is obsolete: true
Assignee | ||
Updated•16 years ago
|
Attachment #373201 -
Attachment is patch: true
Comment 43•16 years ago
|
||
> - HWND parent = ::GetParent(mWnd);
> + HWND parent = ::GetAncestor(mWnd, GA_PARENT);
Actually we also need the owner here when GetTopLevelWindow() is called with aStopOnDialogOrPopup=PR_FALSE.
Assignee | ||
Comment 44•16 years ago
|
||
(In reply to comment #43)
> > - HWND parent = ::GetParent(mWnd);
> > + HWND parent = ::GetAncestor(mWnd, GA_PARENT);
> Actually we also need the owner here when GetTopLevelWindow() is called with
> aStopOnDialogOrPopup=PR_FALSE.
Hmm, I'm not seeing why this is. GetTopLevelWindow would return the current window if it's top and I don't see how that would mess up caret position data in calls to ResolveIMECaretPos. I'll do some debugging though and see if I can spot what the potential problem might be. If we do need owner for the ime code, I think it might be best to differentiate between the two calls so that other calls to getparentwindow return the true parent.
Comment 45•16 years ago
|
||
It will affect the popup window (e.g. "Organize Bookmarks" dialog). We want a main window pointer from children of the popup.
> I think it might be best to differentiate between the two calls so that other
> calls to getparentwindow return the true parent.
I agree.
Assignee | ||
Comment 46•16 years ago
|
||
(In reply to comment #45)
> It will affect the popup window (e.g. "Organize Bookmarks" dialog). We want a
> main window pointer from children of the popup.
> > I think it might be best to differentiate between the two calls so that other
> > calls to getparentwindow return the true parent.
> I agree.
How will it effect windows like organize bookmarks? Is this an IME issue or something more general? I've been using a build of this for a day now and haven't seen any strange behavior during normal use.
Assignee | ||
Comment 47•16 years ago
|
||
Here's an update that chooses between the two so that calls to GetTopLevelWindow continue to return owner.
Attachment #373201 -
Attachment is obsolete: true
Assignee | ||
Updated•16 years ago
|
Attachment #373343 -
Attachment is patch: true
Assignee | ||
Comment 48•16 years ago
|
||
Independent of these changes, this crash still seems pretty prevenlent in pre beta 4. Three crash signatures all seems to tie in with this -
From the last week:
nsWindow::GetParentWindow() (rank 14) -
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5b4pre&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsWindow%3A%3AGetParentWindow()
0 xul.dll nsWindow::GetParentWindow widget/src/windows/nsWindow.cpp:1651
1 xul.dll nsWindow::GetParent widget/src/windows/nsWindow.cpp:1625
2 xul.dll nsBaseWidget::Destroy widget/src/xpwidgets/nsBaseWidget.cpp:259
3 xul.dll nsWindow::Destroy widget/src/windows/nsWindow.cpp:1523
nsBaseWidget::Destroy() (rank 39) -
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5b4pre&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsBaseWidget%3A%3ADestroy()
0 xul.dll nsBaseWidget::Destroy widget/src/xpwidgets/nsBaseWidget.cpp:261
1 xul.dll nsWindow::Destroy widget/src/windows/nsWindow.cpp:1523
2 xul.dll nsPluginInstanceOwner::Destroy layout/generic/nsObjectFrame.cpp:4069
3 xul.dll DoStopPlugin layout/generic/nsObjectFrame.cpp:1986
254 NS_METHOD nsBaseWidget::Destroy()
255 {
256 // Just in case our parent is the only ref to us
257 nsCOMPtr<nsIWidget> kungFuDeathGrip(this);
258 // disconnect from the parent
259 nsIWidget *parent = GetParent();
260 if (parent) {
261 parent->RemoveChild(this);
nsWindow::Destroy() (rank 78) -
(Last one of these was the 12th)
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5b4pre&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsWindow%3A%3ADestroy()
0 @0x740072
1 xul.dll nsWindow::Destroy widget/src/windows/nsWindow.cpp:1489
2 xul.dll nsPluginInstanceOwner::Destroy layout/generic/nsObjectFrame.cpp:4069
3 xul.dll DoStopPlugin layout/generic/nsObjectFrame.cpp:1986
4 xul.dll nsStopPluginRunnable::Run layout/generic/nsObjectFrame.cpp:2023
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510
I was hoping to reproduce this from some extended use of a test build but still haven't seen it.
Comment 49•16 years ago
|
||
(In reply to comment #46)
> How will it effect windows like organize bookmarks?
Sorry, I have confused a dialog name. It should read "Edit This Bookmark" dialog (i.e. a dialog which will appear when tha star icon is clicked).
> Is this an IME issue or
> something more general? I've been using a build of this for a day now and
> haven't seen any strange behavior during normal use.
IME issue. With cleanup v.2 patch and ChangJie, composition window was displayed at wrong position on "Edit Bookmarks" dialog.
Even worse, With cleanup v.3 patch, Firefox hanged when I tried to input on "Edit Bookmarks" dialog using ChangJie. It seems to regress bug 452385.
Without neither patches, there was no problem.
Assignee | ||
Comment 50•16 years ago
|
||
(In reply to comment #49)
> (In reply to comment #46)
> > How will it effect windows like organize bookmarks?
> Sorry, I have confused a dialog name. It should read "Edit This Bookmark"
> dialog (i.e. a dialog which will appear when tha star icon is clicked).
> > Is this an IME issue or
> > something more general? I've been using a build of this for a day now and
> > haven't seen any strange behavior during normal use.
> IME issue. With cleanup v.2 patch and ChangJie, composition window was
> displayed at wrong position on "Edit Bookmarks" dialog.
> Even worse, With cleanup v.3 patch, Firefox hanged when I tried to input on
> "Edit Bookmarks" dialog using ChangJie. It seems to regress bug 452385.
> Without neither patches, there was no problem.
Thanks for testing. I guess I'll go back to using ::GetParent on the IME calls.
Assignee | ||
Comment 51•16 years ago
|
||
Attachment #373343 -
Attachment is obsolete: true
Comment 52•16 years ago
|
||
The new patch looks good.
I haven't find any apparent problems during my short, non-scientific testing.
Assignee | ||
Comment 53•16 years ago
|
||
Comment on attachment 373373 [details] [diff] [review]
[m-c checkin: a7b2a881595a] getparent / destroy variable cleanup v.3
r? -> vlad since there have been some additional changes. This might fix the crash problem so we should consider getting it into 1.9.1 after it's sat on trunk for a few days.
Attachment #373373 -
Flags: review?(vladimir)
Attachment #373373 -
Flags: review?(vladimir) → review+
Assignee | ||
Updated•16 years ago
|
Attachment #373373 -
Attachment description: getparent / destroy variable cleanup v.3 → [m-c checkin: a7b2a881595a] getparent / destroy variable cleanup v.3
Comment 54•16 years ago
|
||
And... ?
Assignee | ||
Updated•16 years ago
|
Attachment #373373 -
Flags: approval1.9.1?
Assignee | ||
Comment 55•16 years ago
|
||
So we generally know which thing is NULL; is there any data at that point that we could use to figure out what happened? I'm thinking we should maybe add some logging; unfortunately breakpad doesn't support sending arbitrary logged data, but we could leave a file in some dir and ask people to mail it in if they see it. (Should always append to it, etc.)
Looks like this can actually be done: http://mxr.mozilla.org/mozilla-central/source/xpcom/system/nsICrashReporter.idl#101 (appendAppNotesToCrashReport) and it'll show up in breakpad.
Comment 58•16 years ago
|
||
I just pushed this patch to the try server... It's a big hammer approach to this, which makes sure Destroy() is called correctly. To do that, SetParent() also needs to manage the parent's child list, which most platforms where not doing. My brief testing showed that the new asserts were not being called, and that reftests and mochitests looked good, except for a few, which appears to caused by a focus problem (the main window isn't refocused after a popup). I'll see what the tryserver says.
I think this patch is to intrusive for 3.5. But some things I did learn... It appears that nsWindow::~nsWindow could be called multiple times... That's not good. I think what might be happening is that the code is actually stepping through deleted memory which gets overwritten at some point.
Comment 59•16 years ago
|
||
Jeremy, out of interest, do you have a test case where SetParent is called from nsViewManager::ReparentChildWidgets?
Comment 60•16 years ago
|
||
Karl... No. I've just been reading the code. bug 129034 is the original implementation. The test case there now asserts on something else (bug 485893 - it doesn't get to SetParent()). It seems to have regressed from 3.0.10, but I haven't looked at it carefully. If SetParent(nonnull) is not needed then these could all be changed to ResetParent(), like what Cocoa did recently, and the code greatly simplified. The assert in nsViewManager::ReparentChildWidgets has never had a bug filed on it. But this is getting off topic.
Assignee | ||
Comment 61•16 years ago
|
||
A possible fix for 1.9.1 might be to move (or add) a call to SetParent(nsnull) just before we destroy the widget in nsPluginInstanceOwner's Destroy. We clear it in PrepareToStop (just below Destroy in the code) a little bit before we actually tear down.
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsObjectFrame.cpp#4217
Assignee | ||
Comment 62•16 years ago
|
||
(In reply to comment #61)
> A possible fix for 1.9.1 might be to move (or add) a call to SetParent(nsnull)
> just before we destroy the widget in nsPluginInstanceOwner's Destroy. We clear
> it in PrepareToStop (just below Destroy in the code) a little bit before we
> actually tear down.
>
> http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsObjectFrame.cpp#4217
Hmm, we call GetParent in SetParent, so that's not going to work without changes to the window code.
Assignee | ||
Comment 63•16 years ago
|
||
Summarizing what's known at this point -
- The crash is sporadic, and hard to reproduce
- The crash occurs in delayed tear down of a plugin window
- We know that SetParent is called in PreparToStop before the tear down event is sent, and it seems that it must succeed in setting the parent widget pointer in the native window's property list to null.
- When the event comes back, we go into what is a rather haphazzard process of unhooking and destroying the widget and window. During this process we retrieve the parent widget pointer via the property list.
- The parent widget pointer returned on a valid native window handle is not null and does not point to a valid parent widget which results in the crash.
I've tried to reproduce this, and looked at ways to do a one off for 1.9.1 but really am not coming up with much of anything except hacks. Vlad suggested logging but I didn't see anything there that would help. What has been suggested is a rework of our tear down methodology with a heavy handed patch in the hopes it will fix the problem. But nobody feels confident about applying that to 1.9.1 this late in the ball game.
I'm not entirely sure what direction to take, so I'd welcome any help / suggestions.
Comment 64•16 years ago
|
||
(In reply to comment #63)
> What has been suggested is a rework of our tear down methodology with a heavy
> handed patch in the hopes it will fix the problem. But nobody feels confident
> about applying that to 1.9.1 this late in the ball game.
Can we start working on this now, land it for 1.9.2 as soon as it's ready and see if it fixes the problem there? If so, and if it bakes for a while, we might be able to take it in 1.9.1.x, despite its size/scope.
Assignee | ||
Comment 65•16 years ago
|
||
(In reply to comment #58)
> Created an attachment (id=376333) [details]
> Properly call Destroy()
>
> I just pushed this patch to the try server... It's a big hammer approach to
> this, which makes sure Destroy() is called correctly. To do that, SetParent()
> also needs to manage the parent's child list, which most platforms where not
> doing. My brief testing showed that the new asserts were not being called, and
> that reftests and mochitests looked good, except for a few, which appears to
> caused by a focus problem (the main window isn't refocused after a popup).
> I'll see what the tryserver says.
>
> I think this patch is to intrusive for 3.5. But some things I did learn... It
> appears that nsWindow::~nsWindow could be called multiple times... That's not
> good. I think what might be happening is that the code is actually stepping
> through deleted memory which gets overwritten at some point.
Jeremy, how are you feeling about your destroy rework patch? Want to kick off some reviews?
Assignee | ||
Comment 66•16 years ago
|
||
Spoke to Jeremy through email, his patch had some issues and didn't make it through try, so I'll start in on something myself.
Comment 67•16 years ago
|
||
I was going to attach an updated patch, but the more I little bugs I've fixed, the more creep in...
What I have become convinced of is that the windows cleverness of using the windows ::GetParent() call is the main problem, because it's ownership model and semantics are just not compatible with what nsIWidget wants. I think that an alternative approach here is to add a nsRefPtr<nsWindow> mParent to nsWindow, and manage it in Create() [=baseParent], SetParent() and Destroy(). Then we can remove GetParentWindow(), since mParent would now be correct. Also, many of the uses of ::GetParent() in nsWindow.cpp are bogus - they should already be using GetParentWindow().
Also, I think that Destroy() should be called on all widgets. I think that the only needed fixes are in nsXULWindow (the last hunk of my patch), and then a loop over the children in Destroy(), like GTK already does. I would also make the loop over the children use a nsCOMPtr so that they are only Released() and destructed when Destroy has returned, just to make life more predictable.
I'll merge most of the rest of my changes into the cleanup I'm doing in bug 431634.
Comment 68•16 years ago
|
||
This is marked topcrash - do we know how much of a crasher this is under b4?
Comment 69•16 years ago
|
||
According to a discussion w/ jimm in IRC, we saw 4991 in a week on 3.0.0.11.
In any case, after talking this over with Joe Drew and jimm in IRC, since we can't reproduce this crash, it's in 3.0, we don't have a regression range, *and* we're really down to the last 1-3 bugs that are going to actually hold the release, I think we're at the point where we should minus this. Marking as such. If people disagree and think we should hold the release for this bug, please re-nom and comment.
Flags: blocking1.9.1+ → blocking1.9.1-
Assignee | ||
Comment 70•16 years ago
|
||
(In reply to comment #68)
> This is marked topcrash - do we know how much of a crasher this is under b4?
http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.5b4&platform=windows&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=contains&query=GetParentWindow
1597 for one week.
If anyone has any ideas on how to pull up a regression range pre the 3.0 release I'd welcome the input. This was present in the first 3.0 release.
Comment 71•16 years ago
|
||
(In reply to comment #69)
> If people disagree and think we should hold the release for this bug,
> please re-nom and comment.
I don't disagree, but I think we should block on this for 1.9.2 and consider backporting it to 1.9.1.x if a fix sticks and is relatively safe.
Flags: wanted1.9.1.x?
Flags: blocking1.9.2?
Comment 72•16 years ago
|
||
Jim, it's hard without any way of reproducing this crash. All the crash stats have no comments.
Chris, could you check if we have a domain or url where this mostly happens? If you can't we should file an it bug to get the results. We still have the email addresses per crash report if users accept their transmission, right? Could we find some users for those the crash happens regularly?
Keywords: regression,
regressionwindow-wanted
Reporter | ||
Comment 73•16 years ago
|
||
looking at the url data it doesn't look to helpful in this case. the distribution of sites and urls to these crash signatures is very wide.
that matches up with the ideas in comment 67 suggesting these crashes are a result of bad interaction within the chrome and general window management; not with anything content related.
As for examples of this it looks like 760 different sites for just the 1500 some odd nsWindow::GetParentWindow() crashes I'm looking at in a weeks worth of data. The list of top sites looks like:
#crash - domain
78 www.youtube.com
23 mail.google.com
22 www.google.com
21 www.facebook.com
9 www.google.com.vn
8 www.google.fr
8 apps.facebook.com
7 www.orkut.com.br
7 win.mail.ru
7 us.mg2.mail.yahoo.com
6 www.orkut.co.in
6 in.mg50.mail.yahoo.com
5 us.mg1.mail.yahoo.com
5 depositfiles.com
5 addons.mozilla.org
4 www.ynet.co.il
4 www.mininova.org
4 www.microsoft.com
4 www.justin.tv
4 www.google.ro
4 www.google.it
4 www.google.co.id
4 www.gametrailers.com
one user might have even hit the crash on about:config , and others hit it on searches, startpage and just a variety of frequent and common browsing activities.
to drill down on the top of the youtube list it looks like:
Firefox 3.5b4 2 http://www.youtube.com/watch?v=daTTOyu-E1w&hd=1&feature=hd nsWindow::GetParentWindow()
Firefox 3.5b4 2 http://www.youtube.com/watch?v=L5brwhalCG0&feature=PlayList&p=BBCB18EBEEF42EC2&index=0&playnext=1 nsWindow::GetParentWindow()
Firefox 3.5b4 2 http://www.youtube.com/watch?v=BsAtRKrbB8w nsWindow::GetParentWindow()
Firefox 3.5b4 2 http://www.youtube.com/watch?v=wXUMZb7gqdg nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=1DODjKLUMVU nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=UYpxIaAI51M nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=SJvtkCFaKaE nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=PGuhX5AmjuA nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=UuBm0dvIzcc&feature=fvst nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/ nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=MYNEYp06EkI nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=kMEFCRbbi0g nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/activesharing_history nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=I8kcnYCHxTk nsWindow::GetParentWindow()
Firefox 3.5b4 1 http://www.youtube.com/watch?v=eUm-zvGsgiE nsWindow::GetParentWindow()
for nsBaseWidget::Destroy stack signature its basically the same thing. Another 625 crash reports in a weeks worth of 3.5b4 data and 457 different domains listed in the urls
40 www.youtube.com
29 mail.google.com
17 www.facebook.com
11 www.google.com
11 apps.facebook.com
5 www.orkut.co.in
5 www.google.com.vn
5 hi5.com
4 www.google.co.in
Reporter | ||
Comment 74•16 years ago
|
||
I do see a few interesting things that appear to be messing with/reading chrome.
I guess one check would be to look at a bunch of reports to try and figure out if there is a common or set of common addons installed like adblockplus, ietab, or speeddial. There really are not very many of these kinds of reports, but the general question of "does chrome get messed up by an addon or cause timing problems not seen otherwise?" might be an interesting one to try and figure out.
release # crashes url at time of crash -- stack signature
Firefox 3.5b4 1 chrome://adblockplus/content/tip_subscriptions.xul nsWindow::GetParentWindow()
Firefox 3.5b4 1 chrome://ietab/content/reloaded.html?url=http://dl.xunlei.com/index.htm?tag=1 nsWindow::GetParentWindow()
Firefox 3.5b4 1 chrome://speeddial/content/speeddial.xul nsWindow::GetParentWindow()
Firefox 3.5b4 1 chrome://wot/locale/loading.html#aHR0cDovL3d3dy50dnZhdWdoYW4uY29tLw%3D%3D nsWindow::GetParentWindow()
Firefox 3.5b4 1 about:sessionrestore nsBaseWidget::Destroy()
Firefox 3.5b4 1 chrome://ietab/content/reloaded.html?url=http://update.microsoft.com/*de-ch nsBaseWidget::Destroy()
Firefox 3.5b4 1 chrome://speeddial/content/speeddial.xul nsBaseWidget::Destroy()
Because it generally seems to be related to plugin destruction, the actual URL may not even be relevant -- for example, it could be caused by destroying a plugin that was on the page you were visiting previously. That would be my guess even, given the high number of google.com and speeddial crashes... then again, there's an about:sessionrestore crash, and that's kinda weird :)
Assignee | ||
Comment 76•16 years ago
|
||
(In reply to comment #75)
> Because it generally seems to be related to plugin destruction, the actual URL
> may not even be relevant -- for example, it could be caused by destroying a
> plugin that was on the page you were visiting previously. That would be my
> guess even, given the high number of google.com and speeddial crashes... then
> again, there's an about:sessionrestore crash, and that's kinda weird :)
Yeah the current url is not assured to be accurate. The dealyed tear down can take some time.
There is an interesting little bit of code in nsStopPluginRunnable's Run that delays tear down due to nesting level of events -
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsObjectFrame.cpp#1965
I've been looking at this recently because in my testing this code never gets used. A possible suspect maybe.
Reporter | ||
Comment 77•16 years ago
|
||
> there's an about:sessionrestore crash, and that's kinda weird :)
yeah, I've been looking at a bunch of cases where this helps as a semi-reliable way to find reproducable crashers. e.g.
1) user crashes on a page
2) sesssion restore kicks in and user selects to load the pages from previous crashes
3) page(s) load and the crash happens again
this case pokes holes in that theory as being fully reliable. I'm guessing the session restore isn't initiating any plugin page teardowns.
I'm getting time of crash, time since last crash, and uptime (time between startup and crash) and that will help to understand and connect these about:sessionrestore crashes to other events on a given users PC a bit better.
Assignee | ||
Comment 78•16 years ago
|
||
I am getting some really crazy stuff going on, including what looked like this crash, by triggering the code in bug 389931. I'm also seeing a lot of slow script warnings, assertions in /mozilla/docshell/shistory/src/nsshentry.cpp's nsSHEntry::AddChild, assertions in the plugin code, hung instances, and plugins that hang around in memory and keep playing.
Steps to reproduce:
1) open two windows
2) in the second window, browse to a page and hit ctrl-s
3) leave the modal file chooser open and switch back to the first window
4) browse around to sites with plugins like youtube & yahoo, use multiple tabs, close tabs, etc.
5) try switching back and forth between the two windows occasionally.
All kinds of interesting things happen.
Script: chrome://global/content/bindings/general.xml:0
Script: chrome://browser/content/browser.js:3552
Script: chrome://global/content/bindings/general.xml:0
&&
WARNING: NS_ENSURE_TRUE(doc) failed: file f:/Mozilla/firefox/470487/mozilla/modules/plugin/base/src/nsNPAPIPlugin.cpp, line 1179
WARNING: NS_ENSURE_TRUE(cx) failed: file f:/Mozilla/firefox/470487/mozilla/modules/plugin/base/src/nsNPAPIPlugin.cpp, line 1192
WARNING: NS_ENSURE_TRUE(doc) failed: file f:/Mozilla/firefox/470487/mozilla/modules/plugin/base/src/nsNPAPIPlugin.cpp, line 1499
are common.
I can't say for sure that this is the cause of this crash. I did see a couple crashes in nswindow's destroy code, but it wasn't in the same place. It was however at points where nswindow was accessing member variables in the base widget.
I can address most of this (except the slow script warnings) by simply commenting out the timer code block in nsObjectFrame's nsStopPluginRunnable::Run(), so that plugins destroy when they are initially asked to. Maybe QA can lend a hand in getting some reproducible cases, so much is going on here I'm not sure where all of it is coming from.
This is all on a trunk build.
Assignee | ||
Comment 79•16 years ago
|
||
The patch in bug 389931 looks to have landed in Fx 3.0, beta 3, so that would put it in the right time frame for when this crash started showing up.
Assignee | ||
Comment 80•16 years ago
|
||
I've successfully reproduced a crash in beta 4 (no stack dump though) following the above steps. The browser re-opened both windows and reloaded the same content, which might explain some of the crash reports we've seen.
Assignee | ||
Updated•16 years ago
|
Flags: blocking1.9.1- → blocking1.9.1?
Assignee | ||
Comment 81•16 years ago
|
||
Bug 420886 is where the delayed tear down code for plugins landed. Apparently it addressed a crash bug in realplayer post the landing of bug 389931.
Assignee | ||
Comment 82•16 years ago
|
||
Bug 422926 may also be related.
Given Jim's data, I'd like to see this go back on the 1.9.1 blocking list -- I think we've got a potential can of worms here.
Assignee | ||
Comment 84•16 years ago
|
||
Filed bug 493578 on the slow script warnings, they are unrelated to this.
Assignee | ||
Comment 85•16 years ago
|
||
(In reply to comment #81)
> Bug 420886 is where the delayed tear down code for plugins landed. Apparently
> it addressed a crash bug in realplayer post the landing of bug 389931.
A general observation - the code in bug 420886 seems to be compensating for a bug in real player. In nsPluginInstanceOwner's PrepareToStop, we hide the window, disable it, and then during destruction the realplayer window receives wm_destroy, at which point it should destroy any modal windows. Maybe the right thing to do here is to pull the patch in 420886, and get with real to work on a fix. We seem to be jumping through a lot of hoops and destablizing our own code in other ways to compensate for a an issue in realplayer.
Comment 87•16 years ago
|
||
Comment on attachment 373373 [details] [diff] [review]
[m-c checkin: a7b2a881595a] getparent / destroy variable cleanup v.3
a191=beltzner
Attachment #373373 -
Flags: approval1.9.1? → approval1.9.1+
Reporter | ||
Comment 88•16 years ago
|
||
re: comment 85
kev / jst/ josh, do we have any recent contacts at real that we could open up that discussion.
Comment 89•16 years ago
|
||
I'll ping a couple people I know there and see if we can get the RP team engaged.
Assignee | ||
Comment 90•16 years ago
|
||
Comment 91•16 years ago
|
||
Keywords: fixed1.9.1
Comment 92•15 years ago
|
||
http://tinyurl.com/nds6x8 shows the current crashes for the Firefox 3.5 release. With 3058 crashes it is currently the #4 topcrash, so it doesn't appear the patch fixed the issue for 1.9.1.
Updated•15 years ago
|
Summary: Firefox Crash [@ nsWindow::GetParentWindow() ] & [@ nsBaseWidget::Destroy()]? → Firefox Crash [@ nsWindow::GetParentWindow(int)] & [@ nsBaseWidget::Destroy()]?
Assignee | ||
Comment 93•15 years ago
|
||
Might be interesting to look at mozilla-central in a few days with bug 500925 landing today.
Assignee | ||
Comment 94•15 years ago
|
||
(In reply to comment #93)
> Might be interesting to look at mozilla-central in a few days with bug 500925
> landing today.
My guess is that won't help much as this is a widget tear down issue. In general I guess this is still a top issue and needs to be addressed sooner rather than later. Now that the nswindow refactoring is complete, I'll look at this again.
Comment 95•15 years ago
|
||
Jim: Is it possible this is fixed in bug 493601? If so, it's likely that this will be fixed in 1.9.1 later.
Assignee | ||
Comment 96•15 years ago
|
||
(In reply to comment #95)
> Jim: Is it possible this is fixed in bug 493601? If so, it's likely that this
> will be fixed in 1.9.1 later.
Doesn't look like it as it's still showing up on 3.6a1pre -
http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.6a1pre&platform=windows&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=contains&query=GetParentWindow
Assignee | ||
Comment 97•15 years ago
|
||
Seems like we have more signatures now -
xul.dll nsWindow::GetParentWindow widget/src/windows/nsWindow.cpp:1668
xul.dll nsWindow::GetParent widget/src/windows/nsWindow.cpp:1634
xul.dll nsBaseWidget::SetZIndex widget/src/xpwidgets/nsBaseWidget.cpp:385
xul.dll xul.dll@0x2a44f9
xul.dll nsIView::CreateWidget view/src/nsView.cpp:699
xul.dll nsMenuPopupFrame::CreateWidgetForView layout/xul/base/src/nsMenuPopupFrame.cpp:280
xul.dll xul.dll@0x320c70
--
xul.dll nsWindow::GetParentWindow widget/src/windows/nsWindow.cpp:1668
xul.dll nsWindow::GetParent widget/src/windows/nsWindow.cpp:1634
xul.dll nsBaseWidget::SetZIndex widget/src/xpwidgets/nsBaseWidget.cpp:385
xul.dll xul.dll@0x2a44f9
xul.dll nsIView::CreateWidget view/src/nsView.cpp:699
xul.dll nsMenuPopupFrame::CreateWidgetForView layout/xul/base/src/nsMenuPopupFrame.cpp:280
xul.dll nsMenuPopupFrame::EnsureWidget layout/xul/base/src/nsMenuPopupFrame.cpp:230
xul.dll nsMenuPopupFrame::InitializePopup layout/xul/base/src/nsMenuPopupFrame.cpp:497
xul.dll nsXULPopupManager::ShowPopup layout/xul/base/src/nsXULPopupManager.cpp:454
The DoStopPlugin one though is still the most common. We might have two bugs here, one on Destroy and another on CreateWidget.
Assignee | ||
Comment 98•15 years ago
|
||
The CreateWidget crash is also on trunk -
http://crash-stats.mozilla.com/report/index/d4431975-aee4-44e4-bcac-f1f3c2090629
Assignee | ||
Comment 99•15 years ago
|
||
This is a refactor of the tear down code in nsWindow. I’d welcome comments. I’ve paired Destroy() down to simply calling DestroyWindow, and rely on the WM_DESTROY event handler to do the actual tear down. Within OnDestroy, I’ve reorganized the order things into what I would consider a more proper tear down order.
This may not fix the crash problem, but it should give us a better feel for exactly what’s going on and when. If the crash still occurs in the call to GetParent in nsBaseWidget’s Destroy(), my guess would be that we could eliminate nsWindow code as being the problem and instead look at the view/plug-in code. The reason I say this is that the plugin code calls SetParent(nsnull) on the widget in nsPluginInstanceOwner’s PrepareToStop, which sets the native parent to null and also removes the window from the parent widget's child list. This (supposedly) happens way before Destroy is called on the window. If a call to GetParent on the final Destroy() call returns a valid Mozilla window handle with a bad |this| pointer, nsWindow’s tear down wouldn’t be the problem AFAICT.
Attachment #376333 -
Attachment is obsolete: true
Assignee | ||
Updated•15 years ago
|
Attachment #387214 -
Flags: review?(neil)
Assignee | ||
Comment 100•15 years ago
|
||
Comment on attachment 387214 [details] [diff] [review]
[m-c checkin: da42cf530fa7] teardown refactor v.1
This passed try fine. I've also been running the try build now for a couple days without problem.
Neil, would you be a good reviewer for something like this?
Updated•15 years ago
|
Attachment #387214 -
Flags: review?(neil) → review+
Assignee | ||
Updated•15 years ago
|
Attachment #387214 -
Attachment description: teardown refactor v.1 → [m-c checkin: da42cf530fa7] teardown refactor v.1
Comment 101•15 years ago
|
||
So, what's the status on this crash?
Currently it's the top crasher for 3.5.1 on Windows
http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.5.1&platform=windows&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=
Comment 102•15 years ago
|
||
(In reply to comment #101)
> So, what's the status on this crash?
>
> Currently it's the top crasher for 3.5.1 on Windows
>
> http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.5.1&platform=windows&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=
http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.5.1
Assignee | ||
Comment 103•15 years ago
|
||
Too early to tell, it has to bake on trunk for a bit.
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&platform=windows&query_search=signature&query_type=contains&query=GetParentWindow&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsWindow%3A%3AGetParentWindow%28int%29
Updated•15 years ago
|
Keywords: testcase-wanted
Assignee | ||
Comment 105•15 years ago
|
||
(In reply to comment #104)
> did this land on the branch?
We could do that if it looks like the patch actually addressed the issue. It would require a little back porting since we refactored nswindow on trunk.
I'd like to let this sit on trunk for a while (maybe 2 or 3 weeks) to test to see if it actually fixed the problem. So far we're about 5 days out from the landing and there hasn't been an occurrence in nightlies - not enough time to confirm IMHO.
Assignee | ||
Comment 106•15 years ago
|
||
For reference, the patch landed on July 13th -
http://hg.mozilla.org/mozilla-central/rev/da42cf530fa7
Crash stats -
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&platform=windows&query_search=signature&query_type=contains&query=GetParentWindow&date=&range_value=2&range_unit=weeks&do_query=1&signature=nsWindow%3A%3AGetParentWindow%28int%29
The last DoStopPlugin crash was in a build from the 12th. We had a crash in GetParentWindow from a build on the 13th, but that was in CreateWidget, which might be something else. Hard to say at this point if the problem is fixed.
Comment 107•15 years ago
|
||
I see. you have two bugs in one here. that's why beltzner marked it fixed1.9.1
Updated•15 years ago
|
status1.9.1:
--- → wanted
Updated•15 years ago
|
Flags: blocking1.9.2? → blocking1.9.2+
Assignee | ||
Comment 108•15 years ago
|
||
on trunk!
I'll get a 1.9.1 patch put together.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 109•15 years ago
|
||
Assignee | ||
Comment 110•15 years ago
|
||
This is definitely fixed on trunk.
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&platform=windows&query_search=signature&query_type=contains&query=GetParentWindow&date=&range_value=2&range_unit=weeks&do_query=1&signature=nsWindow%3A%3AGetParentWindow%28int%29
Posted a 1.9.1 patch to try, if all goes well after a little testing, we should be good for a 1.9.1.3 release. Bug 506108 is filed on the CreateWidget crash.
We could also back port this pretty easily to 3.0 if somebody thinks it's needed.
Reporter | ||
Comment 111•15 years ago
|
||
looking at crash reports from yesterday I still see this signature on 3.5.2
494 total crashes for nsWindow::GetParentWindow(int) on 20090817
distribution of versions where the crash was found on 20090817-crashdata.csv
430 Firefox 3.5.2
44 Firefox 3.5
20 Firefox 3.5.1
but nothing for 3.0.x
does that match up with the places the patch has landed and/or needs to land?
Comment 112•15 years ago
|
||
chofmann: This was initially filed with the signature of nsWindow::GetParentWindow() and changed to be nsWindow::GetParentWindow(int) with an additional nsBaseWidget::Destroy() signature. While the "int" version of the first signature does not show on 1.9.0, the non-int version is still present, ranked #31 with the nsBaseWidget signature ranked at #73.
The above is consistent with this patch having not landed on 1.9.1 or 1.9.0.
Flags: wanted1.9.1.x?
Comment 113•15 years ago
|
||
Verified fixed on trunk based on crash stats.
Assignee | ||
Comment 114•15 years ago
|
||
(In reply to comment #111)
> looking at crash reports from yesterday I still see this signature on 3.5.2
>
> 494 total crashes for nsWindow::GetParentWindow(int) on 20090817
>
> distribution of versions where the crash was found on 20090817-crashdata.csv
> 430 Firefox 3.5.2
> 44 Firefox 3.5
> 20 Firefox 3.5.1
>
> but nothing for 3.0.x
>
> does that match up with the places the patch has landed and/or needs to land?
That's correct. 3.6 is what you want to look at to see if it's fixed. A 3.5.x patch hasn't landed because the original 3.6 patch back ported to 3.5 caused leaks in component manager that I was never able to track down.
What do you mean "never able to track down"? Are you no longer working on this? It's one of our major crashes, so if you need help with it please do ask for it.
Assignee | ||
Comment 116•15 years ago
|
||
(In reply to comment #115)
> What do you mean "never able to track down"? Are you no longer working on
> this? It's one of our major crashes, so if you need help with it please do ask
> for it.
These are the leaks that I can reliably reproduce on the unit test box:
TEST-UNEXPECTED-FAIL | runtests-leaks | leaked 484 bytes during test execution
TEST-UNEXPECTED-FAIL | runtests-leaks | leaked 1 instance of nsComponentManagerImpl with size 276 bytes
TEST-UNEXPECTED-FAIL | runtests-leaks | leaked 2 instances of nsLocalFile with size 88 bytes each (176 bytes total)
TEST-UNEXPECTED-FAIL | runtests-leaks | leaked 3 instances of nsStringBuffer with size 8 bytes each (24 bytes total)
TEST-UNEXPECTED-FAIL | runtests-leaks | leaked 2 instances of nsTArray_base with size 4 bytes each (8 bytes total)
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1250700153.1250706746.27537.gz&fulltext=1#err7
Why a slight re-org of the window destruction code causes this is a mystery. I've poked around in component manager looking for an answer but haven't turned anything up. If anyone has any suggestions on how to better track this down I'd welcome the input. I'll post the current rev of the patch here in a sec.
Assignee | ||
Comment 117•15 years ago
|
||
Attachment #390333 -
Attachment is obsolete: true
Comment 118•15 years ago
|
||
Mass change: adding fixed1.9.2 keyword
(This bug was identified as a mozilla1.9.2 blocker which was fixed before the mozilla-1.9.2 repository was branched (August 13th, 2009) as per this query: http://is.gd/2ydcb - if this bug is not actually fixed on mozilla1.9.2, please remove the keyword. Apologies for the bugspam)
Keywords: fixed1.9.2
Updated•15 years ago
|
Keywords: fixed1.9.2 → verified1.9.2
Updated•15 years ago
|
status1.9.2:
--- → beta1-fixed
Updated•15 years ago
|
Assignee: jmathies → jst
blocking1.9.1: --- → ?
Comment 119•15 years ago
|
||
Are the leaks that bad? If this fixes the topcrash it might be worth it.
blocking1.9.1: ? → needed
Assignee | ||
Comment 120•15 years ago
|
||
(In reply to comment #119)
> Are the leaks that bad? If this fixes the topcrash it might be worth it.
They caused leak check test failures on try.
FWIW, I can revisit this next week after the freeze.
Comment 121•15 years ago
|
||
So that leak output is basically leaking the component manager; everything else that's leaked is members of the component manager: mRegistryFile and mComponentsDir for the nsIFiles, mLoaderData and mPendingServices for the nsTArray_bases; I'd guess the strings are from mLoaderData (though not sure why it only has 3 entries).
Is it possible that this patch effectively changes the timing of things somehow so that we end up executing code after the component manager has been shut down (via nsComponentManagerImpl::Shutdown) and somehow reinstantiates the component manager? Or something like that....
Assignee | ||
Comment 122•15 years ago
|
||
If this is what people were looking at, it's a different bug:
http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.7a1pre&platform=windows&date=&range_value=3&range_unit=weeks&query_search=signature&query_type=contains&query=GetParentWindow&do_query=1#
0 xul.dll nsWindow::GetParentWindow widget/src/windows/nsWindow.cpp:1062
1 xul.dll nsWindow::GetParent widget/src/windows/nsWindow.cpp:1024
2 xul.dll GetWidgetOffset layout/base/nsLayoutUtils.cpp:833
3 xul.dll nsLayoutUtils::TranslateWidgetToView layout/base/nsLayoutUtils.cpp:854
4 xul.dll nsLayoutUtils::GetEventCoordinatesRelativeTo layout/base/nsLayoutUtils.cpp:674
5 xul.dll PresShell::HandleEvent layout/base/nsPresShell.cpp:6188
6 xul.dll nsViewManager::HandleEvent view/src/nsViewManager.cpp:1202
7 xul.dll nsViewManager::DispatchEvent view/src/nsViewManager.cpp:1181
8 xul.dll HandleEvent view/src/nsView.cpp:167
9 xul.dll nsWindow::DispatchEvent widget/src/windows/nsWindow.cpp:2763
10 xul.dll nsWindow::DispatchWindowEvent widget/src/windows/nsWindow.cpp:2786
11 xul.dll nsWindow::DispatchMouseEvent widget/src/windows/nsWindow.cpp:3161
12 xul.dll ChildWindow::DispatchMouseEvent
Comment 123•15 years ago
|
||
As long as we don't have a testcase or steps how to reproduce this crash there will be no way to run a regression test. Clearing flag for now.
Keywords: regressionwindow-wanted
Comment 124•15 years ago
|
||
We're going mark this as blocking 1.9.1.5. We'd like to get it in sooner rather than later, however, I'd like a second review on this. Technically security bugs need a superreview, but I'm explicitly asking for a second review (superreview or not) to get a better sanity check on the code since we haven't yet shipped this code in a release.
Jim or Vlad: Is there a Windows person that would be good for a second review here?
blocking1.9.1: needed → .5+
Assignee | ||
Comment 125•15 years ago
|
||
(In reply to comment #124)
> We're going mark this as blocking 1.9.1.5. We'd like to get it in sooner rather
> than later, however, I'd like a second review on this. Technically security
> bugs need a superreview, but I'm explicitly asking for a second review
> (superreview or not) to get a better sanity check on the code since we haven't
> yet shipped this code in a release.
>
> Jim or Vlad: Is there a Windows person that would be good for a second review
> here?
rob arnold, neil rashbrook, and doug turner come to mind. we still have the leaks to deal with though, so this isn't ready to be reviewed/land on 1.9.1.
Comment 126•15 years ago
|
||
Jim, can you fix those leaks? Per our discussion earlier today, crashes are our top priority right now.
Assignee | ||
Comment 127•15 years ago
|
||
(In reply to comment #126)
> Jim, can you fix those leaks? Per our discussion earlier today, crashes are
> our top priority right now.
I can certainly try. Will look into what bz thought might be the cause (comment 121).
Assignee | ||
Comment 128•15 years ago
|
||
(In reply to comment #127)
> (In reply to comment #126)
> > Jim, can you fix those leaks? Per our discussion earlier today, crashes are
> > our top priority right now.
>
> I can certainly try. Will look into what bz thought might be the cause (comment
> 121).
Things haven't changed much since the last time I worked on this. These leaks only show up on try server - reproducing them on an XP vmware image with both debug and release builds doesn't work.
I guess I'll try playing around with a winnt 5.2 unit test image next. If I can reproduce this on something other than try server where I have more control, I might be able to find something.
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 484 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsComponentManagerImpl with size 276 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of nsLocalFile with size 88 bytes each (176 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3 instances of nsStringBuffer with size 8 bytes each (24 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of nsTArray_base with size 4 bytes each (8 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 484 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsComponentManagerImpl with size 276 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of nsLocalFile with size 88 bytes each (176 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3 instances of nsStringBuffer with size 8 bytes each (24 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of nsTArray_base with size 4 bytes each (8 bytes total)
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1255636233.1255645227.4871.gz&fulltext=1#err12
Comment 129•15 years ago
|
||
Reassigning to Jim since he's investigating here...
Assignee: jst → jmathies
Updated•15 years ago
|
Whiteboard: [crashkill]
Assignee | ||
Comment 130•15 years ago
|
||
After getting access to a try server slave, I found the crashtest leaks corresponded to two tests involving applets -
layout/base/crashtests/265986-1.html
parser/htmlparser/tests/crashtests/328751-1.html
Disabling those two tests addresses the crashtest leak problem.
Will try to track down the mochitest-plain leaks, they probably related to applets as well.
Another interesting note - disabling one test and leaving the other results in the same number of leaked bytes.
Assignee | ||
Comment 131•15 years ago
|
||
The other tests for mochitest-plain all tie to some html validation tests that make use of a java applet:
http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/dom-level2-html/
http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/dom-level2-html/files/
test_HTMLBodyElement07.html \
test_HTMLBodyElement08.html \
test_HTMLBodyElement09.html \
test_HTMLBodyElement10.html \
test_HTMLBodyElement11.html \
test_HTMLBodyElement12.html \
test_HTMLDocument01.html \
test_HTMLDocument02.html \
test_HTMLDocument03.html \
test_HTMLDocument04.html \
test_HTMLDocument05.html \
test_HTMLDocument07.html \
test_HTMLDocument08.html \
test_HTMLDocument09.html \
test_HTMLDocument10.html \
test_HTMLDocument11.html \
test_HTMLDocument13.html \
test_HTMLDocument14.html \
test_HTMLDocument15.html \
test_HTMLDocument16.html \
test_HTMLDocument17.html \
test_HTMLDocument18.html \
test_HTMLDocument19.html \
test_HTMLDocument20.html \
test_HTMLDocument21.html \
test_HTMLDocument22.html \
test_HTMLDocument23.html \
test_HTMLDocument24.html \
test_HTMLDocument25.html \
test_HTMLDocument26.html \
test_HTMLDocument27.html \
I'm guessing we probably don't want to disable these on 1.9.1.
Comment 132•15 years ago
|
||
What's next here? Let's get this done.
Comment 133•15 years ago
|
||
Ah, nevermind. I see it now.
Updated•15 years ago
|
Whiteboard: [crashkill] → [crashkill][crashkill-fix]
Assignee | ||
Comment 134•15 years ago
|
||
(In reply to comment #132)
> What's next here? Let's get this done.
Next step would be to make a decision on how to handle these leaks. We can disable the tests (which allows this to pass try cleanly) or we can spend some additional time looking for a fix to the applet problem.
Comment 135•15 years ago
|
||
I don't think we should disable the tests, but I'm willing to live with the leaks. Can we change our expected leak value to accomodate these new leaks? We'd really like to get this in 1.9.1.6 which code freezes on November 10 at 11:59pm.
Assignee | ||
Comment 136•15 years ago
|
||
(In reply to comment #135)
> I don't think we should disable the tests, but I'm willing to live with the
> leaks. Can we change our expected leak value to accomodate these new leaks?
> We'd really like to get this in 1.9.1.6 which code freezes on November 10 at
> 11:59pm.
Who would make that call? Any pointers on how we would change tolerated leak values? I can work up a patch and request approval on it.
Comment 137•15 years ago
|
||
I just made that call. :)
I don't know how to change the tolerate leak values, unfortunately. Hopefully someone else can help with that.
Comment 138•15 years ago
|
||
Just fix the leaks.
I also don't understand why we're so stumped by the leaks here -- what's the interaction that's changed? What have we looked at or tried? A lot of different tests leak, so it seems like we should be able to track the lifecycle of the relevant objects without too much hassle. It's not like we allocate the component manager very often in the normal course of things!
Assignee | ||
Comment 140•15 years ago
|
||
Has anyone else tried reproducing the crashtest leaks? I still can't, and I've even recently upgraded my system. If not I guess I can allocate a try slave again and see if I can get a debug build going on it.
Assignee | ||
Comment 141•15 years ago
|
||
(In reply to comment #140)
> Has anyone else tried reproducing the crashtest leaks? I still can't, and I've
> even recently upgraded my system. If not I guess I can allocate a try slave
> again and see if I can get a debug build going on it.
Never mind! After installing the latest java run time, I'm able to reproduce them!
Assignee | ||
Comment 142•15 years ago
|
||
This appears to be the root of the problem. I can't find corresponding releases for two AddRef's from the java plugin, and on shutdown the component manager asserts with a left over refcount of 2.
Assignee | ||
Comment 143•15 years ago
|
||
FWIW, I can reproduce this on 1.9.1, w/Java(TM) Platform SE 6 U17, without the tear down patch applied. Trunk builds don't exhibit the problem.
Comment 144•15 years ago
|
||
So... code freeze for 1.9.1.6 is tonight (6.5 hours). While I'm all for getting leaks fixed, this is a leak on shutdown when using Java. It's not a huge issue and it only happens on 1.9.1. The trade off of stability vs leaking on shutdown is pretty clear to me (stability wins!).
I don't think we should push off landing this patch longer than 1.9.1.6. The chances are 1.9.1.7 won't be until early February.
Jim, other than the leaks, is attachment 395363 [details] [diff] [review] ready for the 1.9.1 branch?
Comment 145•15 years ago
|
||
I fully agree with Sam here, let's take this, the leaks mean nothing compared to the stability gains we're hoping to get out of this fix!
Assignee | ||
Comment 146•15 years ago
|
||
(In reply to comment #144)
> So... code freeze for 1.9.1.6 is tonight (6.5 hours). While I'm all for getting
> leaks fixed, this is a leak on shutdown when using Java. It's not a huge issue
> and it only happens on 1.9.1. The trade off of stability vs leaking on shutdown
> is pretty clear to me (stability wins!).
>
> I don't think we should push off landing this patch longer than 1.9.1.6. The
> chances are 1.9.1.7 won't be until early February.
>
> Jim, other than the leaks, is attachment 395363 [details] [diff] [review] ready for the 1.9.1 branch?
I can now reproduce this sans my patch, so the java issue is independent, although my patch on 1.9.1 triggers it with try server slave config.
Let me see if the patch if fresh, will post back here in a sec.
Assignee | ||
Comment 147•15 years ago
|
||
This still applies cleanly.
Assignee | ||
Updated•15 years ago
|
Attachment #395363 -
Flags: approval1.9.1.6?
Comment 148•15 years ago
|
||
Comment on attachment 395363 [details] [diff] [review]
teardown refactor 1.9.1 v.3
Approved for 1.9.1.6. a=ss
Attachment #395363 -
Flags: approval1.9.1.6? → approval1.9.1.6+
Assignee | ||
Comment 149•15 years ago
|
||
This landed tonight:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/3d0b0cfd9644
The java leaks were apparently known. We have leak threshold values related to java set on 1.9.1 that aren't currently set on try, which is why this problem manifested on test runs. (hat tip to nthomas for helping track this down.)
Related bugs: bug 471647, and bug 495533.
Assignee | ||
Updated•15 years ago
|
Updated•15 years ago
|
Whiteboard: [crashkill][crashkill-fix] → [sg:critical?][crashkill][crashkill-fix]
Comment 150•15 years ago
|
||
Henrik, you marked this verified fixed for 1.9.2. How did you verify this? I need to know since I need to verify this for 1.9.1 and I don't see a way to do so based on my reading of the comments.
Comment 151•15 years ago
|
||
I did that regarding our crash stats due to a missing testcase.
Jim, it looks like that a similar crash has been returned for 1.9.2. The first frames are identical. Shall we open a new bug on that? See bp-673ba0da-133f-488d-91ff-2d63b2091120
0 xul.dll nsWindow::GetParentWindow widget/src/windows/nsWindow.cpp:1063
1 xul.dll nsWindow::GetParent widget/src/windows/nsWindow.cpp:1025
2 xul.dll GetWidgetOffset layout/base/nsLayoutUtils.cpp:839
3 xul.dll nsLayoutUtils::TranslateWidgetToView layout/base/nsLayoutUtils.cpp:860
4 xul.dll nsLayoutUtils::GetEventCoordinatesRelativeTo layout/base/nsLayoutUtils.cpp:680
Assignee | ||
Comment 152•15 years ago
|
||
(In reply to comment #151)
> I did that regarding our crash stats due to a missing testcase.
>
> Jim, it looks like that a similar crash has been returned for 1.9.2. The first
> frames are identical. Shall we open a new bug on that? See
> bp-673ba0da-133f-488d-91ff-2d63b2091120
>
> 0 xul.dll nsWindow::GetParentWindow
> widget/src/windows/nsWindow.cpp:1063
> 1 xul.dll nsWindow::GetParent widget/src/windows/nsWindow.cpp:1025
> 2 xul.dll GetWidgetOffset layout/base/nsLayoutUtils.cpp:839
> 3 xul.dll nsLayoutUtils::TranslateWidgetToView
> layout/base/nsLayoutUtils.cpp:860
> 4 xul.dll nsLayoutUtils::GetEventCoordinatesRelativeTo
> layout/base/nsLayoutUtils.cpp:680
Definitely, that's a completely different stack. Looks like it has something to do with scroll.
Assignee | ||
Comment 153•15 years ago
|
||
There are a few common crash signatures here mixed in together. One is CreateWidget -
http://crash-stats.mozilla.com/report/index/bcd92777-df9d-4f36-a7f5-8f0e02091120
http://crash-stats.mozilla.com/report/index/ebe43c86-d12d-44dd-affc-bf5ec2091120
That's likely bug 506108.
Another has to do with dispatching events -
http://crash-stats.mozilla.com/report/index/a85d727a-fcc7-46df-aab2-b5b652091120
We should split out a separate bug for the events crash.
Assignee | ||
Comment 154•15 years ago
|
||
Filed bug 530070.
How do we get bug numbers added to crash stats for specific stacks?
Comment 155•15 years ago
|
||
I think the only way we could do that would be to add enough of the top frames to the skiplist in the crash processor so that we can tell the two appart from just the signature of the crash report. In these two cases it seems we would need to add at least nsWindow::GetParentWindow and nsWindow::GetParent, so that the signatures for the two individual crashes would become "nsWindow::GetParentWindow | nsWindow::GetParent | nsBaseWidget::SetZIndex" vs "nsWindow::GetParentWindow | nsWindow::GetParent | GetWidgetOffset". I don't know whether that's a good idea in general, or if it's worth it just for this specific case.
Comment 156•15 years ago
|
||
This appears to still happen on the 1.9.0 branch (where it's topcrash #32 or so). Would the patch here work for that branch? Do we need a new one? Can't really unhide the bug while it's still happening on that branch.
Does not appear to happen on the 1.8 branch, or at least doesn't appear anywhere in the top-500 crash reports. Tried searching to see if it happened at all but can't get a response from the talkback server.
Flags: blocking1.9.0.19?
Flags: blocking1.9.0.18?
Comment 157•15 years ago
|
||
Would love to get this into 3.0.18 but blocking may not be realistic. If the patch applies please request approval ASAP, otherwise we'll block for the next 3.0.x release.
Flags: blocking1.9.0.19?
Flags: blocking1.9.0.19-
Flags: blocking1.9.0.18?
Updated•14 years ago
|
Group: core-security
Updated•13 years ago
|
Crash Signature: [@ nsWindow::GetParentWindow(int)]
[@ nsBaseWidget::Destroy()]
Updated•9 years ago
|
Keywords: testcase-wanted
You need to log in
before you can comment on or make changes to this bug.
Description
•