Closed Bug 68365 Opened 24 years ago Closed 24 years ago

Crash, when trying to delete lots of messages

Categories

(SeaMonkey :: MailNews: Message Display, defect)

x86
Windows 98
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: sheelar, Assigned: hyatt)

References

Details

(Keywords: crash)

Attachments

(3 files)

buildid:2001020808 os:win98 I was deleting about 3300 messages from my bugzilla folder. Selected all the messages in the thread pane. click on the tool bar button to delete Results: Messages were all selected and the progress bar showed the activity for about 10 seconds. The message count did not change in the trash folder or the folder from where I was deleting messages from because I had a lot of them which were unread. Tried to stop since it looked like it got stuck. Could not stop. I clicked on the trash folder and saw the thread pane was just scrolling and looked like it was getting all the messages Finally moved all the 3300 messages to the trash and crashed I have the talk back stack trace which I will attach to this bug.
change qa contact to myself, cc esther waiting for the stack trace should be available soon.
QA Contact: esther → sheelar
Here is the stack trace: nsScrollSmoother::Notify[d:\builds\seamonkey\mozilla\layout\xul\base\src\nsXULTreeOuterGroupFrame.cpp, line 208] nsTimer::Fire [d:\builds\seamonkey\mozilla\widget\timer\src\windows\nsTimer.cpp, line 200] nsTimerManager::FireNextReadyTimer [d:\builds\seamonkey\mozilla\widget\timer\src\windows\nsTimerManager.cpp, line 117] nsAppShell::Run [d:\builds\seamonkey\mozilla\widget\src\windows\nsAppShell.cpp, line 118] nsAppShellService::Run [d:\builds\seamonkey\mozilla\xpfe\appshell\src\nsAppShellService.cpp, line 408] NETSCP6.EXE + 0x1670 (0x00401670) NETSCP6.EXE + 0x11b8 (0x004011b8) NETSCP6.EXE + 0x2ae2 (0x00402ae2) KERNEL32.DLL + 0x1b560 (0xbff8b560) KERNEL32.DLL + 0x1b412 (0xbff8b412) KERNEL32.DLL + 0x19dd5 (0xbff89dd5)
I am investigating this, and I might know the problem. Please don't fix this, this is a good exercise for me! Thanks. :)
sheela, are you able to reproduce this problem? mOuter is probably null in nsScrollSmoother::Notify() we could add a null check and assertion to prevent the crash.
[sigh] Yes it is null, that's what I had figured out. Seth, can I make the patch for this bug?
As I read through that file, I see that there are quite a number of places where we could nullcheck to prevent crashes. Some of that error checking is poorly written. Seth, do you think I should make a patch to errorcheck on more places, or could that somehow slow down the code?
Håkan: null ptr checks aren't always the way to fix null ptr crashers. If the pointer shouldn't be null at that point, the real cause (for why it's null) should be found and fixed.
A nullcheck can find a null pointer and display a dialog stating the problem instead of silently crashing.
blake is right, adding null checks every where is not the solution. (and brendan will go crazy on you.) what you should do, is provide a null check & assertion patch for this crash, get it checked in (crashing is bad). Then start a new bug on the assertion to figure out why we have null in this case. At least while you (or the author of the code) figure it out, the user doesn't crash.
amen.
Keywords: crash
This patch shouldn't probably get checked in. But it points out where in the source I think this is happening, and does some assertions. 1. This code can be the crash, because if 'this' is nil, and we first try to create a new nsScrollSmoother and then do NS_ADDRED() on that null object we crash. if (!mScrollSmoother) { mScrollSmoother = new nsScrollSmoother(this); NS_ADDREF(mScrollSmoother); 2. The constructor, nsScrollSmoother::nsScrollSmoother(nsXULTreeOuterGroupFrame* aOuter) can also be the problem. I don't know how many times this class is created, but it might be an out of memory here which makes something go null, because the reporter tries to move >3000 msgs to the trash. 3. And finally the mOuter *is* null in nsScrollSmoother::Notify, I am quite sure of this, because those lines appear in the stack trace before we call the timer stuff. But #1 and #2 is called before this, so that's one of them are the originator here. Thanks, and please comment on the patch so I can improve it. Cc'ing evaughan who checked in this code (according to cvsblame).
bug 52270 may be related, if you look at what code they both are talking about.
here comes the patch, based on hwaara's work. In the patch, I assert (so a coder can work on this if they reproduce it) and return if mOuter is null. It is also possible that mOuter is corrupt, in which case this fix will not work. after we check in, we'll have to see if sheela can still reproduce the problem. wild, uneducated guess: the timer fires after mOuter is gone. perhaps the code that removes mOuter should remove the timer, too. here comes the patch, for the crasher. the next step would be to reproduce it in a debug build and figure out what is going on. I don't have the cycles to do that yet. (we'll might need some help from the owner of the code, too.)
But Seth, what about #1 and #2 in my previous posting?
Assignee: sspitzer → hyatt
#1 makes sense, we are not checking if the allocation fails. attached is a revised patch. we will still crash (if the caller of GetSmoother() doesn't null check the result), but at least we'll assert. I'm not convinced mOuter was null, you could get this crash if it was garbage. you need to take this up with hyatt, evaughan and pinkerton. doesn't look like a mailnews problem.
+ if (!mOuter) return; How about returning something here? We should try to notify the following code that we return because mOuter is either null or garbage.
you can't return anything. the return type is void. NS_IMETHODIMP_(void) nsScrollSmoother::Notify(nsITimer *timer)
Oh yes, I forgot that. r=hwaara on the latest patch.
BTW, is it possible to make that function non-void to return error codes?
"If anything is possible, then nothing is interesting." H.G. Wells just thought I'd make everyone's bugzilla more enjoyable. no, Notify is part of the nsITimerCallback interface. Unless you want to change that, which you don't.
About reproducing this bug, I don't have that many messages yet. But as soon as I accumulate that many messages in the bugzilla folder I will try this again and see if I crash without the fix.
a trick to building up a large folder: do a multiple copy from a newsgroup, like news://news.mozilla.org/netscape.test or news://news/mcom.test
seth, thanks for the tip. I did delete about 2700 messages from a local folder as before. Using today's build, I did not crash. I saw the same behavior again but it did not crash. The only difference is that I have today's build(2001021406)instead of 2001020808 build which crashed. Should I try this on the same build again and see if I crash?
this bug appears to be timer related, so it is going to be hard to reproduce. to be honest, if you have other bugs to work on, I'd suggest working on them first. this will be tricky to reproduce, and probably not worth it right now. after the fix lands, we'll probably have to wait and see if people still see this crasher. talkback would be good for that.
So was this checked in?
Ok, the latest patch already has got my r= on it...Seth, please get an sr= this bug and check it in!
hwaara, I'm waiting for hyatt to review / approve the patch.
sr=hyatt
*** Bug 69781 has been marked as a duplicate of this bug. ***
fix checked in.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Sheela wrote: Seth, I remember trying to recreate this crash and was not able to. This is the crash bug when trying to delete lots of messages. Can you tell me how I can verify this fix? You can reply to this email or add as comments in bugzilla. Reply from Seth, That is going to be a tough bug to properly verify, since it is timer related. All I can suggest is looking for crashes in talkback that look like that bug. I'd mark the bug verified, and then reopen if you see any crashes in talkback (with the exact same stack trace) from trunk builds after the date I checked in the fix. -Seth From the above comments from Seth marking this bug as verified. Thanks again seth
Status: RESOLVED → VERIFIED
Sheela (or any other QA), out of curiousity, have you seen any talkbacks crashing when deleting messages lately?
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: