Closed Bug 68365 Opened 24 years ago Closed 24 years ago

Crash, when trying to delete lots of messages

Categories

(SeaMonkey :: MailNews: Message Display, defect)

Product:

Component:

Platform:

x86

Windows 98

Type:

defect

Priority:

Not set

Severity:

critical

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: sheelar, Assigned: hyatt)

References

Details

(Keywords: crash)

Attachments

(3 files)

[Patch] points out where in the code this might be happening 24 years ago Håkan Waara (deleted), patch		Details \| Diff \| Splinter Review
patch, please review 24 years ago (not reading, please use seth@sspitzer.org instead) (deleted), patch		Details \| Diff \| Splinter Review
new patch, following hwaara's suggestions 24 years ago (not reading, please use seth@sspitzer.org instead) (deleted), patch		Details \| Diff \| Splinter Review

Sheela Ravindran

Reporter

Description

•

24 years ago

buildid:2001020808 os:win98 I was deleting about 3300 messages from my bugzilla folder. Selected all the messages in the thread pane. click on the tool bar button to delete Results: Messages were all selected and the progress bar showed the activity for about 10 seconds. The message count did not change in the trash folder or the folder from where I was deleting messages from because I had a lot of them which were unread. Tried to stop since it looked like it got stuck. Could not stop. I clicked on the trash folder and saw the thread pane was just scrolling and looked like it was getting all the messages Finally moved all the 3300 messages to the trash and crashed I have the talk back stack trace which I will attach to this bug.

Sheela Ravindran

Reporter

Comment 1

•

24 years ago

change qa contact to myself, cc esther waiting for the stack trace should be available soon.

QA Contact: esther → sheelar

Sheela Ravindran

Reporter

Comment 2

•

24 years ago

Here is the stack trace: nsScrollSmoother::Notify[d:\builds\seamonkey\mozilla\layout\xul\base\src\nsXULTreeOuterGroupFrame.cpp, line 208] nsTimer::Fire [d:\builds\seamonkey\mozilla\widget\timer\src\windows\nsTimer.cpp, line 200] nsTimerManager::FireNextReadyTimer [d:\builds\seamonkey\mozilla\widget\timer\src\windows\nsTimerManager.cpp, line 117] nsAppShell::Run [d:\builds\seamonkey\mozilla\widget\src\windows\nsAppShell.cpp, line 118] nsAppShellService::Run [d:\builds\seamonkey\mozilla\xpfe\appshell\src\nsAppShellService.cpp, line 408] NETSCP6.EXE + 0x1670 (0x00401670) NETSCP6.EXE + 0x11b8 (0x004011b8) NETSCP6.EXE + 0x2ae2 (0x00402ae2) KERNEL32.DLL + 0x1b560 (0xbff8b560) KERNEL32.DLL + 0x1b412 (0xbff8b412) KERNEL32.DLL + 0x19dd5 (0xbff89dd5)

Comment 3

•

24 years ago

I am investigating this, and I might know the problem. Please don't fix this, this is a good exercise for me! Thanks. :)

(not reading, please use seth@sspitzer.org instead)

Comment 4

•

24 years ago

sheela, are you able to reproduce this problem? mOuter is probably null in nsScrollSmoother::Notify() we could add a null check and assertion to prevent the crash.

Comment 5

•

24 years ago

[sigh] Yes it is null, that's what I had figured out. Seth, can I make the patch for this bug?

Comment 6

•

24 years ago

As I read through that file, I see that there are quite a number of places where we could nullcheck to prevent crashes. Some of that error checking is poorly written. Seth, do you think I should make a patch to errorcheck on more places, or could that somehow slow down the code?

Comment 7

•

24 years ago

Håkan: null ptr checks aren't always the way to fix null ptr crashers. If the pointer shouldn't be null at that point, the real cause (for why it's null) should be found and fixed.

Comment 8

•

24 years ago

A nullcheck can find a null pointer and display a dialog stating the problem instead of silently crashing.

(not reading, please use seth@sspitzer.org instead)

Comment 9

•

24 years ago

blake is right, adding null checks every where is not the solution. (and brendan will go crazy on you.) what you should do, is provide a null check & assertion patch for this crash, get it checked in (crashing is bad). Then start a new bug on the assertion to figure out why we have null in this case. At least while you (or the author of the code) figure it out, the user doesn't crash.

Comment 10

•

24 years ago

amen.

Updated

•

24 years ago

Keywords: crash

Comment 11

•

24 years ago

Attached patch [Patch] points out where in the code this might be happening (deleted) — Details — Splinter Review

Comment 12

•

24 years ago

This patch shouldn't probably get checked in. But it points out where in the source I think this is happening, and does some assertions. 1. This code can be the crash, because if 'this' is nil, and we first try to create a new nsScrollSmoother and then do NS_ADDRED() on that null object we crash. if (!mScrollSmoother) { mScrollSmoother = new nsScrollSmoother(this); NS_ADDREF(mScrollSmoother); 2. The constructor, nsScrollSmoother::nsScrollSmoother(nsXULTreeOuterGroupFrame* aOuter) can also be the problem. I don't know how many times this class is created, but it might be an out of memory here which makes something go null, because the reporter tries to move >3000 msgs to the trash. 3. And finally the mOuter *is* null in nsScrollSmoother::Notify, I am quite sure of this, because those lines appear in the stack trace before we call the timer stuff. But #1 and #2 is called before this, so that's one of them are the originator here. Thanks, and please comment on the patch so I can improve it. Cc'ing evaughan who checked in this code (according to cvsblame).

Comment 13

•

24 years ago

bug 52270 may be related, if you look at what code they both are talking about.

(not reading, please use seth@sspitzer.org instead)

Comment 14

•

24 years ago

here comes the patch, based on hwaara's work. In the patch, I assert (so a coder can work on this if they reproduce it) and return if mOuter is null. It is also possible that mOuter is corrupt, in which case this fix will not work. after we check in, we'll have to see if sheela can still reproduce the problem. wild, uneducated guess: the timer fires after mOuter is gone. perhaps the code that removes mOuter should remove the timer, too. here comes the patch, for the crasher. the next step would be to reproduce it in a debug build and figure out what is going on. I don't have the cycles to do that yet. (we'll might need some help from the owner of the code, too.)

(not reading, please use seth@sspitzer.org instead)

Comment 15

•

24 years ago

Attached patch patch, please review (deleted) — Details — Splinter Review

Comment 16

•

24 years ago

But Seth, what about #1 and #2 in my previous posting?

(not reading, please use seth@sspitzer.org instead)

Updated

•

24 years ago

Assignee: sspitzer → hyatt

(not reading, please use seth@sspitzer.org instead)

Comment 17

•

24 years ago

#1 makes sense, we are not checking if the allocation fails. attached is a revised patch. we will still crash (if the caller of GetSmoother() doesn't null check the result), but at least we'll assert. I'm not convinced mOuter was null, you could get this crash if it was garbage. you need to take this up with hyatt, evaughan and pinkerton. doesn't look like a mailnews problem.

(not reading, please use seth@sspitzer.org instead)

Comment 18

•

24 years ago

Attached patch new patch, following hwaara's suggestions (deleted) — Details — Splinter Review

Comment 19

•

24 years ago

+ if (!mOuter) return; How about returning something here? We should try to notify the following code that we return because mOuter is either null or garbage.

(not reading, please use seth@sspitzer.org instead)

Comment 20

•

24 years ago

you can't return anything. the return type is void. NS_IMETHODIMP_(void) nsScrollSmoother::Notify(nsITimer *timer)

Comment 21

•

24 years ago

Oh yes, I forgot that. r=hwaara on the latest patch.

Comment 22

•

24 years ago

BTW, is it possible to make that function non-void to return error codes?

(not reading, please use seth@sspitzer.org instead)

Comment 23

•

24 years ago

"If anything is possible, then nothing is interesting." H.G. Wells just thought I'd make everyone's bugzilla more enjoyable. no, Notify is part of the nsITimerCallback interface. Unless you want to change that, which you don't.

Sheela Ravindran

Reporter

Comment 24

•

24 years ago

About reproducing this bug, I don't have that many messages yet. But as soon as I accumulate that many messages in the bugzilla folder I will try this again and see if I crash without the fix.

(not reading, please use seth@sspitzer.org instead)

Comment 25

•

24 years ago

a trick to building up a large folder: do a multiple copy from a newsgroup, like news://news.mozilla.org/netscape.test or news://news/mcom.test

Sheela Ravindran

Reporter

Comment 26

•

24 years ago

seth, thanks for the tip. I did delete about 2700 messages from a local folder as before. Using today's build, I did not crash. I saw the same behavior again but it did not crash. The only difference is that I have today's build(2001021406)instead of 2001020808 build which crashed. Should I try this on the same build again and see if I crash?

(not reading, please use seth@sspitzer.org instead)

Comment 27

•

24 years ago

this bug appears to be timer related, so it is going to be hard to reproduce. to be honest, if you have other bugs to work on, I'd suggest working on them first. this will be tricky to reproduce, and probably not worth it right now. after the fix lands, we'll probably have to wait and see if people still see this crasher. talkback would be good for that.

Comment 28

•

24 years ago

So was this checked in?

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 29

•

24 years ago

According to LXR, no it wasn't : http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/layout/xul/base/src/nsXULTreeOuterGroupFrame.cpp

Comment 30

•

24 years ago

Ok, the latest patch already has got my r= on it...Seth, please get an sr= this bug and check it in!

(not reading, please use seth@sspitzer.org instead)

Comment 31

•

24 years ago

hwaara, I'm waiting for hyatt to review / approve the patch.

Assignee

Comment 32

•

24 years ago

sr=hyatt

Comment 33

•

24 years ago

*** Bug 69781 has been marked as a duplicate of this bug. ***

(not reading, please use seth@sspitzer.org instead)

Comment 34

•

24 years ago

fix checked in.

Status: NEW → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Sheela Ravindran

Reporter

Comment 35

•

24 years ago

Sheela wrote: Seth, I remember trying to recreate this crash and was not able to. This is the crash bug when trying to delete lots of messages. Can you tell me how I can verify this fix? You can reply to this email or add as comments in bugzilla. Reply from Seth, That is going to be a tough bug to properly verify, since it is timer related. All I can suggest is looking for crashes in talkback that look like that bug. I'd mark the bug verified, and then reopen if you see any crashes in talkback (with the exact same stack trace) from trunk builds after the date I checked in the fix. -Seth From the above comments from Seth marking this bug as verified. Thanks again seth

Status: RESOLVED → VERIFIED

Comment 36

•

24 years ago

Sheela (or any other QA), out of curiousity, have you seen any talkbacks crashing when deleting messages lately?

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: Browser → Seamonkey

You need to log in before you can comment on or make changes to this bug.