Closed
Bug 18005
Opened 25 years ago
Closed 25 years ago
[DOGFOOD] Leave mail window for a long time, GetMsg, crash
Categories
(MailNews Core :: Networking, defect, P3)
MailNews Core
Networking
Tracking
(Not tracked)
VERIFIED
FIXED
M12
People
(Reporter: trudelle, Assigned: dougt)
References
Details
(Whiteboard: [PDT+] Verified for all the platforms)
Attachments
(1 file)
(deleted),
text/plain
|
Details |
Today's opt build (yesterday too)
Launch Apprunner
Task>Mail
Open IMAP server
select inbox
read a message (possibly extraneous step)
Leave mail window sitting there for a while without using it.
Click GetMsg
Crash, log available.
Saw several times on Mac, once on Linux, will try on Win98
Updated•25 years ago
|
Assignee: phil → bienvenu
Summary: Crash on GetMsg → Leave mail window for a long time, GetMsg, crash
Comment 1•25 years ago
|
||
Here's the stack trace. Peter, is this really a Seamonkey stack trace? It has
all sorts of names which look like 4.x.
Calling chain using A6/R1 links Back chain ISA Caller 00000000 PPC 16C7DF28
068428C0 PPC 16C7E07C 06842870 PPC 174BDE6C LApplication::Run()+000B8 06842800
PPC 17099704 XP_GetNonGridContext+285EC 068427A0 PPC 17447B64
LPeriodical::DevoteTimeToRepeaters(const EventRecord&)+0004C 06842740 PPC
16D2F6D8 CFrontApp::GetApplication()+010D4 068426F0 PPC 16D2FE3C
CFrontApp::GetApplication()+01838 06842660 PPC 16D31948 SSL_DataPending+01390
06842610 PPC 16E05C94 CACHE_FindURLInCache+0317C 068425C0 PPC 16EB2568
NET_CacheConverter+01560 06842560 PPC 16E92420
NET_DeregisterContentTypeConverter+087C0 06842510 PPC 17057630
FE_DefaultDocCharSetID+3AAD8 068424A0 PPC 171AD594 XP_Confirm+178D0 06842450 PPC
17071BC4 XP_GetNonGridContext+00AAC 068423B0 PPC 1707292C
XP_GetNonGridContext+01814 06842360 PPC 16DCEA64 XP_TempDirName+0C0B4 068422F0
PPC 16DCF5A8 XP_TempDirName+0CBF8 068422B0 PPC 16DD0344 XP_TempDirName+0D994
06842270 PPC 16DD0344 XP_TempDirName+0D994 06842230 PPC 16DD0344
XP_TempDirName+0D994 068421F0 PPC 16DD0358 XP_TempDirName+0D9A8 068421B0 PPC
16D96644 SOB_get_error+019E4 06842170 PPC 174F3EDC Flush_Free+0000C Return
addresses on the stack Stack Addr Frame Addr ISA Caller 068424D8 PPC 16D18A18
XP_PlatformFileToURL+0A1D4 068424CC 68K 0636DE42 068424C8 PPC 16CF8804
INTL_DefaultWinCharSetID+004F0 068424B8 68K 17445482
LBroadcaster::BroadcastMessage(long, void*)+0008A 068424A8 PPC 17057630
FE_DefaultDocCharSetID+3AAD8 0684248C 68K 065AA29E 06842458 06842450 PPC
171AD594 XP_Confirm+178D0 06842408 06842400 PPC 174F3E88 Flush_Allocate+0001C
068423B8 068423B0 PPC 17071BC4 XP_GetNonGridContext+00AAC 06842388 PPC 17133678
UGraphicGizmos::BevelRect(const Rect&, short, short , short)+05EE4 06842368
06842360 PPC 1707292C XP_GetNonGridContext+01814 0684235C 06842358 68K 063E7ACA
06842308 06842300 PPC 16F04378 XP_ProgressText+20950 068422F8 068422F0 PPC
16DCEA64 XP_TempDirName+0C0B4 068422EC 68K 063E7ACA 068422DE 68K 0003FFFE
068422D8 068422D0 PPC 1732C8B8 PR_ExitMonitor+00098 068422CC 68K 0635866A
068422B8 068422B0 PPC 16DCF5A8 XP_TempDirName+0CBF8 06842298 68K 063E7ACA
06842288 68K 063DA9CE 06842278 06842270 PPC 16DD0344 XP_TempDirName+0D994
06842258 06842250 PPC 16D7B890 ET_moz_CallFunction+003C0 06842238 06842230 PPC
16DD0344 XP_TempDirName+0D994 06842218 06842210 PPC 16D7BB04
ET_moz_CallFunction+00634 068421F8 068421F0 PPC 16DD0344 XP_TempDirName+0D994
068421D8 068421D0 PPC 16D7C044 ET_moz_CallFunction+00B74 068421CC 68K 0635866A
068421C8 068421C0 PPC 16C7F3D4 068421B8 068421B0 PPC 16DD0358
XP_TempDirName+0D9A8 068421A8 068421A0 PPC 17043C44 FE_DefaultDocCharSetID+270EC
06842194 68K 063DA9CE 06842188 06842180 PPC 174F3EDC Flush_Free+0000C 06842178
06842170 PPC 16D96644 SOB_get_error+019E4 0684215C 68K 0635866A 06842158
06842150 PPC 16DCF51C XP_TempDirName+0CB6C 06842148 68K 063E7ACA 06842138
06842130 PPC 174F3EDC Flush_Free+0000C 06842118 06842110 PPC 174F3EDC
Flush_Free+0000C 06842108 06842100 PPC 16DCF020 XP_TempDirName+0C670 068420F8
068420F0 PPC 174F3EDC Flush_Free+0000C 068420F4 068420F0 68K 063E7ACA
Severity: normal → critical
QA Contact: lchiang → esther
Summary: Leave mail window for a long time, GetMsg, crash → [DOGFOOD] Leave mail window for a long time, GetMsg, crash
Comment 2•25 years ago
|
||
is this a seamonkey crash, or a 4.5 crash? was 4.5 running at the time?
Reporter | ||
Comment 3•25 years ago
|
||
Did I send the wrong file? Sorry, I'll try it again.
Reporter | ||
Comment 4•25 years ago
|
||
Reporter | ||
Comment 5•25 years ago
|
||
Looks like there were two logs in the file I sent, and only the first (a 4.7
crash) got pasted. I deleted that log from the file and attached the apprunner
log only.
Comment 6•25 years ago
|
||
OK, here's the stack trace from the attachment. Looks like a problem shutting
down the thread, especially with the proxy event code. I'm assuming biff is not
turned on, or we wouldn't have timed out.
04F29908 04F29900 PPC 1791E650 PR_CSetOnMonitorRecycle+00050
04F298C8 04F298C0 PPC 16BBB294 nsThread::Exit(void*)+0001C
04F29888 04F29880 PPC 16BBB438 nsThread::Release()+00040
04F29848 68K 16BBB19E nsThread::~nsThread()+00036
04F29808 04F29800 PPC 16B85690 nsCOMPtr_base::~nsCOMPtr_base()+00030
04F297C8 04F297C0 PPC 163289F4 nsImapProtocol::Release()+289F4
04F297A8 04F297A0 PPC 1791E474 PR_CExitMonitor+00074
04F29788 04F29780 PPC 16329C14
nsImapProtocol::~nsImapProtocol()+29C14
04F29768 04F29760 PPC 16C86950 operator delete(void*)+00014
04F29758 04F29750 PPC 17922580 PR_ExitMonitor+00054
04F29748 04F29740 PPC 17922408 PR_DestroyMonitor+0001C
04F29730 68K 05BA264E
04F29728 04F29720 PPC 16C877F8 free+00030
04F29708 04F29700 PPC 1792405C PR_DestroyLock+00018
04F296E8 04F296E0 PPC 16BC83FC
nsProxyEventObject::~nsProxyEventObject()+000F0
04F296D8 04F296D0 PPC 16C8956C
nsLargeHeapAllocator::AllocatorFreeBlock(void*)+000
20
04F296C8 04F296C0 PPC 1791DE94 PR_Free+00014
04F296B8 04F296B0 PPC 16B886AC nsAllocator::Free(void*)+00054
04F296A8 04F296A0 PPC 16BC8480
nsProxyEventObject::RootRemoval()+00034
04F29688 04F29680 PPC 16C86950 operator delete(void*)+00014
Comment 7•25 years ago
|
||
I tried this on windows. It seemed fine. I'll try linux next.
Reporter | ||
Comment 8•25 years ago
|
||
Right, no biff.
Reporter | ||
Comment 9•25 years ago
|
||
I can't reproduce this on Win98, but I just reproduced it on Linux again.
Comment 10•25 years ago
|
||
Are we having a dangling connection to a time-out'd thread?
Comment 11•25 years ago
|
||
I reproduced the crash on linux. We get the following stack trace. This is
probably some symptom of our screwed-up event handling. Perhaps DougT's proxy
event changes will help, though I doubt it.
#0 0x40368888 in main_arena ()
#1 0x68403688 in ?? ()
#2 0x408e37ea in nsStreamListenerEvent::HandlePLEvent (aEvent=0x83fec48) at
nsAsyncStreamListener.cpp:169
#3 0x4019a36b in PL_HandleEvent (self=0x83fec48) at plevent.c:537
#4 0x4019a27c in PL_ProcessPendingEvents (self=0x8736020) at plevent.c:498
#5 0x401599e9 in nsEventQueueImpl::ProcessPendingEvents (this=0x8735ff8) at
nsEventQueue.cpp:190
#6 0x405181ec in event_processor_callback (data=0x8735ff8, source=21,
condition=GDK_INPUT_READ) at nsAppShell.cpp:228
#7 0x40517aff in our_gdk_io_invoke (source=0x8736080, condition=G_IO_IN,
data=0x8722e98) at nsAppShell.cpp:49
#8 0x406b23ca in g_io_unix_dispatch ()
#9 0x406b3a86 in g_main_dispatch ()
#10 0x406b4041 in g_main_iterate ()
#11 0x406b41e1 in g_main_run ()
#12 0x405dd7a9 in gtk_main ()
#13 0x405186ff in nsAppShell::Run (this=0x80a2ce8) at nsAppShell.cpp:395
#14 0x4039d351 in nsAppShellService::Run (this=0x80a1f60) at
nsAppShellService.cpp:480
Comment 12•25 years ago
|
||
More likely we have a proxy event in the event queue, and it refers to a deleted
object, like the protocol, or thread. Since linux event handling seems fairly
messed up, at least as far as IMAP is concerned, this doesn't surprise me too
much.
Comment 13•25 years ago
|
||
Putting on PDT+ radar.
Comment 14•25 years ago
|
||
If you turn on biff at an interval less than 29 minutes, you won't have this
problem.
Comment 15•25 years ago
|
||
What's happening, I bet, is that we're removing the timed-out connection,
attempting to logout, and releasing the imap protocol instance. This eventually
causes the imap thread to be destroyed. On windows, this happens later on the
thread in question, but it looks like on the mac, it happens immediately on the
ui thread. On linux, it looks like the proxy event stuff isn't noticing that the
event queue has gone away.
Reporter | ||
Comment 16•25 years ago
|
||
Thanks David, I thought that (30 min. connection drop) might be the case, and
the workaround is good enough for dogfood.
Comment 17•25 years ago
|
||
It turns out that if we really did drop the connection, everything would be
fine. Unfortunately, we try to gracefully close and logout. If I comment out
those calls, we don't crash. My gdb/linux skills are pretty marginal - all I
can guess is that the vtbl for the StreamListenerEvent is horked, but the object
doesn't look deleted. I'll keep poking around but I suspect this will take a few
days.
Comment 18•25 years ago
|
||
Oy, gevalt. The nsImapProtocol object is definitely getting destroyed before the
event queue is finished, which is not good. But what's worse is that I put in a
call to StopAcceptingEvents after our thread has stopped running to see if that
helps. It didn't help, but it allowed me to discover that on linux (but not
windows), our imap event queue is somehow marked the "elder" event queue. (I
suspect this should be "eldest"). This seems wrong.
Comment 19•25 years ago
|
||
The above is partly wrong - the elder assert happens on windows as well, so
perhaps it's not the problem. But, we are executing the
onDataAvailableEvent::HandleEvent on the wrong thread on Linux (i.e., the main
thread), just like 17065 - my gut tells me this is the root of our problem.
Comment 20•25 years ago
|
||
I've verified that if I stop gtk from calling into imap code from the UI thread,
this crash doesn't happen. I did this by disabling the
nsAppShell::ListenToEventQueue call, which prevents us from getting called from
the ui thread. Unfortunately, it also breaks the password prompt, presumably
because that's why this event queue listener hack is there in the first place.
I believe this is an xpapps problem, so I'm reassigning it back to you, Peter. I
truly believe that we should be called from the correct thread.
Assignee: bienvenu → trudelle
Comment 21•25 years ago
|
||
David, I think we're all agreed that the source of this problem is the same
problem for 17065. Brendan is going to help me find someone to help us figure
out what's going on with event processing on linux. I'm hesitant to mark this a
dup but the problem is probably the same even though the symptoms are different.
Reporter | ||
Updated•25 years ago
|
Assignee: trudelle → brendan
Reporter | ||
Comment 22•25 years ago
|
||
Reassigning to breandan for triage.
Let's not forget, this also happened on Mac, as did 17065.
Comment 23•25 years ago
|
||
Yep, and I believe they both do hacky things with event dispatching to get modal
dialogs to work. I believe these two bugs have the same cause, and Scott and I
spent a lot of time discovering that in both cases, our events are getting
processed by the wrong thread.
Updated•25 years ago
|
Status: NEW → ASSIGNED
Comment 24•25 years ago
|
||
Dan is gonna help me fix this on all platforms, yes he is.
/be
Updated•25 years ago
|
Target Milestone: M12
Comment 25•25 years ago
|
||
17065 is M12, so should this one be.
/be
Comment 26•25 years ago
|
||
*** Bug 20247 has been marked as a duplicate of this bug. ***
Comment 27•25 years ago
|
||
Can also be seen when using "an imap server that only allows a single connection
to a folder, and kills previous connections (like the UW server)" as bienvenu
mentions in the duplicate bug.
Updated•25 years ago
|
QA Contact: esther → huang
Comment 28•25 years ago
|
||
Change QA Contact to me since this is IMAP bug. Cc:Esther.
Comment 29•25 years ago
|
||
Same occurs for me: I'm using UW-IMAP, a non-Mozilla-Biff checking the INBOX
every 30 seconds and "check mail every 1 min." in Mozilla.
While having a normal subfolder (not INBOX and not under INBOX) open, I get
"Document: Done (0.21 secs) In OnFolderLoader" every min. or so. Mozilla (debug
build) crashed after 20 min. w/o any notice. HTH.
Comment 30•25 years ago
|
||
Brendan, what's projected fix date for this bug?
Comment 31•25 years ago
|
||
the better question to get started is who is going to tackle this hairy problem?
did we find a porkjockey owner?
Updated•25 years ago
|
Assignee: brendan → dougt
Status: ASSIGNED → NEW
Comment 32•25 years ago
|
||
dougt has been fixing bugs in event-loop land and kindly offers to take this
one. he's gonna dig into this tomorrow.
/be
Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [PDT+] → [PDT+] 12/9
Assignee | ||
Comment 33•25 years ago
|
||
Sent workaround to mscott to verify. Still tracking down real problem.
Assignee | ||
Updated•25 years ago
|
Whiteboard: [PDT+] 12/9 → [PDT+] Fix ready, patch sent for review.
Assignee | ||
Updated•25 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 34•25 years ago
|
||
fix checked in.
Comment 35•25 years ago
|
||
I have not been able to reproduce on linux 6.0, NT 4.0 or Mac OS 8.5.1 using
12-16-12m12 commercial build. I was indeed seeing it often on my mac and linux
machines prior to this week's builds (fixed this week).
I will let huang or someone else who'd seen this double-check before marking it
verified.
Comment 36•25 years ago
|
||
This bug need to leave PC idle a while...I will test this bug later since I need
to continue testing Basic Functionality Test for M12....
Updated•25 years ago
|
Status: RESOLVED → VERIFIED
Whiteboard: [PDT+] Fix ready, patch sent for review. → [PDT+] Verified for all the platforms
Comment 37•25 years ago
|
||
Verified on the Linux 12-20-23-M12 final commercial build
Verified on the Mac 12-21-11-M12 final commercial build
Verified on the Linux 12-21-00-M12 final commercial build
I have idled over than 30 minutes without crash for all the platforms!!
Marking as Verified.
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•