Closed Bug 150716 Opened 22 years ago Closed 22 years ago

Compact Folders does not work [trunk only]

Categories

(MailNews Core :: Backend, defect, P1)

x86
Windows 2000

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: stevenlyons, Assigned: dougt)

References

Details

(Keywords: regression)

Attachments

(3 files)

In MailNews, clicking on Compact Folders does not work. 

Expected Behavior: 
Clicking on Compact Folders will compact the folders in the current account.

Actual Behavior: 
The folder compact starts and then seems to hang. It doesn't hang the browser
but compaction will never finish and seems to be stalled.

Observations:
There is no exciting combination of activites making this not work. It used to
work great for me but hasn't for a while. I'm not sure exactly when it stopped
working. I am unable to compact from the menu choice nor from when I am prompted
to compact on startup.

The compaction creates the nstmp and nstmp.msf files. Usually only one message
makes it into the nstmp file but sometimes more before it stalls. This happens
in all my accounts.
correcting component, -> Navin. Reporter, what build are you using?
Assignee: bienvenu → naving
Component: Mail Database → Mail Back End
Sorry about that:

Build 2002061008, Windows XP. POP mail account.

Just found bug #133996 that this is probably a dup of.
QA Contact: gayatri → sheelar
looks like something has caused this regression. I'm also seeing it. 
Severity: normal → major
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: nsbeta1, regression
Priority: -- → P1
Target Milestone: --- → mozilla1.0.1
I am hitting this assertion in nsFileTransport::AsyncRead line 519 

NS_ASSERTION(mContext == nsnull, "context not released");

Further mListener gets nulled out after we hit this assertion so we are never
completing reading from the mailbox. I don't know how or why. 

Something changed recently in nsFileTransport, cc darin and dougt if they can help
us out. 
*** Bug 133996 has been marked as a duplicate of this bug. ***
One good thing is that this is happening on trunk only. I think fix for bug
139556 may have caused this. 
Depends on: 139556
Summary: Compact Folders does not work. → Compact Folders does not work [trunk only]
If that's the case, then bug 133996 can't be a dup of this one, since it was
filed before the fix for bug 139556 was checked in. Also, bug 133996 would be on
the trunk as well.
*** Bug 150275 has been marked as a duplicate of this bug. ***
bug 133996 was filed before but there were some comments listing very recent
build-id, so I got confused. Anyway I have reopened that bug 133996.
FOLLOWING ORIGINAL FILED AS Comment#2 on Bug#133996
Because of the ages of the two bugs, and the close similarity of descriptions
I'm copying it here:

Win2000pro+Mozilla Build#2002060704 POP Mail

For best results set your prefs to check every 10 minutes and automatically
download.

1) Empty trash
2) Menu: File->Compact_Folders

OBSERVE: Status bar "Compacting Folder Inbox ", Growler growls.

Go away, make yourself a cup of coffee, come back 10 minutes later.

OBSERVE: Still the same, no actual progress on an operation that formerly took
under 1 minute.  If your download schedule was triggered, you also have a
warning "This folder is busy performing an other operation . . . ."

To clear the busy condition, one must shut down Mozilla completely and restart.
If you need another confirmation for this bug, I'm in (latest buid 
2002061108) ;-) It has some drawbacks : some folders become "unavalable" 
(cannot move / delete).
I discussed this with dougt and the problem is because we are calling AsyncRead
in OnStopRequest which is not supported (after fix for bug 139556). Reassigning
to dougt. If we cannot fix this in necko code we will have to either revert back
to running multiple urls(one url per msg) or run the same url with updated
msgkey and msglength (each on a different transport). 

Assignee: naving → dougt
Status: ASSIGNED → NEW
does this mean copying multiple local messages is also broken? The other
possible fix, besides the ones Navin suggested, is to just set a 0 interval
timer and do the next async read after the timer fires.
I can confirm the condition in Comment#13:

1) In 3pane, type a phrase into the Select input so that some number of Inbox
messages are selected; hilight some or all.
2) Attempt to move them;  I used the keyboard [Alt+M, M, ....].

OBSERVE: Growler growls.  Nothing much else happens.
Recovery requires to shut down.

Attempting to restart from QuickStarter generates crash, see TB7264324E.
Upon starting from scratch, Browser shows mozilla.org/start/, rather than my
home page.
Attached patch patch using timer (deleted) — — Splinter Review
Increasing the delay doesn't seem to work. In fact we see failure very early on
increasing delay.
*** Bug 151334 has been marked as a duplicate of this bug. ***
actually, increasing the timer does work - I changed it from 10 to 100 and it
worked fine. I know there's some slack in the timers for reasons I can't
remember, so setting the timer to 10 might not guarantee that the timer doesn't
fire immediately. However, setting the timer to 100 is going to slow this down
unacceptably. So, I would suggest fixing this in the nsFileTransport, something
like this:

        // if we have a context, we have to ensure that it is released on the 
        // proper thread.
        if (mContext) {
            if (mEventQ) {
                // see http://bugzilla.mozilla.org/show_bug.cgi?id=139556#c64
                // for the reason behind this evil reference counting.
                nsISupports* doomed = mContext.get();
                NS_ADDREF(doomed);
                mContext = 0;
                if (mListener) {
                  mListener->OnStopRequest(this, doomed, mStatus);
                  mListener = 0;
                }

                NS_ProxyRelease(mEventQ, doomed);
            }
            else {
                if (mListener) {
                  mListener->OnStopRequest(this, mContext, mStatus);
                  mListener = 0;
                mContext = nsnull;
            }
        }
 
this just guarantees that mContext is cleared as it was before. The above code
could probably be cleaned up to avoid the duplication...
I have a folder which I can give it to you. This folder doesn't work for 100 for
me. Anyway, I think creating new transport for each msg is the way to go. 
No, that's definitely not the way to go - that will make us reopen the file for
every message in the folder, which will slow down compaction dramatically.
bienvenu: i agree with you... that looks like the right solution.
We have m_multipleMoveCopyMsg stream that we are going to use to create each
transport, and we don't close the stream when we are done using the transport.
Your suggested fix to fileTransport doesn't fix the problem for me. Here is
what I changed

Index: nsFileTransport.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/base/src/nsFileTransport.cpp,v
retrieving revision 1.129
diff -u -w -r1.129 nsFileTransport.cpp
--- nsFileTransport.cpp	7 Jun 2002 05:07:51 -0000	1.129
+++ nsFileTransport.cpp	12 Jun 2002 23:26:19 -0000
@@ -852,11 +852,6 @@
                                    NS_ConvertASCIItoUCS2(mStreamName).get());
         }
 
-        if (mListener) {
-            mListener->OnStopRequest(this, mContext, mStatus);
-            mListener = 0;
-        }
-
         // if we have a context, we have to ensure that it is released on the 
         // proper thread.
         if (mContext) {
@@ -866,11 +861,26 @@
                 nsISupports* doomed = mContext.get();
                 NS_ADDREF(doomed);
                 mContext = 0;
+                if (mListener) {
+                  mListener->OnStopRequest(this, doomed, mStatus);
+                  mListener = 0;
+                }
                 NS_ProxyRelease(mEventQ, doomed);
             }
             else {
+                   if (mListener) {
+                     mListener->OnStopRequest(this, mContext, mStatus);
+                     mListener = 0;
+                   }
                 mContext = nsnull;
             }
+        }
+        else
+        {
+                   if (mListener) {
+                     mListener->OnStopRequest(this, mContext, mStatus);
+                     mListener = 0;
+                   }
         }
         break;
       }
the correct fix is to fix up the file transport code to support multiple
requests.  I don't have time to fix his now, but I would be happy to show
someone what to do.

david, does this really hurt performance; do you have numbers?
OK, Navin, I'll have a crack at it myself and get back to you. Navin is right
(probably :-) ) that we won't close the stream or underlying file - I forgot
that I added the reuseable stream stuff. Adding that gave around a 10x
performance increase compacting large folders on Windows - I can only imagine
what it was for Mac :-) 
I see one more problem for sharing stream between transports. If the file 
transport is destroyed it closes the stream in the destructor. 
Attached patch proposed fix (deleted) — — Splinter Review
this worked for me - I suspect this race condition has been around for a while,
but the new proxying of the release made it much more likely to get hit. Navin,
could you try this patch? Note that I had to fix both places OnStopRequest was
called.
Comment on attachment 87474 [details] [diff] [review]
proposed fix

if that works for you, great.  fix up the comment:

		 // see http://bugzilla.mozilla.org/show_bug.cgi?id=139556#c64
		 // for the reason behind this evil reference counting.

and r=dougt
Attachment #87474 - Flags: needs-work+
Blocks: 151531
Doug, thx, I'll move the comments if it turns out that this works for Navin.
*** Bug 151467 has been marked as a duplicate of this bug. ***
Sorry, it still doesn't work for me. your fix is same as comment #22. I'm 
looking into why it is failing with your fix.
yeah, I was surprised that that worked.  if you want to fix this correctly, look
at how the socket transport creates a internal request object.  We need to do
something similar for the file transport.  If you can't figure it out, let me
know and I can do the heavy lifting.
Navin, please send me your mailbox that doesn't work for you. The fixes are not
the same,however - the one you put in your comment only takes care of one of the
calls to OnStopRequest. As I said before, there are two calls that you have to
take care of. Doug, this patch basically reverts the code to the way it was
before the proxied release was added, at least in terms of whether we call Async
Read from OnStopRequest before mContext is cleared. If it worked before that
change, it should work after, as near as I can tell.
Comment on attachment 87474 [details] [diff] [review]
proposed fix

>Index: nsFileTransport.cpp

>+        nsISupports* doomed = mContext.get();
>+        NS_IF_ADDREF(doomed);
>+        mContext = 0;
>         if (mListener) {
>-            mListener->OnStopRequest(this, mContext, mStatus);
>+            mListener->OnStopRequest(this, doomed, mStatus);
>             mListener = 0;
>         }

one problem:  you need to clear all member variables pertaining to this request
before calling OnStopRequest.  that is the only way to solve the race
condition.

that means that you have to clear mListener before calling OnStopRequest. 
pushing
a copy of it onto the stack would suffice.

you'll of course need to do the same thing with mProvider.
ok, thx, Darin. Navin, can you try that, in both places that OnStopRequest is
called?
Attached patch david's proposed fix w/ darin's suggestion (deleted) — — Splinter Review
This fix worksforme.
Attachment #87557 - Flags: superreview+
Comment on attachment 87557 [details] [diff] [review]
david's proposed fix w/ darin's suggestion

>Index: nsFileTransport.cpp

>             else {
>-                mContext = nsnull;
>+                NS_IF_RELEASE(doomed);
>             }

minor nit: do away with the { }'s here to be consistent with the coding style
of this file.

>+            else
>+              NS_RELEASE(doomed);

minor nit: indentation is 4 spaces; please be consistent.

with these minor fixes sr=darin
I think dougt wanted the comment about the ref-counting and bugzilla reference
to get moved to where the addref is done...
dougt, can we check this fix in ?
Comment on attachment 87557 [details] [diff] [review]
david's proposed fix w/ darin's suggestion

looks good
Attachment #87557 - Flags: review+
fix checked on trunk.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
This fix appears to have also resolved 
the bugs listed in bug 151531. 
*** Bug 151738 has been marked as a duplicate of this bug. ***
*** Bug 151531 has been marked as a duplicate of this bug. ***
*** Bug 152268 has been marked as a duplicate of this bug. ***
*** Bug 151335 has been marked as a duplicate of this bug. ***
*** Bug 151589 has been marked as a duplicate of this bug. ***
*** Bug 152562 has been marked as a duplicate of this bug. ***
*** Bug 152666 has been marked as a duplicate of this bug. ***
*** Bug 153437 has been marked as a duplicate of this bug. ***
*** Bug 155477 has been marked as a duplicate of this bug. ***
*** Bug 152266 has been marked as a duplicate of this bug. ***
*** Bug 155479 has been marked as a duplicate of this bug. ***
*** Bug 155961 has been marked as a duplicate of this bug. ***
*** Bug 156298 has been marked as a duplicate of this bug. ***
*** Bug 155010 has been marked as a duplicate of this bug. ***
*** Bug 156502 has been marked as a duplicate of this bug. ***
*** Bug 157539 has been marked as a duplicate of this bug. ***
*** Bug 157495 has been marked as a duplicate of this bug. ***
*** Bug 157222 has been marked as a duplicate of this bug. ***
*** Bug 157856 has been marked as a duplicate of this bug. ***
*** Bug 158652 has been marked as a duplicate of this bug. ***
*** Bug 155950 has been marked as a duplicate of this bug. ***
*** Bug 156171 has been marked as a duplicate of this bug. ***
*** Bug 156309 has been marked as a duplicate of this bug. ***
*** Bug 157533 has been marked as a duplicate of this bug. ***
QA Contact: sheelar → stephend
QA Contact: stephend → esther
I did not see this specific problem, however I did test Compacting folders with
the latest Compact folder fix in bug 150716.  Compact folder worked for me with
trunk builds 10-11 and branch builds 10-17.  Verified.
Status: RESOLVED → VERIFIED
*** Bug 128568 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: