Closed
Bug 1299920
Opened 8 years ago
Closed 8 years ago
release weak references in the IMAP protocol on the main thread
Categories
(MailNews Core :: Networking: IMAP, defect)
MailNews Core
Networking: IMAP
Tracking
(thunderbird51 unaffected, thunderbird52+ fixed, thunderbird53 fixed, seamonkey2.49esr fixed, seamonkey2.50 fixed)
RESOLVED
FIXED
Thunderbird 53.0
People
(Reporter: jorgk-bmo, Assigned: rkent)
References
Details
Attachments
(1 file)
(deleted),
patch
|
jorgk-bmo
:
review+
mayhemer
:
feedback+
jorgk-bmo
:
approval-comm-aurora+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1299116 +++
See bug 1299116 comment #32.
Assignee | ||
Comment 1•8 years ago
|
||
I'm not sure I am interpreting the stack correctly from https://bugzilla.mozilla.org/attachment.cgi?id=8787354
Can anyone tell if this is complaining about the release of nsMsgProtocol itself, or one of the member variables (like m_server)?
Honza Bambas (:mayhemer) can you help this poor old hack interpret this?
Flags: needinfo?(honzab.moz)
Reporter | ||
Comment 2•8 years ago
|
||
My 0,5 cents worth:
Looking at the macro expansion in DXR of NS_IMPL_RELEASE_INHERITED(nsImapProtocol, nsMsgProtocol ), this is about the release of nsMsgProtocol. Equally NS_IMPL_ISUPPORTS(nsMsgProtocol, nsIChannel, nsIStreamListener, ... is about the release of nsMsgProtocol. Where do you see m_server?
Assignee | ||
Comment 3•8 years ago
|
||
m_server is a member variable of the protocol, so it also needs to be released when the protocol is released.
Comment 4•8 years ago
|
||
Ok, I can now confirm that Seamonkey crashes every time if I disconnect the network and open Mail & newsgroups with an imap account in the profile.
Should i do a debug build later and see where it crashes in imap?
Reporter | ||
Comment 5•8 years ago
|
||
(In reply to Frank-Rainer Grahl from comment #4)
> Ok, I can now confirm that Seamonkey crashes every time if I disconnect the
> network and open Mail & newsgroups with an imap account in the profile.
So you're saying it crashes in a non-debug build? That's not caused by a MOZ_ASSERT then. I will try it later, right now I can't disconnect my network.
Can you write down the exact STR. Do you need to enter a password?
> Should i do a debug build later and see where it crashes in imap?
Well, if it's not too much hassle, it would be good to get the crash confirmed. I though you had already done your debug build.
Comment 6•8 years ago
|
||
(In reply to Jorg K (GMT+2, PTO during summer) from comment #5)
> (In reply to Frank-Rainer Grahl from comment #4)
> > Ok, I can now confirm that Seamonkey crashes every time if I disconnect the
> > network and open Mail & newsgroups with an imap account in the profile.
> So you're saying it crashes in a non-debug build? That's not caused by a
> MOZ_ASSERT then. I will try it later, right now I can't disconnect my
> network.
> Can you write down the exact STR. Do you need to enter a password?
Its the assert because its also triggered by Nightly.
> > Should i do a debug build later and see where it crashes in imap?
> Well, if it's not too much hassle, it would be good to get the crash
> confirmed. I though you had already done your debug build.
This was the clobber build. Old lappi just needs 5-6h to do a full compile and I didn't let it run overnight :)
A message box pops up says it can't connect to the server and then it asserts with the weak reference message visible in the crash output.
Reporter | ||
Comment 7•8 years ago
|
||
(In reply to Kent James (:rkent) from comment #3)
> m_server is a member variable of the protocol, so it also needs to be
> released when the protocol is released.
Sorry, I see it now, m_server in nsImapProtocol, subclass of nsMsgProtocol.
Reporter | ||
Comment 8•8 years ago
|
||
(In reply to Frank-Rainer Grahl from comment #6)
> A message box pops up says it can't connect to the server and then it
> asserts with the weak reference message visible in the crash output.
I'll try it later when I can disconnect my network. Maybe this belongs to the other bug though.
Reporter | ||
Comment 9•8 years ago
|
||
I did the following test:
Disconnected my network. Started TB on a profile with IMAP configured. Accessed an IMAP folder with no local storage (I have that for my cache tests), so IMAP needs to go fetching messages from the server, which doesn't work due to no network connection. Result is the very same crash described in attachment 8787354 [details], which is subject of this bug.
At least, we have an easy to reproduce test case now, since sending 30+ messages and hoping for the crash to happen wasn't at all fun (see bug 1299116 comment #30).
(In reply to Frank-Rainer Grahl from comment #6)
> (In reply to Jorg K (GMT+2, PTO during summer) from comment #5)
> > So you're saying it crashes in a non-debug build? That's not caused by a
> > MOZ_ASSERT then.
> Its the assert because its also triggered by Nightly.
Frank-Rainer, that makes no sense. A Nightly, at least for Thunderbird, is a non-debug build, so MOZ_ASSERT does *NOT* trigger there. If you have a crash in a non-debug build, you are seeing a different issue. Again, I'm happy to try stuff, but I need exact STR. With your description (IMAP + no network) I managed to reproduce the crash we're discussing in this bug here.
Comment 10•8 years ago
|
||
Jorg see Honzas comment here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1299116#c9
Its triggered in Nightly and debug builds. So all you need is 'NIGHTLY_BUILD': '1' in your config.status which is always there in c-c and it will assert unless you also have enabled profiling:
> #if defined(DEBUG) || (defined(NIGHTLY_BUILD) && !defined(MOZ_PROFILING))
Back at the desktop and happily compiling all trees now. Just holler if a debug build is still needed. Looks like you reproduced the crash.
Reporter | ||
Comment 11•8 years ago
|
||
(In reply to Frank-Rainer Grahl from comment #10)
Sorry, I'm completely confused.
> Jorg see Honzas comment here:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1299116#c9
I've read bug 1299116 comment #9 and I can also see that we're not talking about a simple MOZ_ASSERT, but something more sophisticated:
https://dxr.mozilla.org/comm-central/rev/b7f7ae14590aced450bb0b0469dfb38edd2c0ace/mozilla/xpcom/glue/nsWeakReference.cpp#15
https://dxr.mozilla.org/comm-central/rev/b7f7ae14590aced450bb0b0469dfb38edd2c0ace/mozilla/xpcom/glue/nsDebug.h#345
So debug *and* nightly builds should crash here:
https://dxr.mozilla.org/comm-central/rev/b7f7ae14590aced450bb0b0469dfb38edd2c0ace/mozilla/xpcom/glue/nsDebug.h#354
However, observed bahaviour is this:
With the steps described in comment #9, the first click on a message in an IMAP folder without local storages crashes immediately when using my own local debug build.
Doing the very same with a Daily/Nightly build does *NOT* crash. That's why I mistakenly assumed that we were talking about a simple MOZ_ASSERT here, which we are not.
So please explain why the local debug build crashes and the downloaded Daily/Nightly does not crash.
P.S.: config.status is not a file I know or manipulate since it only appears in the object directory.
Comment 12•8 years ago
|
||
>> So please explain why the local debug build crashes and the downloaded Daily/Nightly does not crash.
IanN pointed it out to me. My local build has debug and profiling turned off. The Seamonkey and TB nightlies enable profiling so no crash with them. The assert does not happen when profiling is turned on in the Nightly and debug is off.
Comment 13•8 years ago
|
||
(In reply to Kent James (:rkent) from comment #1)
> I'm not sure I am interpreting the stack correctly from
> https://bugzilla.mozilla.org/attachment.cgi?id=8787354
>
> Can anyone tell if this is complaining about the release of nsMsgProtocol
> itself, or one of the member variables (like m_server)?
>
> Honza Bambas (:mayhemer) can you help this poor old hack interpret this?
According that nsMsgProtocol doesn't impl nsSupportsWeakReference, it's release of last reference to something nsMsgProtocol keeps a strong reference to.
can also be decomposed as:
nsMsgProtocol::Release() -> ~nsMsgProtocol -> member's ~nsCOMPtr() -> referred object's Release() /* apparently the last one */ -> nsSupportsWeakReference::~nsSupportsWeakReference() -> ClearWeakReferences() -> mProxy->NoticeReferentDestruction()
all just optimized by the compiler to be seen only as:
nsMsgProtocol::Release() -> mProxy->NoticeReferentDestruction()
It's unfortunate you can't see the instance object nor type of the comptr being destroyed. However, apparently this shows the problem at its whole glance - an object that has been taken a weak reference - that weak ref still exists (mProxy would be null otherwise), is released on a different thread than the weak ref has been worked with on.
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 14•8 years ago
|
||
I got a crash while testing bug 1297368, with this stack:
xul.dll!nsWeakReference::NoticeReferentDestruction() Line 65 C++
xul.dll!nsSupportsWeakReference::ClearWeakReferences() Line 161 C++
xul.dll!nsSupportsWeakReference::~nsSupportsWeakReference() Line 47 C++
> xul.dll!nsImapProtocol::~nsImapProtocol() Line 587 C++
[External Code]
xul.dll!nsMsgProtocol::Release() Line 48 C++
xul.dll!nsImapProtocol::Release() Line 297 C++
xul.dll!nsCOMPtr<nsIRunnable>::~nsCOMPtr<nsIRunnable>() Line 406 C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1090 C++
Concerning "According that nsMsgProtocol doesn't impl nsSupportsWeakReference, it's release of last reference to something nsMsgProtocol keeps a strong reference to." but note that nsIMapProtocol DOES support weak reference, and that is what crashed here.
Clearly we need to ensure that the imap protocol is only released on the main thread.
Comment 15•8 years ago
|
||
(In reply to Kent James (:rkent) from comment #14)
> Concerning "According that nsMsgProtocol doesn't impl
> nsSupportsWeakReference, it's release of last reference to something
> nsMsgProtocol keeps a strong reference to." but note that nsIMapProtocol
> DOES support weak reference, and that is what crashed here.
Ups! I had to miss that. If that's so, then it's a bit easier to find a fix.
Assignee | ||
Comment 16•8 years ago
|
||
"it's a bit easier to find a fix"
Honza, I'd be happy to do the coding if you could point me in the correct direction of existing Mozilla code that ensures an object is released on the main thread. I could probably figure it out on my own, but if the direction is obvious to you and you could show me, it would make my life easier.
tracking-thunderbird52:
--- → +
Reporter | ||
Comment 17•8 years ago
|
||
I coded something like this once:
nsCOMPtr<nsIThread> mainThread = do_GetMainThread();
if (mainThread) {
NS_ProxyRelease(mainThread, static_cast<mozIPersonalDictionary *>(dict));
Maybe completely unrelated. Anyway, Kent's question is in comment #16.
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #17)
> I coded something like this once:
> nsCOMPtr<nsIThread> mainThread = do_GetMainThread();
> if (mainThread) {
> NS_ProxyRelease(mainThread, static_cast<mozIPersonalDictionary
> *>(dict));
>
> Maybe completely unrelated. Anyway, Kent's question is in comment #16.
Yes that is the approach. I spent a couple of hours looking at this last week, what I am struggling with is figuring out where the reference is to the protocol that is getting released on the wrong thread. Once I find that, them the proxy release should work.
Comment 19•8 years ago
|
||
(In reply to Kent James (:rkent) from comment #16)
> "it's a bit easier to find a fix"
>
> Honza, I'd be happy to do the coding if you could point me in the correct
> direction of existing Mozilla code that ensures an object is released on the
> main thread. I could probably figure it out on my own, but if the direction
> is obvious to you and you could show me, it would make my life easier.
Oh, I didn't want to say anything like that I would know what exactly to do or point you to anything concrete.
We also have nsMainThreadPtrHolder et al thing. It might help to keep the code more clean.
Other option might be to always proxy Release() to the main thread. I did something like this for localStorage at https://dxr.mozilla.org/mozilla-central/rev/ea104eeb14cc54da9a06c3766da63f73117723a0/dom/storage/DOMStorageCache.cpp#102
Flags: needinfo?(honzab.moz)
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → rkent
Assignee | ||
Comment 21•8 years ago
|
||
One of the things holding me up looking at this is that I have no way to recreate the crash reliably for testing. Does anyone know how to make this crash happen on demand?
Comment 22•8 years ago
|
||
If you have an IMAP account it should be pretty easy. I was able to crash it always in SeaMonkey.
1.) compile Nightly with profiling turned off or just debug on. This satisfies:
> #if defined(DEBUG) || (defined(NIGHTLY_BUILD) && !defined(MOZ_PROFILING))
In SeaMonkey if an IMAP account is configured all I needed to do was open the Mail & Newsgroups window -> Bang. I TB you likely need to only start it. See comment 4 from me and comment 9 from Jorg.
> Ok, I can now confirm that Seamonkey crashes every time if I disconnect the network and open
> Mail & newsgroups with an imap account in the profile.
Reporter | ||
Comment 23•8 years ago
|
||
(In reply to Kent James (:rkent) from comment #21)
> One of the things holding me up looking at this is that I have no way to
> recreate the crash reliably for testing. Does anyone know how to make this
> crash happen on demand?
Just read comment #9, first paragraph. I'll copy it here for you ;-)
===
Disconnected my network. Started TB on a profile with IMAP configured. Accessed an IMAP folder with no local storage (I have that for my cache tests), so IMAP needs to go fetching messages from the server, which doesn't work due to no network connection. Result is the very same crash described in attachment 8787354 [details], which is subject of this bug.
===
Happy crashing! Make Thunderbird crash again!
Comment 24•8 years ago
|
||
I couldn't make it crash with 10-26 nightly build on Windows. Do I read correctly that a debug build is not required?
Reporter | ||
Comment 25•8 years ago
|
||
No, you don't. A debug build *IS* required to see the crash. FRG does something special to even crash in non-debug builds, see comment #10 and comment #12.
Repeating: This only crashes in TB debug builds (which is very annoying for developers) and shows that something is wrong in our logic. If it crashed in normal builds, this would be a TOP TOP TOP TOP CRASH.
Assignee | ||
Comment 26•8 years ago
|
||
Contrary to rumor, I really have no idea what I am doing in this thread management stuff. But I can run a debugger, and see that the crash is occurring after the protocol runnable completes. Here I have cargo-culted a release to main thread. Hopefully someone with clue will throw up their hands in disgust, and propose the correct fix here.
But this patch has the merit that it stops the crash, at least for the reproducible case.
Attachment #8817302 -
Flags: review?(jorgk)
Reporter | ||
Comment 27•8 years ago
|
||
(In reply to Kent James (:rkent) from comment #26)
> Contrary to rumor, I really have no idea what I am doing in this thread
> management stuff.
Well, I think you're the resident IMAP expert and the person who has the best idea about this stuff. "The best idea" doesn't imply that it's a good idea, just the best of all the people available.
> Here I have cargo-culted a release to main thread.
Can someone please explain the term "cargo-cult". I've looked it up but didn't find any good explanation.
> Hopefully someone with clue will throw up their
> hands in disgust, and propose the correct fix here.
If we had such a someone, we would have given them the bug.
> But this patch has the merit that it stops the crash, at least for the
> reproducible case.
Nice.
I was going to complain about the comment (who is "I", why is kludge lower case the second 'this' refers to an object, right?) ...
+ // For reasons that I do not understand, this runnable is resulting in the
+ // destruction of this on the imap thread, which causes grief for weak references.
+ // kludge to allow NS_ReleaseOnMainThread
but let me ask a question first.
'this' is the protocol object which you now release:
+ nsCOMPtr<nsIImapProtocol> releaseOnMain(this);
+ NS_ReleaseOnMainThread(releaseOnMain.forget());
Where was is released before? Or did it just casually die in some garbage collection?
Let me ask Honza for feedback, OK?
Reporter | ||
Comment 28•8 years ago
|
||
Comment on attachment 8817302 [details] [diff] [review]
kludge using NS_ReleaseOnMainThread
Hi Honza, if you're not in Hawaii, can you give us your opinion here.
Thanking you in advance.
Attachment #8817302 -
Flags: feedback?(honzab.moz)
Reporter | ||
Comment 29•8 years ago
|
||
I shouldn't work at 2 AM:
Where was *it* released before? Or did it just casually die in some garbage collection?
Assignee | ||
Comment 30•8 years ago
|
||
Where was it released? The imap thread runnable was passed to nsThread.cpp when the protocol started, and at the point it exited all references had gone away, so it was released in Thread.cpp when the runnable exited.
BTW I am travelling for 5 days, I'll have my travel computer but no development environment.
"Cargo cult" means that I copied other code that I did not fully understand. All of the references that I could find to NS_ReleaseOnMainThread were used on .forget() from an nsCOMPtr, so that is what I did.
Reporter | ||
Comment 31•8 years ago
|
||
Kent, thank you so much for your answers, I know, you're a busy man ;-)
Especially ...
===
The imap thread runnable was passed to nsThread.cpp when the protocol started, and at the point it exited all references had gone away, so it was released in Thread.cpp when the runnable exited.
===
... will give Honza a better (not to say "good") understanding.
If I read it correctly, we now pro-actively release the object on the right thread instead of having it being released by something that does sound like some sort of "garbage collection" on the wrong thread. Sounds good to me. Also, there are plenty of calls to |NS_ReleaseOnMainThread()| in that file, so it appears that this one simply was missed(??).
Honza, can I please have your thoughts.
Comment 32•8 years ago
|
||
Comment on attachment 8817302 [details] [diff] [review]
kludge using NS_ReleaseOnMainThread
Review of attachment 8817302 [details] [diff] [review]:
-----------------------------------------------------------------
Nice work to locate this! Looks good.
Attachment #8817302 -
Flags: feedback?(honzab.moz) → feedback+
Reporter | ||
Comment 33•8 years ago
|
||
Comment on attachment 8817302 [details] [diff] [review]
kludge using NS_ReleaseOnMainThread
This looks reasonable, fixes the problem and has expert feedback+
I'll land this with a slightly modified comment.
Thanks, Kent!
Attachment #8817302 -
Flags: review?(jorgk) → review+
Reporter | ||
Comment 34•8 years ago
|
||
Status: NEW → RESOLVED
Closed: 8 years ago
status-thunderbird51:
--- → unaffected
status-thunderbird52:
--- → affected
status-thunderbird53:
--- → fixed
Resolution: --- → FIXED
Reporter | ||
Updated•8 years ago
|
Attachment #8817302 -
Flags: approval-comm-aurora+
Reporter | ||
Comment 35•8 years ago
|
||
While this patch fixes the reproducible case described in comment #9, it still doesn't eliminate all gremlins from the box. Last night I was running a debug build and it crashed with this patch applied. Sadly I missed to grab the stack.
FRG, can you please try with this (landed) patch and see whether you can still get it to crash, preferably in a reproducible fashion.
Flags: needinfo?(frgrahl)
Comment 36•8 years ago
|
||
Looks good for SeaMonkey 2.50a1. I just reverted my backout of the original patch in m-c and with the patches for imap now in the tree I am unable to crash SeaMonkey using my non debug non-profiling mozconfig. When I encounter a new problem I will see that I can reproduce it.
Thanks
status-seamonkey2.50:
--- → fixed
Flags: needinfo?(frgrahl)
Comment 37•8 years ago
|
||
So far I was unable to crash 2.50 with the patch installed. Could it be taken to TB 52? SeaMonkey may do a 2.49 ESR cycle too and this might add to the overall stability of the mailnews component.
Reporter | ||
Comment 38•8 years ago
|
||
Damn, I meant to uplift this, but my query missed it since there is no Target Milestone set here, grrrrrr.
Here's the landing with DONTBUILD since I've just done uplifts:
Aurora (TB 52):
https://hg.mozilla.org/releases/comm-aurora/rev/11ad0c351b82bf77ac789d966bdd6c0bf46a01fb
Target Milestone: --- → Thunderbird 53.0
Comment 39•8 years ago
|
||
Thanks. Just also saw that it already had c-a approval.
status-seamonkey2.49esr:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•