Crash in [@ NS_CycleCollectorSuspect3] from nsImapMockChannel.
Categories
(MailNews Core :: Networking: IMAP, defect, P1)
Tracking
(thunderbird_esr78+ fixed, thunderbird84 fixed)
People
(Reporter: emilio, Assigned: benc)
References
(Depends on 1 open bug)
Details
(Keywords: crash, topcrash-thunderbird)
Crash Data
Attachments
(7 files, 1 obsolete file)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
wsmwk
:
approval-comm-esr68+
|
Details |
(deleted),
patch
|
mkmelin
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
benc
:
review+
wsmwk
:
approval-comm-beta+
wsmwk
:
approval-comm-esr78+
|
Details | Diff | Splinter Review |
(deleted),
image/png
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
image/png
|
Details |
This bug is for crash report bp-6e4e9c87-b863-4b7d-812e-82d560190830.
Top 10 frames of crashing thread:
0 libxul.so NS_CycleCollectorSuspect3 xpcom/base/nsCycleCollector.cpp:3763
1 libxul.so <name omitted> uriloader/base/nsDocLoader.cpp:171
2 libxul.so nsImapMockChannel::~nsImapMockChannel comm/mailnews/imap/src/nsImapProtocol.cpp:8480
3 libxul.so nsImapMockChannel::~nsImapMockChannel comm/mailnews/imap/src/nsImapProtocol.cpp:8473
4 libxul.so <name omitted> comm/mailnews/imap/src/nsImapProtocol.cpp:8460
5 libxul.so nsImapProtocol::RetryUrl comm/mailnews/imap/src/nsImapProtocol.cpp:1896
6 libxul.so nsImapProtocol::ProcessCurrentURL comm/mailnews/imap/src/nsImapProtocol.cpp
7 libxul.so nsImapProtocol::ImapThreadMainLoop comm/mailnews/imap/src/nsImapProtocol.cpp:1420
8 libxul.so nsImapProtocol::Run comm/mailnews/imap/src/nsImapProtocol.cpp:1103
9 libxul.so non-virtual thunk to nsImapProtocol::Run comm/mailnews/imap/src/nsImapProtocol.cpp
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 1•5 years ago
|
||
I've been crashing on imap code two or three times. It wouldn't be fun if these
assertions were the culprit, but an NS_WARNING_ASSERTION is not going to find
out.
So make these MOZ_DIAGNOSTIC_ASSERT.
Updated•5 years ago
|
Comment 2•5 years ago
|
||
Hmm, looking at the crash data, there are mostly crashes in TB 68.1. here:
0 xul.dll NS_CycleCollectorSuspect3 xpcom/base/nsCycleCollector.cpp:3761 context
1 xul.dll CompositeDataSourceImpl::Release() comm/rdf/base/nsCompositeDataSource.cpp:515 cfi
2 xul.dll nsImapMailCopyState::~nsImapMailCopyState() comm/mailnews/imap/src/nsImapMailFolder.cpp:7507 cfi
We've finally killed RDF, so that code path no longer exists. Let's see how that affects the crash situation.
Updated•5 years ago
|
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/cda2408c98e8
Strengthen threading assertions in IMAP code. r=jorgk
Comment 4•5 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #2)
Hmm, looking at the crash data, there are mostly crashes in TB 68.1. here:
0 xul.dll NS_CycleCollectorSuspect3 xpcom/base/nsCycleCollector.cpp:3761 context
1 xul.dll CompositeDataSourceImpl::Release() comm/rdf/base/nsCompositeDataSource.cpp:515 cfi
2 xul.dll nsImapMailCopyState::~nsImapMailCopyState() comm/mailnews/imap/src/nsImapMailFolder.cpp:7507 cfi
I don't want to discourage in any way the investigation but I want to bring to your attention:
- virtually all the stacks I find (except the crash ID cited here and bp-e88dce32-7385-4ac8-af89-c45120190911 which is 60.2.1, quite an old version) match to bug 1517464 which is certainly on the increase in version 68.* - https://crash-stats.mozilla.org/signature/?release_channel=release&product=Thunderbird&signature=NS_CycleCollectorSuspect3&date=%3E%3D2019-04-08T03%3A09%3A00.000Z&date=%3C2019-10-08T03%3A09%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#graphs
- Emilo's crash IDs are quite old, and I find only 3 nightly linux crashes in the past couple months (also old) https://crash-stats.mozilla.org/signature/?release_channel=nightly&product=Thunderbird&platform=Linux&signature=NS_CycleCollectorSuspect3&date=%3E%3D2019-07-08T03%3A37%3A00.000Z&date=%3C2019-10-08T03%3A37%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#reports
So what is interesting about nsImapMockChannel, are there 71.0a1 crashes, and do you actually find an increase correlating to nsImapMockChannel?
Reporter | ||
Comment 6•5 years ago
|
||
Here's a crash id from a minute ago, crashing on this assertion: bp-345ec887-e60b-4e6f-ba56-0485b0191009
Reporter | ||
Comment 7•5 years ago
|
||
Comment 8•5 years ago
|
||
I wonder why this matters in the destructor. But we could proxy to the main thread. Maybe we should file a new bug for this crash since bug is about another crash, unless I misunderstand something.
Emilio, any STR that provoke this crash?
Reporter | ||
Comment 9•5 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #8)
I wonder why this matters in the destructor. But we could proxy to the main thread. Maybe we should file a new bug for this crash since bug is about another crash, unless I misunderstand something.
I don't think it's just a matter of proxying to the main thread, necessarily... There's some code around that doesn't make much sense to me. I'm not familiar with it of course but:
https://searchfox.org/comm-central/rev/becdd368fb58a470c60d247e235ca07da787f694/mailnews/imap/src/nsImapProtocol.cpp#1298 is in a function which asserts that it is called off the main thread. But it calls a main-thread-only function, unless I'm holding it wrong:
That code looks somewhat broken.
Emilio, any STR that provoke this crash?
Nothing reliable I fear :(
Assignee | ||
Comment 10•5 years ago
|
||
Yep, as Emilio says, the code looks very confused as to what it expects to be running on :-(
There is a comment in the imap thread cleanup:
// Release protocol object on the main thread to avoid destruction of 'this'
// on the IMAP thread, which causes grief for weak references.
Maybe a similar motivation for the mockchannel destruction?
The stack frame for the latest crash capture bp-0bd5b09f-3485-4643-a6ef-2f0ac0191010:
nsImapMockChannel::~nsImapMockChannel() comm/mailnews/imap/src/nsImapProtocol.cpp:8467 (ASSERT)
nsImapMockChannel::~nsImapMockChannel() comm/mailnews/imap/src/nsImapProtocol.cpp:8463
<name omitted> comm/mailnews/imap/src/nsImapProtocol.cpp:8450
nsImapProtocol::RetryUrl() comm/mailnews/imap/src/nsImapProtocol.cpp:1894
nsImapProtocol::ProcessCurrentURL() comm/mailnews/imap/src/nsImapProtocol.cpp:0
nsImapProtocol::ImapThreadMainLoop() comm/mailnews/imap/src/nsImapProtocol.cpp:1418
...
It appears to me that the dtor is being called when RetryUrl()
exits, and it's local saveMockChannel
goes out of scope.
(which, indeed, happens in the imap thread, not the main thread).
If the saveMockChannel
release inside RetryUrl()
can be deferred to the main thread, maybe that'd help? But it seems a bit of a nasty band-aid fix without really understanding what's going on...
(Either way, I think the proper approach - for this and other IMAP bugs - is to sit down and pick through, mapping out all the IMAP code and figure out just what's happening (and on which threads), and if it's sane. Which is on my TODO list, but it's not a small job :-(
Comment 11•5 years ago
|
||
Uff, that call to main tread nsImapProtocol::CloseStreams() from a non-main thread function nsImapProtocol::TellThreadToDie() is really confused. The former only MOZ_ASSERTS(), but I've never seen it crash there in a debug build, so I guess that code protected by if (m_imapProtocolSink)
never runs.
As for saveMockChannel going out of scope in RetryUrl() on the IMAP thread (which I haven't) checked, how about adding
NS_ReleaseOnMainThreadSystemGroup("nsImapProtocol::RetryUrl", saveMockChannel .forget());
since this is how we destroy it in other places like:
https://searchfox.org/comm-central/rev/29706036071c4629c2b44512d1acecb64008cc46/mailnews/imap/src/nsImapProtocol.cpp#1011
At least that will fix Emilio's crash. Ben, can you do a patch with that. I don't want to rip off your research and it's just a two-liner.
Assignee | ||
Comment 12•5 years ago
|
||
Assignee | ||
Comment 13•5 years ago
|
||
Hmm. There's a patch which might address that specific crash...
But I'm really unhappy with it. There are lots of other places in RetryUrl()
which should also cause an assert
eg:
if (m_imapServerSink)
(void)m_imapServerSink->PrepareToRetryUrl(kungFuGripImapUrl,
getter_AddRefs(saveMockChannel));
PrepareToRetryUrl()
calls nsImapUrl::GetMockChannel()
which has a similar thread assert in it.
It's not called if m_imapServerSink
is null... but then saveMockChannel
isn't set either. It's a rabbithole.
I'm going to leave the NI on it, and next week I'll try and sit down and map out all the IMAP code and get some kind of understanding of it. Without that I'm just looking at an elephant through a microscope :-)
Comment 14•5 years ago
|
||
IOW: For saveMockChannel to be set PrepareToRetryUrl()
https://searchfox.org/comm-central/rev/29706036071c4629c2b44512d1acecb64008cc46/mailnews/imap/src/nsImapIncomingServer.cpp#448
and GetMockChannel() would need to run and that would crash before the channel is even set if run on the IMAP thread:
https://searchfox.org/comm-central/rev/29706036071c4629c2b44512d1acecb64008cc46/mailnews/imap/src/nsImapUrl.cpp#984
So in conclusion: The patch can't really help. It's crazy since the crash is at the end of RetryUrl() where a mock channel is being destroyed that can't even exist.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 16•5 years ago
|
||
(bringing assignment from bug 1586494. note also bug 1175168 comment 7 )
Updated•5 years ago
|
Updated•5 years ago
|
Comment 17•5 years ago
|
||
Magnus, as soon as we can manage it, I'd suggest we have Ben spend significant additional time to slay this, because we are approaching 1% of users affected - based on stats of 77k - 92k users affected in the past month [1] and a baseline of 10M users measured [2]. And this is just one crash signature - there are surely other signatures.
[1] https://crash-stats.mozilla.org/signature/?signature=NS_CycleCollectorSuspect3&date=%3E%3D2020-02-10T11%3A14%3A00.000Z&date=%3C2020-03-10T11%3A14%3A00.000Z
[2] https://stats.thunderbird.net/#default
Comment 18•5 years ago
|
||
Yeah we should see if we can track this down. Ben is going to spend some time on imap threading (bug 1618934). If we're lucky that could fix things. Likely not, but should give a better understanding and hopefully enough clues to figure this out.
Comment 19•5 years ago
|
||
FYI, got a crash again in 74.0b2 (32-bit)...
bp-0dc515a3-75af-4a2e-bce3-6e4a90200312
I was not even using the app when it happened... just crashed by itself in the background...
Assignee | ||
Comment 20•5 years ago
|
||
OK, so the patch I proposed back in comment #12 looked plausible, except for the fact that Jorg and I both thought it was completely bonkers and could never possibly work.
The Short Version:
I've changed my mind and think it could help. I've got a try run going here:
If no problems, we just apply the existing patch as is.
The Long Version:
(mainly just to leave myself a trail of breadcrumbs!)
This is the code in nsImapProtocol::RetryUrl()
that had Jorg (comment #14) and I both scratching our heads:
(In reply to Ben Campbell from comment #13)
if (m_imapServerSink) (void)m_imapServerSink->PrepareToRetryUrl(kungFuGripImapUrl, getter_AddRefs(saveMockChannel));
PrepareToRetryUrl()
will assert if called on any thread other than the main thread. And the whole problem here is that we're asserting because saveMockChannel is being deleted on the IMAP thread. So the patch I posted was entirely academic - the stack trace must have been wrong, because PrepareToRetryUrl()
wasn't asserting, so we must be on the main thread. The patch would never work.
Except.
After poring through the IMAP code then coming back here and taking a fresh look, I spot that m_imapServerSink
isn't actually an nsIMapServerSink
. It is, in fact, a ImapServerSinkProxy
. This uses the magic of macros to synchronously dispatch nsIMapServerSink
calls to the main thread (see https://searchfox.org/comm-central/source/mailnews/imap/src/nsSyncRunnableHelpers.cpp#419 ).
So it looks like it's being called on the IMAP thread, but it is actually doing the call on the main thread, blocking the IMAP thread until it returns.
Phew.
Assignee | ||
Comment 21•5 years ago
|
||
Doesn't appear to have broken anything (try build), so let's check it in.
Comment 22•5 years ago
|
||
Pushed by thunderbird@calypsoblue.org:
https://hg.mozilla.org/comm-central/rev/30c25adcb3d8
Workaround for mainthread assertion in nsImapMockChannel dtor. r=jorgk
Updated•5 years ago
|
Comment 23•5 years ago
|
||
Hmm, I can't remember reviewing this. Anyway, there's a grammar error here:
- // (this is a workaround to try and prevent a specific crash, and
- // does nothing clarify the threading mess!)
There's a "to" missing, right?
Comment 24•5 years ago
|
||
May be this crash can be forced by saving a draft into a nearly full (or overquota) IMAP folder.
Small mails were okay, but a mail with "corona homeschooling homework" jpg attachements crashed.
(Yes, I know SMTP is not a file transfer protocol.)
FYI on Windows 10 (64 bit) home:
bp-34432a1c-ead1-4ca1-8b8b-3d83f0200508
TB 68.8.0 (32-Bit)
Thunderbird 68.8.0 Crash Report [@ NS_CycleCollectorSuspect3 ]
Crashing Thread (53)
Frame Module Signature Source Trust
0 xul.dll NS_CycleCollectorSuspect3 xpcom/base/nsCycleCollector.cpp:3761 context
1 xul.dll CompositeDataSourceImpl::Release() comm/rdf/base/nsCompositeDataSource.cpp:515 cfi
2 xul.dll nsImapMailCopyState::~nsImapMailCopyState() comm/mailnews/imap/src/nsImapMailFolder.cpp:7506 cfi
3 xul.dll void nsImapMailCopyState::~nsImapMailCopyState() comm/mailnews/imap/src/nsImapMailFolder.cpp:7496 cfi
4 xul.dll mozilla::TransceiverImpl::Release() comm/mailnews/imap/src/nsImapMailFolder.cpp:7508 cfi
5 xul.dll nsImapUrl::~nsImapUrl() comm/mailnews/imap/src/nsImapUrl.cpp:76 cfi
6 xul.dll [thunk]:nsImapUrl::vector deleting destructor'
adjustor{8}' (unsigned int) cfi
7 xul.dll nsMsgMailNewsUrl::Release() comm/mailnews/base/util/nsMsgMailNewsUrl.cpp:81 frame_pointer
8 xul.dll nsImapUrl::Release() comm/mailnews/compose/src/nsSmtpUrl.cpp:583 cfi
9 xul.dll nsImapProtocol::~nsImapProtocol() comm/mailnews/imap/src/nsImapProtocol.cpp:662 cfi
10 xul.dll [thunk]:nsImapProtocol::vector deleting destructor'
adjustor{24}' (unsigned int) cfi
11 xul.dll nsDSURIContentListener::Release() comm/mailnews/base/util/nsMsgProtocol.cpp:50 frame_pointer
12 xul.dll [thunk]:nsImapProtocol::Release`adjustor{4}' () cfi
13 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1222 frame_pointer
14 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:486 cfi
15 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:303 cfi
16 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:308 cfi
17 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:290 cfi
18 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:454 cfi
19 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 cfi
20 nss3.dll static unsigned int pr_root(void*) nsprpub/pr/src/md/windows/w95thred.c:137 cfi
21 ucrtbase.dll thread_start<unsigned int (__stdcall*)(void*), 1> cfi
22 kernel32.dll BaseThreadInitThunk cfi
23 mozglue.dll static void patched_BaseThreadInitThunk(int, void*, void*) mozglue/build/WindowsDllBlocklist.cpp:625 cfi
24 ntdll.dll _RtlUserThreadStart cfi
25 ntdll.dll _RtlUserThreadStart cfi
Comment 25•5 years ago
|
||
(In reply to Ben Campbell from comment #20)
...
So it looks like it's being called on the IMAP thread, but it is actually doing the call on the main thread, blocking the IMAP thread until it returns.Phew.
Holy cow!
Ben, these graphs for beta for signatures of bug 1517464 look promising :
- https://crash-stats.mozilla.org/signature/?release_channel=beta&product=Thunderbird&signature=nsXPCWrappedJS%3A%3ARelease&date=%3E%3D2020-02-16T15%3A47%3A00.000Z&date=%3C2020-05-16T15%3A47%3A00.000Z#graphs
- https://crash-stats.mozilla.org/signature/?release_channel=beta&product=Thunderbird&signature=NS_CycleCollectorSuspect3&date=%3E%3D2020-02-16T15%3A48%3A00.000Z&date=%3C2020-05-16T15%3A48%3A00.000Z#graphs
- and from https://support.mozilla.org/en-US/questions/1286951 we have potentially related crash sig xul.dll | _PR_NativeRunThread | pr_root which, at least so far, has no significant change on beta channel [1]. But who knows what it might look like for release channel.
Even though the crash is not totally gone from betas
- bp-b21bd9fb-517d-4e4a-9e97-8512f0200516 nsXPCWrappedJS::Release 77 beta
- 642842a1-925d-482a-b623-bb2050200516 NS_CycleCollectorSuspect3 77 beta
it has been on the beta channel since 76 without apparent ill effects, so I'd like to suggest we take the current patch on a point release for ESR to help us understand the effect on that channel - and we can be cautious about rolling it out. Unless you see a downside? What do you think?
Comment 26•5 years ago
|
||
Comment on attachment 9100325 [details]
Bug 1586494 - Workaround for mainthread assertion in nsImapMockChannel dtor. r=jorgk
it has been on the beta channel since 76 without apparent ill effects, so I'd like to suggest we take the current patch on a point release for ESR to help us understand the effect on that channel - and we can be cautious about rolling it out. Unless you see a downside? What do you think?
[Approval Request Comment]
Regression caused by (bug #): na
User impact if declined: many crashes
Testing completed (on c-c, etc.): c-b
Risk to taking this patch (and alternatives if risky): Magnus deems this to be low risk and ready to go
[Triage Comment]
Approved for ESR
Comment 27•5 years ago
|
||
bugherder uplift |
Thunderbird 68.8.1:
https://hg.mozilla.org/releases/comm-esr68/rev/d4acd5d53484
Comment 28•5 years ago
|
||
Should know in a couple more days the effect of this patch. So far, it is unclear because of the weekend and yesterday's Monday US holiday https://crash-stats.mozilla.org/signature/?release_channel=release&signature=NS_CycleCollectorSuspect3&date=%3E%3D2020-05-12T00%3A00%3A00.000Z&date=%3C2020-05-26T23%3A59%3A00.000Z#graphs
Assignee | ||
Comment 29•5 years ago
|
||
Just to make sure I'm reading it right (updated chart):
https://crash-stats.mozilla.org/signature/?release_channel=release&signature=NS_CycleCollectorSuspect3&date=%3E%3D2020-05-12T00%3A00%3A00.000Z&date=%3C2020-06-07T23%3A59%3A00.000Z#graphs
We haven't seen any reduction in crashes due to the patch, right?
Comment 30•5 years ago
|
||
I hate to say it, but unlike beta it doesn't look like release has substantially changed. At best maybe 5-10%.
Unless for release, other crash signatures have been reduced.
Reporter | ||
Comment 31•5 years ago
|
||
I mean, that's probably explained by most of the crashes being a diagnostic assert isn't it? That isn't compiled in release mode.
Assignee | ||
Comment 32•4 years ago
|
||
I've been thrashing my way clumsily through the crash-stats, and it looks to me that the vast bulk of the crashes are in the 68.x versions.
(go onto the "Graphs" tab, click on "Crashes per day.." and add "version" to get the stats broken down by version).
78.x does show up, but in much smaller numbers.
Wayne: Is this just a reflection of vastly more people still running the 68.x series, or an actual improvement?
(I'm kind of guessing the former, but we can hope ;-)
A lot of crashes in earlier 68.x versions seemed to ultimately occur in the RDF code, which (I think) was removed at the point of transition to 78.x. So it'd be nice and neat if that was the culprit.
Except that it seems unlikely as a lot of later 68.x version signatures don't seem to have crashed in RDF code. sigh.
Comment 33•4 years ago
|
||
(In reply to Ben Campbell from comment #32)
I've been thrashing my way clumsily through the crash-stats, and it looks to me that the vast bulk of the crashes are in the 68.x versions.
(go onto the "Graphs" tab, click on "Crashes per day.." and add "version" to get the stats broken down by version).
78.x does show up, but in much smaller numbers.
Wayne: Is this just a reflection of vastly more people still running the 68.x series, or an actual improvement?
(I'm kind of guessing the former, but we can hope ;-)
Hard to say this early in the 78 cycle exactly where this signature is headed. We will know better when we hit 78.2 and we turn on updates. But yes, the crash numbers for 78 are affected by the relatively small number of users on that version. Approximately 6% on 78 vs 85% on 68 according to the top graph of https://stats.thunderbird.net/#version
To break down within the signature I prefer to look at the summary, order by version and then math - roughly 4% on 78.x vs 93% 68.x (and the rest other products)
Based on these two sets of numbers I'd say there is no significant change in the crash rate of 78.x.
Comment 34•4 years ago
|
||
Not sure if is affects the crash though it's on the crash stack, but what is this code supposed to do?
https://searchfox.org/comm-central/rev/f8f4a035ed094b3f70d054b1e2fa90bc9e79408b/mailnews/imap/src/nsImapMailFolder.cpp#7599-7603
It doesn't look like it's doing anything useful. And even less useful since it's in the DTOR.
Assignee | ||
Comment 35•4 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #34)
It doesn't look like it's doing anything useful. And even less useful since it's in the DTOR.
Yep - seems pretty redundant to me! (It does call nsIMsgFolder.GetUriForMsg()
, but there's only one implementation and it doesn't have any side-effects).
I don't think it'll have any bearing on the crash - the redundant code doesn't create any extra references held beyond the dtor scope.
Patch attached, try run here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=4e25a85d6b23818364430948aa55ec8d5fc327a2
More generally,
There are a whole bunch of different classes which show up in the various crash logs. Without a replicable case (where the diagnostics could identify the cycle) I think it's just a matter of continuing to plod through all the member variables of the involved classes and trying to work out their lifetimes...
(It's about this point where someone suggests rewriting everything in rust ;- )
Updated•4 years ago
|
Updated•4 years ago
|
Comment 36•4 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/e56a57fb95a6
Remove redundant code in nsImapMailCopyState dtor. r=mkmelin
Comment 37•4 years ago
|
||
To break down within the signature I prefer to look at the summary, order by version and then math - roughly 4% on 78.x vs 93% 68.x (and the rest other products)
Based on these two sets of numbers I'd say there is no significant change in the crash rate of 78.x.
Still the case.
Comment 38•4 years ago
|
||
(In reply to Robert Hartmann from comment #24)
May be this crash can be forced by saving a draft into a nearly full (or overquota) IMAP folder.
Small mails were okay, but a mail with "corona homeschooling homework" jpg attachements crashed.
Robert, are you saying you have a reproducible scenario?
Comment 39•4 years ago
|
||
str |
(In reply to Wayne Mery (:wsmwk) from comment #38)
(In reply to Robert Hartmann from comment #24)
May be this crash can be forced by saving a draft into a nearly full (or overquota) IMAP folder.
Small mails were okay, but a mail with "corona homeschooling homework" jpg attachements crashed.Robert, are you saying you have a reproducible scenario?
Yes the scenario reproducible created crashes for me until I cleaned up may GMX-IMAP-Space... so from practical point of view I cannnot reproduce it ...
But let assume we can reproduce it without having to waste GB space on some mail servers...
We need a IMAP-Server which does not tell Thunderbird any quota information (like GMX-Server doesn't do that).
We need a test account on that server with a minimum of IMAP user space.
We should write a mail with attachment and save to draft or let copying the sent mail to sent-folder. That should be tested in two cases
a) using encryption while sending or saving the draft
b) using no encryption
I think than either the stack trace of this bugreport or the stack trace of the stack trace of other bug 1517464 comment 36 ( https://bugzilla.mozilla.org/show_bug.cgi?id=1517464#c36 ) will be reproducible with described secenario .
Comment hidden (obsolete) |
Assignee | ||
Comment 42•4 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #41)
Ben are you able to utilize Robert's comment 39?
I haven't yet, but I plan to - it looks like an excellent lead...
Assignee | ||
Comment 43•4 years ago
|
||
This bug seems to have shifted a bit - I don't see any nsImapMockChannel involvement any more in the stack traces.
I haven't been able to replicate the exact crash yet, but my current working theory is that it happens when a shutdown occurs while an nsImapMailCopyState object is still outstanding - either because the app was shutdown mid-copy (so larger messages would be more prone to this), or because the copy is failing and the nsImapMailCopyState isn't being cleaned up (an IMAP server rejecting the copy for going over-quota, say...).
nsImapMailCopyState is used only for some kinds of copy - copying an email from a local folder into an IMAP folder, for instance.
Assignee | ||
Comment 44•4 years ago
|
||
There's a degree of speculation involved in this patch :-)
From the crash logs, it looks like it all falls apart because of an nsImapUrl
which is holding onto a pointer to an nsImapMailCopyState
object and being released the IMAP thread. (the nsImapUrl
in turn is being held by the nsImapProtocol
object which inhabits the IMAP thread).
I think it happens because some references (possibly cyclic) are being held on past the end of the copy operation. And this patch helps by disconnecting the nsImapMailCopyState from the nsImapUrl after a copy.
It's all a bit twisty - the copy-related code is fantastically convoluted. This patch it clears out the copyState attribute on the nsImapUrl, which is sometimes an nsImapMailCopyState, but sometimes not. The fix in a slightly ugly place, but it was the only place I was confident of being inside an operation where copyState is an nsImapMailCopyState and not something else!
try run here:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=1f8cfbfb63c3ceafd8d75b17e3c2a13f8aff5603
Comment 45•4 years ago
|
||
Assignee | ||
Comment 46•4 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #45)
Looks good. I think we should add the explanation you wrote into the patch
description so it's always more easily accessible.
Like this you mean?
Comment 47•4 years ago
|
||
Yes. Once this lands lets close it for uplifting purposed in particular. We can clone it to another bug if the patch didn't fix it.
Comment 48•4 years ago
|
||
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/b4c0fee502df
Release nsImapMailCopyState directly after IMAP copy to fix shutdown crash. r=mkmelin
Updated•4 years ago
|
Assignee | ||
Comment 49•4 years ago
|
||
Comment on attachment 9189081 [details] [diff] [review]
1586494-release-nsImapMailCopyState-after-copy-2.patch
[Approval Request Comment]
Regression caused by (bug #):
unsure
User impact if declined:
We need more crash logs to gauge the improvement (if any) provided by this fix.
Testing completed (on c-c, etc.):
No in-depth testing - we don't have a concrete set of steps to reproduce.
Risk to taking this patch (and alternatives if risky):
I think this is a low risk (it's a one-liner, clearing a pointer which is no longer required).
Comment 50•4 years ago
|
||
Comment on attachment 9189081 [details] [diff] [review]
1586494-release-nsImapMailCopyState-after-copy-2.patch
[Triage Comment]
Approved for beta
Comment 51•4 years ago
|
||
Crash rate today is about 20% lower than 6 months ago https://crash-stats.mozilla.org/signature/?product=Thunderbird&signature=NS_CycleCollectorSuspect3&date=%3E%3D2020-05-23T11%3A30%3A00.000Z&date=%3C2020-11-23T11%3A30%3A00.000Z#graphs
One might conclude the crash rate for 78.x must be lower than 68.x, but it's hard to construct a set of crash queries to tease that out because the time period covers summer and the erratic transition from 68 to 78.
Comment 52•4 years ago
|
||
bugherder uplift |
Thunderbird 84.0b2
https://hg.mozilla.org/releases/comm-beta/rev/ca6b28642c6f
Comment 53•4 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #51)
Created attachment 9189473 [details]
Screen Shot 2020-11-23 at 6.35.42 AM.pngCrash rate today is about 20% lower than 6 months ago https://crash-stats.mozilla.org/signature/?product=Thunderbird&signature=NS_CycleCollectorSuspect3&date=%3E%3D2020-05-23T11%3A30%3A00.000Z&date=%3C2020-11-23T11%3A30%3A00.000Z#graphs
One might conclude the crash rate for 78.x must be lower than 68.x, but it's hard to construct a set of crash queries to tease that out because the time period covers summer and the erratic transition from 68 to 78.
Well, I didn't realize this bug produces a few thousands crashes a day(!). This is way too high, isn't it?
(Actually, evern since I noticed TB tended to crash when I shut it down, I have been using TB without shutting down all the time.)
Maybe after December 11, I may be able to test overflowing Imap server or something.
It seems to me that the failure to check for low-level I/O errors may be also to blame (or at least complicate the analysis here).
Comment 54•4 years ago
|
||
Comment on attachment 9189081 [details] [diff] [review]
1586494-release-nsImapMailCopyState-after-copy-2.patch
[Approval Request Comment]
Crash fix.
Comment 55•4 years ago
|
||
To set expectations, beta https://crash-stats.mozilla.org/signature/?release_channel=beta&product=Thunderbird&signature=NS_CycleCollectorSuspect3&date=%3E%3D2020-09-10T15%3A59%3A00.000Z&date=%3C2020-12-10T15%3A59%3A00.000Z#graphs does not show a significant decrease in crash rate, if any as a result of this being on 84.0b2 buildid 20201123211438. So it will be interesting to see if uplifting has a different impact on release channel
Comment 56•4 years ago
|
||
Comment on attachment 9189081 [details] [diff] [review]
1586494-release-nsImapMailCopyState-after-copy-2.patch
[Triage Comment]
Approved for esr78
Comment 57•4 years ago
|
||
Uplifting note: The change that landed in comment 36 (Aug 2020) is not present on c-esr78 so it is going along with the approved patch in comment 56.
Comment 58•4 years ago
|
||
bugherder uplift |
Comment 59•4 years ago
|
||
Good news....
- crash reports down ~60% for NS_CycleCollectorSuspect3 (this bug) - but still not gone
- crash reports down ~40% for nsXPCWrappedJS::Release (bug 1517464) - but still not gone
That's with 22% drop in ADI compared to two weeks ago, and full uptake of 78.6.0 (85% of users) per https://stats.thunderbird.net/
I get roughly the same reduction if I limit stats to only 78.5.0, 78.5.1, 78.6.0.
Two examples from 78.6.0 bp-3fdb7008-cc7f-4bc9-9b3f-ca1f50210101 bp-5520680c-c1e4-4e53-86e1-ee74d0210101
Comment 60•4 years ago
|
||
Comment 61•4 years ago
|
||
Updated•4 years ago
|
Comment 62•4 years ago
|
||
Thunderbird 78.9.0 Crash Report [@ NS_CycleCollectorSuspect3 ] / 32bit TB on 64bit Win 10. / master password active / crash while auto saving to drafts. I thing tha draft wanted to become stored crypted via TB internal PGP - imho that failed, because attachement 25 MB or more were to big,,,
Crashing Thread (72), Name: IMAP
bp-09bed46d-a84e-4e60-b072-3a51b0210406
Frame Module Signature Source Trust
0 xul.dll NS_CycleCollectorSuspect3(void*, nsCycleCollectionParticipant*, nsCycleCollectingAutoRefCnt*, bool*) xpcom/base/nsCycleCollector.cpp:3759 context
1 xul.dll XULContentSinkImpl::Release() dom/xul/nsXULContentSink.cpp:165 cfi
2 xul.dll nsMsgComposeAndSend::~nsMsgComposeAndSend() comm/mailnews/compose/src/nsMsgSend.cpp:0 cfi
3 xul.dll nsMsgComposeAndSend::~nsMsgComposeAndSend() comm/mailnews/compose/src/nsMsgSend.cpp:283 cfi
4 xul.dll nsMsgComposeAndSend::Release() comm/mailnews/compose/src/nsMsgSend.cpp:244 cfi
5 xul.dll CopyListener::~CopyListener() comm/mailnews/compose/src/nsMsgCopy.cpp:41 cfi
6 xul.dll ArrayBufferInputStream::Release() netwerk/base/BackgroundFileSaver.cpp:1077 cfi
7 xul.dll nsImapMailCopyState::~nsImapMailCopyState() comm/mailnews/imap/src/nsImapMailFolder.cpp:7611 cfi
8 xul.dll nsImapMailCopyState::~nsImapMailCopyState() comm/mailnews/imap/src/nsImapMailFolder.cpp:7608 cfi
9 xul.dll mozilla::dom::`anonymous namespace'::FileCreationHandler::Release() dom/base/BodyConsumer.cpp:436 cfi
10 xul.dll nsImapUrl::~nsImapUrl() comm/mailnews/imap/src/nsImapUrl.cpp:76 cfi
11 xul.dll [thunk]: virtual void* __thiscall nsImapUrl::`vector deleting destructor'(unsigned int) cfi
12 xul.dll nsMsgMailNewsUrl::Release() comm/mailnews/base/src/nsMsgMailNewsUrl.cpp:80 frame_pointer
Description
•