Closed
Bug 537017
Opened 15 years ago
Closed 6 years ago
crash [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
Categories
(MailNews Core :: Filters, defect)
Tracking
(thunderbird_esr6063+ fixed, thunderbird63 fixed, thunderbird64 fixed)
RESOLVED
FIXED
Thunderbird 64.0
People
(Reporter: wsmwk, Assigned: mkmelin)
References
Details
(Keywords: crash, Whiteboard: [rare])
Crash Data
Attachments
(1 file, 1 obsolete file)
(deleted),
patch
|
mkmelin
:
review+
jorgk-bmo
:
approval-comm-beta+
jorgk-bmo
:
approval-comm-esr60+
|
Details | Diff | Splinter Review |
crash [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
very low volume crash, like 2/week
bp-b8d38796-b190-4c00-9fec-03c482091224
0 thunderbird.exe nsMsgFilterAfterTheFact::ApplyFilter mailnews/base/search/src/nsMsgFilterService.cpp:576
1 thunderbird.exe nsMsgApplyFiltersToMessages::RunNextFilter mailnews/base/search/src/nsMsgFilterService.cpp:979
2 thunderbird.exe nsMsgFilterAfterTheFact::AdvanceToNextFolder
3 thunderbird.exe nsMsgFilterService::ApplyFilters
4 xpcom_core.dll NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:101
also ~8/week [@ nsCOMPtr_base::~nsCOMPtr_base() | nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact()] .. bogus top of stack?
bp-0d89f9d4-25f7-4bcf-9892-ece0f2091228
0 xpcom_core.dll nsCOMPtr_base::~nsCOMPtr_base objdir-tb/mozilla/xpcom/build/nsCOMPtr.cpp:81
1 thunderbird.exe nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact mailnews/base/search/src/nsMsgFilterService.cpp:332
2 thunderbird.exe nsMsgFilterAfterTheFact::`scalar deleting destructor'
3 thunderbird.exe nsReferencedElement::DocumentLoadNotification::Release mailnews/imap/src/nsImapOfflineSync.cpp:61
4 xpcom_core.dll nsCOMPtr_base::~nsCOMPtr_base objdir-tb/mozilla/xpcom/build/nsCOMPtr.cpp:81
5 thunderbird.exe nsTArray<nsCOMPtr<nsIMsgSendLaterListener> >::DestructRange objdir-tb/mozilla/dist/include/xpcom/nsTArray.h:848
6 thunderbird.exe nsTArray<nsCOMPtr<nsIMsgSendListener> >::RemoveElementsAt objdir-tb/mozilla/dist/include/xpcom/nsTArray.h:663
7 thunderbird.exe nsAutoTObserverArray<nsCOMPtr<nsIDBChangeListener>,0>::Clear objdir-tb/mozilla/dist/include/xpcom/nsTObserverArray.h:242
8 thunderbird.exe nsMsgMailNewsUrl::SetUrlState mailnews/base/util/nsMsgMailNewsUrl.cpp:136
9 thunderbird.exe nsImapMailFolder::SetUrlState mailnews/imap/src/nsImapMailFolder.cpp:6596
Reporter | ||
Comment 1•15 years ago
|
||
crash rate for v3 is now double compared to comment 0, about 4/week, but still tiny. bp-686d0ada-2b73-42c9-bb7f-e31442100322 has an email address which I have pinged.
filed bug 555539 for nsCOMPtr_base::~nsCOMPtr_base() |
nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact()
OS: Windows XP → All
Updated•13 years ago
|
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
Reporter | ||
Comment 2•13 years ago
|
||
rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown?
(never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322)
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
[@ nsMsgFilterAfterTheFact::ApplyFilter ]
Whiteboard: [rare]
Comment 3•13 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #2)
> rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown?
>
> (never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322)
With nsXREDirProvider::DoShutdown in the stack, my guess is yes. It is also happening with a post-bayes filter, which is quite rare (the user must select this manually during filter definition).
Reporter | ||
Comment 4•12 years ago
|
||
In version 12, I think, signature morphed to nsMsgFilterAfterTheFact::ApplyFilter(bool*)
(In reply to Kent James (:rkent) from comment #3)
> (In reply to Wayne Mery (:wsmwk) from comment #2)
> > rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown?
> >
> > (never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322)
>
> With nsXREDirProvider::DoShutdown in the stack, my guess is yes. It is also
> happening with a post-bayes filter, which is quite rare (the user must
> select this manually during filter definition).
None of the current stacks I viewed have shutdown, so that must have been a fluke.
Do these newer stacks reveal anything more useful?
bp-7aa07743-cac8-4603-9d3a-3bc672121203
0 xul.dll nsMsgFilterAfterTheFact::ApplyFilter mailnews/base/search/src/nsMsgFilterService.cpp:538
1 xul.dll nsMsgFilterAfterTheFact::OnSearchDone mailnews/base/search/src/nsMsgFilterService.cpp:400
2 xul.dll nsMsgSearchSession::NotifyListenersDone mailnews/base/search/src/nsMsgSearchSession.cpp:568
3 xul.dll nsMsgSearchSession::TimerCallback mailnews/base/search/src/nsMsgSearchSession.cpp:504
4 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:473
5 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:556
kent@3238 530 // We'll continue after a copy, but not after a move
kent@3238 531 if (NS_SUCCEEDED(rv) && actionType == nsMsgFilterAction::CopyToFolder
kent@3238 532 && actionIndex < numActions - 1)
kent@3238 533 m_nextAction = actionIndex + 1;
kent@3238 534 else
kent@3238 535 m_nextAction = 0; // OnStopCopy tests this to move to next filter
kent@3330 536 // Tell postplugin filters if we are moving the message.
kent@3330 537 if (actionType == nsMsgFilterAction::MoveToFolder)
FWIW, 3/4 of the random stacks I examined incllude exquilla.
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
[@ nsMsgFilterAfterTheFact::ApplyFilter ] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
[@ nsMsgFilterAfterTheFact::ApplyFilter ]
[@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ]
Flags: needinfo?
Reporter | ||
Comment 5•10 years ago
|
||
still low volume
hmm, the failing line didn't paste in the prior comment
bp-9e1d337f-073a-4008-980b-effe32140710
hg@0 536 if (copyService)
kent@3238 537 {
kent@3238 538 rv = copyService->CopyMessages(m_curFolder, m_searchHitHdrs,
kent@3238 539 destIFolder, actionType == nsMsgFilterAction::MoveToFolder,
bugzilla@9082 540 this, m_msgWindow, false);
kent@3238 541 // We'll continue after a copy, but not after a move
kent@3238 542 if (NS_SUCCEEDED(rv) && actionType == nsMsgFilterAction::CopyToFolder
kent@3238 543 && actionIndex < numActions - 1)
kent@3238 544 m_nextAction = actionIndex + 1;
kent@3238 545 else
kent@3238 546 m_nextAction = 0; // OnStopCopy tests this to move to next filter
kent@3330 547 // Tell postplugin filters if we are moving the message.
kent@3330 548 if (actionType == nsMsgFilterAction::MoveToFolder)
ehsan@10936 549 c(uint32_t i = 0; i < m_searchHits.Length(); i++)
Flags: needinfo?
Reporter | ||
Comment 6•9 years ago
|
||
* nsMsgFilterAfterTheFact::ApplyFilter(bool*) signature ends after TB33.0b1
* nsMsgFilterAfterTheFact::ApplyFilter linux signature ends after TB31.8.0
That leaves nsMsgFilterAfterTheFact::ApplyFilter() which is a mess of different stacks, but adding it to the signature list
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
[@ nsMsgFilterAfterTheFact::ApplyFilter ]
[@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
[@ nsMsgFilterAfterTheFact::ApplyFilter ]
[@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ]
[@ nsMsgFilterAfterTheFact::ApplyFilter()]
Depends on: 797710
Whiteboard: [rare] → [rare][check TB38.3.0 crash-stats]
Reporter | ||
Comment 7•9 years ago
|
||
still happens in 38.4.0 and betas.
bp-ac52d722-2d21-4d74-ab45-b24e72151206
bp-1761da40-1ca0-4f7e-9c40-f142c2151104
Blocks: 915107
Whiteboard: [rare][check TB38.3.0 crash-stats] → [rare]
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 8•6 years ago
|
||
After Feb 1 the crash rate dropped about 30%. All or most of the reduction appeared to be from versions prior to version 52 - so perhaps some where in the version 52 cycle we landed something that helped
Assignee | ||
Comment 9•6 years ago
|
||
Looks like m_curFolder was null. So keep a local ref?
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #9012851 -
Flags: review?(jorgk)
Comment 10•6 years ago
|
||
Comment on attachment 9012851 [details] [diff] [review]
bug537017_apply_filter_crash.patch
You think this is going to help? So basically you're assuming 'm_curFolder' is going away somewhere during the processing? How would that be caused? Wouldn't it be useful to check 'm_curFolder' at the top of the function?
Assignee | ||
Comment 11•6 years ago
|
||
I'm not sure how, it is null for some reason though.
We already check m_curFolder on top.
Comment 12•6 years ago
|
||
Comment on attachment 9012851 [details] [diff] [review]
bug537017_apply_filter_crash.patch
I don't like this at all. Basically you're adding another reference to something that's going missing during the big processing loop for some unknown reason. In doing so, you could cause unpredictable side-effects, especially accessing via that reference. I don't know how that member gets nulled, maybe someone is deleting a folder while the filter is busily processing it in the background. It's a low crash rate, so it must be some unlikely case.
IMHO, you have two options: Either find out what is really happening or apply a different sort of band-aid that just avoids the crash:
NS_ENSURE_TRUE(m_curFolder, NS_ERROR_UNEXPECTED)
at all call sites using 'm_curFolder'. Do we know which one is crashing?
Attachment #9012851 -
Flags: review?(jorgk)
Assignee | ||
Comment 13•6 years ago
|
||
The one I tracked down was https://dxr.mozilla.org/comm-esr60/source/mailnews/base/search/src/nsMsgFilterService.cpp#608
I think we do use the pattern I suggested elsewhere, that of keeping a local ref. Can't find and example now though.
As for the options, I don't think we're likely to find out, and NS_ENSURE_TRUE(m_curFolder, NS_ERROR_UNEXPECTED) before all call sites is not any more likely to help. I assume threading is somehow involved here, or else the crash wouldn't be possible. So the folder could still go away in between such added check and the next line. We're already checking m_curFolder at the top of the loop, so adding any further checks is just a game of timing, and you will still sometimes lose.
Comment 14•6 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #13)
> I think we do use the pattern I suggested elsewhere, that of keeping a local
> ref. Can't find and example now though.
I think you're referring to the kungFuDeathGrip. That's a different story:
https://dxr.mozilla.org/comm-central/search?q=kungFuDeathGrip+path%3Acomm&redirect=false
> So the folder could still go away in between such added check and the next line.
Not likely. 'm_curFolder' is a member of the object and no one else will clear that out unless the object itself subjects itself to some async processing. It would be good to know where which dereferencing actually fails. Can we have a recent crash rather than the ones from 2015 :-(
If some strange thing happens, how do you know your copied folder handle still points to a valid folder? And adding another reference to a thing that's been killed elsewhere is asking for trouble since you might block the destruction of the folder.
Maybe Neil wants to comment, he's been active lately.
Flags: needinfo?(vseerror)
Assignee | ||
Comment 15•6 years ago
|
||
Flags: needinfo?(vseerror)
Reporter | ||
Comment 16•6 years ago
|
||
> Can we have a recent crash rather than the ones from 2015 :-(
One can just click the crash sig link at the top of the bug :)
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
[@ nsMsgFilterAfterTheFact::ApplyFilter ]
[@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ]
[@ nsMsgFilterAfterTheFact::ApplyFilter()] → [@ nsMsgFilterAfterTheFact::ApplyFilter ]
Comment 17•6 years ago
|
||
Comment on attachment 9012851 [details] [diff] [review]
bug537017_apply_filter_crash.patch
Review of attachment 9012851 [details] [diff] [review]:
-----------------------------------------------------------------
I've looked at all 32 reports:
https://crash-stats.mozilla.com/signature/?product=Thunderbird&version=60.0&signature=nsMsgFilterAfterTheFact%3A%3AApplyFilter&date=%3E%3D2018-06-28T22%3A59%3A47.000Z&date=%3C2018-09-28T22%3A59%3A47.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
Most crash on comm/mailnews/base/search/src/nsMsgFilterService.cpp:608 (CopyToFolder) with very few failing on lines 589 (Delelete), 670 (MarkUnread) and 863 (Custom), all places where 'm_curFolder' is dereferenced.
So let's focus on "copy" for a moment. That calls copyService->CopyMessages(..., this, ...), where 'this' is the nsMsgFilterAfterTheFact object which is also a listener. It gets called on OnStartCopy() (no-op) and OnStopCopy(). That one recursively calls ApplyFilter() or may call RunNextFilter(). That one calls eventually calls AdvanceToNextFolder() which resets 'm_curFolder' :-(
So what I imagine happens is that the filtering is happily running and at some stage a failed copy comes back and smashes 'm_curFolder'.
Looks like Magnus solution with a variation is the way to go after all :-) - I'm more convinced with my analysis now.
::: mailnews/base/search/src/nsMsgFilterService.cpp
@@ +525,4 @@
> nsresult nsMsgFilterAfterTheFact::ApplyFilter()
> {
> nsresult rv;
> do { // error management block, break if unable to continue with filter.
Add this:
// 'm_curFolder' can be reset asynchronously by the copy service calling OnStopCopy(). So take a local copy here and use it throughout the function.
nsCOMPtr<nsIMsgFolder> curFolder = m_curFolder;
@@ +527,5 @@
> nsresult rv;
> do { // error management block, break if unable to continue with filter.
> if (!m_curFilter)
> break; // Maybe not an error, we just need to call RunNextFilter();
> if (!m_curFolder)
Change this to curFolder.
@@ +530,5 @@
> break; // Maybe not an error, we just need to call RunNextFilter();
> if (!m_curFolder)
> break; // Maybe not an error, we just need to call AdvanceToNextFolder();
> +
> + nsCOMPtr<nsIMsgFolder> curFolder = m_curFolder;
Please move this up until after the 'do' since m_curFolder may already be gone here although it tested non-null before.
Attachment #9012851 -
Flags: review+
Assignee | ||
Comment 18•6 years ago
|
||
Attachment #9012851 -
Attachment is obsolete: true
Attachment #9012977 -
Flags: review+
Comment 20•6 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/4568dcf1dd57
fix crash in nsMsgFilterAfterTheFact::ApplyFilter() caused by async reset of 'm_curFolder'. r=jorgk
Updated•6 years ago
|
Target Milestone: --- → Thunderbird 64.0
Comment 21•6 years ago
|
||
Comment on attachment 9012977 [details] [diff] [review]
bug537017_apply_filter_crash.patch
Maybe run this through a beta cycle since it's only 98% straight forward.
Attachment #9012977 -
Flags: approval-comm-esr60?
Attachment #9012977 -
Flags: approval-comm-beta+
Comment 22•6 years ago
|
||
Beta (TB 63):
https://hg.mozilla.org/releases/comm-beta/rev/26cd66633351f8b32164bd346f72e9351138c20c
status-thunderbird63:
--- → fixed
status-thunderbird64:
--- → fixed
Updated•6 years ago
|
Attachment #9012977 -
Flags: approval-comm-esr60? → approval-comm-esr60+
Comment 23•6 years ago
|
||
status-thunderbird_esr60:
--- → fixed
tracking-thunderbird_esr60:
--- → 63+
You need to log in
before you can comment on or make changes to this bug.
Description
•