Closed Bug 537017 Opened 15 years ago Closed 6 years ago

crash [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]

Categories

(MailNews Core :: Filters, defect)

1.9.1 Branch
x86
All
defect
Not set
critical

Tracking

(thunderbird_esr6063+ fixed, thunderbird63 fixed, thunderbird64 fixed)

RESOLVED FIXED
Thunderbird 64.0
Tracking Status
thunderbird_esr60 63+ fixed
thunderbird63 --- fixed
thunderbird64 --- fixed

People

(Reporter: wsmwk, Assigned: mkmelin)

References

Details

(Keywords: crash, Whiteboard: [rare])

Crash Data

Attachments

(1 file, 1 obsolete file)

crash [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] very low volume crash, like 2/week bp-b8d38796-b190-4c00-9fec-03c482091224 0 thunderbird.exe nsMsgFilterAfterTheFact::ApplyFilter mailnews/base/search/src/nsMsgFilterService.cpp:576 1 thunderbird.exe nsMsgApplyFiltersToMessages::RunNextFilter mailnews/base/search/src/nsMsgFilterService.cpp:979 2 thunderbird.exe nsMsgFilterAfterTheFact::AdvanceToNextFolder 3 thunderbird.exe nsMsgFilterService::ApplyFilters 4 xpcom_core.dll NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:101 also ~8/week [@ nsCOMPtr_base::~nsCOMPtr_base() | nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact()] .. bogus top of stack? bp-0d89f9d4-25f7-4bcf-9892-ece0f2091228 0 xpcom_core.dll nsCOMPtr_base::~nsCOMPtr_base objdir-tb/mozilla/xpcom/build/nsCOMPtr.cpp:81 1 thunderbird.exe nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact mailnews/base/search/src/nsMsgFilterService.cpp:332 2 thunderbird.exe nsMsgFilterAfterTheFact::`scalar deleting destructor' 3 thunderbird.exe nsReferencedElement::DocumentLoadNotification::Release mailnews/imap/src/nsImapOfflineSync.cpp:61 4 xpcom_core.dll nsCOMPtr_base::~nsCOMPtr_base objdir-tb/mozilla/xpcom/build/nsCOMPtr.cpp:81 5 thunderbird.exe nsTArray<nsCOMPtr<nsIMsgSendLaterListener> >::DestructRange objdir-tb/mozilla/dist/include/xpcom/nsTArray.h:848 6 thunderbird.exe nsTArray<nsCOMPtr<nsIMsgSendListener> >::RemoveElementsAt objdir-tb/mozilla/dist/include/xpcom/nsTArray.h:663 7 thunderbird.exe nsAutoTObserverArray<nsCOMPtr<nsIDBChangeListener>,0>::Clear objdir-tb/mozilla/dist/include/xpcom/nsTObserverArray.h:242 8 thunderbird.exe nsMsgMailNewsUrl::SetUrlState mailnews/base/util/nsMsgMailNewsUrl.cpp:136 9 thunderbird.exe nsImapMailFolder::SetUrlState mailnews/imap/src/nsImapMailFolder.cpp:6596
crash rate for v3 is now double compared to comment 0, about 4/week, but still tiny. bp-686d0ada-2b73-42c9-bb7f-e31442100322 has an email address which I have pinged. filed bug 555539 for nsCOMPtr_base::~nsCOMPtr_base() | nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact()
OS: Windows XP → All
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown? (never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322)
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ]
Whiteboard: [rare]
(In reply to Wayne Mery (:wsmwk) from comment #2) > rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown? > > (never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322) With nsXREDirProvider::DoShutdown in the stack, my guess is yes. It is also happening with a post-bayes filter, which is quite rare (the user must select this manually during filter definition).
In version 12, I think, signature morphed to nsMsgFilterAfterTheFact::ApplyFilter(bool*) (In reply to Kent James (:rkent) from comment #3) > (In reply to Wayne Mery (:wsmwk) from comment #2) > > rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown? > > > > (never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322) > > With nsXREDirProvider::DoShutdown in the stack, my guess is yes. It is also > happening with a post-bayes filter, which is quite rare (the user must > select this manually during filter definition). None of the current stacks I viewed have shutdown, so that must have been a fluke. Do these newer stacks reveal anything more useful? bp-7aa07743-cac8-4603-9d3a-3bc672121203 0 xul.dll nsMsgFilterAfterTheFact::ApplyFilter mailnews/base/search/src/nsMsgFilterService.cpp:538 1 xul.dll nsMsgFilterAfterTheFact::OnSearchDone mailnews/base/search/src/nsMsgFilterService.cpp:400 2 xul.dll nsMsgSearchSession::NotifyListenersDone mailnews/base/search/src/nsMsgSearchSession.cpp:568 3 xul.dll nsMsgSearchSession::TimerCallback mailnews/base/search/src/nsMsgSearchSession.cpp:504 4 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:473 5 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:556 kent@3238 530 // We'll continue after a copy, but not after a move kent@3238 531 if (NS_SUCCEEDED(rv) && actionType == nsMsgFilterAction::CopyToFolder kent@3238 532 && actionIndex < numActions - 1) kent@3238 533 m_nextAction = actionIndex + 1; kent@3238 534 else kent@3238 535 m_nextAction = 0; // OnStopCopy tests this to move to next filter kent@3330 536 // Tell postplugin filters if we are moving the message. kent@3330 537 if (actionType == nsMsgFilterAction::MoveToFolder) FWIW, 3/4 of the random stacks I examined incllude exquilla.
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] [@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ]
Flags: needinfo?
still low volume hmm, the failing line didn't paste in the prior comment bp-9e1d337f-073a-4008-980b-effe32140710 hg@0 536 if (copyService) kent@3238 537 { kent@3238 538 rv = copyService->CopyMessages(m_curFolder, m_searchHitHdrs, kent@3238 539 destIFolder, actionType == nsMsgFilterAction::MoveToFolder, bugzilla@9082 540 this, m_msgWindow, false); kent@3238 541 // We'll continue after a copy, but not after a move kent@3238 542 if (NS_SUCCEEDED(rv) && actionType == nsMsgFilterAction::CopyToFolder kent@3238 543 && actionIndex < numActions - 1) kent@3238 544 m_nextAction = actionIndex + 1; kent@3238 545 else kent@3238 546 m_nextAction = 0; // OnStopCopy tests this to move to next filter kent@3330 547 // Tell postplugin filters if we are moving the message. kent@3330 548 if (actionType == nsMsgFilterAction::MoveToFolder) ehsan@10936 549 c(uint32_t i = 0; i < m_searchHits.Length(); i++)
Flags: needinfo?
* nsMsgFilterAfterTheFact::ApplyFilter(bool*) signature ends after TB33.0b1 * nsMsgFilterAfterTheFact::ApplyFilter linux signature ends after TB31.8.0 That leaves nsMsgFilterAfterTheFact::ApplyFilter() which is a mess of different stacks, but adding it to the signature list
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] [@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] [@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ] [@ nsMsgFilterAfterTheFact::ApplyFilter()]
Depends on: 797710
Whiteboard: [rare] → [rare][check TB38.3.0 crash-stats]
No longer depends on: 797710
Blocks: 915107
Whiteboard: [rare][check TB38.3.0 crash-stats] → [rare]
After Feb 1 the crash rate dropped about 30%. All or most of the reduction appeared to be from versions prior to version 52 - so perhaps some where in the version 52 cycle we landed something that helped
Attached patch bug537017_apply_filter_crash.patch (obsolete) (deleted) — Splinter Review
Looks like m_curFolder was null. So keep a local ref?
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #9012851 - Flags: review?(jorgk)
Comment on attachment 9012851 [details] [diff] [review] bug537017_apply_filter_crash.patch You think this is going to help? So basically you're assuming 'm_curFolder' is going away somewhere during the processing? How would that be caused? Wouldn't it be useful to check 'm_curFolder' at the top of the function?
I'm not sure how, it is null for some reason though. We already check m_curFolder on top.
Comment on attachment 9012851 [details] [diff] [review] bug537017_apply_filter_crash.patch I don't like this at all. Basically you're adding another reference to something that's going missing during the big processing loop for some unknown reason. In doing so, you could cause unpredictable side-effects, especially accessing via that reference. I don't know how that member gets nulled, maybe someone is deleting a folder while the filter is busily processing it in the background. It's a low crash rate, so it must be some unlikely case. IMHO, you have two options: Either find out what is really happening or apply a different sort of band-aid that just avoids the crash: NS_ENSURE_TRUE(m_curFolder, NS_ERROR_UNEXPECTED) at all call sites using 'm_curFolder'. Do we know which one is crashing?
Attachment #9012851 - Flags: review?(jorgk)
The one I tracked down was https://dxr.mozilla.org/comm-esr60/source/mailnews/base/search/src/nsMsgFilterService.cpp#608 I think we do use the pattern I suggested elsewhere, that of keeping a local ref. Can't find and example now though. As for the options, I don't think we're likely to find out, and NS_ENSURE_TRUE(m_curFolder, NS_ERROR_UNEXPECTED) before all call sites is not any more likely to help. I assume threading is somehow involved here, or else the crash wouldn't be possible. So the folder could still go away in between such added check and the next line. We're already checking m_curFolder at the top of the loop, so adding any further checks is just a game of timing, and you will still sometimes lose.
(In reply to Magnus Melin [:mkmelin] from comment #13) > I think we do use the pattern I suggested elsewhere, that of keeping a local > ref. Can't find and example now though. I think you're referring to the kungFuDeathGrip. That's a different story: https://dxr.mozilla.org/comm-central/search?q=kungFuDeathGrip+path%3Acomm&redirect=false > So the folder could still go away in between such added check and the next line. Not likely. 'm_curFolder' is a member of the object and no one else will clear that out unless the object itself subjects itself to some async processing. It would be good to know where which dereferencing actually fails. Can we have a recent crash rather than the ones from 2015 :-( If some strange thing happens, how do you know your copied folder handle still points to a valid folder? And adding another reference to a thing that's been killed elsewhere is asking for trouble since you might block the destruction of the folder. Maybe Neil wants to comment, he's been active lately.
Flags: needinfo?(vseerror)
Flags: needinfo?(vseerror)
> Can we have a recent crash rather than the ones from 2015 :-( One can just click the crash sig link at the top of the bug :)
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] [@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ] [@ nsMsgFilterAfterTheFact::ApplyFilter()] → [@ nsMsgFilterAfterTheFact::ApplyFilter ]
Comment on attachment 9012851 [details] [diff] [review] bug537017_apply_filter_crash.patch Review of attachment 9012851 [details] [diff] [review]: ----------------------------------------------------------------- I've looked at all 32 reports: https://crash-stats.mozilla.com/signature/?product=Thunderbird&version=60.0&signature=nsMsgFilterAfterTheFact%3A%3AApplyFilter&date=%3E%3D2018-06-28T22%3A59%3A47.000Z&date=%3C2018-09-28T22%3A59%3A47.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports Most crash on comm/mailnews/base/search/src/nsMsgFilterService.cpp:608 (CopyToFolder) with very few failing on lines 589 (Delelete), 670 (MarkUnread) and 863 (Custom), all places where 'm_curFolder' is dereferenced. So let's focus on "copy" for a moment. That calls copyService->CopyMessages(..., this, ...), where 'this' is the nsMsgFilterAfterTheFact object which is also a listener. It gets called on OnStartCopy() (no-op) and OnStopCopy(). That one recursively calls ApplyFilter() or may call RunNextFilter(). That one calls eventually calls AdvanceToNextFolder() which resets 'm_curFolder' :-( So what I imagine happens is that the filtering is happily running and at some stage a failed copy comes back and smashes 'm_curFolder'. Looks like Magnus solution with a variation is the way to go after all :-) - I'm more convinced with my analysis now. ::: mailnews/base/search/src/nsMsgFilterService.cpp @@ +525,4 @@ > nsresult nsMsgFilterAfterTheFact::ApplyFilter() > { > nsresult rv; > do { // error management block, break if unable to continue with filter. Add this: // 'm_curFolder' can be reset asynchronously by the copy service calling OnStopCopy(). So take a local copy here and use it throughout the function. nsCOMPtr<nsIMsgFolder> curFolder = m_curFolder; @@ +527,5 @@ > nsresult rv; > do { // error management block, break if unable to continue with filter. > if (!m_curFilter) > break; // Maybe not an error, we just need to call RunNextFilter(); > if (!m_curFolder) Change this to curFolder. @@ +530,5 @@ > break; // Maybe not an error, we just need to call RunNextFilter(); > if (!m_curFolder) > break; // Maybe not an error, we just need to call AdvanceToNextFolder(); > + > + nsCOMPtr<nsIMsgFolder> curFolder = m_curFolder; Please move this up until after the 'do' since m_curFolder may already be gone here although it tested non-null before.
Attachment #9012851 - Flags: review+
Attachment #9012851 - Attachment is obsolete: true
Attachment #9012977 - Flags: review+
Quite the detective work!
Keywords: checkin-needed
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/4568dcf1dd57 fix crash in nsMsgFilterAfterTheFact::ApplyFilter() caused by async reset of 'm_curFolder'. r=jorgk
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 64.0
Comment on attachment 9012977 [details] [diff] [review] bug537017_apply_filter_crash.patch Maybe run this through a beta cycle since it's only 98% straight forward.
Attachment #9012977 - Flags: approval-comm-esr60?
Attachment #9012977 - Flags: approval-comm-beta+
Attachment #9012977 - Flags: approval-comm-esr60? → approval-comm-esr60+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: