Closed Bug 1175168 Opened 9 years ago Closed 3 years ago

crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol. MOZ_Crash reason nsXPCWrappedJS::CallMethod called off main thread (Not import related) - now bug 1707004?

Categories

(MailNews Core :: Networking: IMAP, defect)

x86
All
defect
Not set
critical

Tracking

(thunderbird_esr91+ fixed, thunderbird94+ fixed)

VERIFIED FIXED
95 Branch
Tracking Status
thunderbird_esr91 + fixed
thunderbird94 + fixed

People

(Reporter: wsmwk, Assigned: mkmelin)

References

(Blocks 4 open bugs, )

Details

(Keywords: crash, topcrash-thunderbird, Whiteboard: [non-import crashes][imap:key])

Crash Data

Attachments

(3 files)

+++ This bug was initially created as a clone of Bug #1175055 +++ +++ Bug #1175055 was initially created as a clone of Bug #917961 +++ I estimate about 5% of nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) crashes are not import related. Once we get Bug #1175055 and Bug #917961 sorted out then we should take a look at these. Examples (most of these are 31.x crashes) : bp-0b68f7af-5afc-4313-aeef-93a022150504 I cannot read the mails in my inbox bp-811d47d5-9ac9-463f-981b-fce3b2150206 Suudenly it got crashed bp-c5205ba2-95cc-49a5-aa36-f410e2150117 Moved tabs in Windows 7 and Thunderbird crashed after move bp-af4fc7f1-f431-442f-984d-dc0292150506 was trying to click on menu > add ons bp-38681d3c-803a-408a-8c54-669cd2150504 Thunderbird was idling. bp-3069c4f5-fef0-4559-b0d2-d39cb2150418 I was about to add a signature to my account.
If all import crashes are gone when bug 1175055 is fixed then bug 917961 can cover the remaining import crashes, and this bug cover the non-import crashes.
Summary: Importing from Eudora and Outlook busted in TB 38 and trunk → crash nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*)
Whiteboard: [non-import crashes]
Depends on: 1176748
Crash Signature: [@ nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*)] [@ nsXPCWrappedJS::AddRef()] [@ mozilla::dom::workers::GetCurrentThreadJSContext()] → [@ nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*)] [@ nsXPCWrappedJS::AddRef()] [@ mozilla::dom::workers::GetCurrentThreadJSContext()] [@ nsXPCWrappedJS::CallMethod] [@ nsXPCWrappedJS::AddRef] [@ mozilla::d…
Still topcrash. Still no rhyme or reason to these
Summary: crash nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) → crash nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) - Not import related
m_kato can you offer advice? les frequently has this crash signature bp-e0ea98b5-50b0-40ce-bd88-72e060180319 0 xul.dll nsXPCWrappedJS::AddRef() js/xpconnect/src/XPCWrappedJS.cpp:242 1 xul.dll AddRefThunk ipc/mscom/VTableBuilder.c:24 2 xul.dll nsMsgMailNewsUrl::~nsMsgMailNewsUrl() C:/builds/moz2_slave/tb-rel-c-beta-w32_bld-00000000/build/mailnews/base/util/nsMsgMailNewsUrl.cpp:69 3 xul.dll nsImapUrl::`scalar deleting destructor'(unsigned int) 4 xul.dll nsMsgMailNewsUrl::Release() C:/builds/moz2_slave/tb-rel-c-beta-w32_bld-00000000/build/mailnews/base/util/nsMsgMailNewsUrl.cpp:76 5 xul.dll nsPop3URL::Release() C:/builds/moz2_slave/tb-rel-c-beta-w32_bld-00000000/build/mailnews/local/src/nsPop3URL.cpp:24 6 xul.dll nsImapProtocol::ProcessCurrentURL() C:/builds/moz2_slave/tb-rel-c-beta-w32_bld-00000000/build/mailnews/imap/src/nsImapProtocol.cpp:1916 7 xul.dll nsImapProtocol::ImapThreadMainLoop() C:/builds/moz2_slave/tb-rel-c-beta-w32_bld-00000000/build/mailnews/imap/src/nsImapProtocol.cpp:1402 8 xul.dll nsImapProtocol::Run() C:/builds/moz2_slave/tb-rel-c-beta-w32_bld-00000000/build/mailnews/imap/src/nsImapProtocol.cpp:1061
Component: General → Networking: IMAP
Flags: needinfo?(m_kato)
Product: Thunderbird → MailNews Core
(In reply to Wayne Mery (:wsmwk) from comment #3) > m_kato can you offer advice? Same issue as bug 1133892. But that fix isn't enough. Don't addref JavaScript's XPCOM object on non-Gecko's main thread.
Flags: needinfo?(m_kato)
Thanks. In that bug Kent's comment was "needs to proxy URL releases to the main thread in the IMAP protocol." Kent, can you suggest a variation of your patch for this bug? The combination of the two signatures makes this a topcrash - #22 for 52.6.0, and #15 for 60.0b1 (although perhaps not all the crashes are related) (removing mozilla::dom::workers::GetCurrentThreadJSContext)
Crash Signature: [@ nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*)] [@ nsXPCWrappedJS::AddRef()] [@ mozilla::dom::workers::GetCurrentThreadJSContext()] [@ nsXPCWrappedJS::CallMethod] [@ nsXPCWrappedJS::AddRef] [@ mozilla::d… → [@ nsXPCWrappedJS::AddRef()] [@ nsXPCWrappedJS::CallMethod] [@ nsXPCWrappedJS::AddRef]
Flags: needinfo?(rkent)
Summary: crash nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) - Not import related → crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol (Not import related)

Ben, can you derive something from the patch in bug 1133892?

Recent crash examples
nsXPCWrappedJS::CallMethod bp-a38ea4de-7950-477e-8abb-9e8c20190709
nsXPCWrappedJS::AddRef bp-1347677c-3bca-40aa-9922-979230190709

Crash rank ~#25 combining the two crash signatures

Crash Signature: [@ nsXPCWrappedJS::AddRef()] [@ nsXPCWrappedJS::CallMethod] [@ nsXPCWrappedJS::AddRef] → [@ nsXPCWrappedJS::CallMethod] [@ nsXPCWrappedJS::AddRef]
Flags: needinfo?(rkent) → needinfo?(benc)

These do all look related to me (ie xpcom calls that need to be done by the main thread are instead being invoked from some other thread - nsXPCWrappedJS stuff in particular, as the javascript has to be run on the main thread).

Just to make sure I'm understanding things right:

  • The fix is to set things up so that such actions are properly deferred to the main thread. For example, using nsIRunnable to package up the call for the main thread to run.
  • The patch in Bug 1133892 does for nsMsgMailNewsUrl, which holds some XPCOM pointers. Because releasing them might require executing some javascript (if the object is a nsXPCWrappedJS object), the release is now deferred to the main thread.

I've spotted a few other places where mainthread deferral might be required - nsMsgFolderNotificationService listeners, pop3Sink in nsPop3URL , I'm sure there are a bunch more...
But I'm not sure how widespread it is, or what the cleanest solution is.
While a lot of it can be fixed piecemeal (like in Bug 1133892), maybe a better approach would be to place the responsibility on to the worker threads - ie if they perform an operation that might require main thread deferral, they should handle deferral rather than relying on lower-level objects to be so thread-aware...
I suspect the piecemeal approach would be easier. Quicker wins, anyway. But maybe at the expense of adding complexity to low-level code.

Do we have replication steps for one of these crashes?
nsXPCWrappedJS::CallMethod bp-a38ea4de-7950-477e-8abb-9e8c20190709 looks promising - it's in the import thread, and I'd imagine it'd be pretty deterministic. Surely there's some easy way to reliably trigger it manually?

Flags: needinfo?(benc)

(In reply to Ben Campbell from comment #7)

...
Do we have replication steps for one of these crashes?
nsXPCWrappedJS::CallMethod bp-a38ea4de-7950-477e-8abb-9e8c20190709 looks promising - it's in the import thread, and I'd imagine it'd be pretty deterministic. Surely there's some easy way to reliably trigger it manually?

Perhaps. See bug 917961 comment 2 and bug 917961 comment 6. (bug 917961 already blocks this bug)

I am also forwarding you some emails. Some comments have said they did import all. But it's not clear that is a key factor.

Blocks: 1517464
Flags: needinfo?(benc)

Ben ...

(In reply to Ben Campbell from comment #7)

These do all look related to me (ie xpcom calls that need to be done by the main thread are instead being invoked from some other thread - nsXPCWrappedJS stuff in particular, as the javascript has to be run on the main thread).
....
While a lot of it can be fixed piecemeal (like in Bug 1133892), maybe a better approach would be to place the responsibility on to the worker threads - ie if they perform an operation that might require main thread deferral, they should handle deferral rather than relying on lower-level objects to be so thread-aware...
I suspect the piecemeal approach would be easier. Quicker wins, anyway. But maybe at the expense of adding complexity to low-level code.

If import is such a use case, could we explore this idea in that limited use case in bug 917961? Coincidentally...

Do we have replication steps for one of these crashes?
nsXPCWrappedJS::CallMethod bp-a38ea4de-7950-477e-8abb-9e8c20190709 looks promising - it's in the import thread, and I'd imagine it'd be pretty deterministic. Surely there's some easy way to reliably trigger it manually?

Were the examples I sent on 8/13 useful?

Blocks: 1284583
Blocks: 1508649

Starting July 15, crash rate of nsXPCWrappedJS::CallMethod appears to be up about 18% compared to the prior 4 months. Unclear why.

Thunderbird 78.0 1374 44.8% 744
Thunderbird 68.10.0 1233 40.2% 902
Thunderbird 78.0.1 328 10.7% 181
Thunderbird 68.9.0 24 0.8% 12
Thunderbird 79.0 19 0.6% 20
Thunderbird 68.6.0 15 0.5% 13

(In reply to Wayne Mery (:wsmwk) from comment #11)
...

Starting July 15, crash rate of nsXPCWrappedJS::CallMethod appears to be up about 18% compared to the prior 4 months. Unclear why.
...

new Java version?

(In reply to Worcester12345 from comment #12)

new Java version?

No, - if you are implying a java version change in the OS - java on the OS wouldn't impact Thunderbird nor Firefox. java is not javascript.
Also, we pushed no changes July 13-15.

There is an increase perhaps due to 78.0 on July 16, but that is one day earlier than as noted in my comment 11. https://crash-stats.mozilla.org/signature/?product=Thunderbird&version=78.0&signature=nsXPCWrappedJS%3A%3ACallMethod&date=%3E%3D2020-06-24T00%3A00%3A00.000Z&date=%3C2020-07-24T23%3A59%3A00.000Z#graphs

Sounds like that fixed (many of) the import cases yes.

(In reply to Magnus Melin [:mkmelin] from comment #15)

Sounds like that fixed (many of) the import cases yes.

updated bug 917961 for the import related issue.

Most remaining signatures of nsXPCWrappedJS::CallMethod do not have outlook on the stack, which is what this bug is about. Are any of these examples actionable?

bp-2bae85b8-bbb9-4d36-81e7-7c3c70201229 78.6.0
0 xul.dll nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) js/xpconnect/src/XPCWrappedJSClass.cpp:753
1 xul.dll PrepareAndDispatch(nsXPTCStubBase*, unsigned int, unsigned int*, unsigned int*) xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:88
2 xul.dll SharedStub() xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:110
3 xul.dll nsAbOutlookDirectory::ExecuteQuery(_SRestriction&, nsIAbDirSearchListener*, int, int, int) comm/mailnews/addrbook/src/nsAbOutlookDirectory.cpp:941

Most current crashes are beta - so maybe there is a regression in beta? ... looks like there is an increase starting in September https://crash-stats.mozilla.org/signature/?product=Thunderbird&version=%2178.0.1&version=%2178.1.0&version=%2168.12.0&version=%2168.12.1&version=%2168.10.0&version=%2168.6.0&version=%2168.5.0&release_channel=beta&signature=nsXPCWrappedJS%3A%3ACallMethod&date=%3E%3D2020-06-30T19%3A30%3A00.000Z&date=%3C2020-12-31T19%3A30%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#graphs

bp-8dde177a-30a8-4b42-8080-b14670201231 85 beta
0 xul.dll nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) js/xpconnect/src/XPCWrappedJSClass.cpp:757
1 xul.dll PrepareAndDispatch(nsXPTCStubBase*, unsigned int, unsigned int*, unsigned int*) xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:79
2 xul.dll SharedStub() xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:98
3 xul.dll nsMsgMailSession::OnItemBoolPropertyChanged(nsIMsgFolder*, nsTSubstring<char> const&, bool, bool) comm/mailnews/base/src/nsMsgMailSession.cpp:118

Still no testcases - ref Ben's comment 7.

Flags: needinfo?(benc) → needinfo?(mkmelin+mozilla)

MOZ_CRASH Reason (Sanitized): MOZ_RELEASE_ASSERT(NS_IsMainThread()) (nsXPCWrappedJS::CallMethod called off main thread)

Flags: needinfo?(mkmelin+mozilla) → needinfo?(benc)
Summary: crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol (Not import related) → crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol. MOZ_Crash reason nsXPCWrappedJS::CallMethod called off main thread(Not import related)
Blocks: 1344036
Summary: crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol. MOZ_Crash reason nsXPCWrappedJS::CallMethod called off main thread(Not import related) → crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol. MOZ_Crash reason nsXPCWrappedJS::CallMethod called off main thread (Not import related)
Blocks: 558452
Whiteboard: [non-import crashes] → [non-import crashes][imap:key]

This has more analysis than bug 917961, so switching the dependency

Blocks: 1176748
No longer depends on: 1176748
Blocks: 917961
No longer blocks: 1176748
Blocks: 1685084
Blocks: 1684266
Blocks: 1592338
Blocks: 1534119
Blocks: 1649674
Blocks: 1323596
Blocks: 1581079
Summary: crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol. MOZ_Crash reason nsXPCWrappedJS::CallMethod called off main thread (Not import related) → crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol. MOZ_Crash reason nsXPCWrappedJS::CallMethod called off main thread (Not import related) - now bug 1707004?
Blocks: 1718260
Blocks: 1674132
Blocks: 1707006

91.0.1 ranks #1 for nsXPCWrappedJS::AddRef, a significant change [1] compared to version 78.13.0 which ranks #48. Can we determine why?

Some crash report examples:
bp-07e92124-3864-4df1-8956-206b80210519 84.0
bp-4bf19fa9-1056-4c58-8f17-8855c0210317 86.0
bp-38ac0885-044e-455b-9d8b-399930210315 87.0
bp-1fce9379-1355-48f8-a87e-7fdb00210506 89.0

Currently 45% of release crashes <1 minute, beta is 40%, but significantly higher than betas of a few months ago at 31% have uptime. bp-1f44681f-229e-454a-ac2c-0b7810210905 is a current beta example - it has no unusual addons.

[1] It's hard to say where the rate changed, and what patch caused the change, because crash rate is already very high for the at the start of the beta crash query data starting 3/5/2021. Another example where a year of crash data would be helpful.

version count
83.0 24
84.0 10
85.0 21
86.0 43
87.0 444 (merge happened on 2/22)
88.0 48 (merge happened on 3/22, don't know why the crash rate dips, FWIW smtp-js appears)
89.0 789 (merged happened on 4/19)
90.0 823
91.0 437
92.0 308

p.s. nsXPCWrappedJS::CallMethod is rare, probably has been for some time, except for a brush in pdf bug 1663859.

Flags: needinfo?(benc) → needinfo?(mkmelin+mozilla)

benc see comment 20

(In reply to Ben Campbell from comment #7)

These do all look related to me (ie xpcom calls that need to be done by the main thread are instead being invoked from some other thread - nsXPCWrappedJS stuff in particular, as the javascript has to be run on the main thread).

Just to make sure I'm understanding things right:

  • The fix is to set things up so that such actions are properly deferred to the main thread. For example, using nsIRunnable to package up the call for the main thread to run.
  • The patch in Bug 1133892 does for nsMsgMailNewsUrl, which holds some XPCOM pointers. Because releasing them might require executing some javascript (if the object is a nsXPCWrappedJS object), the release is now deferred to the main thread.

I've spotted a few other places where mainthread deferral might be required - nsMsgFolderNotificationService listeners, pop3Sink in nsPop3URL , I'm sure there are a bunch more...

Hopefully those will go away with pop3-js bug 1707548!

Flags: needinfo?(benc)

Just been peeking through the callstacks for the current reports. It looks to me like the issue is that there's an nsMailToUrl being constructed on a worker (DOMWorker) thread. This should be fine and normal... except that all our c-c URL classes are not just URLs. They all have state, and hold lots of nsCOMPtr<> members they really really shouldn't. Which means they're not very good at being used except on the main thread.

Bigger picture:
We should make all our URL classes into actual URLs that don't hold lots of state about the progress of the request they represent (major job, but necessary I think).

More narrowly:
This crash signature has skewed massively to nsMailToUrl urls. I don't know what's changed.
I'd be tempted to think that the new javascript SMTP might improve things... but I think that's false hope. If anything is still creating nsMailToUrl objects off the main thread, we're still going to have problems :-(

Flags: needinfo?(benc)

Looking at a few of those crashes, I think that stacks means that like nsMailtoUrl::ParseMailtoUrl() is calling into an XPCOM service implementing nsIMimeConverter, on a DOM worker thread. It looks like the service is implemented in JS, which means it is trying to go through XPConnect, which can't be used off of the main thread. That said, I can't see any JS implementations of nsIMimeService in the comm-central tree, so maybe it is an addon doing this? I do see some comments saying "nsIMIMEService is not thread-safe", so maybe something could go wrong even when using the C++ implementation of this interface. But presumably any issues would show up differently.

Hmm, I wonder if these all users have set the pref to use the c++ smtp....
That would seems unlikely. Perhaps some case where the right module isn't (yet) loaded?
From above the call comes through nsSmtpService.cpp which would imply so (and that's completely removed on trunk already).
It looks like nsISmtpUrl could be removed altogether, but there's still some usage of it. Bug 1724177 would remove almost all of it I think.

Flags: needinfo?(mkmelin+mozilla)

bp-1da9fd49-cafb-42e7-97b6-9b36e0210928 reported in https://support.mozilla.org/en-US/questions/1352146 happens when opening a PDF - so nothing to do with SMTP

Flags: needinfo?(mkmelin+mozilla)

It seems these are for mailto: urls. I'll give them the same treatment as imap. Maybe somehow related to them getting triggered by an external call?

Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Flags: needinfo?(mkmelin+mozilla)
Target Milestone: --- → 94 Branch
Blocks: 1731174

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/0ac5c49c584c
make sure to create mailto urls only on the main thread (non-main thread will crash). r=benc

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Pushed by mkmelin@iki.fi: https://hg.mozilla.org/comm-central/rev/e022bd521a23 follow-up - fix warning about unused param which breaks builds on CI. rs=bustage-fix DONTBUILD

I guess I should add ac_add_options --enable-warnings-as-errors to .mozconfig locally.

Comment on attachment 9243897 [details]
Bug 1175168 - make sure to create mailto urls only on the main thread (non-main thread will crash). r=benc

[Approval Request Comment]
Regression caused by (bug #): not directly a regression
User impact if declined: may crash
Testing completed (on c-c, etc.): daily
Risk to taking this patch (and alternatives if risky): very safe

Attachment #9243897 - Flags: approval-comm-esr91?
Attachment #9243897 - Flags: approval-comm-beta?

Comment on attachment 9243897 [details]
Bug 1175168 - make sure to create mailto urls only on the main thread (non-main thread will crash). r=benc

[Triage Comment]
Approved for beta

Attachment #9243897 - Flags: approval-comm-beta? → approval-comm-beta+
Target Milestone: 94 Branch → 95 Branch
Blocks: 1736192

Comment on attachment 9243897 [details]
Bug 1175168 - make sure to create mailto urls only on the main thread (non-main thread will crash). r=benc

[Triage Comment]
Approved for esr91 (91.2.1) - it's been on beta for a week, so fingers crossed we get no regressions shipping in 91.

Note

  • This has potential to fix Bug 1736192 - Thunderbird crashes @ nsXPCWrappedJS::AddRef when opening some pdf files in the viewer
  • nsXPCWrappedJS::CallMethod crashes stopped after 91.0 buildid 20210713202926, which would be Bug 1663859
Attachment #9243897 - Flags: approval-comm-esr91? → approval-comm-esr91+

Need an esr91 version of this if it's to be uplifted.

Flags: needinfo?(mkmelin+mozilla)

Try looks good

But I don't know why tests didn't run. (Not that this would have been tested anyway.)

This is a BIG step forward, so thanks for the patch.

Per crash-stats I can confirm nsXPCWrappedJS::AddRef is substantially reduced. In version 91.2.0 it was #1 crash, in 91.2.1 it drops to #39

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: