crash nsXPCWrappedJS::CallMethod - need to proxy URL release (JavaScript's XPCOM object) to the main thread in the IMAP protocol. MOZ_Crash reason nsXPCWrappedJS::CallMethod called off main thread (Not import related) - now bug 1707004?
Categories
(MailNews Core :: Networking: IMAP, defect)
Tracking
(thunderbird_esr91+ fixed, thunderbird94+ fixed)
People
(Reporter: wsmwk, Assigned: mkmelin)
References
(Blocks 4 open bugs, )
Details
(Keywords: crash, topcrash-thunderbird, Whiteboard: [non-import crashes][imap:key])
Crash Data
Attachments
(3 files)
(deleted),
image/png
|
Details | |
(deleted),
text/x-phabricator-request
|
wsmwk
:
approval-comm-beta+
wsmwk
:
approval-comm-esr91+
|
Details |
(deleted),
image/png
|
Details |
Reporter | ||
Comment 1•9 years ago
|
||
Updated•9 years ago
|
Reporter | ||
Comment 2•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
Reporter | ||
Comment 5•7 years ago
|
||
Reporter | ||
Comment 6•5 years ago
|
||
Ben, can you derive something from the patch in bug 1133892?
Recent crash examples
nsXPCWrappedJS::CallMethod bp-a38ea4de-7950-477e-8abb-9e8c20190709
nsXPCWrappedJS::AddRef bp-1347677c-3bca-40aa-9922-979230190709
Crash rank ~#25 combining the two crash signatures
Comment 7•5 years ago
|
||
important |
These do all look related to me (ie xpcom calls that need to be done by the main thread are instead being invoked from some other thread - nsXPCWrappedJS
stuff in particular, as the javascript has to be run on the main thread).
Just to make sure I'm understanding things right:
- The fix is to set things up so that such actions are properly deferred to the main thread. For example, using
nsIRunnable
to package up the call for the main thread to run. - The patch in Bug 1133892 does for
nsMsgMailNewsUrl
, which holds some XPCOM pointers. Because releasing them might require executing some javascript (if the object is ansXPCWrappedJS
object), the release is now deferred to the main thread.
I've spotted a few other places where mainthread deferral might be required - nsMsgFolderNotificationService
listeners, pop3Sink in nsPop3URL
, I'm sure there are a bunch more...
But I'm not sure how widespread it is, or what the cleanest solution is.
While a lot of it can be fixed piecemeal (like in Bug 1133892), maybe a better approach would be to place the responsibility on to the worker threads - ie if they perform an operation that might require main thread deferral, they should handle deferral rather than relying on lower-level objects to be so thread-aware...
I suspect the piecemeal approach would be easier. Quicker wins, anyway. But maybe at the expense of adding complexity to low-level code.
Do we have replication steps for one of these crashes?
nsXPCWrappedJS::CallMethod bp-a38ea4de-7950-477e-8abb-9e8c20190709 looks promising - it's in the import thread, and I'd imagine it'd be pretty deterministic. Surely there's some easy way to reliably trigger it manually?
Reporter | ||
Comment 8•5 years ago
|
||
(In reply to Ben Campbell from comment #7)
...
Do we have replication steps for one of these crashes?
nsXPCWrappedJS::CallMethod bp-a38ea4de-7950-477e-8abb-9e8c20190709 looks promising - it's in the import thread, and I'd imagine it'd be pretty deterministic. Surely there's some easy way to reliably trigger it manually?
Perhaps. See bug 917961 comment 2 and bug 917961 comment 6. (bug 917961 already blocks this bug)
I am also forwarding you some emails. Some comments have said they did import all. But it's not clear that is a key factor.
Reporter | ||
Comment 9•5 years ago
|
||
In version 68., the crash rate for nsXPCWrappedJS::CallMethod has increased 100x (it was also a steady crash for 60.*) - graph: https://crash-stats.mozilla.org/signature/?signature=nsXPCWrappedJS%3A%3ACallMethod&date=%3E%3D2019-06-15T15%3A02%3A00.000Z&date=%3C2019-09-15T15%3A02%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=3#graphs
25% of the crashes sampled do NOT have import on the stack. example bp-36662401-b8e4-4ee9-aea3-7892c0190915
100x is an exceptionally high increase. I did not assess whether non-import crashes increared more than import crashes (bug 917961)
On the other hand, nsXPCWrappedJS::AddRef crash rate is rock steady so far going from 60.* to 68.* - graph: https://crash-stats.mozilla.org/signature/?signature=nsXPCWrappedJS%3A%3AAddRef&date=%3E%3D2019-06-15T15%3A02%3A00.000Z&date=%3C2019-09-15T15%3A02%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#graphs
Reporter | ||
Comment 10•5 years ago
|
||
Ben ...
(In reply to Ben Campbell from comment #7)
These do all look related to me (ie xpcom calls that need to be done by the main thread are instead being invoked from some other thread -
nsXPCWrappedJS
stuff in particular, as the javascript has to be run on the main thread).
....
While a lot of it can be fixed piecemeal (like in Bug 1133892), maybe a better approach would be to place the responsibility on to the worker threads - ie if they perform an operation that might require main thread deferral, they should handle deferral rather than relying on lower-level objects to be so thread-aware...
I suspect the piecemeal approach would be easier. Quicker wins, anyway. But maybe at the expense of adding complexity to low-level code.
If import is such a use case, could we explore this idea in that limited use case in bug 917961? Coincidentally...
Do we have replication steps for one of these crashes?
nsXPCWrappedJS::CallMethod bp-a38ea4de-7950-477e-8abb-9e8c20190709 looks promising - it's in the import thread, and I'd imagine it'd be pretty deterministic. Surely there's some easy way to reliably trigger it manually?
Were the examples I sent on 8/13 useful?
Reporter | ||
Comment 11•4 years ago
|
||
Starting July 15, crash rate of nsXPCWrappedJS::CallMethod appears to be up about 18% compared to the prior 4 months. Unclear why.
Thunderbird 78.0 1374 44.8% 744
Thunderbird 68.10.0 1233 40.2% 902
Thunderbird 78.0.1 328 10.7% 181
Thunderbird 68.9.0 24 0.8% 12
Thunderbird 79.0 19 0.6% 20
Thunderbird 68.6.0 15 0.5% 13
Comment 12•4 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #11)
...
Starting July 15, crash rate of nsXPCWrappedJS::CallMethod appears to be up about 18% compared to the prior 4 months. Unclear why.
...
new Java version?
Reporter | ||
Comment 13•4 years ago
|
||
(In reply to Worcester12345 from comment #12)
new Java version?
No, - if you are implying a java version change in the OS - java on the OS wouldn't impact Thunderbird nor Firefox. java is not javascript.
Also, we pushed no changes July 13-15.
There is an increase perhaps due to 78.0 on July 16, but that is one day earlier than as noted in my comment 11. https://crash-stats.mozilla.org/signature/?product=Thunderbird&version=78.0&signature=nsXPCWrappedJS%3A%3ACallMethod&date=%3E%3D2020-06-24T00%3A00%3A00.000Z&date=%3C2020-07-24T23%3A59%3A00.000Z#graphs
Reporter | ||
Comment 14•4 years ago
|
||
Looking at the graph of XPCWrapped:CallMethod https://crash-stats.mozilla.org/signature/?signature=nsXPCWrappedJS%3A%3ACallMethod&date=%3E%3D2020-05-22T03%3A03%3A00.000Z&date=%3C2020-11-22T03%3A03%3A00.000Z#graphs there is a massive drop in crashes between 78.1.0 and 78.1.1.
Which according to https://hg.mozilla.org/releases/comm-esr78/pushloghtml?fromchange=THUNDERBIRD_78_1_0_RELEASE&tochange=THUNDERBIRD_78_1_1_RELEASE&full=1 may be the result of Bug 272292 - unable to import Seamonkey data into Thunderbird ?
Assignee | ||
Comment 15•4 years ago
|
||
Sounds like that fixed (many of) the import cases yes.
Reporter | ||
Comment 16•4 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #15)
Sounds like that fixed (many of) the import cases yes.
updated bug 917961 for the import related issue.
Most remaining signatures of nsXPCWrappedJS::CallMethod do not have outlook on the stack, which is what this bug is about. Are any of these examples actionable?
bp-2bae85b8-bbb9-4d36-81e7-7c3c70201229 78.6.0
0 xul.dll nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) js/xpconnect/src/XPCWrappedJSClass.cpp:753
1 xul.dll PrepareAndDispatch(nsXPTCStubBase*, unsigned int, unsigned int*, unsigned int*) xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:88
2 xul.dll SharedStub() xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:110
3 xul.dll nsAbOutlookDirectory::ExecuteQuery(_SRestriction&, nsIAbDirSearchListener*, int, int, int) comm/mailnews/addrbook/src/nsAbOutlookDirectory.cpp:941
Most current crashes are beta - so maybe there is a regression in beta? ... looks like there is an increase starting in September https://crash-stats.mozilla.org/signature/?product=Thunderbird&version=%2178.0.1&version=%2178.1.0&version=%2168.12.0&version=%2168.12.1&version=%2168.10.0&version=%2168.6.0&version=%2168.5.0&release_channel=beta&signature=nsXPCWrappedJS%3A%3ACallMethod&date=%3E%3D2020-06-30T19%3A30%3A00.000Z&date=%3C2020-12-31T19%3A30%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#graphs
bp-8dde177a-30a8-4b42-8080-b14670201231 85 beta
0 xul.dll nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) js/xpconnect/src/XPCWrappedJSClass.cpp:757
1 xul.dll PrepareAndDispatch(nsXPTCStubBase*, unsigned int, unsigned int*, unsigned int*) xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:79
2 xul.dll SharedStub() xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:98
3 xul.dll nsMsgMailSession::OnItemBoolPropertyChanged(nsIMsgFolder*, nsTSubstring<char> const&, bool, bool) comm/mailnews/base/src/nsMsgMailSession.cpp:118
Still no testcases - ref Ben's comment 7.
Assignee | ||
Comment 17•4 years ago
|
||
MOZ_CRASH Reason (Sanitized): MOZ_RELEASE_ASSERT(NS_IsMainThread()) (nsXPCWrappedJS::CallMethod called off main thread)
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 18•3 years ago
|
||
This has more analysis than bug 917961, so switching the dependency
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 20•3 years ago
|
||
91.0.1 ranks #1 for nsXPCWrappedJS::AddRef, a significant change [1] compared to version 78.13.0 which ranks #48. Can we determine why?
Some crash report examples:
bp-07e92124-3864-4df1-8956-206b80210519 84.0
bp-4bf19fa9-1056-4c58-8f17-8855c0210317 86.0
bp-38ac0885-044e-455b-9d8b-399930210315 87.0
bp-1fce9379-1355-48f8-a87e-7fdb00210506 89.0
Currently 45% of release crashes <1 minute, beta is 40%, but significantly higher than betas of a few months ago at 31% have uptime. bp-1f44681f-229e-454a-ac2c-0b7810210905 is a current beta example - it has no unusual addons.
[1] It's hard to say where the rate changed, and what patch caused the change, because crash rate is already very high for the at the start of the beta crash query data starting 3/5/2021. Another example where a year of crash data would be helpful.
version | count |
---|---|
83.0 | 24 |
84.0 | 10 |
85.0 | 21 |
86.0 | 43 |
87.0 | 444 (merge happened on 2/22) |
88.0 | 48 (merge happened on 3/22, don't know why the crash rate dips, FWIW smtp-js appears) |
89.0 | 789 (merged happened on 4/19) |
90.0 | 823 |
91.0 | 437 |
92.0 | 308 |
p.s. nsXPCWrappedJS::CallMethod is rare, probably has been for some time, except for a brush in pdf bug 1663859.
Reporter | ||
Comment 21•3 years ago
|
||
benc see comment 20
(In reply to Ben Campbell from comment #7)
These do all look related to me (ie xpcom calls that need to be done by the main thread are instead being invoked from some other thread -
nsXPCWrappedJS
stuff in particular, as the javascript has to be run on the main thread).Just to make sure I'm understanding things right:
- The fix is to set things up so that such actions are properly deferred to the main thread. For example, using
nsIRunnable
to package up the call for the main thread to run.- The patch in Bug 1133892 does for
nsMsgMailNewsUrl
, which holds some XPCOM pointers. Because releasing them might require executing some javascript (if the object is ansXPCWrappedJS
object), the release is now deferred to the main thread.I've spotted a few other places where mainthread deferral might be required -
nsMsgFolderNotificationService
listeners, pop3Sink innsPop3URL
, I'm sure there are a bunch more...
Hopefully those will go away with pop3-js bug 1707548!
Comment 22•3 years ago
|
||
Just been peeking through the callstacks for the current reports. It looks to me like the issue is that there's an nsMailToUrl
being constructed on a worker (DOMWorker) thread. This should be fine and normal... except that all our c-c URL classes are not just URLs. They all have state, and hold lots of nsCOMPtr<>
members they really really shouldn't. Which means they're not very good at being used except on the main thread.
Bigger picture:
We should make all our URL classes into actual URLs that don't hold lots of state about the progress of the request they represent (major job, but necessary I think).
More narrowly:
This crash signature has skewed massively to nsMailToUrl
urls. I don't know what's changed.
I'd be tempted to think that the new javascript SMTP might improve things... but I think that's false hope. If anything is still creating nsMailToUrl objects off the main thread, we're still going to have problems :-(
Comment 23•3 years ago
|
||
Looking at a few of those crashes, I think that stacks means that like nsMailtoUrl::ParseMailtoUrl() is calling into an XPCOM service implementing nsIMimeConverter, on a DOM worker thread. It looks like the service is implemented in JS, which means it is trying to go through XPConnect, which can't be used off of the main thread. That said, I can't see any JS implementations of nsIMimeService in the comm-central tree, so maybe it is an addon doing this? I do see some comments saying "nsIMIMEService is not thread-safe", so maybe something could go wrong even when using the C++ implementation of this interface. But presumably any issues would show up differently.
Assignee | ||
Comment 24•3 years ago
|
||
Hmm, I wonder if these all users have set the pref to use the c++ smtp....
That would seems unlikely. Perhaps some case where the right module isn't (yet) loaded?
From above the call comes through nsSmtpService.cpp which would imply so (and that's completely removed on trunk already).
It looks like nsISmtpUrl could be removed altogether, but there's still some usage of it. Bug 1724177 would remove almost all of it I think.
Reporter | ||
Comment 25•3 years ago
|
||
bp-1da9fd49-cafb-42e7-97b6-9b36e0210928 reported in https://support.mozilla.org/en-US/questions/1352146 happens when opening a PDF - so nothing to do with SMTP
Reporter | ||
Comment 26•3 years ago
|
||
That said, the great increase in crashes since beta 88/89 has a strong correlation to smtp.
We're being killed with crashes as can be seen in the chart https://crash-stats.mozilla.org/signature/?signature=nsXPCWrappedJS%3A%3AAddRef&date=%3E%3D2021-03-30T13%3A32%3A00.000Z&date=%3C2021-09-30T13%3A32%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#graphs
Assignee | ||
Comment 27•3 years ago
|
||
It seems these are for mailto: urls. I'll give them the same treatment as imap. Maybe somehow related to them getting triggered by an external call?
Assignee | ||
Comment 28•3 years ago
|
||
Comment 30•3 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/0ac5c49c584c
make sure to create mailto urls only on the main thread (non-main thread will crash). r=benc
Comment 31•3 years ago
|
||
Assignee | ||
Comment 32•3 years ago
|
||
I guess I should add ac_add_options --enable-warnings-as-errors
to .mozconfig locally.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 33•3 years ago
|
||
Comment on attachment 9243897 [details]
Bug 1175168 - make sure to create mailto urls only on the main thread (non-main thread will crash). r=benc
[Approval Request Comment]
Regression caused by (bug #): not directly a regression
User impact if declined: may crash
Testing completed (on c-c, etc.): daily
Risk to taking this patch (and alternatives if risky): very safe
Reporter | ||
Comment 34•3 years ago
|
||
Comment on attachment 9243897 [details]
Bug 1175168 - make sure to create mailto urls only on the main thread (non-main thread will crash). r=benc
[Triage Comment]
Approved for beta
Updated•3 years ago
|
Comment 35•3 years ago
|
||
bugherder uplift |
Reporter | ||
Comment 37•3 years ago
|
||
Comment on attachment 9243897 [details]
Bug 1175168 - make sure to create mailto urls only on the main thread (non-main thread will crash). r=benc
[Triage Comment]
Approved for esr91 (91.2.1) - it's been on beta for a week, so fingers crossed we get no regressions shipping in 91.
Note
- This has potential to fix Bug 1736192 - Thunderbird crashes @ nsXPCWrappedJS::AddRef when opening some pdf files in the viewer
- nsXPCWrappedJS::CallMethod crashes stopped after 91.0 buildid 20210713202926, which would be Bug 1663859
Comment 38•3 years ago
|
||
Need an esr91 version of this if it's to be uplifted.
Assignee | ||
Comment 40•3 years ago
|
||
ESR91 version: https://hg.mozilla.org/try-comm-central/rev/3a81512d74af2f1147d203cb6b2931d6ae582310
Try is running at https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=3efc1b1506975a38efdf21250e5b013ec02555f1
Reporter | ||
Comment 41•3 years ago
|
||
Try looks good
Assignee | ||
Comment 42•3 years ago
|
||
But I don't know why tests didn't run. (Not that this would have been tested anyway.)
Comment 43•3 years ago
|
||
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 44•3 years ago
|
||
This is a BIG step forward, so thanks for the patch.
Per crash-stats I can confirm nsXPCWrappedJS::AddRef is substantially reduced. In version 91.2.0 it was #1 crash, in 91.2.1 it drops to #39
Reporter | ||
Updated•3 years ago
|
Description
•