Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::ipc::IPDLParamTraits<T>::Write]
Categories
(Core :: DOM: Security, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox67 | --- | unaffected |
firefox67.0.1 | --- | unaffected |
firefox68 | --- | unaffected |
firefox69 | + | verified |
People
(Reporter: calixte, Assigned: ckerschb)
References
(Blocks 2 open bugs, Regression)
Details
(Keywords: crash, regression, topcrash, Whiteboard: [domsecurity-active])
Crash Data
Attachments
(2 files)
This bug is for crash report bp-8b9c1734-8389-4fa0-b212-4c2df0190522.
Top 10 frames of crashing thread:
0 xul.dll mozilla::ipc::FatalError ipc/glue/ProtocolUtils.cpp:262
1 xul.dll mozilla::ipc::IProtocol::HandleFatalError ipc/glue/ProtocolUtils.cpp:473
2 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::ipc::PrincipalInfo>::Write ipc/ipdl/PBackgroundSharedTypes.cpp:860
3 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::ipc::CSPInfo>::Write ipc/ipdl/PBackgroundSharedTypes.cpp:1078
4 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::Maybe<mozilla::ipc::CSPInfo> >::Write ipc/glue/IPDLParamTraits.h:231
5 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::dom::IPCClientInfo>::Write ipc/ipdl/ClientIPCTypes.cpp:178
6 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::Maybe<mozilla::dom::IPCClientInfo> >::Write ipc/glue/IPDLParamTraits.h:231
7 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::net::LoadInfoArgs>::Write ipc/ipdl/NeckoChannelParams.cpp:558
8 xul.dll mozilla::ipc::IPDLParamTraits<mozilla::Maybe<mozilla::net::LoadInfoArgs> >::Write ipc/glue/IPDLParamTraits.h:231
9 xul.dll static void mozilla::ipc::IPDLParamTraits<mozilla::net::HttpChannelOpenArgs>::Write ipc/ipdl/NeckoChannelParams.cpp:1307
There are 65 crashes (from 15 installations) in nightly 69 starting with buildid 20190522093426. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1547218.
The MOZ_CRASH_REASON is always: IPDL error: "unknown union type". abort()ing as a result.
[1] https://hg.mozilla.org/mozilla-central/rev?node=15b4b7f9d97a
Comment 1•6 years ago
|
||
This is the set of commits added in Nightly 20190522093426: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=257f2c96cef502a1d674df56c8e39d76d8ed4d89&tochange=5f95b3f2ea44723ba6a8c41a4b27c88032df709f
Bug 965637, a huge change to how we deal with CSP, including serialization and deserialization, seems like a much more likely regressor.
Comment 2•6 years ago
|
||
This is the #1 non-shutdown hang crash on OSX for the May 22 Nightlies, with 14 crashes from 3 installations.
Assignee | ||
Comment 3•6 years ago
|
||
:jkt, can you please take a look while I am out. As soon as I am back I'll take over of course. Please also hold back with landing you Principal serialization patches till we have sorted this problem out. Thanks!
Comment 4•6 years ago
|
||
Hmm, not sure how my changes would be impacting this, but here's an explanation as to what's going on:
So unfortunately, trying to send a T__None
(which is the default) union type over IPC causes this crash. This is caused whenever an IPDL union has been overridden. It looks like the cause here is that the PrincipalInfo
object was default-initialized when creating the CSPInfo
, and not overridden with a different value.
From some poking around, I found at least one place where this can happen. In the method CSPToCSPInfo
if aCSP
is nullptr
, the CSPInfo
will not be explicitly initialized, and instead will be left in its default-initialized state. This leaves the PrincipalInfo
member in an invalid state, and can trigger this crash. https://searchfox.org/mozilla-central/rev/662de518b1686c4769320d6b8825ce4864c4eda0/ipc/glue/BackgroundUtils.cpp#213-215
I'm not entirely sure how we would've gotten in that particular situation, but it seems like https://searchfox.org/mozilla-central/rev/662de518b1686c4769320d6b8825ce4864c4eda0/dom/workers/WorkerLoadInfo.cpp#117 is the main call site where we don't null-check the nsIContentSecurityPolicy
argument.
Sending a ni? to :ckerschb because I think he's touched this code before.
--
Just a little note here to explain why this weird T__None
situation exists - basically in order for ParamTraits
to work properly, we need to have a default constructor so that the target outparameter can be initialized & passed in to ParamTraits::Read. The crash when sending that default state over IPC was made because you're theoretically not supposed to do this.
It may be possible/reasonable to allow that value to be serialized, but I have no strong opinions there.
Comment 5•6 years ago
|
||
It looks like this is actually the top crash across all platforms.
[Tracking Requested - why for this release]: top crash
I tried loading a few URLs from the crash reports and then reloading the browser, but that didn't trigger the crash.
Comment 6•6 years ago
|
||
It looks to me like the code should have been returning a failure, checking with try if it causes no problems: https://treeherder.mozilla.org/#/jobs?repo=try&revision=48fbf9680cb0352e7a09b2ba170204f8e0fb45fb
I haven't been able to replicate the crash but based on Comment 4 this looks like the issue and if it doens't cause any other issues with tests I think it's worth trying to land to see if it resolves the issue.
Comment 7•6 years ago
|
||
Comment 8•5 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #5)
I tried loading a few URLs from the crash reports and then reloading the browser, but that didn't trigger the crash.
This is a really annoying crash. For a bit more context, this consistently crashes on my Nightly on latest Arch Linux x86-64 (Gnome with Wayland), on some pages. Most prominent are web.telegram.org and Google Spreadsheets. FWIW both sites load halfway through and crash just before becoming interactive (there is a flash of page content for half a second before the crash hits).
My Nightly is my day-to-day browser, but is fairly light on addons. I do run Webrender as an opt-in. I'm happy to email someone more info or test builds to help get this fixed, feel free to email me at flaki@moco or ping me on Slack.
Comment 9•5 years ago
|
||
I have a patch that I just sent out for testing, if you could try the build on there that likely would be helpful: https://treeherder.mozilla.org/#/jobs?repo=try&revision=65363b4d97e6235f020e200777bc3a4706d85781 (target file) I have no idea how broken this will be for our test suite yet.
I no longer instantiate the CSPInfo anymore on the callsites and instead use Maybe<CSPInfo> to pass an owned return. I also noticed WorkerPrivate::StoreCSPOnClient()
wasn't checking if the CSPInfo existed, this might have been the cause of the other perma fail regressions of the bug 965637 that caused this.
Due to the crash handle being so generic the stats aren't reliable for this bug. I have also struggled to replicate the crash, however I have been working on the assumption that Nika posted in Comment 4 as there aren't other call sites that create CSPInfo.
Comment 10•5 years ago
|
||
Ignore that :flaki this is still crashing, however I think I have narrowed down the error to one place. Your examples make sense as they likely use workers.
Comment 11•5 years ago
|
||
Please note that I can always replicate this crash on https://www.getdigital.de/Matias-Tastatur-Tactile-Pro-FK302-DE.html. The tab's content crashes as soon as the site has finished loading. Maybe it will be helpful for verification.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 12•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #11)
Please note that I can always replicate this crash on https://www.getdigital.de/Matias-Tastatur-Tactile-Pro-FK302-DE.html. The tab's content crashes as soon as the site has finished loading. Maybe it will be helpful for verification.
:whimboo, I wasn't able to reproduce the crash using https://www.getdigital.de/Matias-Tastatur-Tactile-Pro-FK302-DE.html. Obviously a testcase would be super helpful to get this problem sorted is. Question is, can you reproduce using a fresh profile as well? Anything else you can think of so I can reproduce the problem? Thanks for your help!
Comment 13•5 years ago
|
||
The preference necessary to trigger the crash for me is security.csp.enable
. You will have to set it to false
.
Assignee | ||
Comment 14•5 years ago
|
||
Finally I got a chance to take a look at what's going on here. As :nika already indicated within comment 4 we are creating a CSPInfo which then does not get initialized correctly because aCSP is null here [1]. This problem occurs only for worker contexts because we do the correct null checks on the callsites for all other contexts.
I was able to reproduce the problem by
- visiting https://www.getdigital.de/Matias-Tastatur-Tactile-Pro-FK302-DE.html
- after setting security.csp.enable to false.
The problem is indeed that the CSP was null here [1] which caused the CSPInfo to end up in a half initialized state.
Here is what I suggest:
a) We do the proper null checks in worker contexts before creating a CSPInfo within this bug, and
b) We refactor CSPToCSPInfo to not use an outgoing argument but to return a CSPInfo, because currently we might still end up with a half way initialized cspInfo in case something is bogus with the principal [1].
However, I assume that (a) already fixes the vast majority of problems.
[1] https://hg.mozilla.org/mozilla-central/rev/ddf4012a7652#l60.103
Assignee | ||
Comment 15•5 years ago
|
||
Comment 16•5 years ago
|
||
Comment 17•5 years ago
|
||
bugherder |
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 18•5 years ago
|
||
Confirmed issue with 69.0a1 (2019-05-23) on Windows 10.
Fix verified with 69.0b11 on Windows 10, macOS 10.13, Ubutnu 16.04.
A brief navigation on the mentioned website revealed no other issues/crashes as well.
Updated•3 years ago
|
Description
•