Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::ipc::ReadIPDLParam<T>]
Categories
(Core :: Graphics, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | unaffected |
firefox67 | --- | unaffected |
firefox68 | --- | unaffected |
firefox69 | blocking | verified |
firefox70 | + | verified |
People
(Reporter: marcia, Assigned: mattwoodrow)
References
Details
(Keywords: crash, regression, topcrash, Whiteboard: [rca - Coding Error])
Crash Data
Attachments
(1 file, 1 obsolete file)
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details |
This bug is for crash report bp-8688e7fb-c841-4264-99c8-ea8e60190629.
Seen while reviewing nightly crashes. These crashes started in 20190628161248: https://bit.ly/302JP8m. Many of the URLS are youtube.com, but there are some others.
Possible regression range based on build id: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9ee669c657c79b26a6ec0df246cc5bfe2b1d9a9a&tochange=8537d24d8aa8c0c2189c308d5835def2feec313c
Bug 1547218 touched some code in this area when it landed, ni on nika
Top 10 frames of crashing thread:
0 XUL mozilla::ipc::FatalError ipc/glue/ProtocolUtils.cpp:259
1 XUL mozilla::ipc::IProtocol::HandleFatalError const ipc/glue/ProtocolUtils.cpp:495
2 XUL bool mozilla::ipc::ReadIPDLParam<mozilla::layers::TimedTexture> ipc/glue/IPDLParamTraits.h:61
3 XUL bool mozilla::ipc::ReadIPDLParam<nsTArray<mozilla::layers::TimedTexture> > ipc/glue/IPDLParamTraits.h:61
4 XUL bool mozilla::ipc::ReadIPDLParam<mozilla::layers::OpUseTexture> ipc/glue/IPDLParamTraits.h:61
5 XUL bool mozilla::ipc::ReadIPDLParam<mozilla::layers::CompositableOperationDetail> ipc/glue/IPDLParamTraits.h:61
6 XUL mozilla::ipc::IPDLParamTraits<mozilla::layers::CompositableOperation>::Read ipc/ipdl/LayersMessages.cpp:10713
7 XUL mozilla::ipc::IPDLParamTraits<nsTArray<mozilla::layers::CompositableOperation> >::Read ipc/glue/IPDLParamTraits.h:182
8 XUL mozilla::layers::PImageBridgeParent::OnMessageReceived ipc/ipdl/PImageBridgeParent.cpp:349
9 XUL mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2158
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
The error message (IPCFatalErrorMsg
in the metadata) is Error deserializing 'textureParent' (PTexture) member of 'TimedTexture'
, so this is a problem with deserializing an actor reference.
Comment 2•5 years ago
|
||
I think this is probably showing up as being blamed by that code as I changed the names of some of the stack stack frames used there. I think this is more likely to be caused by some graphics code failing to manage actor lifecycles. Perhaps in webrender (no clue tbh)
ni? :jrmuizel because they might know more?
Comment 3•5 years ago
|
||
Sotaro, do you have any guesses as to what this could be?
Comment 4•5 years ago
|
||
(In reply to :Nika Layzell (ni? for response) from comment #2)
Perhaps in webrender (no clue tbh)
All crash logs that I check was "WR-", then WR seems not related to the crash.
Comment 5•5 years ago
|
||
(In reply to :Nika Layzell (ni? for response) from comment #2)
I think this is probably showing up as being blamed by that code as I changed the names of some of the stack stack frames used there. I think this is more likely to be caused by some graphics code failing to manage actor lifecycles.
I checked the related graphics code. It seemed not to have a problem of how to send ipdl message.
When the error happened, crash report had the following error log.
- Error deserializing 'textureParent' (PTexture) member of 'TimedTexture'
'TimedTexture' is sent with ipdl only by OpUseTexture. On ImageBridgeChild, it is created only by ImageBridgeChild::UseTextures().
https://searchfox.org/mozilla-central/source/gfx/layers/ipc/LayersMessages.ipdlh#439
https://searchfox.org/mozilla-central/source/gfx/layers/ipc/ImageBridgeChild.cpp#127
ImageBridgeChild::UseTextures() is called under ImageBridgeChild::UpdateImageClient(). The UpdateImageClient() manages ipc transaction.
https://searchfox.org/mozilla-central/source/gfx/layers/ipc/ImageBridgeChild.cpp#304
Function call sequence is like the following.
ImageContainer::SetCurrentImages()
->// dispatch to ImageBridge thread.
->ImageBridgeChild::UpdateImageClient()
-->ImageBridgeChild::BeginTransaction()
-->ImageClientSingle::UpdateImage()
--->ImageBridgeChild::UseTextures()
---->// Add OpUseTexture message
-->ImageBridgeChild::EndTransaction()
--->PImageBridgeChild::SendUpdate()
During the transaction, It checks if PTextureClient is valid. Therefore PTexture should be valid when ipdl message was sent by the SendUpdate().
Comment 6•5 years ago
|
||
In crash reports on Windows, there are several types of IPC error messages like the following. All of them include array. From it, I wonder if array in IPDL message might be related to the problem.
- Error deserializing 'securityPolicies' (ContentSecurityPolicy[]) member of 'ContentPrincipalInfo'
- Error deserializing 'originNoSuffix' (nsCString) member of 'ContentPrincipalInfo'
- Error deserializing 'attrs' (OriginAttributes) member of 'ContentPrincipalInfo'
- Error deserializing 'textureParent' (PTexture) member of 'TimedTexture'
- Error deserializing variant TArrayOfTransformFunction of union Animatable
- Error deserializing 'displayListLog' (nsCString) member of 'CommonLayerAttributes'
- Error deserializing 'scrollMetadata' (ScrollMetadata[]) member of 'CommonLayerAttributes'
- Error deserializing 'ancestorMaskLayers' (LayerHandle[]) member of 'CommonLayerAttributes'
Comment 7•5 years ago
|
||
The “crash signature” here just means that something failed to be deserialized. It's not surprising that there are a number of different failures grouped together under it; that doesn't imply anything one way or the other about whether they're actually related.
I notice that the vast majority (93%) of reports over the past week are for Error deserializing 'textureParent' (PTexture) member of 'TimedTexture'
; the other error messages are much less common.
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
Jed, does an error deserializing an actor (PTexture) suggest that we've got a race condition where we're sending a message containing a reference to an actor that has already been deleted?
Bug 1561178 is in the regression range, which did touch some graphics IPC code, though it shouldn't affect PImageBridge.
We have ImageBridgeChild::DestroyInTransaction which should detect the case where we want to destroy a PTexture while in the process of accumulating a transaction, and puts the delete request into the transaction (and processes it on the parent side after the transaction).
It's possible that there's a case where we miss that, but I don't see it.
Comment 9•5 years ago
|
||
The error message seemed to come from IPDLParamTraits<mozilla::layers::OpUseTexture>::Read().
https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/LayersMessages.cpp#9740
https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/LayersMessages.cpp#9745
Comment 10•5 years ago
|
||
One possible place of failure is IPDLParamTraits<mozilla::layers::PTextureParent*>::Read().
https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/PTextureParent.cpp#201
Another possible place of failure is IPDLParamTraits<nsTArray<T>>::Read()
https://searchfox.org/mozilla-central/source/ipc/glue/IPDLParamTraits.h#153
Comment 11•5 years ago
|
||
When ImageBridgeParent::AllocPTextureParent() failed because of TextureHost::CreateIPDLActor() failure. I also saw the following error message.
IPDL protocol error: Error deserializing 'textureParent' (PTexture) member of 'TimedTexture'
https://searchfox.org/mozilla-central/source/gfx/layers/ipc/ImageBridgeParent.cpp#291
Comment 12•5 years ago
|
||
[Tracking Requested - why for this release]:
this is a topcrasher in early results from 69.0b3 - together with the signature from bug 1561976 it's accounting for over a third of all browser crashes on beta currently.
as mentioned at https://bugzilla.mozilla.org/show_bug.cgi?id=1561976#c2 too, timing-wise it seems possible that bug 1561178 has regressed this.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 13•5 years ago
|
||
ni'ing Matt to see if there are any leads, as this is a beta topcrasher.
Updated•5 years ago
|
Comment 14•5 years ago
|
||
There is a patch here: 1564667 for more error logging so we can determine what is actually going on
Assignee | ||
Comment 15•5 years ago
|
||
If Sotaro's theory is right (and I suspect it is), this could be because GPUVideoTextureHost::CreateFromDescriptor is fallible.
We should probably unconditionally create the actor, and move the LookupTexture call into GPUVideoTextureHost::Lock.
Updated•5 years ago
|
Comment 16•5 years ago
|
||
the added error logging is commonly showing up as "TextureHost creation failure type=11" in recent nightly builds:
https://crash-stats.mozilla.com/search/?ipc_fatal_error_msg=%3DError%20deserializing%20%27textureParent%27%20%28PTexture%29%20member%20of%20%27TimedTexture%27&graphics_critical_error=~TextureHost%20creation%20failure%20type&_columns=signature&_columns=graphics_critical_error#crash-reports
Assignee | ||
Updated•5 years ago
|
Comment 17•5 years ago
|
||
Assuming this and bug 1561976 have the same root cause, this is blocking wider rollout of Fx69 to the Beta population.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 19•5 years ago
|
||
Assignee | ||
Comment 20•5 years ago
|
||
Depends on D37781
Assignee | ||
Comment 21•5 years ago
|
||
Assignee | ||
Comment 22•5 years ago
|
||
Comment on attachment 9077526 [details]
Bug 1562616 - Don't fail to create a GPUVideoTextureHost if the dependent texture isn't available, since that crashes the compositor. r?sotaro
Beta/Release Uplift Approval Request
- User impact if declined: Race condition that causes browser (parent process) crashes.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Should be very low risk since it changes the failing condition from causing an IPDL assert to instead just fail to render the racy video frame.
The other patch tries to fix the race condition, but we don't need to rush to uplift that.
- String changes made/needed:
Assignee | ||
Comment 23•5 years ago
|
||
Filed bug 1565405 to make this sort of issue easier to diagnose in the future.
Comment 24•5 years ago
|
||
Comment 25•5 years ago
|
||
Comment on attachment 9077526 [details]
Bug 1562616 - Don't fail to create a GPUVideoTextureHost if the dependent texture isn't available, since that crashes the compositor. r?sotaro
Approved for 69.0b4 to hopefully blunt this topcrash.
Comment 26•5 years ago
|
||
uplift |
Updated•5 years ago
|
Comment 27•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 28•5 years ago
|
||
Matt, maybe it'd be better to move the other patch to a new bug at this point to make our lives easier for tracking?
Comment 29•5 years ago
|
||
Hi, I managed to reproduce this issue in Firefox Beta 69.0b3 on a Mac OSX 10.14.5 using multiple tabs with Office, Powerpoint and Excel files as well as a few Youtube videos and Netflix, after Refreshing some of those pages Firefox would crash with the same crash signature as this issue.
I was unable to reproduce this issue on Firefox Beta 69.0b4 using the same profile, a few more tabs open like Reddit and other common websites while the videos from Youtube and Netflix were playing in the background. I do believe this issue is Verified as Fixed but we will not update any flags until we are certain of the Fix. We will keep our eyes on the crash reports for the next week or so and make sure that this crash no longer occurs in newer versions of Firefox.
Updated•5 years ago
|
Comment 30•5 years ago
|
||
Comment on attachment 9077525 [details]
Bug 1562616 - Wait for a response from the VideoBridge to ensure textures are available before telling the content process about them. r?sotaro
Revision D37781 was moved to bug 1566956. Setting attachment 9077525 [details] to obsolete.
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 31•5 years ago
|
||
Based on Crash stats and previous comments, remove the Qe-verify+ flag.
Comment 32•5 years ago
|
||
This bug has been identified as part of a pilot on determining root causes of blocking and dot release drivers.
It needs a root-cause set for it. Please see the list at https://docs.google.com/document/d/1FFEGsmoU8T0N8R9kk-MXWptOPtXXXRRIe4vQo3_HgMw/.
Add the root cause as a whiteboard
tag in the form [rca - <cause> ]
and remove the rca-needed
keyword.
If you have questions, please contact :tmaity.
Assignee | ||
Comment 33•5 years ago
|
||
Propagating an error within an IPDL/IPC message handler causes a deliberate process crash, as that's the desired behaviour if we receive corrupt input.
The issue here is that a runtime error (not related to bad IPC input) was propagating, causing us to crash in situations that we'd prefer to handle silently.
Going with 'Coding Error' for now, though it could be argued to be an architecture issue that it's too easy to trigger the deliberate crash code.
Description
•