Investigate if CacheParent/CacheStreamControlParent may keep alive Manager during shutdown
Categories
(Core :: Storage: Cache API, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox99 | --- | affected |
People
(Reporter: jstutte, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: leave-open)
Attachments
(3 files)
There is a reference cycle between:
CacheStreamControlParent.mStreamList
(strong)
StreamList.mStreamControl
(weak)
that (depending on CacheStreamControlParent
's lifecycle) might keep alive the StreamList
instance which in turn keeps alive Manager
and Context
who keep alive sFactory
which would end up blocking the cache shutdown.
It is unclear if this can really happen, but it might be worth investigating.
Reporter | ||
Comment 1•3 years ago
|
||
There is also CacheParent
holding a reference to Manager
that is destroyed only on ActorDestroy
. Also this deserves some investigation if we can really rely on ActorDestroy
to be always executed in time. We might want to add some implementation to CacheQuotaClient::ForceKillActors
that ensures all relevant actors are correctly removed.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 2•3 years ago
|
||
So PCacheStreamControlParent::Send__delete__
bails out if we cannot send:
auto PCacheStreamControlParent::Send__delete__(PCacheStreamControlParent* actor) -> bool
{
if (!actor || !actor->CanSend()) {
NS_WARNING("Attempt to __delete__ missing or closed actor");
return false;
}
IPC::Message* msg__ = PCacheStreamControl::Msg___delete__((actor)->Id());
MOZ_RELEASE_ASSERT(actor, "NULL actor value passed to non-nullable param");
WriteIPDLParam(msg__, actor, actor);
// Sentinel = 'actor'
(msg__)->WriteSentinel(102892058);
if (mozilla::ipc::LoggingEnabledFor("PCacheStreamControlParent")) {
mozilla::ipc::LogMessageForProtocol(
"PCacheStreamControlParent",
actor->ToplevelProtocol()->OtherPidMaybeInvalid(),
"Sending ",
msg__->type(),
mozilla::ipc::MessageDirection::eSending);
}
AUTO_PROFILER_LABEL("PCacheStreamControl::Msg___delete__", OTHER);
bool sendok__ = (actor)->ChannelSend(msg__);
IProtocol* mgr = actor->Manager();
actor->DestroySubtree(Deletion);
actor->ClearSubtree();
mgr->RemoveManagee(PCacheStreamControlMsgStart, actor);
return sendok__;
}
This could lead to not execute the cleanup code at the end of the function? Is this intended if we cannot send? Note that I am not even sure if that cleanup code would kill our instance.
Given that we use raw CacheStreamControlParent* mStreamControl;
everywhere else, could this lead to a situation where we just never release our instance?
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Comment 3•3 years ago
|
||
(and I said on matrix:
I think https://searchfox.org/mozilla-central/source/__GENERATED__/__android-armv7__/ipc/ipdl/PContentChild.cpp#15876,15878-15879 does the right thing
as an example)
Reporter | ||
Comment 4•3 years ago
|
||
Hmm, that does not do mgr->RemoveManagee(PCacheStreamControlMsgStart, actor);
, FWIW.
And IIUC RemoveManagee
handles the release of the actor at least for PBackgroundParent
.
My doubt would just be that some release happens (significantly) later than we would it expect to happen if not triggered by Send__delete__
directly.
Reporter | ||
Comment 5•3 years ago
|
||
OK, I just found out that if we simply remove that || !actor->CanSend()
we will fail on WriteIPDLParam(msg__, actor, actor);
. So if ever this could become a potential problem, the fix would be less trivial.
Reporter | ||
Comment 6•3 years ago
|
||
I would probably have expected to see something like
auto PCacheStreamControlParent::Send__delete__(PCacheStreamControlParent* actor) -> bool
{
if (!actor || actor->Id() == kFreedActorId) {
NS_WARNING("Attempt to __delete__ missing or deleted actor.");
return false;
}
here. But I probably get this totally wrong and it is ok as is and there is something specifically bogus just with some of the involved cache actors.
Reporter | ||
Comment 7•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 9•3 years ago
|
||
bugherder |
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 10•3 years ago
|
||
I assume this is just the beginning of the investigation and might well be a dead end.
Updated•3 years ago
|
Reporter | ||
Comment 11•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #10)
I assume this is just the beginning of the investigation and might well be a dead end.
At least the first two incoming crashes for 20220223093800 do not show any sign of the new annotation. Let's observe this for another while, though.
Reporter | ||
Comment 12•3 years ago
|
||
Updated•3 years ago
|
Comment 13•3 years ago
|
||
Comment 14•3 years ago
|
||
bugherder |
Reporter | ||
Comment 15•3 years ago
|
||
Following the pattern used in D139569, we want to confirm that all cache related parent side actors are
deleted by their Send__delete__ when expected. In order to hide some of the uglyness, we wrap
Send__delete__ into specific SendDelete functions whose only purpose is to check if Send__delete__ will
execute its cleanup code path and record a suspect if not.
If we get an ActorDestroy from any code path, we will remove the suspect candidate from our list.
The expected result would be to never see these annotations in QuotaManagerShutdownTimeout.
Reporter | ||
Comment 16•3 years ago
|
||
The expected result would be to never see these annotations in QuotaManagerShutdownTimeout.
In that case we will need a different investigation path.
Comment 17•2 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 18•2 years ago
|
||
I am not actively working on this but we cannot close this without a final investigation.
Description
•