Open Bug 1727526 Opened 3 years ago Updated 2 years ago

Investigate if CacheParent/CacheStreamControlParent may keep alive Manager during shutdown

Categories

(Core :: Storage: Cache API, task)

task

Tracking

()

REOPENED
99 Branch
Tracking Status
firefox99 --- affected

People

(Reporter: jstutte, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: leave-open)

Attachments

(3 files)

There is a reference cycle between:

CacheStreamControlParent.mStreamList (strong)
StreamList.mStreamControl (weak)

that (depending on CacheStreamControlParent's lifecycle) might keep alive the StreamList instance which in turn keeps alive Manager and Context who keep alive sFactory which would end up blocking the cache shutdown.

It is unclear if this can really happen, but it might be worth investigating.

There is also CacheParent holding a reference to Manager that is destroyed only on ActorDestroy. Also this deserves some investigation if we can really rely on ActorDestroyto be always executed in time. We might want to add some implementation to CacheQuotaClient::ForceKillActors that ensures all relevant actors are correctly removed.

Summary: Investigate if CacheStreamControlParent may keep alive StreamList during shutdown → Investigate if CacheParent/CacheStreamControlParent may keep alive StreamList during shutdown
Summary: Investigate if CacheParent/CacheStreamControlParent may keep alive StreamList during shutdown → Investigate if CacheParent/CacheStreamControlParent may keep alive Manager during shutdown

So PCacheStreamControlParent::Send__delete__ bails out if we cannot send:

auto PCacheStreamControlParent::Send__delete__(PCacheStreamControlParent* actor) -> bool
{
    if (!actor || !actor->CanSend()) {
        NS_WARNING("Attempt to __delete__ missing or closed actor");
        return false;
    }
    IPC::Message* msg__ = PCacheStreamControl::Msg___delete__((actor)->Id());

    MOZ_RELEASE_ASSERT(actor, "NULL actor value passed to non-nullable param");
    WriteIPDLParam(msg__, actor, actor);
    // Sentinel = 'actor'
    (msg__)->WriteSentinel(102892058);

    if (mozilla::ipc::LoggingEnabledFor("PCacheStreamControlParent")) {
        mozilla::ipc::LogMessageForProtocol(
            "PCacheStreamControlParent",
            actor->ToplevelProtocol()->OtherPidMaybeInvalid(),
            "Sending ",
            msg__->type(),
            mozilla::ipc::MessageDirection::eSending);
    }
    AUTO_PROFILER_LABEL("PCacheStreamControl::Msg___delete__", OTHER);

    bool sendok__ = (actor)->ChannelSend(msg__);

    IProtocol* mgr = actor->Manager();
    actor->DestroySubtree(Deletion);
    actor->ClearSubtree();
    mgr->RemoveManagee(PCacheStreamControlMsgStart, actor);
    return sendok__;
}

This could lead to not execute the cleanup code at the end of the function? Is this intended if we cannot send? Note that I am not even sure if that cleanup code would kill our instance.
Given that we use raw CacheStreamControlParent* mStreamControl; everywhere else, could this lead to a situation where we just never release our instance?

Flags: needinfo?(bugs)
Flags: needinfo?(bugs)

Hmm, that does not do mgr->RemoveManagee(PCacheStreamControlMsgStart, actor); , FWIW.

And IIUC RemoveManagee handles the release of the actor at least for PBackgroundParent.

My doubt would just be that some release happens (significantly) later than we would it expect to happen if not triggered by Send__delete__ directly.

OK, I just found out that if we simply remove that || !actor->CanSend() we will fail on WriteIPDLParam(msg__, actor, actor);. So if ever this could become a potential problem, the fix would be less trivial.

I would probably have expected to see something like

auto PCacheStreamControlParent::Send__delete__(PCacheStreamControlParent* actor) -> bool
{
    if (!actor || actor->Id() == kFreedActorId) {
        NS_WARNING("Attempt to __delete__ missing or deleted actor.");
        return false;
    }

here. But I probably get this totally wrong and it is ok as is and there is something specifically bogus just with some of the involved cache actors.

Assignee: nobody → jstutte
Status: NEW → ASSIGNED
Attachment #9264981 - Attachment description: Bug 1727526: Add AssertWillDelete before calling Send__delete__. r?#dom-storage-reviewers → Bug 1727526: Add AssertWillDelete before calling Send__delete__ on CacheStreamControlParent. r?#dom-storage-reviewers
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5ea965265189 Add AssertWillDelete before calling Send__delete__ on CacheStreamControlParent. r=dom-storage-reviewers,janv
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---

I assume this is just the beginning of the investigation and might well be a dead end.

(In reply to Jens Stutte [:jstutte] from comment #10)

I assume this is just the beginning of the investigation and might well be a dead end.

At least the first two incoming crashes for 20220223093800 do not show any sign of the new annotation. Let's observe this for another while, though.

Attachment #9265303 - Attachment description: Bug 1727526: Have AssertWillDelete check the very same condition as Send__delete__. r?#dom-storage-reviewers → Bug 1727526: Make AssertWillDelete check mStreamList and remove suspect candidates on ActorDestroy. r?#dom-storage-reviewers
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/061882e1a3cd Make AssertWillDelete check mStreamList and remove suspect candidates on ActorDestroy. r=dom-storage-reviewers,janv

Following the pattern used in D139569, we want to confirm that all cache related parent side actors are
deleted by their Send__delete__ when expected. In order to hide some of the uglyness, we wrap
Send__delete__ into specific SendDelete functions whose only purpose is to check if Send__delete__ will
execute its cleanup code path and record a suspect if not.
If we get an ActorDestroy from any code path, we will remove the suspect candidate from our list.
The expected result would be to never see these annotations in QuotaManagerShutdownTimeout.

The expected result would be to never see these annotations in QuotaManagerShutdownTimeout.

In that case we will need a different investigation path.

The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jstutte)

I am not actively working on this but we cannot close this without a final investigation.

Assignee: jstutte → nobody
Flags: needinfo?(jstutte)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: