Closed Bug 1761953 Opened 3 years ago Closed 3 years ago

Blocklist remote settings don't get imported after initial sync on Android

Categories

(Firefox :: Remote Settings Client, defect)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox98 --- wontfix
firefox99 --- wontfix
firefox100 --- wontfix
firefox101 --- fixed
firefox102 --- fixed

People

(Reporter: jnicol, Assigned: leplatrem)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

Not sure whether the fault lies in the RemoteSettingsClient or the Blocklist, but I have noticed this potential bug when attempting to verify that the downloadable Gfx blocklist works on Android.

Essentially the flow is:

  1. Blocklist.loadBlocklistAsync is called from geckoview.js startup(). This calls GfxBlocklistRs.checkForEntries.
  2. This calls RemoteSettingsClient.get(). Because syncIfEmpty == true and hasLocalData == false, this creates _importingPromise here and then immediately awaits the promise immediately after.
  3. Meanwhile the promise asynchronous attempts to load the JSON dump, which fails on Android. It therefore calls sync(), which ends up in _importChanges() where it awaits emitting the "sync" signal. This is hooked up to GfxBlocklistRs.checkForEntries().
  4. GfxBlocklistRs.checkForEntries() calls RemoteSettingsClient.get(), and this time _importingPromise is non-null so we simply await it again. But this will never resolve because we are running from within the promise!!

If we close firefox and restart it, then the settings get imported correctly on the second run, because hasLocalData is true this time.

On desktop this doesn't occur because we bundle the JSON dump, so the _importingPromise function loads the dump rather than performing a sync. If I stop bundling the dump on desktop then we encounter this same problem. Presumably, bundling the dump on Android (eg bug 1722341) -- if it contained relevant entries -- would avoid the issue. But that seems more like a side effect than a solution.

I'm not very familiar with this code, so have no idea how best to fix this. Any ideas Rob?

Flags: needinfo?(rob)

This is a deadlock in RemoteSettings, when there is no dump.
The logic introduced by bug 1646315 exists to avoid concurrent import requests.
But without a dump, importing would fall back to syncing, which in turn is blocked on the completion of the import promise.

A way to resolve the deadlock is to set a flag before triggering sync, and clear that flag when the sync operation completes - all while being in the import promise (use try-finally to ensure that the flag is reset even if an error occurrs).
Then, if that flag is set, don't await the promise to avoid the deadlock. At the time that the deadlock is happening, the sync has already happened, so reading from the db should yield the retrieved values.

There may be another way to fix this, but in general you should look for a way to break the cyclic dependency to avoid the deadlock.

Flags: needinfo?(rob)
Keywords: regression
Regressed by: 1646315

Set release status flags based on info from the regressing bug 1646315

:leplatrem, since you are the author of the regressor, bug 1646315, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(mathieu)
Has Regression Range: --- → yes

Thanks for reporting this, and suggesting ideas!

In theory, calling .get() from a "sync" event callback should not be necessary, since the current records are passed with the event.

I had started to work on a related change a long time ago, but never landed it (https://phabricator.services.mozilla.com/D36913)

Among the possible ideas, we have:

  1. Add a flag to detect that a sync is going on
  2. Do not send "sync" events when synchronizing from .get()
  3. Remove call to .get() in "sync" callback, and use event payload

I will propose a patch

Assignee: nobody → mathieu
Status: NEW → ASSIGNED

Set release status flags based on info from the regressing bug 1646315

Did you have the opportunity to have a look at my patch and feedback?

Flags: needinfo?(mathieu)

The severity field is not set for this bug.
:leplatrem, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mathieu)
Pushed by mleplatre@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/df5f0344c0e8 Prevent deadlock in sync callback when no dump r=robwu
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch
Flags: needinfo?(mathieu)

The patch landed in nightly and beta is affected.
:leplatrem, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(mathieu)

Comment on attachment 9270248 [details]
Bug 1761953 - Prevent deadlock in sync callback when no dump r?robwu,jnicol

Beta/Release Uplift Approval Request

  • User impact if declined: Synchronization of remote settings can hang. Users can be stuck with outdated data.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch prevents a deadlock to happen, but does not interfere with the synchronization process.
  • String changes made/needed:
  • Is Android affected?: Yes
Flags: needinfo?(mathieu)
Attachment #9270248 - Flags: approval-mozilla-beta?

Comment on attachment 9270248 [details]
Bug 1761953 - Prevent deadlock in sync callback when no dump r?robwu,jnicol

Approved for Desktop 101.0b5 & Fenix/Focus 101.0.0-beta.4.

Attachment #9270248 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: