Package relevant remote settings dumps for ETP URL classifier for GeckoView
Categories
(Firefox :: Remote Settings Client, task, P2)
Tracking
()
People
(Reporter: esawin, Unassigned)
References
Details
(Whiteboard: [geckoview:2022h2?])
Startup errors like main/url-classifier-skip-urls Local DB is empty, pull data from server
indicate that we don't package remote settings dumps when building GeckoView.
Looking into https://searchfox.org/mozilla-central/source/services/settings/dumps/main/moz.build it looks like there is a mix of Gecko and Firefox settings.
We should find the proper subset (probably just the privacy settings) of the dumps that are relevant to Gecko/GeckoView and enable packaging these for GeckoView builds.
Reporter | ||
Comment 1•5 years ago
|
||
Mathieu, can you help us find an owner for this?
Comment 2•5 years ago
|
||
Mathieu and i were chatting before this was opened. I see multiple requests for utl-classifier data (apparently in parallel). See log https://paste.mozilla.org/OjEbGN4U/raw which says it can't find the local DB
These happen every time I start the browser (and we see significant js running associated with requesting and processing them).
Profile: https://perfht.ml/3c1x1WV. Js for sync/etc: https://perfht.ml/390Lm3I and
https://perfht.ml/2Tdbceq
we don't see any of this in a fennec startup: https://perfht.ml/38Yej0u
Comment 3•5 years ago
|
||
The list of owners of each collection can be found here: https://docs.google.com/spreadsheets/d/1TogKspQnTPkYAKrZNZG3bf8MiETBm2lYHfl0wKd2BS4/edit#gid=0
The list of collections / features that are used on Geckoview is unclear to me. Same for the list of Remote Settings use-cases that cannot have empty data on startup.
For the url-classifier component, there must be a reason why we absolutely need some initial data right on startup. Shipping the dump would indeed be faster, and Johann may have insights about whether it's necessary or not to load the list synchronously during startup.
Comment 4•5 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #2)
Mathieu and i were chatting before this was opened. I see multiple requests for utl-classifier data (apparently in parallel). See log https://paste.mozilla.org/OjEbGN4U/raw which says it can't find the local DB
These happen every time I start the browser (and we see significant js running associated with requesting and processing them).
I haven't done a real test yet, by just checking the code, I guess the reason this happens is because:
- URL Classifier supports multiple features(tracking protection, fingerprinting protection, tracking annotation, etc.), each feature registers
an observer inUrlClassifierSkipListService.jsm
to listen to remote setting updates. [link] - For each registered featured,
UrlClassifierSkipListService
tries to get data from the remote setting DB. [link] RemoteSetting
doesn't have a local DB yet, eachRemoteSetting.get()
triggers a "pull data" sync request. [link]
I'll update UrlClassifierSkipListService to only call RemoteSetting.get()
once, this should be able to solve the multiple requests issue.
BTW,
Mathieu, do you think the current behavior of step3 works as expected, is it possible to only trigger one "sync" request even if we call the API for the same collection multiple times at startup?
Comment 5•5 years ago
|
||
Mathieu, do you think the current behavior of step3 works as expected, is it possible to only trigger one "sync" request even if we call the API for the same collection multiple times at startup?
Yes, it should be the case already. If you trigger several parallel syncs on the same client, only one should go through, the others will warn and exit
Comment 6•5 years ago
|
||
(In reply to Mathieu Leplatre [:leplatrem] from comment #5)
Mathieu, do you think the current behavior of step3 works as expected, is it possible to only trigger one "sync" request even if we call the API for the same collection multiple times at startup?
Yes, it should be the case already. If you trigger several parallel syncs on the same client, only one should go through, the others will warn and exit
OK, so I assume the current behavior of UrlClassifierSkipListService doesn't affect the performance a lot.
Jesup told me it shows "no local DB" at every startup on Fenix, do you know why? Is that related to how UrlClassifierSkipListService.jsm
uses RemoteSetting's API now?
If we don't know why this happens, can you suggest how we can debug this?
Updated•5 years ago
|
I looked at this some this morning and cannot reproduce the behavior the perf team is seeing with GVE. After enabling the settings service logging with a yaml config, I see the "local db is empty" messages on the first run:
02-27 09:43:17.501 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:17.502 29635 29650 I Gecko : Instantiated new client main/url-classifier-skip-urls
02-27 09:43:17.523 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:17.524 29635 29650 I Gecko : main/url-classifier-skip-urls Local DB is empty, pull data from server
02-27 09:43:17.525 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:17.526 29635 29650 I Gecko : main/url-classifier-skip-urls Local DB is empty, pull data from server
02-27 09:43:17.528 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:17.528 29635 29650 I Gecko : main/url-classifier-skip-urls Local DB is empty, pull data from server
02-27 09:43:17.820 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:17.821 29635 29650 I Gecko : main/url-classifier-skip-urls Local DB is empty, pull data from server
02-27 09:43:17.823 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:17.824 29635 29650 I Gecko : main/url-classifier-skip-urls Local DB is empty, pull data from server
02-27 09:43:17.828 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:17.829 29635 29650 I Gecko : main/url-classifier-skip-urls Local DB is empty, pull data from server
02-27 09:43:17.832 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:17.832 29635 29650 I Gecko : main/url-classifier-skip-urls Local DB is empty, pull data from server
02-27 09:43:18.123 29635 29650 I Gecko : console.warn: services.settings: main/url-classifier-skip-urls sync already running
02-27 09:43:18.125 29635 29650 I Gecko : console.warn: services.settings: main/url-classifier-skip-urls sync already running
02-27 09:43:18.132 29635 29650 I Gecko : console.warn: services.settings: main/url-classifier-skip-urls sync already running
02-27 09:43:18.135 29635 29650 I Gecko : console.warn: services.settings: main/url-classifier-skip-urls sync already running
02-27 09:43:18.136 29635 29650 I Gecko : console.warn: services.settings: main/url-classifier-skip-urls sync already running
02-27 09:43:18.137 29635 29650 I Gecko : console.warn: services.settings: main/url-classifier-skip-urls sync already running
02-27 09:43:21.225 29635 29650 I Gecko : console.debug: services.settings:
02-27 09:43:21.226 29635 29650 I Gecko : main/url-classifier-skip-urls sync status is success
However, subsequent runs and any page loads after the sync completes do not have these messages. I only see the following on the first page load after a subsequent startup:
02-27 09:58:18.444 30034 30049 I Gecko : console.debug: services.settings:
02-27 09:58:18.444 30034 30049 I Gecko : Instantiated new client main/url-classifier-skip-urls
I think this may be some kind of problem with the profile that browsertime is using.
Comment 8•5 years ago
|
||
However, subsequent runs and any page loads after the sync completes do not have these messages.
That's reassuring!
Is that related to how UrlClassifierSkipListService.jsm uses RemoteSetting's API now?
No, I don't think so. However, it's possible that UrlClassifier is one of the very few components that need data on startup. In Desktop, since we ship initial data, no network request is necessary.
Comment 9•5 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) (he/him) from comment #7)
I think this may be some kind of problem with the profile that browsertime is using.
This is generally a hard problem for automation: how do we produce profiles for use in the product that "look like" the fresh profiles produced by the product itself? I happened to be poking around these pre-populated dumps just yesterday and wondered how they were shipped in the product; answer is that they're in the omnijar.
Specifically for Raptor/browsertime, it might be worth finding a way for Gecko to populate a fresh profile and then immediately exit before downloading, e.g., SafeBrowsing DBs.
Comment 10•5 years ago
|
||
(In reply to Mathieu Leplatre [:leplatrem] from comment #8)
However, subsequent runs and any page loads after the sync completes do not have these messages.
That's reassuring!
Is that related to how UrlClassifierSkipListService.jsm uses RemoteSetting's API now?
No, I don't think so. However, it's possible that UrlClassifier is one of the very few components that need data on startup. In Desktop, since we ship initial data, no network request is necessary.
Yes, we do need initial data, IIRC. The essence is that these are temporary overrides to anti-tracking features that would break websites. So, some sites would be broken until the data is loaded.
Is there any particular reason why this isn't supported in Android? :)
Comment 11•5 years ago
|
||
Is there any particular reason why this isn't supported in Android? :)
A long time ago, downsizing the Android installer was a high priority. So we made the decision of not packaging Remote Settings initial data. Back then, no use-case was relying on Remote Settings data on startup, and there was no «sync if empty» behaviour, hence this seemed like a reasonable decision.
It's totally acceptable to revisit it 3 years later :)
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
I think we have evidence that we're only downloading these on the first run, which IMHO is probably fine. Randell are you ok with closing this?
Comment 13•5 years ago
|
||
I think we should still keep it open; the question is the size of the packaged data (specifically the skip-urls). If the size is "too large", perhaps we should close it then.
If there's a normal update, I presume the skip-url data is retained so we don't have this issue?
Comment 14•5 years ago
|
||
The size of the skip URLs list has always been very small (less than 10 entries) and it's not intended to grow large since it's sort of a last-resort mechanism for fixing anti-tracking breakage in the short term while we come up with a better long-term solution.
Comment 15•5 years ago
|
||
If there's a normal update, I presume the skip-url data is retained so we don't have this issue?
I would also presume that chrome storage is maintained on update.
Packaging the dump seems relevant here, at least to avoid the latency of downloading remote content...
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 16•3 years ago
|
||
This all seems to be in the remote settings codebase not geckoview. Could someone please take a look?
Comment 17•3 years ago
|
||
Which collections do you want to package? url-classifier-skip-urls
?
Basically it consists in doing:
curl "https://firefox.settings.services.mozilla.com/v1/buckets/main/collections/url-classifier-skip-urls/records | jq > services/settings/dumps/main/url-classifier-skip-urls.json
hg add services/settings/dumps/main/url-classifier-skip-urls.json
And editing the moz.build
file as described here:
https://firefox-source-docs.mozilla.org/services/settings/index.html#initial-data
Comment 18•3 years ago
|
||
This bug hasn't received activity recently.
I don't think it's completely related to remote settings, since it's a matter of picking up resources to package in Geckoview. Shall we move it to GeckoView component, or close it if irrelevant?
Comment 19•3 years ago
|
||
Redirect a needinfo that is pending on an inactive user to the triage owner.
:leplatrem, since the bug has recent activity, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 20•3 years ago
|
||
Agi, Jonathan, could you look at this please?
In short, it's about packaging Remote Settings data for Geckoview. When not packaged, this data is pulled from the network, and thus not available right away on startup/new profiles. Which could be problematic for security stuff.
Updated•2 years ago
|
Comment 21•2 years ago
|
||
Chris, possibly tag this bug to add remote settings for 2022 h2, or close this if there is a duplicate
Comment 22•2 years ago
|
||
(In reply to calu from comment #21)
Chris, possibly tag this bug to add remote settings for 2022 h2, or close this if there is a duplicate
Thanks for catching this old bug. I'll make this bug block the Android remote settings bug 1772232 and tag it so it's on the GeckoView team's radar.
Comment 23•2 years ago
|
||
I'm looking into remote settings on GeckoView/Fenix. It looks like we are packaging some remote settings but Gecko is not able to read them, I'm looking into fixing this.
Updated•2 years ago
|
Description
•