Deployment of CRLite Production Environment
Categories
(Cloud Services :: Operations: CRLite, task)
Tracking
(Not tracked)
People
(Reporter: jcj, Assigned: sven)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [Target: Q1 2020])
CRLite is a WebPKI-wide certificate revocation system, to be distributed via Remote Settings for all Firefox users, replacing OCSP. We're experimenting with it now using a pre-production CRLite instance and manual inspection and submission of CRLite filter files to Remote Settings.
This bug is to formally deploy CRLite and hand-over control of the production instance to CloudOps.
As of this writing, CRLite consists of several components, taken from https://github.com/mozilla/crlite/wiki/Overview:
Google Firestore
Bulk storage of all unexpired certificates in the Web PKI, as well as CT log metadata. They are organized in a heirarchy:
logs
/<url>
ct
/<expiration date string>
/issuer
/<issuer SPKI string>
/certs
/<certificate SPKI string>
Google Memorystore (Redis)
Fast lists of all unexpired certificate serial numbers, their issuers, and metadata (such as CRL distribution URLs).
A container, crlite-fetch
https://github.com/mozilla/crlite/tree/master/containers/crlite-fetch
This uses the ct-fetch tool from ct-mapreduce to download from all CT logs, placing the certificates into Firestore and the Memorystore/Redis cache. This container runs as an always-on Kubernetes deployment.
A container, crlite-generate
https://github.com/mozilla/crlite/tree/master/containers/crlite-generate
This run-to-completion Kubernetes cronjob uses several tools to construct a CRLite filter, and publish it, ultimately to Remote Settings.
A container, crlite-rebuild
https://github.com/mozilla/crlite/tree/master/containers/crlite-rebuild
This run-to-completion Kubernetes job is used when the Memorystore/Redis cache is invalid in some way. It reads all unexpired entries from the Google Firestore and rebuilds the Memorystore data.
Google Stackdriver
Metrics are published to Stackdriver for overall system health, as are logs. Errors and warnings are generally of two categories:
- Problems with infrastructure performance, which are still being addressed via adjustments to how operations are performed
- Problems with the WebPKI, which might well be used by the Mozilla CA Root Program for enforcement
Environments
As of this writing, jcj is still actively developing CRLite, and needs the full dataset for development. So if a magic wand were used today to make the deployments, I would need a stage
environment to work in as a sandbox -- which could certainly be the existing environment I am using.
*NOTE: * The prod
environment would probably want to start from a clone of my current environment's Firestore data, as it takes multiple calendar-months to synchronize CT data from the original sources.
(Supercedes bug 1429802)
Comment 2•5 years ago
|
||
Is https://github.com/mozilla/crlite a private repo on purpose? (I get a 404)
As CRLite is a very important service, I think the code should be open-source and released before a production environment is setup.
Reporter | ||
Comment 3•5 years ago
|
||
It is. We have to do a scrub and potentially a reinitialization. That said, the vast majority of the code is actually in https://github.com/jcjones/ct-mapreduce , what's in crlite is the kubernetes mechanisms and the filter generator that matches https://pypi.org/project/filtercascade/ and https://github.com/mozilla/rust-cascade.
I will definitely get it released. I completely agree opening it up, I just have to ensure it's clean and then get a review pass.
Updated•5 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Description
•