1007844 - Implement per-host telemetry for operational mozilla properties (aus4, amo)

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Description

•

11 years ago

In this mode, pinsets will be evaluated but pinning violations will only be reported to telemetry, not acted on. The current pinset structure contains: - hostname - boolean include_subdomains (for whether to match *.addons.mozilla.org instead of just addons.mozilla.org) - list of cert fingerprints Then the pinning service uses these static pinsets to compute whether or not the certificate chain for a given host matches one of the allowed fingerprints, and blocks the connection if no matches are found. If we add another boolean, test_mode, that is used for telemetry logging only, then it becomes a lot safer to activate a given pinset. We can watch the telemetry for the pinset in test mode, and if violations are zero (or near zero), then promote the pinset to production mode. This lets us do the following rollout plan: - Convert all pinsets to test mode, including ones imported from Google - Flip the default pinning enforcement level to on - Watch the test mode telemetry buckets - For all the buckets that have zero errors, promote the corresponding pinset to non-test mode.

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Updated

•

11 years ago

Blocks: 1004350

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 1

•

11 years ago

bug 772756 required test mode already, so stealing this bug for telemetry improvements.

Summary: implement test_mode for certificate pinning → Implement telemetry for all pinsets

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 2

•

11 years ago

Attached patch Implement telemetry for all the buckets, including test mode ( (obsolete) (deleted) — Details — Splinter Review

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Updated

•

11 years ago

Assignee: nobody → mmc

Status: NEW → ASSIGNED

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 3

•

11 years ago

Comment on attachment 8420473 [details] [diff] [review] Implement telemetry for all the buckets, including test mode ( Review of attachment 8420473 [details] [diff] [review]: ----------------------------------------------------------------- Unfortunately this change adds a huge list of histogram ids to the preload json. The reason is because we are importing pinsets from jsons and then sorting them by hostname, so just generating histogram ids won't be stable. This change also adds an "imported" field to the pinsets, which we technically don't need but is good as a sanity check when inspecting the generated file. As far as the number of buckets go, the largest existing telemetry histogram is 1024, so I didn't want to push my luck. There's also this cool thing http://telemetry.mozilla.org/docs.html#Telemetry that we can use for massaging the data later.

Attachment #8420473 - Flags: review?(dkeeler)

Attachment #8420473 - Flags: review?(cviecco)

Dana Keeler (she/her) (use needinfo) (:keeler for reviews)

Comment 4

•

11 years ago

Comment on attachment 8420473 [details] [diff] [review] Implement telemetry for all the buckets, including test mode ( Review of attachment 8420473 [details] [diff] [review]: ----------------------------------------------------------------- My main concerns are: 1. Updating the ids of new pinning entries. How do we plan to make that scale? Do we at least have a bug on automating it? 2. It seems like the CERT_PINNING_(TEST_)RESULTS telemetry is redundant. r- for now. ::: security/manager/boot/src/PublicKeyPinningService.cpp @@ +194,5 @@ > } > + mozilla::Telemetry::ID histogram = > + mozilla::Telemetry::CERT_PINNING_RESULTS_BY_HOST; > + if (foundEntry->mTestMode) { > + histogram = mozilla::Telemetry::CERT_PINNING_TEST_RESULTS_BY_HOST; Would it make more sense to put this in the other "if (foundEntry->mTestMode)" block? @@ +196,5 @@ > + mozilla::Telemetry::CERT_PINNING_RESULTS_BY_HOST; > + if (foundEntry->mTestMode) { > + histogram = mozilla::Telemetry::CERT_PINNING_TEST_RESULTS_BY_HOST; > + } > + uint32_t bucket = foundEntry->mId * 2 + result; I would be more comfortable with " + (result ? 1 : 0);" ::: security/manager/ssl/tests/unit/test_pinning.js @@ +99,5 @@ > + do_check_eq(results.counts[0], 1); // Failure count > + do_check_eq(results.counts[1], 3); // Success count > + > + let results = service.getHistogramById("CERT_PINNING_TEST_RESULTS") > + .snapshot(); nit: indent two @@ +115,5 @@ > + const TEST_MODE_PASSED = 292 * 2 + 1; > + > + // Everything is in production mode except test-mode.pinning.example.com > + results = service.getHistogramById("CERT_PINNING_RESULTS_BY_HOST") > + .snapshot(); nit: indent two @@ +116,5 @@ > + > + // Everything is in production mode except test-mode.pinning.example.com > + results = service.getHistogramById("CERT_PINNING_RESULTS_BY_HOST") > + .snapshot(); > + do_check_eq(1, results.counts[INCLUDE_SUBDOMAINS_FAILED]); nit: switching the order of (actual, expected) might lead to confusion later. It's fine to do it this way, but let's be consistent in this function. @@ +124,5 @@ > + do_check_eq(0, results.counts[TEST_MODE_FAILED]); > + do_check_eq(0, results.counts[TEST_MODE_PASSED]); > + > + results = service.getHistogramById("CERT_PINNING_TEST_RESULTS_BY_HOST") > + .snapshot(); nit: indent two ::: security/manager/tools/PreloadedHPKPins.json @@ +87,5 @@ > "pins": "mozilla_test", "test_mode": true } > + ], > + // Associate telemetry ids with all of the pins we want to monitor. > + "nextUnusedId": 307, > + "ids": { Is this generated by hand? How does it get updated when new hosts are added? ::: security/manager/tools/genHPKPStaticPins.js @@ +54,5 @@ > " const char* mHost;\n" + > " const bool mIncludeSubdomains;\n" + > " const bool mTestMode;\n" + > + " const bool mImported;\n" + > + " const int mId;\n" + This eventually gets used in a calculation that gets stored in a uint32_t. Why not just start it that way? @@ +450,5 @@ > + const UNKNOWN_ID = 0; > + if (entry.name in gStaticPins.ids) { > + printVal += gStaticPins.ids[entry.name] + ", "; > + } else { > + printVal += UNKNOWN_ID + ", "; Shouldn't we warn here? Or automatically update the list of pin ids? ::: toolkit/components/telemetry/Histograms.json @@ +5916,5 @@ > "high": "5000", > "n_buckets": 10, > "extended_statistics_ok": true > }, > + "CERT_PINNING_RESULTS": { It seems like this can be calculated based on CERT_PINNING_RESULTS_BY_HOST - is it necessary to have right now? (I imagine we'll want it when we do HPKP (i.e. non-preloaded pinning).) @@ +5921,5 @@ > + "expires_in_version": "never", > + "kind": "boolean", > + "description": "Certificate pinning results (0 = failure, 1 = success)" > + }, > + "CERT_PINNING_TEST_RESULTS": { Same here.

Attachment #8420473 - Flags: review?(dkeeler) → review-

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 5

•

11 years ago

(In reply to David Keeler (:keeler) from comment #4) > Comment on attachment 8420473 [details] [diff] [review] > Implement telemetry for all the buckets, including test mode ( > > Review of attachment 8420473 [details] [diff] [review]: > ----------------------------------------------------------------- > > My main concerns are: > 1. Updating the ids of new pinning entries. How do we plan to make that > scale? Do we at least have a bug on automating it? Yeah, we should have talked about this in the meeting, sorry I had to run out. About new pinning entries, here's a list of things I think we should automatically import from google: 1) Whenever certs get added or removed to a pinset (done) 2) Whenever a new host gets pinned, we should import but not enable in production mode (done). We can detect 2) when the UNKNOWN_ID bucket goes up. I'd love for us to get monitoring automatically when new hosts get added, but I don't see how without having access to strings. > 2. It seems like the CERT_PINNING_(TEST_)RESULTS telemetry is redundant. > r- for now. It is redundant if everything works correctly :) We could leave it in for human-readable sanity checks, or take it out. > ::: security/manager/tools/PreloadedHPKPins.json > @@ +87,5 @@ > > "pins": "mozilla_test", "test_mode": true } > > + ], > > + // Associate telemetry ids with all of the pins we want to monitor. > > + "nextUnusedId": 307, > > + "ids": { > > Is this generated by hand? How does it get updated when new hosts are added? For updating this list: 1) Notice that the UNKNOWN_ID (0) bucket count is increasing. 2) Go and inspect changes to transport_security_static_certs.json. 3) Find the new hosts and add them to the list, starting with nextUnusedId. 4) Update nextUnusedId. It sounds like I should try to automate this. I'll work on it.

Camilo Viecco (:cviecco)

Comment 6

•

11 years ago

Comment on attachment 8420473 [details] [diff] [review] Implement telemetry for all the buckets, including test mode ( Review of attachment 8420473 [details] [diff] [review]: ----------------------------------------------------------------- ::: security/manager/tools/PreloadedHPKPins.json @@ +87,5 @@ > "pins": "mozilla_test", "test_mode": true } > + ], > + // Associate telemetry ids with all of the pins we want to monitor. > + "nextUnusedId": 307, > + "ids": { I am also worried about this. This file is supposed to be maintained by humans not machines. If we want something like this list it should live similar to the rejects for HSTS. Another option would be not to use a list like this but just hash the domain in the entry and use that as the index. There will be collisions, but as long as we are able to have our test set reasonably small this will not happen often. ::: security/manager/tools/genHPKPStaticPins.js @@ +53,5 @@ > "struct TransportSecurityPreload {\n" + > " const char* mHost;\n" + > " const bool mIncludeSubdomains;\n" + > " const bool mTestMode;\n" + > + " const bool mImported;\n" + you dont use this value, please remove

Attachment #8420473 - Flags: review?(cviecco) → review-

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 7

•

11 years ago

Here are the collisions using the last 12 bits of sha256 (note: we only have 10 bits as proposed, so the problem is worse in practice). The cdn.mozilla.org/google collision is especially bad. Camilo and David, how about it we assign new buckets as we see new domains and keep it in a separate JSON? This lets us keep the preloads json file clean, and have another json that's both input and output to help keep track of buckets. 052 -, cdn.mozilla.org 052 -, google.com.bd 1c1 -, google.com.np 1c1 -, google.tk 32c -, google.gy 32c -, google.vg 51e -, google.gr 51e -, google.rw 61f -, google.co.ke 61f -, translate.googleapis.com 655 -, google.cc 655 -, google.com.cu

Dana Keeler (she/her) (use needinfo) (:keeler for reviews)

Comment 8

•

11 years ago

(In reply to [:mmc] Monica Chew (please use needinfo) from comment #7) > Camilo and David, how about it we assign new buckets as we see new domains > and keep it in a separate JSON? > > This lets us keep the preloads json file clean, and have another json that's > both input and output to help keep track of buckets. That's more like what I was thinking.

Camilo Viecco (:cviecco)

Comment 9

•

11 years ago

what is the plan once we have more than 1000 domains. cycle the numbers?

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 10

•

11 years ago

Adding bsmedberg at his request. He expressed concern about counting pinning violations when I asked about the maximum number of buckets that could exist in a histogram. bsmedberg, what will it take to move forward on this? Thanks, Monica

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 11

•

11 years ago

(In reply to Camilo Viecco (:cviecco) from comment #9) > what is the plan once we have more than 1000 domains. cycle the numbers? Switch to a new histogram.

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Updated

•

11 years ago

Summary: Implement telemetry for all pinsets → Implement per-host telemetry for operational mozilla properties (aus4, amo)

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 12

•

11 years ago

Attached patch bucket-amo (deleted) — Details — Splinter Review

Also adds aus4 in test mode.

Attachment #8420473 - Attachment is obsolete: true

Attachment #8423470 - Flags: review?(dkeeler)

Dana Keeler (she/her) (use needinfo) (:keeler for reviews)

Comment 13

•

11 years ago

Comment on attachment 8423470 [details] [diff] [review] bucket-amo Review of attachment 8423470 [details] [diff] [review]: ----------------------------------------------------------------- Great! ::: security/manager/boot/src/PublicKeyPinningService.cpp @@ +200,5 @@ > ? Telemetry::CERT_PINNING_MOZ_TEST_RESULTS > : Telemetry::CERT_PINNING_TEST_RESULTS; > retval = true; > } > + // We can collect per-host pinning violations for this entry. I would add: "because it is a Mozilla host necessary for the operation of the browser" or something ::: security/manager/tools/genHPKPStaticPins.js @@ +444,5 @@ > printVal += "true, "; > } else { > printVal += "false, "; > } > + if (entry.id >= 512) { How about "if ("id" in entry && entry.id >= 0)"? (id won't be a defined property on all of the entries, right?) (Or I guess entry.hasOwnProperty("id") would work?) ::: toolkit/components/telemetry/Histograms.json @@ +5950,5 @@ > }, > "CERT_PINNING_MOZ_RESULTS": { > "expires_in_version": "never", > "kind": "boolean", > + "description": "Certificate pinning evaluation results (0 = failure, 1 = success)" Should we be a bit more descriptive in each of these descriptions? (e.g. test mode vs. production mode, non-operational mozilla properties vs. non-mozilla properties) @@ +5960,5 @@ > + }, > + "CERT_PINNING_MOZ_RESULTS_BY_HOST": { > + "expires_in_version": "never", > + "kind": "enumerated", > + "n_values": 1024, Seems like we're not going to need anywhere near 1024 buckets this time around. @@ +5967,5 @@ > + "CERT_PINNING_MOZ_TEST_RESULTS_BY_HOST": { > + "expires_in_version": "never", > + "kind": "enumerated", > + "n_values": 1024, > + "description": "Certificate pinning evaluation results by host" Seems like we should mention these are for mozilla operational hosts. Also, it would be good to differentiate test vs. production in the description.

Attachment #8423470 - Flags: review?(dkeeler) → review+

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 14

•

11 years ago

Fixed comments and descriptions and cut buckets down to 512 (I haven't audited the moz properties enough to know how many "operational" ones there are), since it breaks histograms to increase n_values, better safe than sorry. I think we should write a custom dashboard anyway, so hopefully ugly empty buckets won't be an issue. > How about "if ("id" in entry && entry.id >= 0)"? (id won't be a defined > property on all of the entries, right?) (Or I guess > entry.hasOwnProperty("id") would work?) For this one, I left it -- if id is not in entry, it will be undefined and default to -1 as intended. test_mode required more care since we wanted it to default to true if not present.

[:mmc] Monica Chew (no longer reading bugmail)

Assignee

Comment 15

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/7cf39ac6f41e

Carsten Book [:Tomcat]

Comment 16

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/7cf39ac6f41e

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

Implement telemetry for all the buckets, including test mode ( 11 years ago [:mmc] Monica Chew (no longer reading bugmail) (deleted), patch	cviecco : review- keeler : review-	Details \| Diff \| Splinter Review
bucket-amo 11 years ago [:mmc] Monica Chew (no longer reading bugmail) (deleted), patch	keeler : review+	Details \| Diff \| Splinter Review