Open
Bug 1355903
Opened 8 years ago
Updated 2 years ago
Re-enable Certificate Transparency telemetry collection
Categories
(Core :: Security: PSM, defect, P3)
Core
Security: PSM
Tracking
()
NEW
People
(Reporter: jcj, Unassigned)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
(Whiteboard: [psm-blocked])
Bug 1353216 shut off certificate transparency telemetry collection due to a performance regression. This bug should turn it back on after performance is improved.
Comment 1•8 years ago
|
||
sorry for the drive-by partially informed comment - but given that this is just for reporting is this something that can be done asynchronously queued and processed at 'idle' time? (its been a while, but iirc firefox has an event for that) so the math doesn't impact responsiveness? or is there too much state tied up in the connection to practically copy/addref for queueing?
Reporter | ||
Comment 2•8 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #1)
> sorry for the drive-by partially informed comment - but given that this is
> just for reporting is this something that can be done asynchronously queued
> and processed at 'idle' time? (its been a while, but iirc firefox has an
> event for that) so the math doesn't impact responsiveness? or is there too
> much state tied up in the connection to practically copy/addref for queueing?
It's a good idea, but an eventual goal is to have CT enforcement, too -- which would need to be synchronous to be effective.
Reporter | ||
Comment 3•8 years ago
|
||
Keeler and I discussed more what is needed to re-enable our CT telemetry collection:
== Better Tests ==
Before we get to performance testing, I think we need more significant tests in-tree, as specified in bug 1361177.
== Speedups ==
It seems that most of the performance regression was due to unnecessary cert chain validations, fixed in Bug 1356611. The unnecessary validations were made much worse by the slower-than-desired verification algorithm; Keeler filed Bug 1355993 to address speeding that up. It should be possible to get some performance increase, but given the magnitude of the fix from Bug 1356611, we _might not need_ the algorithm speedup to proceed.
We'd optimally like to see a Talos test for HTTPS that could show a regression during CT (Bug 1356073), but perhaps we can proceed with a plan that checks telemetry instead:
== Speedup Test Plan ==
When we disabled CT, the SSL_SUCCESFUL_CERT_VALIDATION_TIME_MOZILLAPKIX telemetry metric dropped precipitously [1] (the drop corresponds to the disable date) [2]. In-code, this metric is firmly tied to the CT performance.
I think it's OK to move on without the Talos test and verify whether our performance is OK using Nightly telemetry. If slows down more than a little, we'd want to wait on Bug 1355993 to proceed.
== Recommendation Summary ==
1. We need the tests in bug 1361177.
2. Then we can re-enable using this bug and monitor SSL_SUCCESFUL_CERT_VALIDATION_TIME_MOZILLAPKIX.
2.a. If performance slows down by an unacceptable amount, we back-out the enable commit.
2.b. If we back-out, then we block on Bug 1355993 to re-try.
[1] https://i.have.insufficient.coffee/ct-validation-time-fix.png
[2] https://mzl.la/2qKALHN
Depends on: 1361177
Reporter | ||
Comment 4•8 years ago
|
||
After further discussion, I am amending my recommendation that we only re-enable after we also have Talos tests, and marking this bug as being blocked on Bug 1356073:
== Recommendation Summary (Revised) ==
1. We need the tests in bug 1361177.
2. We also need the Talos test in bug 1356073 to monitor for HTTPS performance regressions.
3. Then we can re-enable using this bug and monitor SSL_SUCCESFUL_CERT_VALIDATION_TIME_MOZILLAPKIX in Nightly.
4.a. If performance slows down by an unacceptable amount, we back-out the enable commit.
4.b. If we back-out, then we block on Bug 1355993 before re-trying.
Depends on: 1356073
Priority: -- → P3
Whiteboard: [psm-blocked]
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•