Add mar download telemetry to evaluate impact of downloading mars over https
Categories
(Toolkit :: Application Update, task, P2)
Tracking
()
Install Update Workflow | Prioritized |
People
(Reporter: jvehent, Assigned: agashlin)
References
(Blocks 1 open bug)
Details
Reporter | ||
Comment 1•7 years ago
|
||
Comment 2•7 years ago
|
||
Reporter | ||
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Comment 6•7 years ago
|
||
Reporter | ||
Comment 8•6 years ago
|
||
Updated•6 years ago
|
Comment 9•6 years ago
|
||
Comment 10•5 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #5)
<snip>
In terms of pin expiring, we basically set the expiration date manually for
each release to be around the time we release version n + 2 (see e.g. bug
1427957). The ESR and Beta expiration dates should be about 14 weeks from
when each point version/build is released.
Hi Dana,
What does "set the expiration date manually for each release to be around the time we release version n + 2" mean in practice? Will a client older than that not be able to update? A long time ago I was given a failure rate due to pinning (iirc around 0.5%). What is the current failure rate?
For example, 67 released recently with the pinning expiration time set to Aug 15 2019. Since we're planning on releasing 68 on/around Jul 9 2019 (according to https://wiki.mozilla.org/Release_Management/Calendar anyway), those pins will still be active for about a month in 67 at that point. (So it looks like n+2 isn't really true any longer...)
In any case, when those pins expire, they won't be enforced, so expiry would make it easier to update rather than harder.
Some quick back-of-the-envelope calculations using https://mzl.la/2KqkNvl and buckets 32/33 (failures/successes for *.cdn.mozilla.net) indicate 160k failures vs 707m successes, or a failure rate of 0.02%. (I think I can run the numbers on release using atmo if necessary, but it might take a little while.)
Reporter | ||
Comment 12•5 years ago
|
||
You may find this little bash snippet helpful:
mozpins () {
echo "Preloaded Public Key Pins expiration dates"
for release in mozilla-beta mozilla-release mozilla-esr52 mozilla-esr60
do
loc="https://hg.mozilla.org/releases/${release}/raw-file/tip/security/manager/ssl/StaticHPKPins.h"
echo -n "$release: "
TZ='UTC' date --date="@$((
$( \
curl -s $loc | grep kPreloadPKPinsExpirationTime | awk -F'[()]' '{print $2}' \
) / 10**6 ))"
done
}
$ mozpins
Preloaded Public Key Pins expiration dates
mozilla-beta: Mon Sep 9 12:36:53 UTC 2019
mozilla-release: Thu Aug 15 12:31:43 UTC 2019
mozilla-esr52: Thu Dec 13 06:21:31 UTC 2018
mozilla-esr60: Mon Sep 9 12:38:39 UTC 2019
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 13•5 years ago
|
||
Julien, since this was changed to a client component, what needs to be changed on the client to fix this bug?
Reporter | ||
Comment 14•5 years ago
|
||
It is blocked on implementing your recommendation from comment 6:
Before this change telemetry should be added specifically targeting this change so we know as close as possible what the impact of this change is if any. We are trying to get to below 2% of clients on versions less than 3 versions prior to the latest version and even an extremely small impact for all clients could easily be a major impact to this goal. If there is an impact then this defense in depth (mar signing already prevents an invalid mar file from being applied) then the value vs. cost would need to be discussed and likely the tradeoff would need to be agreed to by directors or equivalent.
Updated•5 years ago
|
Comment 15•5 years ago
|
||
Ahhh... so this bug is being morphed. Changing summary to reflect what this bug is about now.
Julien, can you provide details as to what should be checked and reported to telemetry?
Reporter | ||
Comment 16•5 years ago
|
||
The question we want to answer is "how many clients fails to download updated over https but succeed over http?".
If the updater can handle it, perhaps the easiest way to obtain that data would be to upgrade http download links to https, and fall back to http if that fails.
Ideally, the telemetry ping indicates what situation the updater ran into:
- download link was https and succeeded
- download link was https and failed
- download link was http, upgraded to https and succeeded
- download link was http, upgraded to https and failed, downgraded to http and succeeded
- download link was http, upgraded to https and failed, downgraded to http and failed
- download link was http and succeeded (no upgrade attempt)
- download link was http and failed (no upgrade attempt)
How does that sound?
Comment 17•5 years ago
|
||
That might be ok but it is work in addition to the telemetry reporting but A / B testing might be better than using a fallback. I was asking for the networking error codes that would indicate the failure case if they are known.
Reporter | ||
Comment 18•5 years ago
|
||
I'm not sure what those error codes would be... perhaps Dana knows?
I suppose you could use nsINSSErrorsService.getErrorClass
[0]. Given an nsresult
, it will tell you if the error is (roughly) categorized as due to a bad certificate or due to a problem with TLS (and if the error wasn't either of those, the implementation will throw an error).
Reporter | ||
Comment 20•5 years ago
|
||
Robert, does comment 19 above answer your question?
Comment 21•5 years ago
|
||
Yes but I haven't been allocated time to work on this.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 22•5 years ago
|
||
Closing as won't fix for now; we're going to use the existing update ping to monitor for issues. See 1629033 for details.
Description
•