[DoH] In strict fallback mode, retry DoH lookups upon network failures
Categories
(Core :: Networking: DNS, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox96 | --- | fixed |
People
(Reporter: nhnt11, Assigned: nhnt11)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged])
Attachments
(9 files, 1 obsolete file)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/plain
|
chutten
:
data-review+
|
Details |
(deleted),
text/x-phabricator-request
|
Details |
In mode 2.5, when a DoH lookup fails to resolve an address due to any reason other than:
- NXDOMAIN returned from the DoH provider (TRRSkippedReason::TRR_NXDOMAIN)
- We skipped TRR because previously resolved addresses could not be connected to (TRRSkippedReason::TRR_DISABLED_FLAG)
- Confirmation state is CONFIRM_FAILED or CONFIRM_TRYING_FAILED (TRRSkippedReason::TRR_NOT_CONFIRMED)
we should retry the DoH lookup and make sure we use a fresh connection to do so.
Assignee | ||
Comment 1•3 years ago
|
||
This patch exposes a flag that tells a TRR to set the LOAD_FRESH_CONNECTION
load flag when creating the channel for the TRR lookup. This when seen by
TRRServiceChannel will cause us to close any existing DoH connections before
connecting. After a failed TRR lookup, nsHostResolver will retry with this
flag set, if we are in strict fallback mode and the skip reason was not one
of NXDOMAIN, DISABLED_FLAG, and NOT_CONFIRMED.
TODO:
- Haven't really looked at ODoH at all yet
- How should confirmation change now that we're retrying?
- If we kill the DoH connection, other ongoing lookups will be killed too.
Need to confirm that this will at least eventually lead to a retry, and
that this is appropriate (or what we should do instead if not) and document
how the skip reason telemetry will look when this happens. - Tests
- ???
Updated•3 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
This adds a feature to moz-http2's doh path that helps test connection
cycling. We log the remote ports of DoH requests and expose an API to
fetch the log. A specific name will trigger us to send a delayed response
to help simulate network volatility. The log is then checked for correctness
in the test script.
TODO: add assertions for the log. this is easy, just waiting to figure out
one thing: at the moment when we cycle the connection for TRR, the retry
lookup queries as well as the confirmation query all use different connections.
I think this is because at the time of their dispatch there's no cached
connection so we end up using new connections for them... for subsequent
lookups we re-use the connection that was created for Confirmation.
This is evident by looking at the log of ports seen by the server when
we dump it at the end.
Depends on D129227
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
Reverts bug 1651672 / rev D86520. It's been a few releases,
we can remove this cleanup code.
Depends on D131467
Assignee | ||
Comment 4•3 years ago
|
||
Depends on D132107
Assignee | ||
Comment 5•3 years ago
|
||
Depends on D132109
Assignee | ||
Comment 6•3 years ago
|
||
Depends on D132109
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
Depends on D132112
Assignee | ||
Comment 8•3 years ago
|
||
Data review request for telemetry probes added in Part 4 (D132109)
Assignee | ||
Comment 9•3 years ago
|
||
Comment 10•3 years ago
|
||
Comment on attachment 9252437 [details]
data-review-Bug-1737198.md
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes, Nihanth Subramanya is responsible.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does the data collection use a third-party collection tool?
No.
Result: datareview+
Assignee | ||
Comment 11•3 years ago
|
||
Hey chutten, the previous telemetry impl was piggy backing some data onto an existing histogram. Now I've added a new histogram for it - same as the old skip reason histogram but will exist to isolate data that was collected when this strict mode feature is on.
The questions we're answering, and everything else, remain the same.
Comment 12•3 years ago
|
||
Comment on attachment 9253065 [details]
Data review request v2
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes, Nihanth Subramanya is responsible.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does the data collection use a third-party collection tool?
No.
Result: datareview+
Comment 13•3 years ago
|
||
Comment 14•3 years ago
|
||
Backed out 8 changesets (Bug 1743122, Bug 1737198) for causing build bustages on nsHostRecord.cpp.
Backout link
Push with failures
Failure Log
Also xpcshell - X1 Failure Log
Comment 15•3 years ago
|
||
Comment 16•3 years ago
|
||
Backed out 8 changesets (Bug 1737198, Bug 1743122) for causing xpcshell failures in unit/test_trr.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/dae06634f2bfbfd75cc653817122978c85c84319
Push with failures, failure log.
(Update): Also caused bustages in nsHostResolver.
Assignee | ||
Comment 17•3 years ago
|
||
Depends on D132115
Comment 18•3 years ago
|
||
Comment 19•3 years ago
|
||
Backed our as per dev request. CLOSED TREE
Backout link : https://hg.mozilla.org/integration/autoland/rev/a19bf299a00192a5c3e7e89e7b69ee3d91ab4def
Comment 20•3 years ago
|
||
Comment 21•3 years ago
|
||
Backed out 9 changesets (Bug 1743122, Bug 1737198) for causing assertion failure in TRRServiceChild.cpp CLOSED TREE
Log: https://treeherder.mozilla.org/logviewer?job_id=359958659&repo=autoland&lineNumber=2544
TV failures: https://treeherder.mozilla.org/logviewer?job_id=359958535&repo=autoland&lineNumber=1756
https://treeherder.mozilla.org/logviewer?job_id=359953062&repo=autoland&lineNumber=1843
Crash: https://treeherder.mozilla.org/logviewer?job_id=359958254&repo=autoland&lineNumber=4393
Backout: https://hg.mozilla.org/integration/autoland/rev/9fe35e8ae5e46fe5d57c68b180d34c0055dec818
Assignee | ||
Comment 22•3 years ago
|
||
This failed to land again, this time because I forgot to check that we're actually in the socket process before trying to mirror the confirmation state. We agreed to back out. In the meantime, here is a try push with a fix, let's make sure this actually passes everything before attempting to land again...
https://treeherder.mozilla.org/jobs?repo=try&revision=dfac5e97256df9ac57079950f4559f7c9ca68b2e
Assignee | ||
Comment 23•3 years ago
|
||
Try's green, landing this again. 🤞
Comment 24•3 years ago
|
||
Comment 25•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bc5a65d8d866
https://hg.mozilla.org/mozilla-central/rev/c016a5438637
https://hg.mozilla.org/mozilla-central/rev/24db7713515a
https://hg.mozilla.org/mozilla-central/rev/49b9db2831ac
https://hg.mozilla.org/mozilla-central/rev/4123e58dbff2
https://hg.mozilla.org/mozilla-central/rev/6d1884416d1d
https://hg.mozilla.org/mozilla-central/rev/fdad127a1d21
https://hg.mozilla.org/mozilla-central/rev/b2126f5ed075
Assignee | ||
Updated•3 years ago
|
Description
•