With DoH (using NextDNS) recently, phabricator pages take a long time to open on Nightly.
Categories
(Core :: Networking: DNS, defect, P2)
Tracking
()
People
(Reporter: mayankleoboy1, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged][necko-priority-review])
Attachments
(3 files)
Example: Go to https://phabricator.services.mozilla.com/D165214
I am attaching the network logging.
This is not a recent regression, as a build from Dec-22 also repros. It might be that my network provide is blocking NExtDNS or something
Reporter | ||
Comment 1•2 years ago
|
||
Reporter | ||
Comment 2•2 years ago
|
||
Ok, this got better if I disabl Doh (using NextDNS).
If is use Cloudflare with DoH, things are better than DoH with NextDNS.
Reporter | ||
Updated•2 years ago
|
Comment 3•2 years ago
|
||
Thanks for the log.
It shows that the first connection attempt to phabricator.services.mozilla.com
is using address 54.148.248.183
.
2023-01-13 13:58:51.234000 UTC - [Parent 34900: Socket Thread]: E/nsSocketTransport nsSocketTransport::SendStatus [this=18ac7691400 status=804b0007]
804b0007 = STATUS_CONNECTING_TO
2023-01-13 13:58:51.234000 UTC - [Parent 34900: Socket Thread]: D/nsSocketTransport trying address: 54.148.248.183
It fails because of NS_ERROR_NET_TIMEOUT
, so we try to connect again with another address 35.167.158.137
.
2023-01-13 13:59:12.270000 UTC - [Parent 34900: Socket Thread]: D/nsSocketTransport nsSocketTransport::RecoverFromError [this=18ac7691400 state=3 cond=804b000e]
2023-01-13 13:59:12.270000 UTC - [Parent 34900: Socket Thread]: D/nsSocketTransport trying again with next ip address
2023-01-13 13:59:12.270000 UTC - [Parent 34900: Socket Thread]: D/nsSocketTransport nsSocketTransport::PostEvent [this=18ac7691400 type=1 status=0 param=0]
2023-01-13 13:59:12.270000 UTC - [Parent 34900: Socket Thread]: D/nsSocketTransport trying address: 35.167.158.137
2023-01-13 13:59:12.270000 UTC - [Parent 34900: Socket Thread]: D/nsSocketTransport idle [0] { handler=18ac7691400 condition=0 pollflags=6 }
In the end, the second attempt also failed, so we try to resolve phabricator.services.mozilla.com
with the native resolver.
2023-01-13 13:59:33.312000 UTC - [Parent 34900: Socket Thread]: V/nsHttp DnsAndConnectSocket::OnOutputStreamReady [this=18aae864480 ent=phabricator.services.mozilla.com primary]
2023-01-13 13:59:33.312000 UTC - [Parent 34900: Socket Thread]: V/nsHttp failed to connect with TRR enabled, try w/o
Interestingly, the native resolver returns different address than TRR.
2023-01-13 13:59:33.647000 UTC - [Parent 34900: DNS Resolver #1]: D/nsHostResolver Caching host [phabricator.services.mozilla.com] record for 60 seconds (grace 0).
2023-01-13 13:59:33.647000 UTC - [Parent 34900: DNS Resolver #1]: D/nsHostResolver CompleteLookup: phabricator.services.mozilla.com has 52.35.152.109
2023-01-13 13:59:33.647000 UTC - [Parent 34900: DNS Resolver #1]: D/nsHostResolver CompleteLookup: phabricator.services.mozilla.com has 54.202.98.228
2023-01-13 13:59:33.647000 UTC - [Parent 34900: DNS Resolver #1]: D/nsHostResolver CompleteLookup: phabricator.services.mozilla.com has 35.163.76.207
It seems that NextDNS might return some unavailable IP addresses for phabricator.services.mozilla.com
.
I assume we don't have a good way to mitigate this case. Valentin, what do you think?
Comment 4•2 years ago
|
||
So, when connecting to all of the TRR IP addresses fails, we then try bypassing TRR here.
If that succeeds, I think that's a good indication that we shouldn't use TRR for that domain name.
In OnSocketConnected, if we are falling back to nativeDNS then we can add it to the temporary domain blocklist.
This would make it so the next time we try to establish a connection to this domain (if it's withing 60 seconds) we don't use TRR because it will fail anyway. I'm not sure how well this would work in practice, as presumably we shouldn't be connecting to the same host so often.
On my machine, nextDNS returns the same IPs as native DNS, but that may not be the case for everyone.
@Mayank, are you using the default nextDNS TRR: https://firefox.dns.nextdns.io/ or a custom one?
I'm wondering if this is a problem with phabricator/AWS - not accepting connections that are to the wrong zone.
We should add also some telemetry to see how often this happens.
Reporter | ||
Comment 5•2 years ago
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #4)
On my machine, nextDNS returns the same IPs as native DNS, but that may not be the case for everyone.
@Mayank, are you using the default nextDNS TRR: https://firefox.dns.nextdns.io/ or a custom one?
In my about:config, network.trr.uri=https://firefox.dns.nextdns.io/
Also, I tried to repro just now, and the issue seems to be fixed.
Will attach a log.
Reporter | ||
Comment 6•2 years ago
|
||
Comment 7•2 years ago
|
||
Thanks Mayank. I'll close this as WORKSFORME.
Please reopen if you find it's happening again.
Description
•