Open Bug 1738102 Opened 3 years ago Updated 3 years ago

Firefox does not send any acknowledgements in entire QUIC (HTTP/3) connections with early data enabled

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

People

(Reporter: fayang, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36

Actual results:

We (QUIC team in Google) observed large number of cases where QUIC connections from Firefox with early data enabled do not send any acknowledgements. When this happens, the server would run out of congestion window at some point (by sending handshake packets and response to the early data), and the connection would stuck in handshake phase (and Firefox would keep sending PINGs in encryption level INITIAL forever).

These are not lost acknowledgments since packet number is contiguous in the same packet space. Though this could because of clients do not receive/process any packets from server, infinite PINGs are not expected.

Thank you very much!

The Bugbug bot thinks this bug should belong to the 'Core::Networking' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Networking
Product: Firefox → Core
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Unspecified → All
Hardware: Unspecified → All
Version: unspecified → Trunk

The behaviour described here is reminiscent of the client not receiving Initial packets from the server. However, we shouldn't be sending only PING, there should also be CRYPTO frames if the server hasn't acknowledged our CRYPTO frames. Of course, it is entirely possible that the server is sending only part of a TLS ServerHello, such that we are unable to generate handshake keys. In that case, we'll keep pinging (because the spec tells us to do that) until the server gets around to doing that.

Are you able to share a trace or wireshark that shows this happening? That doesn't need to be shared publicly if you are concerned about how that data might be treated.

Flags: needinfo?(fayang)

Thanks, Martin!

In the observed traces, server indeed ACKs the CHLO (depending on whether the ticket decryption is sync/async, the ACK may or may not be in the same packet as server INITIAL). The server INITIAL should be relatively small, and it gets coalesced with HANDSHAKE (no cert given this is a resumption connection) and FORWARD_SECURE ticket. As you said, likely the client only receives server ACKs and not be able to derive HANDSHAKE keys.

After client starts infinite PINGs, server keeps sending ACKs. Oddly, server does not bundle server INITIAL with those ACKs, where our implementation supposes to do so: https://source.chromium.org/chromium/chromium/src/+/main:net/third_party/quiche/src/quic/core/quic_connection.cc;drc=6f580e900d2531ff57acd8f4bc214f7e54df5815;l=5862 (acts like the server INITIAL gets acknowledged).

I am happy to share the trace, what is the preferred way of sharing it? Thanks again!

Flags: needinfo?(fayang)

That sounds like the behaviour we expect from your implementation (similar to TLS/TCP), so that shouldn't be a problem. If you are not retransmitting the CRYPTO frames from the server Initial, that might be a problem. The only reason you wouldn't need to do that is if you got an ACK, but then we should have everything we need to proceed.

I don't understand your code very well, but could it be that this is dropping out of the retransmission because you are also retransmitting at other encryption levels/packet number spaces? https://source.chromium.org/chromium/chromium/src/+/main:net/third_party/quiche/src/quic/core/quic_connection.cc;l=5882-5885;drc=6f580e900d2531ff57acd8f4bc214f7e54df5815 That doesn't seem to be a likely cause, but I know that this is a complicated piece of the stack.

If you are happy sharing a trace publicly, bugzilla attachments are ideal. You can just email them to me. I have the same email at mozilla.com as well. Thanks for helping work through this.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]

FYI, not sure if related to this bug, but in Bug 1735864 related to a TLS Setup delay 59 seconds (Core > Security or Core > Networking issue) it is suspected that Thunderbird and Firefox may have changed the way they handle TLS/Handshake since recently... See Bug 1735864 Comment 64 where we have unfolded a list of handshake related patches recently pushed to Firefox > Core > Security... those are the clues unfolded for now... maybe that can help resolve this bug somehow... just thought to let you know...

You need to log in before you can comment on or make changes to this bug.