Closed
Bug 1367861
Opened 7 years ago
Closed 7 years ago
Http2Stream doesn't correctly handle nsHttpTransaction::WriteSegments returning NS_BASE_STREAM_WOULD_BLOCK
Categories
(Core :: Networking: HTTP, enhancement, P1)
Core
Networking: HTTP
Tracking
()
RESOLVED
FIXED
mozilla59
Tracking | Status | |
---|---|---|
firefox59 | --- | fixed |
People
(Reporter: mayhemer, Assigned: u408661)
References
Details
(Whiteboard: [necko-triaged])
Attachments
(1 file)
The contract is that when NS_BASE_STREAM_WOULD_BLOCK is returned, the underlying data feeder has to wait until ResumeRecv() is called to restart reading (and feeding the transaction).
See [1] and [2].
This blocks implementation of the throttling algorithm (for h2, at least) and also blocks push back implementation (bug 1280629) for h2.
I am usually lost in how the httpconn-session-stream-trans interconnection works, so hard to say to say how wrong this is. If too complicated too fix, we may find a different way to implement the "stop read" thing for bug 1365307, separately for h1 and h2 :(((
[1] https://dxr.mozilla.org/mozilla-central/rev/f81bcc23d37d7bec48f08b19a9327e93c54d37b5/netwerk/protocol/http/nsHttpTransaction.cpp#822
[2] https://dxr.mozilla.org/mozilla-central/rev/f81bcc23d37d7bec48f08b19a9327e93c54d37b5/netwerk/protocol/http/nsHttpTransaction.cpp#2099
Updated•7 years ago
|
Assignee: nobody → hurley
Whiteboard: [necko-active]
Reporter | ||
Comment 1•7 years ago
|
||
Nick, if you won't be able to fix this in one or two days, please let me know, I have to enable the throttling only for h1 and leave h2 streams unthrottled.
P1 since this would be good to have on 55 for field testing.
Priority: -- → P1
Honza, could you be a bit clearer on what, exactly, you want to have happen here? Currently, when receiving WOULD_BLOCK, the stream will buffer any data it gets without updating the flow control window (so that stream will, as long as we are throttling it, eventually run out of window and not be allowed to send any data from the server). We can't totally stop reading on the connection, as that could potentially end up throttling streams we *don't* want to have throttled. Given that, I'm not seeing what's missing.
Flags: needinfo?(honzab.moz)
Reporter | ||
Comment 3•7 years ago
|
||
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #2)
> Honza, could you be a bit clearer on what, exactly, you want to have happen
> here? Currently, when receiving WOULD_BLOCK, the stream will buffer any data
> it gets without updating the flow control window (so that stream will, as
> long as we are throttling it, eventually run out of window and not be
> allowed to send any data from the server). We can't totally stop reading on
> the connection,
Yes, that's obvious.
> as that could potentially end up throttling streams we
> *don't* want to have throttled. Given that, I'm not seeing what's missing.
I have no idea how to resume reading correctly. For h1 the contract is:
- when nsHttpTransaction::WriteSegments() returns WOULD_BLOCK, stop reading from the socket
- wait for mConnection->ResumeRecv() call by the transaction when there is a space to receive the data again (OnOutputStreamReady from the pipe is called)
This nicely works when throttling is engaged (simulates the recv pipe being full).
In an h2 case it works as following:
- h2 stream still calls WriteSegments even though it returned WOULD_BLOCK and has not called ResumeRecv (that is not a catastrophe, no need to fix it immediately, but would be nice) ; this stops after few iterations, apparently the window for the stream is filled
- but, h2 transaction's mConnection (being Http2Session) is forwarding everything to the underlying nsHttpConnection, so that when I call ResumeRecv() on trans->mConnection, it steals the listener from the sessions and everything breaks
Hence, the thing is that I don't know how to resume reading from an h2 stream correctly.
Note that this WOULD_BLOCK/wait for ResumeRecv() is there regardless throttling! Probably a very unlikely problem with e10s now but definitely a big problem when we fix bug 1280629!
Hence, two motivations to do something about it.
If still not clear: nsHttpTransactions::mConnection->ResumeRecv() should do something else: resume the stream. This is a pay off of the bad design - why is there Http2Session and not Http2Stream as an abstract connection in the transaction?
Blocks: 1280629
Flags: needinfo?(honzab.moz) → needinfo?(hurley)
Reporter | ||
Comment 4•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #3)
> - but, h2 transaction's mConnection (being Http2Session) is forwarding
> everything to the underlying nsHttpConnection, so that when I call
> ResumeRecv() on trans->mConnection, it steals the listener from the sessions
> and everything breaks
Actually, I'm not sure what happens when this is done, but definitely the stream doesn't restart data delivery.
OK, so it sounds like the real problem is "how do I restart data delivery once a stream is blocked on flow control?" and the rest is implementation details (to an extent).
Flags: needinfo?(hurley) → needinfo?(honzab.moz)
Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #5)
> OK, so it sounds like the real problem is "how do I restart data delivery
> once a stream is blocked on flow control?" and the rest is implementation
> details (to an extent).
If the flow control itself works (never checked) then it seems like to be the only problem, yes.
Flags: needinfo?(honzab.moz)
OK, cool. Do you have patches somewhere that trigger this/steps to repro? (I'm guessing you do, since you filed the bug.) Would be good to be able to see exactly what's going down and verify any fix I come up with.
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #8)
> See bug 1365307.
Ah, indeed. For some reason that section was collapsed in my view, so it didn't show for me (I was wondering why this didn't seem to block any bugs).
OK, so I think (though haven't yet verified) that you should be able, in the h2 case, to call mConnection->TransactionHasDataToRecv(trans) in addition to (specifically just before) ResumeRecv. That will get the stream with buffered data into a list of streams that have buffered data available, and set it updating the flow control window again, as well. Then the ResumeRecv will cause the appropriate read(s) out of the buffer and get data flowing again.
I'm more than happy to test this assumption. Do you have an easy set of steps to get something throttled that's easy to keep track of? Is it as simple as downloading something in the background while loading a page?
Flags: needinfo?(honzab.moz)
Reporter | ||
Comment 10•7 years ago
|
||
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #9)
> Is it as
> simple as downloading something in the background while loading a page?
Yes :)
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 11•7 years ago
|
||
So the good news is, TransactionHasDataToRecv will start data delivery again. The bad news (you knew this was coming) is that it's also called from OnOutputStreamReady... so data delivery gets started again no matter what. I'm *guessing* this has to do with the funky handling of WOULD_BLOCK, but I'm not 100% sure. (FTR, ResumeRecv gets called just after TransactionHasDataToRecv in this code path, so this is an issue whether or not TransactionHasDataToRecv is required.)
I'll keep digging at this, but this is looking like it's going to take me longer than your requested timeframe, Honza, just FYI.
Reporter | ||
Updated•7 years ago
|
Comment 12•7 years ago
|
||
This is an assigned P1 bug without activity in two weeks.
If you intend to continue working on this bug for the current release/iteration/sprint, remove the 'stale-bug' keyword.
Otherwise we'll reset the priority of the bug back to '--' on Monday, August 28th.
Keywords: stale-bug
Reporter | ||
Comment 13•7 years ago
|
||
I'll repriotize this now. Note this bug is not limited to throttling and h2 stream. We have no data on how often this is triggered in the wild, tho.
We have a code that disables (mitigates greatly) this issue in the tree when throttling is on, so there is no immediate need.
Priority: P1 → P3
Comment 14•7 years ago
|
||
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P3 → P1
Updated•7 years ago
|
Updated•7 years ago
|
Whiteboard: [necko-triaged]
Comment 15•7 years ago
|
||
This blocks bug 1280629, which we're planning to commit to doing in Q1 '18. So bumping this up in priority.
Needinfo just to make sure Nick knows this :)
Flags: needinfo?(hurley)
Priority: P3 → P1
Assignee | ||
Comment 16•7 years ago
|
||
I believe Honza said he would find someone else to work on this (and I would provide guidance on H2-specific bits). Sorry I never took myself off as assignee...
Assignee: hurley → nobody
Flags: needinfo?(hurley) → needinfo?(honzab.moz)
Comment 17•7 years ago
|
||
Get /SomeoneElse
HTTP/1.1 301 Moved Permanently
Location: /hurley
Assignee: nobody → hurley
Assignee | ||
Comment 18•7 years ago
|
||
So... I'm gonna question at this point why this blocks bug 1280629. This bug is not at all e10s-related, it can happen in e10s mode and in single-process mdoe. Do we have an issue with WOULD_BLOCK in h2? Yes. But this issue should be orthogonal to the fact that we don't appropriately do back-pressure on sockets (*any* sockets: h1, h2, ftp, ...) with e10s enabled.
Honza, can you clarify your thinking there? (You're the one that made this block the e10s back-pressure bug.)
I'll still work on this (since it's obvious I don't have a choice), but I want to make sure we all understand *why* this bug is prioritized the way it is.
Reporter | ||
Comment 19•7 years ago
|
||
backpressure will be implemented per-channel. the target thread queue on the child process is the place we pile up. hence, the pressure must be done via the channel IPCing to push on the transaction on the parent, not the socket. in h1 case, it will go down to the socket, yes, in h2, it will definitely not, it will end up on a stream (1-1 with a channel).
hence, obviously I presume the backpressure impl will happen the same way as throttling - we simply stop reading from the pipe above the transaction which will later lead to WriteSegments returning WOULD_BLOCK. when child signals it can read again, we call transaction->ResumeReading(), same mechanism as for throttling.
Flags: needinfo?(honzab.moz)
Reporter | ||
Comment 20•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #19)
> backpressure will be implemented per-channel. the target thread queue on
> the child process is the place we pile up. hence, the pressure must be done
> via the channel IPCing to push on the transaction on the parent, not the
> socket. in h1 case, it will go down to the socket, yes, in h2, it will
> definitely not, it will end up on a stream (1-1 with a channel).
>
> hence, obviously I presume the backpressure impl will happen the same way as
> throttling - we simply stop reading from the pipe above the transaction
> which will later lead to WriteSegments returning WOULD_BLOCK. when child
> signals it can read again, we call transaction->ResumeReading(), same
> mechanism as for throttling.
to be a but more precise, when we stop reading the pipe, and WriteSegments has no space to write, it switches to a waiting mode ON THE PIPE (via outputStream->asyncWait()) and we restart read from the h2 stream (or h1 connection) when there is a space in the pipe to write to (OnOutputStreamReady()).
this code precedes throttling by ten years, it has always been there, but introduction of h2, as it was implemented, broke that for h2.
Assignee | ||
Comment 21•7 years ago
|
||
So... I spent the better part of two weeks prior to the holidays going back and forth (and back and forth, and back and forth, and ...) in an rr trace of downloading a large (~2GB) resource over h2 while loading a page (that loads very slowly, to exaggerate effects of throttling). I saw absolutely zero indication that h2 somehow "broke" throttling. To be sure, I extended the pause time to 3s. (Along with making a few code changes to ensure that my h2 stream was eligible for throttling.)
During the initial run (with UI), I noticed regular 3s pauses in the download, so that was a good sign there.
Furthermore, during the throttled portion of my run in rr, I traced back from every h2 WriteSegments on the throttled stream/transaction, and every single one could be directly traced to the throttling code deciding it was time to allow more data through on the transaction. (In case you're wondering, yes, I went a bit cross-eyed staring at this stuff over and over again. The long break seems to have remedied that.)
As such, Honza, unless you can provide proof that I've somehow missed something (and throttling is, in fact, broken in h2), I think all we need to do here is remove the "activated as h2" check in the transaction, and let the throttling code do its thing.
Updated•7 years ago
|
Flags: needinfo?(honzab.moz)
Reporter | ||
Comment 22•7 years ago
|
||
Nick, how is then the stream read restarted when we only call ResumeRead on the raw connection?
Or is the result of your findings that we actually don't need to resume a throttled stream at all - that it just calls WriteSegments of the transaction on and on in a loop until it reads something? If so, how is that call triggered exactly?
I remember very clearly that an h2 session broke down (nothing going in or out) after we called ResumeRead on the underlying connection. That might have been an unrelated h2 bug that has been fixed in the meantime, tho!
Anyway, thanks for your effort Nick!
Flags: needinfo?(honzab.moz) → needinfo?(hurley)
Assignee | ||
Comment 23•7 years ago
|
||
Sorry, that was the one thing I forgot to mention - I also added a call to mConnection->TransactionHasDataToRecv in ResumeRead (the same spot where we call mConnection->ResumeRecv). That's what tells the h2 session to start delivering data to the transaction again (instead of buffering it).
Flags: needinfo?(hurley)
Reporter | ||
Comment 24•7 years ago
|
||
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #23)
> Sorry, that was the one thing I forgot to mention - I also added a call to
> mConnection->TransactionHasDataToRecv in ResumeRead (the same spot where we
> call mConnection->ResumeRecv). That's what tells the h2 session to start
> delivering data to the transaction again (instead of buffering it).
I see, so that is the patch we need to make it work then? Could you submit a patch so we can resolve/fix this bug with it?
Thanks.
Flags: needinfo?(hurley)
Assignee | ||
Comment 25•7 years ago
|
||
Sorry, saw this in email, then promptly forgot about it in favor of other things. I'll get the patch cleaned up and ready tomorrow.
Flags: needinfo?(hurley)
Comment hidden (mozreview-request) |
Reporter | ||
Comment 27•7 years ago
|
||
mozreview-review |
Comment on attachment 8941893 [details]
Bug 1367861 - enable throttling for http/2.
https://reviewboard.mozilla.org/r/212100/#review217852
thanks!
Attachment #8941893 -
Flags: review?(honzab.moz) → review+
Reporter | ||
Comment 28•7 years ago
|
||
please update the commit message that this is not just about enabling it, but also adding the missing TransactionHasDataToRead() line that cause all the pain. thanks.
Assignee | ||
Comment 29•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #28)
> please update the commit message that this is not just about enabling it,
> but also adding the missing TransactionHasDataToRead() line that cause all
> the pain. thanks.
will do!
Reporter | ||
Comment 30•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #28)
> adding the missing TransactionHasDataToRead() line that cause all
> the pain.
*caused*
Comment hidden (mozreview-request) |
Comment 32•7 years ago
|
||
Pushed by hurley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/06d10d09e6ee
enable throttling for http/2. r=mayhemer
Comment 33•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox59:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
You need to log in
before you can comment on or make changes to this bug.
Description
•