Closed Bug 932183 Opened 11 years ago Closed 11 years ago

mozTCPSocket flow control features unreliable in multi-process; bufferedAmount updates infrequently, drain event semantics not multi-process safe

Categories

(Core :: DOM: Core & HTML, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: asuth, Assigned: kk1fff)

References

Details

Attachments

(3 files, 7 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
Still not sure what component is actually best for this. Here's the key points: - bufferedAmount in the child exclusively updates as a result of 'open', 'drain', 'data', 'error', and 'close' events. This is because only updateReadyStateAndBuffered updates it and it is only triggered by by TCPSocketChild::RecvCallback. This means that in the case where TCPSocket is being used for bulk sends, the child may potentially never update bufferedAmount and the value may also latch at some high level and never decrease unless drain triggers or data is received. - drain-semantics computations are computed independently in the parent and the child. The child uses its rarely updated bufferedAmount for its decision as to whether to have send() return indicate that drain will fire. The parent uses the actual amount of data in the multiplex stream as the buffered amount. This is potentially dangerous because "drain" is only fired once sensitized in the parent and per the previous point, bufferedAmount in the child is not particularly reliable. - ondrain only ever fires when the send buffer is completely empty, meaning it can only ever cause bufferedAmount to go to 0. This isn't really a problem on its own, although using a water mark instead would be much better. So an example of an exciting failure condition is one where the child's bufferedAmount latches at something like 34k because it received some data when there was that much buffered data. The child then goes to send 32k, so send() claims that the caller should stop sending and a drain event will be emitted. But the parent knows the buffer is empty and so it decides that 32k data does not justify a drain event. So it sends the data, drain never fires, things deadlock. Another example of an exciting failure condition is that when bulk sending with no acks from the other side, if you only send amounts smaller than the buffer threshold of 64 KiB, then you will never be told by the child process that you should stop sending in your calls to send, and the sending code can't use bufferedAmount to check either, since it will also only be 0. The result is that the sending code will send as fast as it physically can, defeating the whole flow control thing. (The parent side will generate drain events in this case, but again, only once we hit zero bytes buffered, which means basically nothing.) This is notable because in bug 871897 we are trying to use mozTCPSocket for streaming and I noticed we were never waiting for a drain event. (I had somehow thought _bufferedAmount would be locally updated; don't know why.) I've changed our implementation to only assume drain events will be fired when we write a block >65536 bytes in size and having our chunks be larger than 64k. This sorta works; we eventually hang, but I'm seeing some other weird connection problems too, so it's possible something legitimately weird is happening elsewhere in the stack (not all the data was getting to the SMTP server even when we weren't waiting for drain events). I'll look into that more tomorrow. I've tentatively marked this as a blocker of bug 871897 for tracking purposes and because having our implementation depend on the secret BUFFER_SIZE constant in TCPSocket.js is not a good idea and could lead to breakage when this bug is legitimately fixed. That bug is currently koi+, but I think we're realistically going to need to move it to 1.3 given the other platform issues we've encountered and how late we are in the koi cycle.
Blocks: 871897
According to the spec, when send() returns false, caller should wait ondrain. I think we could let child control the ondrain event in IPC case: If caller sends data too fast and make parent reach the limit, parent sends a message to child and make child return false in next send(). When the child received the message that indicates parent's buffer is too large and it did return false in the next send() call before ondrain, it fires ondrain when it received an ondrain from parent. If the ondrain from parent is fired before next send() call, child can just eat the event, since the caller is not expecting an ondrain event.
Yeah, that sounds much better! It would be great if bufferedAmount could be a little more useful too. Although given the inherent asynchronous nature of updates to bufferedAmount, I think anyone using bufferedAmount would need to be assured that a 'drain' event will *always* be dispatched when the amount of data in the buffer goes to zero and/or falls below a threshold, not just when it has previously gone above an arbitrary and unknown threshold and then come back down below it. Alternatively, bufferedAmount needs to be documented as being an unreliable value that MUST NOT be used for decisions on when to send more data, instead only relying on the send() return value and drain events when so indicated by the send return value. (And so bufferedAmount should only be used for informational debug logging or something. But really it seems dangerous to have around without very explicit semantics and reliability guarantees.)
I am a little bit unclear about the meaning of 'buffer' in the spec. It looks once send() is called successfully, bufferedAmount increases. Does this mean 'drain' event should be dispatched every time when pending data are all sent no matter if pending data has ever been larger then the threshold (64k)?
Comment #1 sounds correct to me.
Saw I missed some IRC discussion. For the correctness of bufferedAmount in multi-process, I would suggest we either get rid of bufferedAmount entirely or do the following thing to try and maximize its accuracy without sending gratuitous messages to the child from the parent: - Maintain a "tracking" number on the child that we one-up whenever we do something where we want to make sure the parent has heard our most recent async requests when we process incoming messages. - In the child, increment bufferedAmount by the size of the buffer when send() is called. Increment the tracking number prior to calling send, ensuring the tracking number accompanies the send data. - Update bufferedAmount in the child when a message from the parent includes a new buffered amount size and the tracking number is the same as our current tracking number. Otherwise, ignore it. This could temporarily lead to bufferedAmount being higher than it should be, but avoids bufferedAmount ever being lower than it should be. - Have the parent send a message updating bufferedAmount (with the current tracking number) whenever a buffer 'completes' in the multiplex stream stream copier. In the case it was the last stream and ondrain is being issued, just the ondrain notification can be sent instead. Patrick, are you available to work on fixing the send()/ondrain semantics in multiprocess mode this week, or should I try and prepare a fix? I'm busy, but we really do want to land the streaming fix in e-mail with the right fix implemented. If I should pursue the fix, what type of unit tests would you want to see? Thanks.
Flags: needinfo?(pwang)
Andrew, either getting rid of bufferedAmount or your fix sound correct to me.
(In reply to Andrew Sutherland (:asuth) from comment #5) > Patrick, are you available to work on fixing the send()/ondrain semantics in multiprocess mode this week I can upload a patch for this bug. Actaully I have a wip, but I am trying to clearify what the correct behavior is.
Flags: needinfo?(pwang)
(In reply to Patrick Wang [:kk1fff] from comment #7) > (In reply to Andrew Sutherland (:asuth) from comment #5) > I can upload a patch for this bug. Actaully I have a wip, but I am trying to > clearify what the correct behavior is. If you have a wip that fixes the send() behaviour in the client, I'd appreciate it so I can move ahead with realistic on-device testing. Thanks!
Assignee: nobody → pwang
Status: NEW → ASSIGNED
Attached patch Proposed patch (obsolete) (deleted) — Splinter Review
Update TCPSocket child's bufferedAmount when parent sent a stream to make bufferedAmount more accurate, and let child control whether ondrain should be dispatched when bufferedAmount is updated to 0, to ensure ondrain is dispatched when send() return false and is not dispatched when send() haven't return false. Tracking number in comment 4 looks a good idea to solve the case that send() is called when parent is updating bufferedAmount. Thanks, Andrew! Test cases haven't been done yet. I think we would need to control the bufferedAmount of child to simulate the case that bufferedAmounts are inconsistency between in parent and in child. What do you think, Honza? Thanks!
Attachment #828036 - Flags: feedback?(honzab.moz)
Attached patch Test case (obsolete) (deleted) — Splinter Review
Attachment #828503 - Flags: review?(honzab.moz)
Attachment #828036 - Flags: feedback?(honzab.moz) → review?(honzab.moz)
Comment on attachment 828036 [details] [diff] [review] Proposed patch Review of attachment 828036 [details] [diff] [review]: ----------------------------------------------------------------- General objection: I'm reviewing changes in this code for at least 3rd or 4th time and as always I have to rediscover how this horrible IPC machinery works at all again. It's because the code generally lacks comments of what is what and what is called by what, interfaces, classes and method are very badly named. And I can see new methods added w/o a proper explanatory comments being added, again! It's very hard to review this code. Technically the code seems to be correct but I cannot say this review would be finished completely. It's so complicated to do reviews of changes to this code... I would really appreciate some overall names and comments improvement to make this code more transparent. If you can at least improve this patch, I can look at it again to get it to the tree, since it's kinda critical. ::: dom/network/interfaces/nsIDOMTCPSocket.idl @@ +232,5 @@ > + // Update the DOM object's readyState. > + void updateReadyState(in DOMString readyState); > + > + // Update the DOM object's bufferedAmount value with a tracking number to > + // ensure the update request is sent after child's send() invocation. What are the arguments? It needs to be documented in this comment. @@ +265,5 @@ > // Set App ID. > void setAppId(in unsigned long appId); > + > + // Set a callback that handles the request from TCP socket parent when socket > + // parent wants to notify that its bufferAmount is updated. from _a_ TCP socket parent when _that_ socket parent buffer_ed_Amount _has been_ updated (If I understand the comment right) ::: dom/network/interfaces/nsITCPSocketParent.idl @@ +9,5 @@ > interface nsITCPServerSocketParent; > interface nsITCPSocketIntermediary; > > // Interface required to allow the TCP socket object in the parent process > // to talk to the parent IPC actor Can you please add to this comment in what language and by what class name this interface is implemented? @@ +20,5 @@ > // argument of |data|, and update the child's readyState and bufferedAmount values > // with the given values. > [implicit_jscontext] void sendCallback(in DOMString type, > in jsval data, > + in DOMString readyState); This must be renamed to sendReadyState or updateReadtStateOnChild or so, to make it as much as possible clear what this method is doing. @@ +40,4 @@ > }; > > // Intermediate class to handle sending multiple possible data types > // and kicking off the chrome process socket object's connection. Can you please add to this comment in what language and by what class name this interface is implemented? Best would be to add this more detailed note to every interface around the TCP socket implementation... not just in this file... ::: dom/network/src/PTCPSocket.ipdl @@ +43,5 @@ > Close(); > > child: > + Callback(nsString type, CallbackData data, nsString readyState); > + UpdateBufferedAmount(uint32_t bufferedAmount, uint32_t trackingNumber); Comments, comments! Actually, as part of these changes, please add proper and good comments on all classes and their methods and members in this file. Thanks. ::: dom/network/src/TCPSocket.js @@ +456,5 @@ > + > + sendFromChild: function(data, byteOffset, byteLength, trackingNumber) { > + this._trackingNumber = trackingNumber; > + this.send(data, byteOffset, byteLength); > + }, comments! @@ +656,5 @@ > + if (!bufferNotFull) { > + // This make us return false, we should dispatch ondrain after buffered > + // data are all sent. > + this._waitingForDrain = true; > + } Why don't you move [1] above this child specific early return? You are duplicating a code this way. [1] http://mxr.mozilla.org/mozilla-central/source/dom/network/src/TCPSocket.js#636 ::: dom/network/src/TCPSocketParent.cpp @@ +34,5 @@ > { > mozilla::unused << > aActor->SendCallback(NS_LITERAL_STRING("onerror"), > TCPError(NS_LITERAL_STRING("InvalidStateError")), > + NS_LITERAL_STRING("connecting")); belongs this change to this bug? ::: dom/network/src/TCPSocketParentIntermediary.js @@ +88,5 @@ > }, > > + sendString: function(aData, aTrackingNumber) { > + let socketInternal = this._socket.QueryInterface(Ci.nsITCPSocketInternal); > + return socketInternal.sendFromChild(aData, null, null, 0, 0 instead null, null ? @@ +95,5 @@ > > + sendArrayBuffer: function(aData, aTrackingNumber) { > + let socketInternal = this._socket.QueryInterface(Ci.nsITCPSocketInternal); > + return socketInternal.sendFromChild(aData, 0, aData.byteLength, > + aTrackingNumber); indent @@ +100,1 @@ > }, For instance, here names of these methods are completely misleading. sendString should be renamed to recvSendString, sendArrayBuffer to recvSendArrayBuffer and sendFromChild to recvSend or so. Other possibility is to use 'on' prefix to make it clear that these are invoked remotely as a callback and that are not actually used to send out data (not directly called).
Attachment #828036 - Flags: review?(honzab.moz) → review-
Comment on attachment 828503 [details] [diff] [review] Test case Review of attachment 828503 [details] [diff] [review]: ----------------------------------------------------------------- r=honzab, try to address the comments. ::: dom/network/tests/unit/test_tcpsocket.js @@ +510,5 @@ > +// 1. set bufferedAmount of content socket to a value that will make next > +// send() call return false. > +// 2. send a small data to make send() return false, but it won't make > +// parent buffer. > +// 3. we should get a ondrain before success. an ondrain success of what? @@ +513,5 @@ > +// parent buffer. > +// 3. we should get a ondrain before success. > +function childbuffered() { > + let yays = makeJointSuccess(['ondrain', 'serverdata', > + 'clientclose', 'serverclose']); I never remember how this 'yay' stuff works... @@ +534,5 @@ > + server.onclose = yays.serverclose; > +} > + > +// Test child behavior when child doesn't think it's buffering when parent > +// buffered. The sentence doesn't make much sense to me. @@ +545,5 @@ > +function childnotbuffered() { > + let yays = makeJointSuccess(['serverdata', 'clientclose', 'serverclose']); > + server.ondata = makeExpectData('ondata', BIG_ARRAY, false, yays.serverdata); > + if (sock.send(BIG_ARRAY_BUFFER)) { > + throw new Error("sock.send(BIG_TYPED_ARRAY) did not return false to indicate buffering"); for curiosity: why not do_throw?
Attachment #828503 - Flags: review?(honzab.moz) → review+
Sorry for making this patch hard to read. I will improve the naming and the comments.
Attached patch Proposed patch v2 (obsolete) (deleted) — Splinter Review
Try to make function name and comment easier to read. * In nsITCPSocketChild, add 'send' as a prefix of open/send/resume/suspend/close/startTLS functions to indicate that the actual action is to send request to parent. * In nsITCPSocketParent, change name of sendCallback to sendEvent, as this function is to forward the event from parent. * In nsITCPSocketIntermediary, use recvSendString and recvSendArrayBuffer to replace sendString and sendArrayBuffer. * Add comments to IPDL.
Attachment #828036 - Attachment is obsolete: true
Attachment #830239 - Flags: review?(honzab.moz)
Separate previous patch into two parts. This patch is changing function name and adding comments.
Attachment #830239 - Attachment is obsolete: true
Attachment #830239 - Flags: review?(honzab.moz)
Attachment #830276 - Attachment is obsolete: true
Attachment #830293 - Flags: review?(honzab.moz)
Attachment #830294 - Flags: review?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #12) > > + server.onclose = yays.serverclose; > > +} > > + > > +// Test child behavior when child doesn't think it's buffering when parent > > +// buffered. > > The sentence doesn't make much sense to me. How about: "Test child's behavior when send() of child return true but parent buffers data"? > > @@ +545,5 @@ > > +function childnotbuffered() { > > + let yays = makeJointSuccess(['serverdata', 'clientclose', 'serverclose']); > > + server.ondata = makeExpectData('ondata', BIG_ARRAY, false, yays.serverdata); > > + if (sock.send(BIG_ARRAY_BUFFER)) { > > + throw new Error("sock.send(BIG_TYPED_ARRAY) did not return false to indicate buffering"); > > for curiosity: why not do_throw? Well.. this is copied from other place in this file. do_throw looks more suitable here, I will change to do_throw.
Comment on attachment 830293 [details] [diff] [review] Part 1: rename function and add comment to make TCPSocket IPC code more readable Review of attachment 830293 [details] [diff] [review]: ----------------------------------------------------------------- Like this! Thanks for this patch. Makes things clearer. r=honzab ::: dom/network/interfaces/nsITCPSocketChild.idl @@ +6,5 @@ > > interface nsITCPSocketInternal; > interface nsIDOMWindow; > > // Interface to allow the content process socket to reach the IPC bridge. AFAIK, this is implemented in C++, right? Can we add a note here: // Implemented in C++ as TCPSocketChild, referenced as _socketBridge in TCPSocket.js (Please correct me if I'm wrong). ::: dom/network/interfaces/nsITCPSocketParent.idl @@ +47,5 @@ > > // Intermediate class to handle sending multiple possible data types > // and kicking off the chrome process socket object's connection. > +// This interface is the bridge of TCPSocketParent, which is written in C++, > +// and TCPSocket, which is written in Javascript. And can we add a note where is nsITCPSocketIntermediary (i.e. this interface) implemented? Language and file? @@ +61,5 @@ > nsIDOMTCPServerSocket listen(in nsITCPServerSocketParent parent, > in unsigned short port, in unsigned short backlog, > in DOMString binaryType); > > + // Called when received a request to send string from child. maybe: Called when received a child request to send a string. up to you.. @@ +62,5 @@ > in unsigned short port, in unsigned short backlog, > in DOMString binaryType); > > + // Called when received a request to send string from child. > + void recvSendString(in DOMString data); I'd little bit more tend to call this onRecvSend*. But that is entirely up to you, it might be just too much prefixing... ::: dom/network/src/PTCPSocket.ipdl @@ +55,4 @@ > Close(); > > child: > + // Forward events that is dispatched by parent. that ARE dispatched .. back to the child.
Attachment #830293 - Flags: review?(honzab.moz) → review+
Comment on attachment 830294 [details] [diff] [review] Part 2: update child's bufferedAmount when parent sent a stream. Review of attachment 830294 [details] [diff] [review]: ----------------------------------------------------------------- r=honzab with comments addressed This is much better, thanks. Please push to try, I leave reasonably throughout testing before landing up to you. ::: dom/network/interfaces/nsIDOMTCPSocket.idl @@ +212,5 @@ > > /* > * Internal interfaces for use in cross-process socket implementation. > * Needed to account for multiple possible types that can be provided to > * the socket callbacks as arguments. Here notes about where it's implemented and referenced would be good too. @@ +281,5 @@ > + // @param trackingNumber > + // To ensure the request to update bufferedAmount in child is after > + // lastest send() invocation from child. > + void sendFromChild(in jsval data, in unsigned long byteOffset, > + in unsigned long byteLength, in unsigned long trackingNumber); should be onRecvSendFromChild or so. ::: dom/network/interfaces/nsITCPSocketParent.idl @@ +50,5 @@ > + // The new value of bufferedAmount that is going to be set to child's > + // bufferedAmount. > + // @param trackingNumber > + // Parent's current tracking number. It's to ensure the bufferedAmount > + // is updated after lastest send() call. Parent's current tracking number, reflecting the number of calls to send() on the child process. This number is sent back to the child to make sure the bufferedAmount updated on the child will correspond to the latest call of send(). ::: dom/network/src/PTCPSocket.ipdl @@ +39,5 @@ > // is expanded to |useSSL| (from TCPOptions.useSecureTransport) and > // |binaryType| (from TCPOption.binaryType). > Open(nsString host, uint16_t port, bool useSSL, nsString binaryType); > > + // When child received data (send() is called), this message requrests When child received data (send() is called) .. sounds weird to me, should be When child's send() is called .. ? @@ +61,5 @@ > + Callback(nsString type, CallbackData data, nsString readyState); > + > + // Update child's bufferedAmount when parent's bufferedAmount is updated. > + // trackingNumber is also passed back to child to ensure the bufferedAmount > + // is updated after lastest send() invocation. is corresponding the last call to send(). ::: dom/network/src/TCPSocket.js @@ +290,5 @@ > return; > } > } > > + // If we have callback to update bufferedAmount, we let child to have a callback generally better phrased comment would be good. @@ +394,5 @@ > callListenerVoid: function ts_callListenerVoid(type) { > this.callListener(type); > }, > > + updateReadyState: function ts_updateReadyState(readyState) { 'Who calls this' note would be good Worth a check we are on child? @@ +408,5 @@ > + if (this._waitingForDrain) { > + this._waitingForDrain = false; > + this.callListener("drain"); > + } > + } If not a big problem, logs would be good (for the |trackingNumber != this._trackingNumber| case and for |bufferedAmount != 0| case.) @@ +450,5 @@ > + this._onUpdateBufferedAmount = aFunction; > + } else { > + throw new Error("only function can be passed to " + > + "setOnUpdateBufferedAmountHandler"); > + } One concern (no need to fix it in this bug): I'm a bit afraid some developers (in the content scope) could exploit this to track the bufferedAmount changes and wipe the _onUpdateBufferedAmount callback. Are we somehow protected against that? @@ +455,5 @@ > + }, > + > + /** > + * Handle the requst of sending data and update trackingNumber from > + * child. add who is calling this @@ +654,5 @@ > let length = this._binaryType === "arraybuffer" ? byteLength : data.length; > + let newBufferedAmount = this.bufferedAmount + length; > + let bufferNotFull = newBufferedAmount < BUFFER_SIZE; > + > + if (newBufferedAmount >= BUFFER_SIZE) { why not to use !bufferNotFull ? BTW, bufferFull (and semantic inversion) would be IMO better ;) Double negations are always bad. @@ +658,5 @@ > + if (newBufferedAmount >= BUFFER_SIZE) { > + // If we buffered more than some arbitrary amount of data, > + // (65535 right now) we should tell the caller so they can > + // wait until ondrain is called if they so desire. Once all the > + //buffered data has been written to the socket, ondrain is // buffered ::: dom/network/src/TCPSocketChild.cpp @@ +208,5 @@ > JSString* jsstr = aData.toString(); > nsDependentJSString str; > bool ok = str.init(aCx, jsstr); > NS_ENSURE_TRUE(ok, NS_ERROR_FAILURE); > + SendData(str, aTrackingNumber); Should the !aData.isString() case bellow be changed too..? ::: dom/network/src/TCPSocketParentIntermediary.js @@ +19,5 @@ > this._socket = socket; > > // Create handlers for every possible callback that attempt to trigger > // corresponding callbacks on the child object. > + ["open", "data", "error", "close"].forEach( maybe add a comment why drain isn't here. ::: dom/network/tests/unit/test_tcpsocket.js @@ +448,5 @@ > + function serverSideCallback() { > + yays.ondata(); > + ondataCalled = true; > + maybeSendNextData(); > + } Not sure... but won't this all code send the data 3 times to the server?
Attachment #830294 - Flags: review?(honzab.moz) → review+
(In reply to Honza Bambas (:mayhemer) from comment #21) > @@ +450,5 @@ > > + this._onUpdateBufferedAmount = aFunction; > > + } else { > > + throw new Error("only function can be passed to " + > > + "setOnUpdateBufferedAmountHandler"); > > + } > > One concern (no need to fix it in this bug): I'm a bit afraid some > developers (in the content scope) could exploit this to track the > bufferedAmount changes and wipe the _onUpdateBufferedAmount callback. Are > we somehow protected against that? _onUpdateBufferedAmount is set by setOnUpdateBufferedAmountHandler, which is a method of nsITCPSocketInternal and is not exposed to content scope. > > @@ +658,5 @@ > > + if (newBufferedAmount >= BUFFER_SIZE) { > > + // If we buffered more than some arbitrary amount of data, > > + // (65535 right now) we should tell the caller so they can > > + // wait until ondrain is called if they so desire. Once all the > > + //buffered data has been written to the socket, ondrain is > > // buffered > > ::: dom/network/src/TCPSocketChild.cpp > @@ +208,5 @@ > > JSString* jsstr = aData.toString(); > > nsDependentJSString str; > > bool ok = str.init(aCx, jsstr); > > NS_ENSURE_TRUE(ok, NS_ERROR_FAILURE); > > + SendData(str, aTrackingNumber); > > Should the !aData.isString() case bellow be changed too..? Ah. I mixed this change into previous part, so it is not appeared in this patch. I will update this to make the patch separeted clear. > ::: dom/network/tests/unit/test_tcpsocket.js > @@ +448,5 @@ > > + function serverSideCallback() { > > + yays.ondata(); > > + ondataCalled = true; > > + maybeSendNextData(); > > + } > > Not sure... but won't this all code send the data 3 times to the server? maybeSendNextData will check if both |clientOndrain| and |serverSideCallback| are called before send the second data buffer, so when the first time maybeSendNextData is called, it won't send the second buffer.
Also tested SSL and STARTTLS manually.
Rebase and address review comments.
Attachment #830293 - Attachment is obsolete: true
Attached patch Part 3: Test case. (deleted) — Splinter Review
Attachment #828503 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: