Closed Bug 278144 Opened 20 years ago Closed 20 years ago

Support socket i/o timeouts

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.8beta1

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

Attachments

(3 files, 4 obsolete files)

Support socket i/o timeouts Mailnews protocols could make use of configurable timeout support for socket i/o. We'll want separate timeouts for connection establishment and read/write.
Severity: normal → enhancement
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8beta
My plan is to use PR_Poll's timeout parameter to implement this feature. Each socket will have a current timeout associated with it, and we'll compute the minimum before calling PR_Poll. When PR_Poll returns we'll check for expired sockets, and trigger the associated socket handler. It might make sense to add a new method on the socket handler for timeout errors (instead of trying to reuse OnSocketReady).
Attached patch prototype patch (obsolete) (deleted) — Splinter Review
Here's a prototype patch. NOTE: I haven't verified that it actually works! ;-) I've only confirmed that the socket timeout doesn't fire needlessly when loading a few web pages. More testing is definitely in order. bienvenu: you offered to help out with that, right? ;-)
I'm trying this out on my laptop now. But I lied when I said I didn't want to be able to change the read timeout - one of things I'm trying to fix is the imap shutdown code - currently, we send the logout and don't wait for the server response before closing the socket. I'd like to send the logout and then do a blocking read for the response with a very short timeout (a few seconds) so that we don't block shutdown for too long, but do cleanup the connection correctly almost all the time. Looking at your code, it doesn't look like it would be too hard to extend it to allow the timeout to be changed and propagated...
yeah, it shouldn't be that hard to allow the change to be made dynamically.
I hit a PR_Assert in SocketConnectionContinue(), I think, line 339, PR_Assert(out_flags & PR_POLL_WRITE). I don't know if this is related, but I can't remember seeing this PR_Assert in a very long time. I wasn't testing the timeout, just running with the patch. I'll see if it happens again and see if I can figure out more what's going on. The out_flags were 8 when I hit that assertion.
I didn't hit my breakpoints on the code in your patch that gets called when we timeout (nsSocketTransport2:1433, in OnSocketReady) or nsSocketTransportService::572, where we call OnSocketReady with -1. It could be that since I did a complete repull and build, that some other changes went in. However, it does seem timeout related, in that I'm not doing anything when it hits that assertion, and it might be a server connection that has timed out...
I think I see the problem - mTimeouts[2], but TIMEOUT_CONNECT and TIMEOUT_READ_WRITE are 1 and 2, and are used as indexes into mTimeouts...in my tree, I'll try changing those to 0 and 1...
I haven't hit that PR_Assert anymore after change the values from 1 and 2 to 0 and 1. I tried a connection timeout by leaving the dial-up connection dialog up for more than 2 minutes, but it didn't trigger a timeout, I think because the socket transport thread is blocked in nsRasAutoDial (I think you've talked about doing that on a different thread - that could be really useful to us because I think we have issues with that code blocking after laptops come out of hibernation, for example)
Comment on attachment 171098 [details] [diff] [review] prototype patch just as a reminder, base/public/nsISocketTransport.idl needs a new iid ;)
Attached patch v1 patch (obsolete) (deleted) — Splinter Review
this one works. (looks like i left a XXX comment in the server socket code..)
Attachment #171098 - Attachment is obsolete: true
woo woo, this worked for imap, using netcat as a fake imap server
Great! There's a bit of misc cleanup I want to do before asking for reviews.
Attached patch v1.1 patch (deleted) — Splinter Review
Attachment #171207 - Attachment is obsolete: true
Attachment #171328 - Flags: superreview?(bzbarsky)
Attachment #171328 - Flags: review?(bienvenu)
It'll take me a few days to get to this....
Comment on attachment 171328 [details] [diff] [review] v1.1 patch thx a lot for doing this, Darin! Should I try running with this latest patch, or is it essentially the same as the last one?
Attachment #171328 - Flags: review?(bienvenu) → review+
> Should I try running with this latest patch, > or is it essentially the same as the last one? It is essentially the same thing, but if you have time to verify it that'd be great :)
I've been running with the patch and haven't seen any problems. I haven't forced any timeouts, however.
Comment on attachment 171328 [details] [diff] [review] v1.1 patch >Index: src/nsServerSocket.cpp > nsServerSocket::nsServerSocket() > : mLock(nsnull) > , mFD(nsnull) > , mAttached(PR_FALSE) > { > mCondition = NS_OK; > mPollFlags = 0; >+ mPollTimeout = PR_UINT16_MAX; So would it make sense to make the nsASocketHandler constructor require the condition, flags, and timeout? Seems to me like it would. Either way, though. >Index: src/nsSocketTransport2.cpp >+// Large default timeouts approx. behavior of no timeout. (It's better to let >+// the servers or host operating system time us out.) No need to abbreviate "approximately" >+#define DEFAULT_TIMEOUT_CONNECT (10 * 60) >+#define DEFAULT_TIMEOUT_READ_WRITE (10 * 60) Document that these are in seconds. >@@ -1084,20 +1095,21 @@ nsSocketTransport::InitiateSocket() >+ mPollFlags = mTimeouts[TIMEOUT_CONNECT]; You want to be setting mPollTimeout here. >+nsSocketTransport::SetTimeout(PRUint32 type, PRUint32 value) >+ mTimeouts[type] = >+ (PRUint16) (value > PR_UINT16_MAX ? PR_UINT16_MAX : value); Any reason not to use PR_MIN here? >Index: src/nsSocketTransportService2.h >+ PRInt32 Poll(PRUint32 *interval); // calls PR_Poll Document what "interval" is (and that it's an out param). sr=bzbarsky with those changes/nits.
Attachment #171328 - Flags: superreview?(bzbarsky) → superreview+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Its look like this patch cause this popup http://lahls.de/temp/pic/moz-alert.png (4KB)in mailnews. This appears first with my Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b) Gecko/20050126 today. This popup comes up every two minutes, when I don't do anything in mailnews. When mailnews is open in the background of the browser, Mozilla switch back to mailnews, when the popup opens :-( I use a local newsserver(Hamster) and an other guy who runs a local leafnode and using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050126 tells exact the same about his Mozilla. Maybe we only have to do something in about:config?
bienvenu: any thoughts? the mailnews code isn't using this API yet, right? perhaps i should push the default timeouts back to the maximum value.
(In reply to comment #20) > an other guy who runs a local leafnode and > using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050126 tells > exact the same about his Mozilla. I am the "other guy". :-) But on my mozilla the alert box comes up only once after doing nothing about ca. 10 minutes (not every two minutes).
OK, 10 minutes is the new default timeout value, so that surely explains the problem. Perhaps we are continuously polling on some mailnews socket. That seems reasonable, so it would make sense to bump the default timeouts back up to unlimited. Patch coming up...
Attached patch followup patch (deleted) — Splinter Review
bz: this is a fairly straightforward patch. the one thing that is missing is documentation in nsISocketTransport.idl for the magic timeout value that means no timeout. i was about to document that, but then i realized that it is a bit awkward to say that PR_UINT16_MAX (or any value above it) is that magic value since the interface uses PRUint32 for the timeout value. the 16-bit variable size is a detail of the implementation. so, then i thought hmm... i wonder if it would be better for the socket transport code to deal with timeout values in milliseconds instead of seconds. i could imagine some fancy applications wanting sub-second timeouts. ok, but that's a much larger change, which belongs in another bug. for now, i just want to fix the mailnews regression :-)
Attachment #172505 - Flags: superreview?(bzbarsky)
Attachment #172505 - Flags: review?(bzbarsky)
I'm not sure I understand the purpose of the change to nsServerSocket::OnSocketReady... Won't we still get values of -1 for outFlags? And won't that rather confuse the following bit-test?
> Won't we still get values of -1 for outFlags? > And won't that rather confuse the following bit-test? no, because the changes to nsSocketTransportService make it so that OnSocketReady is not called when mPollTimeout is set to PR_UINT16_MAX. in other words, if you say that your socket should never timeout (which was our old default), then we should not bother calling OnSocketReady (as before). So, since the default constructor for nsASocketHandler assigns PR_UINT16_MAX to mPollTimeout, there is no reason for nsServerSocket to do any special timeout handling.
Comment on attachment 172505 [details] [diff] [review] followup patch Oh, so there's no way to change the timeout on an nsServerSocket? If so, please add an assert to that effect in OnSocketReady? That is, assert that aFlags is not -1? r+sr=bzbarsky with that. For documentation... we could simply document that a value of PR_UINT32_MAX will guarantee no timeout and say that certain implementations may treat some smaller values as also meaning no timeout... Or we could just make the api use PRUint16.
Attachment #172505 - Flags: superreview?(bzbarsky)
Attachment #172505 - Flags: superreview+
Attachment #172505 - Flags: review?(bzbarsky)
Attachment #172505 - Flags: review+
The mailnews code isn't deliberately polling on any sockets, afaik, but we do cache nntp and imap connections but when we're not using them, and I suppose necko would be polling on those sockets when the connections are cached, even though we're not running any urls...so I suppose we need to turn off the timeouts when we're caching a connection or ignore the timeout error...is it safe to simply ignore the timeout in that case? Is the socket still useable after the timeout?
BTW: Should this bug left as "RESOLVED FIXED" further on?
(In reply to comment #28) > ... is it safe to simply ignore the timeout in that case? Is the socket still > useable after the timeout? no, unfortunately the socket transport API doesn't give consumers a way to ignore timeout events. implementations of nsASocketHandler can choose to ignore timeouts, but that isn't exposed beyond the bowels of necko. i'm going to proceed with the latest patch just to get things working again. we can then play around with adjusting timeouts for the various consumers as appropriate.
'followup patch' has been checked in on the trunk. please check tomorrows builds to verify that the problem has been solved. thanks!
(In reply to comment #31) > 'followup patch' has been checked in on the trunk. please check tomorrows > builds to verify that the problem has been solved. It seems to work: I got no timeout since 1h. :-)
(In reply to comment #28) > The mailnews code isn't deliberately polling on any sockets hm... not even for IMAP with IDLE? (or maybe I misunderstood how IDLE works)
> It seems to work: I got no timeout since 1h. :-) great! i'm glad to hear that it worked.
Attached patch make setTimeout take effect immediately (obsolete) (deleted) — Splinter Review
When imap connections are cached after finishing a url, the imap code needs to turn off the timeout or else the cached connections will timeout - but it didn't quite work, so per Darin's advice, post an event to the background thread to set mPollTimeout appropriately.
Attachment #172775 - Flags: review?(darin)
Can't that trigger some of the asserts the other patch has in it? In particular: + PRUint32 r = s.mHandler->mPollTimeout - s.mElapsedTime; + NS_ASSERTION(r <= s.mHandler->mPollTimeout, "oops");
Boris, do you mean you might hit that assertion if you shrink the timeout to a value smaller than the time we've been idle? My test case wouldn't trigger that because I set the timeout to max_uint32 when idle, but I suppose it's theoretically possible for code to do that.
Yes, that's what I mean. I don't see anything preventing that, certainly. That would have the effect of "r" becoming very large, meaning that shortening the timeout on something that has been idle for longer than the new value would effectively block it for a long time, unless there is something else in the queue to trigget a smaller return from PollTimeout().
Attached patch patch addressing bzbarsky's comment (obsolete) (deleted) — Splinter Review
with this patch, in that situation, we'll set the timeout to the new, lower timeout, which seems as valid a value as any.
Attachment #172775 - Attachment is obsolete: true
Attachment #172969 - Flags: superreview?(bzbarsky)
Attachment #172969 - Flags: review?(darin)
Attachment #172775 - Flags: review?(darin)
I'd think that 0 would be a much more reasonable value, no? I just don't know whether a value of 0 works here.
I thought about 0...0 was definitely a possibility in the existing code, but I'd like to hear from Darin if it should be OK before using it. From the imap point of view, using mPollTimeout makes sense, because whenever I set the timeout, I expect that to be the timeout from now going forward. I won't ever be idle for five minutes and then set the timeout to 10 seconds, and expect it to fire immediately. If that happened, I'd need to figure some way around it because my idle connection could timeout before I could close it gracefully. I'd have to send the data to end the idle and then change the timeout. I guess I'll have to do that anyway because my socket could timeout if we service any other active socket between the time I change the timeout and the time I read or write data to the socket. But then I won't get the correct timeout on the write...I could make setTimeout clear mElapsedTime as well, which would solve my problem and then I wouldn't care if we used 0 or mPollTimeout.
hmm... david: but in that case, wouldn't you have written something to the server before you start waiting for a response? in that case mElapsedTime would have been reset to zero, so setting mPollTimeout to 10 sec should be fine. the idea is that mPollTimeout is always relative to mElapsedTime, which is a measure of how long the socket has been polling. so, it should be an edge case that mElapsedTime is ever greater than mPollTimeout. so, i think bz's suggestion of using 0 in that case is reasonable and perhaps the most consistent.
In the IMAP IDLE case, we send the IDLE command, and then we do an async wait for data. Assume no data comes back for 10 minutes, and then the user shuts down. At that point, I want to set the read+write timeout to a very short value (5 seconds or so) and then write DONE, read the response, write logout, read the response. I could write the DONE, and then change the timeout before reading the response, assuming that a WRITE timeout isn't that useful.
> I could write the DONE, and then change the timeout before reading the > response, assuming that a WRITE timeout isn't that useful. This seems like the right approach to me. The newly set timeout would apply to the WRITE operation because of the PostEvent call in SetTimeout, so I think it should be the right solution :)
Attached patch fix using 0 interval... (deleted) — Splinter Review
Attachment #172969 - Attachment is obsolete: true
Attachment #173043 - Flags: superreview?(darin)
Attachment #173043 - Flags: review?(bzbarsky)
Attachment #172969 - Flags: superreview?(bzbarsky)
Attachment #172969 - Flags: review?(darin)
Comment on attachment 173043 [details] [diff] [review] fix using 0 interval... r=bzbarsky
Attachment #173043 - Flags: review?(bzbarsky) → review+
Attachment #173043 - Flags: superreview?(darin) → superreview+
With a CVS build (checkout after comment #46) I get timeouts again...
Using Builds with checked in Patch, I have massive Time-Out-Popups with different news-Servers, first seen with Tinderbox/Creature Build 2005013118 and using now 2005013123. Direct Connection to News-Server (no Hamster etc.) using Online-Mode
Looks like fall-out from Bug 189363 (a patch was checked in there)?
fix for making setTimeout interval apply immediately checked in. I'll look into the news problem.
basically, news has to change the timeout before caching connections, so now that I've checked in the last patch, I can fix news.
JFYI: With the latest Build I got no timeouts in news any more.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: