Closed Bug 278144 Opened 20 years ago Closed 20 years ago

Support socket i/o timeouts

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.8beta1

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

Attachments

(3 files, 4 obsolete files)

prototype patch 20 years ago Darin Fisher (deleted), patch		Details \| Diff \| Splinter Review
v1 patch 20 years ago Darin Fisher (deleted), patch		Details \| Diff \| Splinter Review
v1.1 patch 20 years ago Darin Fisher (deleted), patch	Bienvenu : review+ bzbarsky : superreview+	Details \| Diff \| Splinter Review
followup patch 20 years ago Darin Fisher (deleted), patch	bzbarsky : review+ bzbarsky : superreview+	Details \| Diff \| Splinter Review
make setTimeout take effect immediately 20 years ago David :Bienvenu (deleted), patch		Details \| Diff \| Splinter Review
patch addressing bzbarsky's comment 20 years ago David :Bienvenu (deleted), patch		Details \| Diff \| Splinter Review
fix using 0 interval... 20 years ago David :Bienvenu (deleted), patch	bzbarsky : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review

Darin Fisher

Assignee

Description

•

20 years ago

Support socket i/o timeouts Mailnews protocols could make use of configurable timeout support for socket i/o. We'll want separate timeouts for connection establishment and read/write.

Darin Fisher

Assignee

Updated

•

20 years ago

Severity: normal → enhancement

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla1.8beta

Darin Fisher

Assignee

Comment 1

•

20 years ago

My plan is to use PR_Poll's timeout parameter to implement this feature. Each socket will have a current timeout associated with it, and we'll compute the minimum before calling PR_Poll. When PR_Poll returns we'll check for expired sockets, and trigger the associated socket handler. It might make sense to add a new method on the socket handler for timeout errors (instead of trying to reuse OnSocketReady).

Darin Fisher

Assignee

Comment 2

•

20 years ago

Attached patch prototype patch (obsolete) (deleted) — Details — Splinter Review

Here's a prototype patch. NOTE: I haven't verified that it actually works! ;-) I've only confirmed that the socket timeout doesn't fire needlessly when loading a few web pages. More testing is definitely in order. bienvenu: you offered to help out with that, right? ;-)

David :Bienvenu

Comment 3

•

20 years ago

I'm trying this out on my laptop now. But I lied when I said I didn't want to be able to change the read timeout - one of things I'm trying to fix is the imap shutdown code - currently, we send the logout and don't wait for the server response before closing the socket. I'd like to send the logout and then do a blocking read for the response with a very short timeout (a few seconds) so that we don't block shutdown for too long, but do cleanup the connection correctly almost all the time. Looking at your code, it doesn't look like it would be too hard to extend it to allow the timeout to be changed and propagated...

Darin Fisher

Assignee

Comment 4

•

20 years ago

yeah, it shouldn't be that hard to allow the change to be made dynamically.

David :Bienvenu

Comment 5

•

20 years ago

I hit a PR_Assert in SocketConnectionContinue(), I think, line 339, PR_Assert(out_flags & PR_POLL_WRITE). I don't know if this is related, but I can't remember seeing this PR_Assert in a very long time. I wasn't testing the timeout, just running with the patch. I'll see if it happens again and see if I can figure out more what's going on. The out_flags were 8 when I hit that assertion.

David :Bienvenu

Comment 6

•

20 years ago

I didn't hit my breakpoints on the code in your patch that gets called when we timeout (nsSocketTransport2:1433, in OnSocketReady) or nsSocketTransportService::572, where we call OnSocketReady with -1. It could be that since I did a complete repull and build, that some other changes went in. However, it does seem timeout related, in that I'm not doing anything when it hits that assertion, and it might be a server connection that has timed out...

David :Bienvenu

Comment 7

•

20 years ago

I think I see the problem - mTimeouts[2], but TIMEOUT_CONNECT and TIMEOUT_READ_WRITE are 1 and 2, and are used as indexes into mTimeouts...in my tree, I'll try changing those to 0 and 1...

David :Bienvenu

Comment 8

•

20 years ago

I haven't hit that PR_Assert anymore after change the values from 1 and 2 to 0 and 1. I tried a connection timeout by leaving the dial-up connection dialog up for more than 2 minutes, but it didn't trigger a timeout, I think because the socket transport thread is blocked in nsRasAutoDial (I think you've talked about doing that on a different thread - that could be really useful to us because I think we have issues with that code blocking after laptops come out of hibernation, for example)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 9

•

20 years ago

Comment on attachment 171098 [details] [diff] [review] prototype patch just as a reminder, base/public/nsISocketTransport.idl needs a new iid ;)

Darin Fisher

Assignee

Comment 10

•

20 years ago

Attached patch v1 patch (obsolete) (deleted) — Details — Splinter Review

this one works. (looks like i left a XXX comment in the server socket code..)

Attachment #171098 - Attachment is obsolete: true

David :Bienvenu

Comment 11

•

20 years ago

woo woo, this worked for imap, using netcat as a fake imap server

Darin Fisher

Assignee

Comment 12

•

20 years ago

Great! There's a bit of misc cleanup I want to do before asking for reviews.

Darin Fisher

Assignee

Comment 13

•

20 years ago

Attached patch v1.1 patch (deleted) — Details — Splinter Review

Attachment #171207 - Attachment is obsolete: true

Attachment #171328 - Flags: superreview?(bzbarsky)

Attachment #171328 - Flags: review?(bienvenu)

Boris Zbarsky [:bzbarsky]

Comment 14

•

20 years ago

It'll take me a few days to get to this....

David :Bienvenu

Comment 15

•

20 years ago

Comment on attachment 171328 [details] [diff] [review] v1.1 patch thx a lot for doing this, Darin! Should I try running with this latest patch, or is it essentially the same as the last one?

Attachment #171328 - Flags: review?(bienvenu) → review+

Darin Fisher

Assignee

Comment 16

•

20 years ago

> Should I try running with this latest patch, > or is it essentially the same as the last one? It is essentially the same thing, but if you have time to verify it that'd be great :)

David :Bienvenu

Comment 17

•

20 years ago

I've been running with the patch and haven't seen any problems. I haven't forced any timeouts, however.

Boris Zbarsky [:bzbarsky]

Comment 18

•

20 years ago

Comment on attachment 171328 [details] [diff] [review] v1.1 patch >Index: src/nsServerSocket.cpp > nsServerSocket::nsServerSocket() > : mLock(nsnull) > , mFD(nsnull) > , mAttached(PR_FALSE) > { > mCondition = NS_OK; > mPollFlags = 0; >+ mPollTimeout = PR_UINT16_MAX; So would it make sense to make the nsASocketHandler constructor require the condition, flags, and timeout? Seems to me like it would. Either way, though. >Index: src/nsSocketTransport2.cpp >+// Large default timeouts approx. behavior of no timeout. (It's better to let >+// the servers or host operating system time us out.) No need to abbreviate "approximately" >+#define DEFAULT_TIMEOUT_CONNECT (10 * 60) >+#define DEFAULT_TIMEOUT_READ_WRITE (10 * 60) Document that these are in seconds. >@@ -1084,20 +1095,21 @@ nsSocketTransport::InitiateSocket() >+ mPollFlags = mTimeouts[TIMEOUT_CONNECT]; You want to be setting mPollTimeout here. >+nsSocketTransport::SetTimeout(PRUint32 type, PRUint32 value) >+ mTimeouts[type] = >+ (PRUint16) (value > PR_UINT16_MAX ? PR_UINT16_MAX : value); Any reason not to use PR_MIN here? >Index: src/nsSocketTransportService2.h >+ PRInt32 Poll(PRUint32 *interval); // calls PR_Poll Document what "interval" is (and that it's an out param). sr=bzbarsky with those changes/nits.

Attachment #171328 - Flags: superreview?(bzbarsky) → superreview+

Darin Fisher

Assignee

Comment 19

•

20 years ago

fixed-on-trunk

Status: ASSIGNED → RESOLVED

Closed: 20 years ago

Resolution: --- → FIXED

Ruediger Lahl

Comment 20

•

20 years ago

Its look like this patch cause this popup http://lahls.de/temp/pic/moz-alert.png (4KB)in mailnews. This appears first with my Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b) Gecko/20050126 today. This popup comes up every two minutes, when I don't do anything in mailnews. When mailnews is open in the background of the browser, Mozilla switch back to mailnews, when the popup opens :-( I use a local newsserver(Hamster) and an other guy who runs a local leafnode and using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050126 tells exact the same about his Mozilla. Maybe we only have to do something in about:config?

Darin Fisher

Assignee

Comment 21

•

20 years ago

bienvenu: any thoughts? the mailnews code isn't using this API yet, right? perhaps i should push the default timeouts back to the maximum value.

Volker Moell

Comment 22

•

20 years ago

(In reply to comment #20) > an other guy who runs a local leafnode and > using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050126 tells > exact the same about his Mozilla. I am the "other guy". :-) But on my mozilla the alert box comes up only once after doing nothing about ca. 10 minutes (not every two minutes).

Darin Fisher

Assignee

Comment 23

•

20 years ago

OK, 10 minutes is the new default timeout value, so that surely explains the problem. Perhaps we are continuously polling on some mailnews socket. That seems reasonable, so it would make sense to bump the default timeouts back up to unlimited. Patch coming up...

Darin Fisher

Assignee

Comment 24

•

20 years ago

Attached patch followup patch (deleted) — Details — Splinter Review

bz: this is a fairly straightforward patch. the one thing that is missing is documentation in nsISocketTransport.idl for the magic timeout value that means no timeout. i was about to document that, but then i realized that it is a bit awkward to say that PR_UINT16_MAX (or any value above it) is that magic value since the interface uses PRUint32 for the timeout value. the 16-bit variable size is a detail of the implementation. so, then i thought hmm... i wonder if it would be better for the socket transport code to deal with timeout values in milliseconds instead of seconds. i could imagine some fancy applications wanting sub-second timeouts. ok, but that's a much larger change, which belongs in another bug. for now, i just want to fix the mailnews regression :-)

Attachment #172505 - Flags: superreview?(bzbarsky)

Attachment #172505 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 25

•

20 years ago

I'm not sure I understand the purpose of the change to nsServerSocket::OnSocketReady... Won't we still get values of -1 for outFlags? And won't that rather confuse the following bit-test?

Darin Fisher

Assignee

Comment 26

•

20 years ago

> Won't we still get values of -1 for outFlags? > And won't that rather confuse the following bit-test? no, because the changes to nsSocketTransportService make it so that OnSocketReady is not called when mPollTimeout is set to PR_UINT16_MAX. in other words, if you say that your socket should never timeout (which was our old default), then we should not bother calling OnSocketReady (as before). So, since the default constructor for nsASocketHandler assigns PR_UINT16_MAX to mPollTimeout, there is no reason for nsServerSocket to do any special timeout handling.

Boris Zbarsky [:bzbarsky]

Comment 27

•

20 years ago

Comment on attachment 172505 [details] [diff] [review] followup patch Oh, so there's no way to change the timeout on an nsServerSocket? If so, please add an assert to that effect in OnSocketReady? That is, assert that aFlags is not -1? r+sr=bzbarsky with that. For documentation... we could simply document that a value of PR_UINT32_MAX will guarantee no timeout and say that certain implementations may treat some smaller values as also meaning no timeout... Or we could just make the api use PRUint16.

Attachment #172505 - Flags: superreview?(bzbarsky)

Attachment #172505 - Flags: superreview+

Attachment #172505 - Flags: review?(bzbarsky)

Attachment #172505 - Flags: review+

David :Bienvenu

Comment 28

•

20 years ago

The mailnews code isn't deliberately polling on any sockets, afaik, but we do cache nntp and imap connections but when we're not using them, and I suppose necko would be polling on those sockets when the connections are cached, even though we're not running any urls...so I suppose we need to turn off the timeouts when we're caching a connection or ignore the timeout error...is it safe to simply ignore the timeout in that case? Is the socket still useable after the timeout?

Volker Moell

Comment 29

•

20 years ago

BTW: Should this bug left as "RESOLVED FIXED" further on?

Darin Fisher

Assignee

Comment 30

•

20 years ago

(In reply to comment #28) > ... is it safe to simply ignore the timeout in that case? Is the socket still > useable after the timeout? no, unfortunately the socket transport API doesn't give consumers a way to ignore timeout events. implementations of nsASocketHandler can choose to ignore timeouts, but that isn't exposed beyond the bowels of necko. i'm going to proceed with the latest patch just to get things working again. we can then play around with adjusting timeouts for the various consumers as appropriate.

Darin Fisher

Assignee

Comment 31

•

20 years ago

'followup patch' has been checked in on the trunk. please check tomorrows builds to verify that the problem has been solved. thanks!

Volker Moell

Comment 32

•

20 years ago

(In reply to comment #31) > 'followup patch' has been checked in on the trunk. please check tomorrows > builds to verify that the problem has been solved. It seems to work: I got no timeout since 1h. :-)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 33

•

20 years ago

(In reply to comment #28) > The mailnews code isn't deliberately polling on any sockets hm... not even for IMAP with IDLE? (or maybe I misunderstood how IDLE works)

Darin Fisher

Assignee

Comment 34

•

20 years ago

> It seems to work: I got no timeout since 1h. :-) great! i'm glad to hear that it worked.

David :Bienvenu

Comment 35

•

20 years ago

Attached patch make setTimeout take effect immediately (obsolete) (deleted) — Details — Splinter Review

When imap connections are cached after finishing a url, the imap code needs to turn off the timeout or else the cached connections will timeout - but it didn't quite work, so per Darin's advice, post an event to the background thread to set mPollTimeout appropriately.

Attachment #172775 - Flags: review?(darin)

Boris Zbarsky [:bzbarsky]

Comment 36

•

20 years ago

Can't that trigger some of the asserts the other patch has in it? In particular: + PRUint32 r = s.mHandler->mPollTimeout - s.mElapsedTime; + NS_ASSERTION(r <= s.mHandler->mPollTimeout, "oops");

David :Bienvenu

Comment 37

•

20 years ago

Boris, do you mean you might hit that assertion if you shrink the timeout to a value smaller than the time we've been idle? My test case wouldn't trigger that because I set the timeout to max_uint32 when idle, but I suppose it's theoretically possible for code to do that.

Boris Zbarsky [:bzbarsky]

Comment 38

•

20 years ago

Yes, that's what I mean. I don't see anything preventing that, certainly. That would have the effect of "r" becoming very large, meaning that shortening the timeout on something that has been idle for longer than the new value would effectively block it for a long time, unless there is something else in the queue to trigget a smaller return from PollTimeout().

David :Bienvenu

Comment 39

•

20 years ago

Attached patch patch addressing bzbarsky's comment (obsolete) (deleted) — Details — Splinter Review

with this patch, in that situation, we'll set the timeout to the new, lower timeout, which seems as valid a value as any.

Attachment #172775 - Attachment is obsolete: true

Attachment #172969 - Flags: superreview?(bzbarsky)

Attachment #172969 - Flags: review?(darin)

David :Bienvenu

Updated

•

20 years ago

Attachment #172775 - Flags: review?(darin)

Boris Zbarsky [:bzbarsky]

Comment 40

•

20 years ago

I'd think that 0 would be a much more reasonable value, no? I just don't know whether a value of 0 works here.

David :Bienvenu

Comment 41

•

20 years ago

I thought about 0...0 was definitely a possibility in the existing code, but I'd like to hear from Darin if it should be OK before using it. From the imap point of view, using mPollTimeout makes sense, because whenever I set the timeout, I expect that to be the timeout from now going forward. I won't ever be idle for five minutes and then set the timeout to 10 seconds, and expect it to fire immediately. If that happened, I'd need to figure some way around it because my idle connection could timeout before I could close it gracefully. I'd have to send the data to end the idle and then change the timeout. I guess I'll have to do that anyway because my socket could timeout if we service any other active socket between the time I change the timeout and the time I read or write data to the socket. But then I won't get the correct timeout on the write...I could make setTimeout clear mElapsedTime as well, which would solve my problem and then I wouldn't care if we used 0 or mPollTimeout.

Darin Fisher

Assignee

Comment 42

•

20 years ago

hmm... david: but in that case, wouldn't you have written something to the server before you start waiting for a response? in that case mElapsedTime would have been reset to zero, so setting mPollTimeout to 10 sec should be fine. the idea is that mPollTimeout is always relative to mElapsedTime, which is a measure of how long the socket has been polling. so, it should be an edge case that mElapsedTime is ever greater than mPollTimeout. so, i think bz's suggestion of using 0 in that case is reasonable and perhaps the most consistent.

David :Bienvenu

Comment 43

•

20 years ago

In the IMAP IDLE case, we send the IDLE command, and then we do an async wait for data. Assume no data comes back for 10 minutes, and then the user shuts down. At that point, I want to set the read+write timeout to a very short value (5 seconds or so) and then write DONE, read the response, write logout, read the response. I could write the DONE, and then change the timeout before reading the response, assuming that a WRITE timeout isn't that useful.

Darin Fisher

Assignee

Comment 44

•

20 years ago

> I could write the DONE, and then change the timeout before reading the > response, assuming that a WRITE timeout isn't that useful. This seems like the right approach to me. The newly set timeout would apply to the WRITE operation because of the PostEvent call in SetTimeout, so I think it should be the right solution :)

David :Bienvenu

Comment 45

•

20 years ago

Attached patch fix using 0 interval... (deleted) — Details — Splinter Review

Attachment #172969 - Attachment is obsolete: true

Attachment #173043 - Flags: superreview?(darin)

Attachment #173043 - Flags: review?(bzbarsky)

David :Bienvenu

Updated

•

20 years ago

Attachment #172969 - Flags: superreview?(bzbarsky)

Attachment #172969 - Flags: review?(darin)

Boris Zbarsky [:bzbarsky]

Comment 46

•

20 years ago

Comment on attachment 173043 [details] [diff] [review] fix using 0 interval... r=bzbarsky

Attachment #173043 - Flags: review?(bzbarsky) → review+

Darin Fisher

Assignee

Updated

•

20 years ago

Attachment #173043 - Flags: superreview?(darin) → superreview+

Volker Moell

Comment 47

•

20 years ago

With a CVS build (checkout after comment #46) I get timeouts again...

Tobias Fischer

Comment 48

•

20 years ago

Using Builds with checked in Patch, I have massive Time-Out-Popups with different news-Servers, first seen with Tinderbox/Creature Build 2005013118 and using now 2005013123. Direct Connection to News-Server (no Hamster etc.) using Online-Mode

Frank Wein [:mcsmurf]

Comment 49

•

20 years ago

Looks like fall-out from Bug 189363 (a patch was checked in there)?

David :Bienvenu

Comment 50

•

20 years ago

fix for making setTimeout interval apply immediately checked in. I'll look into the news problem.

David :Bienvenu

Comment 51

•

20 years ago

basically, news has to change the timeout before caching connections, so now that I've checked in the last patch, I can fix news.

Volker Moell

Comment 52

•

20 years ago

JFYI: With the latest Build I got no timeouts in news any more.

You need to log in before you can comment on or make changes to this bug.