n Tue, 09 Sep 2008 11:59:51 -0700, A. W. Dunstan < XXXX@XXXXX.COM > wrote:
Fundamentally, the issue is that you're relying on a property that isn't
reliable. It won't return the instantaneous state of the socket, but
rather the state known at some previous time. Even if it did return the
instantaneous state, that state could change between the time you check it
and the time you try to rely on it.
When Connect() returns, the socket _has_ successfully connected to the
remote endpoint. However, you can't make any assumptions beyond that.
The internal data associated with the Connected property might not be
updated yet or the connection could wind up getting reset immediately
after connection. You simply need to write your code so that it can deal
with errors during later operations.
Speaking of which, given that, it really doesn't make sense to keep
recreating connections just because of a reliability issue. If anything,
the extra overhead _increases_ your exposure to reliability issues. Every
time any data is moved from one endpoint to the other, you run the risk of
a reliability problem causing an error. The only thing that recreating
connections does is cause _more_ data to be moved, increasing the risk of
Likewise, using the KeepAlive option will also have only that effect.
Given that the connections appear to be transient anyway, it's not clear
why you're using that. But given that you are, all it does is increase
your chances of having an error occur when one otherwise would not have.
Keep-alives can be useful in some very specific scenarios, but it's not an
appropriate approach to error management. That's not what it's there for.
Setting the Blocking property to "true" is superfluous. That's the
Setting the NoDelay option is generally a bad idea, and it is especially
so in your particular scenario. It's bad generally because it decreases
efficiency of the connection. It's especially bad in your case for the
same reason that recreating the connection is: it has the effect of
_adding_ to the amount of data that needs to be moved, which increases
your odds of getting an error on the unreliable connection.
Finally, if you care about reliability, you should not rely on the
LingerState of the socket. Leave the socket as the default, and then
after you've called Shutdown(), call Receive() until it returns 0
(assuming the remote endpoint isn't actually going to send you anything,
that will happen as soon as the remote endpoint has received all of your
data). Unless you do that, you have no assurance that the remote endpoint
has actually received all of the data you sent, even using a lingering
In other words, I would take out almost all of the implementation details
you've posted, including the part of the design in which a new connection
is repeatedly reestablished. Just design the code so that it can recover
gracefully from a reset connection and otherwise leave the socket
connected. Don't bother with lingering, keep-alives, or disabling Nagle
(i.e. "NoDelay"), and do make sure you do a true graceful shutdown by
calling Receive() after Shutdown().
Right now, the code is written in a way that is likely to maximize the
number of errors. You're working against TCP, rather than with it.
That's never a good approach anyway, but especially wh