I'm running a simulation with one client machine and four server
(all in the same LAN and running Fedora Core 2 with kernel
2.6.5-1.358smp). The client sends about 1.2 million requests (each of
432 bytes) through a TCP connection to servers and servers read it.
In my first simulation, the client randomly distribute each request to
of the four servers and it works fine. However, in my 2nd simulation,
where the clients sends all the requests to a central distributor
on one of the four servers) and the central guy then distribute each
one of the four servers, the TCP connection between the client and the
distributor seems to hang, after sometime (from a few minutes to half
hour). The client stops writing requests to the socket and the central
stops reading from the socket.
But, if I launch any other TCP connection request (e.g., telnet
xx.xx.xx.xx 80) to the central distributor machine, the program resumes
from wehre it hangs (client starts to write the socket and the central
distributor starts to read the sockets again), although it would hang
after a while unless I redo another tcp connection to that machine.
Anyone could provide a clue/hint to solve this problem? Thanks. BTW, I
observe that there are about 12 tcp connections in the TIME WAIT status
central distributor server, it is from another thread of the server
process where it periodically opens a new socket, sends a performance
report through that socket to a remote machine, and then closes the
immediately. I guess it should not be the reason of the above problem
not quite sure.