by Adrian H » Tue, 31 Aug 2010 17:13:29
Oddly enough, I'd just modified my test client to connect but not send
anything, then ran the same test again while strace'ing the server. Sure
enough, here's what strace logged:
5258 15:35:52.064705 gettimeofday({1283153752, 64746}, NULL) = 0
5258 15:35:52.064803 select(1024, [3 4 5 ... 1021 1022 1023], [], [],
{6, 71149}) = 1 (in [3], left {6, 71149})
5258 15:35:52.066910 gettimeofday({1283153752, 66967}, NULL) = 0
5258 15:35:52.067069 accept(3, {sa_family=AF_INET, sin_port=htons
(35598), sin_addr=inet_addr("192.168.1.8")}, [16]) = 1024
5258 15:35:52.067281 fcntl64(1024, F_SETFD, FD_CLOEXEC) = 0
5258 15:35:52.067624 gettimeofday({1283153752, 67672}, NULL) = 0
5258 15:35:52.067887 write(2, "20100830.153552 CONNECT: 192.168"..., 55)
= 55
5258 15:35:52.068081 write(2, "\n", 1) = 1
5258 15:35:52.068199 gettimeofday({1283153752, 68240}, NULL) = 0
5258 15:35:52.068297 select(1025, [3 4 5 ... 1021 1022 1023 1024], [0],
[1024], {6, 67655}) = 2 (in [3 1024], left {6, 67655})
5258 15:35:52.070400 gettimeofday({1283153752, 70456}, NULL) = 0
5258 15:35:52.070732 recv(1024,
Looks like the select() call only manipulated the first 1024 fd bits,
hence the call to recv() on an FD which blocks with nothing to read...
So, it's clearly important to ensure that, until/unless Tcl switches to
poll() and its ilk, forcing a 1024-open-FD limit is the right thing to do.
I now have the unhappy task of asking the client to either re-engineer
their protocol to limit the number of simultaneous connections, or write
a poll()-based proxy. 8-)
Best Regards,
Adrian