close() of active socket does not work on FreeBSD 6

close() of active socket does not work on FreeBSD 6

Post by arne » Tue, 12 Dec 2006 23:49:04


I've had problems with some tests *** on FreeBSD 6/amd64. This happens
both with diablo-1.5.0_07-b01 and the java/jdk15 compiled from ports.

After much digging we've determined that the root cause is that
the guarantee in the socket.close() API, see the documentation at
http://www.yqcomputer.com/ #close()
isn't fulfulled - the thread blocked in I/O on the socket doesn't wake up.

Here's a pretty small test program that demonstrates the problem (given
that you're running sshd on port 22, if not change the port number to
something that the program can connect to). Is this a known problem?
Does it happen for everybody on FreeBSD 6?



import java.io.*;
import java.net.*;
import java.util.*;
import java.util.logging.*;

public class FooConn extends Thread {
private boolean alive;
public final int port;
private Socket socket = null;

public FooConn(int port) {
super("FooConn:" + port);
this.port = port;
this.alive = true;
}
private void connect() {
while (socket == null) {
try {
socket = new Socket("localhost", port);
} catch(IOException e) {
System.err.println("Connect failed: " + e);
try { Thread.sleep(1000); } catch(InterruptedException ie)
{}
}
}
}
public void disconnect() throws IOException, InterruptedException {
alive = false;
System.out.println("closing socket");
socket.close();
System.out.println("calling join");
join();
}
public void run() {
while (alive) {
if (socket == null) {
System.out.println("socket null, connect");
connect();
}
try {
int b = socket.getInputStream().read();
System.out.println("got byte "+ b);
} catch (IOException e) {
System.out.println("IOException, set socket to null");
socket = null; //triggers reconnect
} catch (RuntimeException e) {
System.err.println("RuntimeException "+e);
return;
}
}
}
public static void main(String[] args) throws IOException {
try {
FooConn conn = new FooConn(22);
conn.start();
Thread.sleep(1000);
conn.disconnect();
} catch(InterruptedException ie) {}
}
}
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by arne » Wed, 13 Dec 2006 00:08:14


Looking at the Java VM source code it does some tricks with dup2() to
reopen the close()'d filedescriptor, making it point to a filedescriptor
that's pre-connected to a closed socket.

A small C program that duplicates this (using pipes to make it a bit
simpler) follows. I'm not sure if any standards demand that this
works like it used to on FreeBSD 4 / libc_r, but since Java uses it it
would be really nice if this could be made to work in FreeBSD 6 (libthr
and libpthread). Or maybe somebody has another suggestions on how to
implement the Java close() semantics?

Anyway, the following C program works as intended on FreeBSD 4,
hangs on FreeBSD 6 (amd64), compiled with:
cc -Wall -pthread read_dup2.c -o read_dup2


#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

int p[2];

void *run(void *arg) {
ssize_t res;
char tmp[128];
fprintf(stderr, "reading...\n");
res = read(p[0], tmp, sizeof(tmp));
fprintf(stderr, "read result: %d\n", (int)res);
if (res < 0) {
perror("read");
}
return arg;
}

int main(int argc, char **argv) {
pthread_t t;
int d = open("/dev/null", O_RDONLY);
if (pipe(p) != 0) {
perror("pipe");
return 1;
}
if (pthread_create(&t, NULL, run, NULL) != 0) {
perror("thread create");
return 1;
}
sleep(1);
d = open("/dev/null", O_RDONLY);
if (d < 0) {
perror("open dev null");
exit(1);
}
if (dup2(d, p[0]) < 0) {
perror("dup2");
exit(1);
}
if (pthread_join(t, NULL) != 0) {
perror("thread join");
exit(1);
}
return 0;
}
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "

 
 
 

close() of active socket does not work on FreeBSD 6

Post by achil » Wed, 13 Dec 2006 00:32:54

D3=F4=E9=F2 =C4=E5=F5=F4=DD=F1=E1 11 =C4=E5=EA=DD=EC=E2=F1=E9=EF=F2 2006 1=
6:46, =EF/=E7 Arne H. Juul =DD=E3=F1=E1=F8=E5:

In my systems,
1.4.2-p7, diablo-1.5.0_07-b00 have this problem.
However with linux 1.4.2_12-b03 right after socket.close(), IOException is=
=20
thrown and caught by the FooConn thread.
e)

=2D-=20
Achilleas Mantzios
_______________________________________________
XXXX@XXXXX.COM mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-java
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by achil » Wed, 13 Dec 2006 00:49:11

=D3=F4=E9=F2 =C4=E5=F5=F4=DD=F1=E1 11 =C4=E5=EA=DD=EC=E2=F1=E9=EF=F2 2006 1=
7:07, =EF/=E7 Arne H. Juul =DD=E3=F1=E1=F8=E5:


I forgot to mention that all my tests were on 386. (So most probably it's n=
ot=20
amd64 related).
And indeed in FreeBSD 6, by mapping libpthread.so.2 to libc_r.so.6
=46ooConn seems to work correctly.
Only with libthr.so.2 / libpthread.so.2 the problem exists.


=2D-=20
Achilleas Mantzios
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by kostikbe » Wed, 13 Dec 2006 02:12:26


--7qSK/uQB79J36Y4o
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable



=20

I think that -arch@ is proper ML to discuss the issue.

Your test example hangs becase read() takes one more hold count on the
file descriptor operated upon. As result, when calling close, f_count
of the rpipe (aka p[0]) is 2, close() decrements it, f_count becomes
1. Since f_count > 0, fdrop_locked simply returns instead of calling
fo_close (see kern_descrip.c).

I cannot find the statement in SUSv3 that would require interruption of
the read() upon close() from another thread; this looks like undefined
behaviour from the standard point of view.

I think that JVM is more appropriate place for fix, but others may have
different view point.


--7qSK/uQB79J36Y4o
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFfZEzC3+MBN1Mb4gRAqKBAJ0e2xoeobSLeRZjJQbFUs5/uXX3ywCgqkJM
ZoWAOIyyznh7U1KYx8vpB8s=
=CgYu
-----END PGP SIGNATURE-----

--7qSK/uQB79J36Y4o--
 
 
 

close() of active socket does not work on FreeBSD 6

Post by arne » Wed, 13 Dec 2006 07:53:06


The best authority I've found says that the standards are silent (so
the current FreeBSD 6 behaviour is allowed), I'm asking whether it is
best practice and why it's changed since FreeBSD 4.


If it was just the JVM I would agree, but any threaded program that uses
blocking I/O in some threads will probably need the same kind of handling
at some point. And if you think about what that handling looks like,
it's not exactly pretty:

* when calling any potentially blocking system call (read/readv,
write/writev, recv/recvfrom/recvmsg, send/sendto/sendmsg, accept,
connect, poll, select, maybe others that I didn't think of) the
application must:

** take a mutex
** remember in some structure (linked list or similar) keyed off
the file descriptor that "this thread will now do blocking I/O"
** release the mutex
** perform the actual operation
** take the mutex again
** check if the operation was interrupted in a special way, if so
return with EBADF
** release the mutex

* instead of calling close() and dup2() the application must:
** take the mutex
** for each thread in the FD-associated structure, interrupt it
in some special way (I'm guessing that setting a special flag
and then sending SIGIO should work).
** actually do the close() / dup2()
** release the mutex

This is exactly the sort of issue that should be solved by the
thread library / kernel threads implementation and not in every
threaded application that needs it, in my view.

- Arne H. J.
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by davidx » Wed, 13 Dec 2006 09:17:21


<snip>
It should not be done in new thread library, do you want a bloat
and error-prone thread library ? Instead if this semantic is really
necessary, it should be done in kernel.


David Xu
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by arne » Wed, 13 Dec 2006 09:26:36

n Mon, 11 Dec 2006, Arne H. Juul wrote:

after more hours of digging in standards, mailing list archives, and
bug tickets, it looks like the best thing for now is to just copy the
workaround used in the Java VM already for linux, something like this
should work:

diff -ruN jdk-1_5_0_09.b3/j2se/make/java/net/FILES_c.gmk jdk-1_5_0_09.b3-ahj8/j2se/make/java/net/FILES_c.gmk
--- jdk-1_5_0_09.b3/j2se/make/java/net/FILES_c.gmk Sun Oct 15 12:44:55 2006
+++ jdk-1_5_0_09.b3-ahj8/j2se/make/java/net/FILES_c.gmk Mon Dec 11 23:38:44 2006
@@ -26,3 +26,7 @@
ifeq ($(PLATFORM), linux)
FILES_c += $(CTARGDIR)linux_close.c
endif
+
+ifeq ($(PLATFORM), bsd)
+ FILES_c += $(CTARGDIR)bsd_close.c
+endif
diff -ruN jdk-1_5_0_09.b3/j2se/src/solaris/native/java/net/net_util_md.h jdk-1_5_0_09.b3-ahj8/j2se/src/solaris/native/java/net/net_util_md.h
--- jdk-1_5_0_09.b3/j2se/src/solaris/native/java/net/net_util_md.h Sun Oct 15 12:48:14 2006
+++ jdk-1_5_0_09.b3-ahj8/j2se/src/solaris/native/java/net/net_util_md.h Mon Dec 11 23:40:51 2006
@@ -39,7 +39,7 @@
#endif


-#ifdef __linux__
+#if defined(__linux__) || defined(_ALLBSD_SOURCE)
extern int NET_Timeout(int s, long timeout);
extern int NET_Read(int s, void* buf, size_t len);
extern int NET_RecvFrom(int s, void *buf, int len, unsigned int flags,
diff -ruN jdk-1_5_0_09.b3/j2se/src/solaris/native/java/net/bsd_close.c jdk-1_5_0_09.b3-ahj8/j2se/src/solaris/native/java/net/bsd_close.c
--- jdk-1_5_0_09.b3/j2se/src/solaris/native/java/net/bsd_close.c Thu Jan 1 01:00:00 1970
+++ jdk-1_5_0_09.b3-ahj8/j2se/src/solaris/native/java/net/bsd_close.c Mon Dec 11 23:39:45 2006
@@ -0,0 +1,367 @@
+/*
+ * @(#)bsd_close.c 1.7 03/12/19
+ *
+ * Copyright 2004 Sun Microsystems, Inc. All rights reserved.
+ * SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
+ */
+
+/* XXXBSD: almost exact copy of linux_close.c */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <pthread.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <sys/uio.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include <sys/poll.h>
+
+/*
+ * Stack allocated by thread when doing blocking operation
+ */
+typedef struct threadEntry {
+ pthread_t thr; /* this thread */
+ struct threadEntry *next; /* next thread */
+ int intr; /* interrupted */
+} threadEntry_t;
+
+/*
+ * Heap allocated during initialized - one entry per fd
+ */
+typedef struct {
+ pthread_mutex_t lock; /* fd lock */
+ threadEntry_t *threads; /* threads blocked on fd */
+} fdEntry_t;
+
+/*
+ * Signal to unblock thread
+ */
+static int sigWakeup = SIGIO;
+
+/*
+ * The fd table and the number of file descriptors
+ */
+static fdEntry_t *fdTable;
+static int fdCount;
+
+/*
+ * Null signal handler
+ */
+static void sig_wakeup(int sig) {
+}
+
+/*
+ * Initialization routine (executed when library is loaded)
+ * Allocate fd tables and sets up signal handler.
+ */
+static void __attribute((constructor)) init() {
+ struct rlimit nbr_files;
+ sigset_t sigset;
+ struct sigaction sa;
+
+ /*
+ * Allocate table based on the maximum number of
+ * file descriptors.
+ */
+ getrlimit(RLIMIT_NOFILE, &nbr_files);
+ fdCount = nbr_files.rlim_max;
+ fdTable = (fdEntry_t *)calloc(fdCount, sizeo
 
 
 

close() of active socket does not work on FreeBSD 6

Post by arne » Wed, 13 Dec 2006 09:56:26


Well, it depends on the alternatives.
If a clean kernel implementation is possible - yes please, of course.
If only a complex, error-prone kernel implementation is possible,
I would prefer to have the complexity in the thread library.

That's better than having it in the kernel and (IMHO) better than having N
implementation in various applications, especially since the applications
don't necessarily know enough about the internals of the thread library
and kernel interactions to get it right, much less efficient.

That said, copying the linux_close.c workaround in the Java VM seems to
solve my immediate problem, even if I think it's a bit ugly. But I have
confidence that you can do a better and cleaner solution :-)

- Arne H. J.
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by davidx » Wed, 13 Dec 2006 10:17:24


Thread library only manages POSIX threads, it is nothing to do with how
user will use file. Sorry, I will not mess the thread library.

David Xu
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by deische » Wed, 13 Dec 2006 10:21:38


Hacking libthr or libpthread to do this for you is not
an option. They would then look like libc_r since all
fd's accesses would need to be wrapped. If this needs
to be done, it must be in the kernel.

Common sense leads me to think that a close() should release
threads in IO operations (reads/writes/selects/polls) and
return EBADF or something appropriate. At least when behavior
is not dictated by POSIX or other historical/defactor behavior.

--
DE
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by bde » Wed, 13 Dec 2006 14:56:19


It's probably a nightmare in the kernel too. close() starts looking
like revoke(), and revoke() has large problems and bugs in this area.

At higher levels, revoke() has no support for either waking up or synchronizing
with threads in I/O operations on the revoked file; it only tries to
force a close on revoked files that are open, but due to reference
counting problems it sometimes gets even this wrong.

At lower levels, I think only the tty driver even partly understands
that a device close() can occur while an (other) thread is in another
operation on the device. Of course, most revokes are of ttys so the
tty driver needs to understand this more than most. It uses a generation
count to detect changes of the open instance. It doesn't wake up the
other threads and depends on them checking the generation count. The
check occurs mainly in ttysleep() where it is fundamentally incomplete
on SMP systems -- there is no synchronization, so after a revoke(),
threads running on other CPUs just blunder on like they do in other
drivers. Giant locking of the tty driver reduces the problem.

Bruce
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by phk » Wed, 13 Dec 2006 15:45:25

In message < XXXX@XXXXX.COM >, Bruce Evans writes:



There is the distinctive difference that revoke() operates on a name
and close() on a filedescriptor, but otherwise I agree.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
XXXX@XXXXX.COM | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by deische » Wed, 13 Dec 2006 22:22:33


It's also couldn't be entirely solved by fixing it in the
threads library. You could still have a non-threaded
application that waits on a read operation, but receives
a signal and closes the socket in the signal handler.

--
DE
_______________________________________________
XXXX@XXXXX.COM mailing list
http://www.yqcomputer.com/
To unsubscribe, send any mail to " XXXX@XXXXX.COM "
 
 
 

close() of active socket does not work on FreeBSD 6

Post by kostikbe » Wed, 13 Dec 2006 23:06:58


--5uO961YFyoDlzFnP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable







This is not the problem. The read (as syscall being executed) is aborted
when signal is delivered. Original poster considered situation where
read() is active (in particular, f_count of struct file is incremented
by fget, that caused the reported behaviour).

--5uO961YFyoDlzFnP
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFfrUeC3+MBN1Mb4gRAhCXAKCJxzJsY0KFk3GYwKTqTSC2ZLWybQCgjA8M
Lfnc6O8F144t8wd826jDuX0=
=6wvE
-----END PGP SIGNATURE-----

--5uO961YFyoDlzFnP--