I will analyse some memory dumps to identify possible causes. Maybe it's only related to threads ... On Aug 26, 2013, at 19:02, Ralf Buschermöhle <[log in to unmask]> wrote: > Hi, > > sorry, it seems I missed the exception in the log, after around ~ 2000 connects & disconnects I receive > > Exception in thread "SlaveMonitor:: " java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:657) > at ec.eval.SlaveConnection.buildThreads(SlaveConnection.java:161) > at ec.eval.SlaveConnection.<init>(SlaveConnection.java:82) > at ec.eval.SlaveMonitor.registerSlave(SlaveMonitor.java:220) > at ec.eval.SlaveMonitor$1.run(SlaveMonitor.java:192) > at java.lang.Thread.run(Thread.java:679) > > In the test case I started and shut down the slaves immediately (after 2s) - but I received the same exception in the log of the cluster nodes and there are usually hours between starting and stopping the slaves. > > Greetings, > > Ralf > > > P.S. Just for completion > > The others (regarding the socket(s)) are: > > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:185) > at java.net.SocketInputStream.read(SocketInputStream.java:199) > at java.io.DataInputStream.readByte(DataInputStream.java:265) > at ec.eval.SlaveConnection.readLoop(SlaveConnection.java:260) > at ec.eval.SlaveConnection$1.run(SlaveConnection.java:150) > > at "dataOut.writeByte(Slave.V_SHUTDOWN)" > > java.net.SocketException: Broken pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) > at java.net.SocketOutputStream.write(SocketOutputStream.java:132) > at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) > at ec.eval.SlaveConnection.shutdown(SlaveConnection.java:99) > at ec.eval.SlaveConnection.readLoop(SlaveConnection.java:327) > at ec.eval.SlaveConnection$1.run(SlaveConnection.java:150) > > On Aug 26, 2013, at 17:20, Sean Luke <[log in to unmask]> wrote: > >> On Aug 26, 2013, at 10:27 AM, Ralf Buschermöhle wrote: >> >>> These are just the running nodes. Previously there have been a few thousand connections from a cluster (handled successfully). >> >> Some more. Try executing the following command on your BSD box to see how many sockets and files (combined) you can have open at one time: >> >> sysctl kern.maxfilesperproc >> >> On my Mac (a BSD box) I get around 10K. >> >> I wonder if ECJ isn't properly closing the sockets, and so you're hitting a socket limit by repeatedly adding and removing clients. It looks correct to me though. >> >> Sean >