Where to start looking:

When a  slave makes an incoming connection, the SlaveMonitor creates a SlaveConnection object to handle it.  This object has two threads: a read thread and a write thread.  When this object is destroyed because the connection has broken, (the shutdown() method), these threads are interrupted, which causes them to drop out of their loop.  But they're not joined or set to null -- it's assumed that the threads will die on their own and that the SlaveConnection object will get GC'd, so null-setting is unnecessary.

Perhaps one of these assumptions is wrong, so we're piling up threads.  Try changing the SlaveConnection.shutdown() synchronization section to look like this:

            // notify my threads now that I've closed stuff in case they're still waiting

// stuff added by Sean
            reader = writer = null;  // let GC

Notice I added in joining and null-setting.  This will probably have no effect, but if things magically start working it tells us that a thread isn't ever finishing or isn't being set to null.  From there you might try to figure out why.


On Aug 26, 2013, at 3:17 PM, Ralf Buschermöhle wrote:

> I will analyse some memory dumps to identify possible causes. Maybe it's only related to threads ...
> On Aug 26, 2013, at 19:02, Ralf Buschermöhle <[log in to unmask]> wrote:
>> Hi,
>> sorry, it seems I missed the exception in the log, after around ~ 2000 connects & disconnects I receive
>> Exception in thread "SlaveMonitor::    " java.lang.OutOfMemoryError: unable to create new native thread
>> 	at java.lang.Thread.start0(Native Method)
>> 	at java.lang.Thread.start(
>> 	at ec.eval.SlaveConnection.buildThreads(
>> 	at ec.eval.SlaveConnection.<init>(
>> 	at ec.eval.SlaveMonitor.registerSlave(
>> 	at ec.eval.SlaveMonitor$
>> 	at