I think a better approach would be to use the thread pool rather than keeping the threads static like this.  But it'll take a while to whip it up.

Sean

On Sep 6, 2013, at 9:43 AM, Ralf Buschermöhle wrote:

> Hi,
> 
> A little ... more elegant & less performance consuming solution was to change writeLoop() and return 'false' when waitOnMonitor returns one (receiving an interrupt):
> 
> boolean writeLoop()
>        {
>        Job job = null;
> 
>        try
>            {
>            synchronized(jobs)
>                {
>                // check for an unsent job
>                if ((job = oldestUnsentJob()) == null)  // automatically marks as sent
>                    {
>                    // failed -- wait and drop out of the loop and come in again
>                    debug("" + Thread.currentThread().getName() + "Waiting for a job to send" );
>                    if (!slaveMonitor.waitOnMonitor(jobs))							// changed
>                    	return false;										// new
>                    }
>    ...
> 
> Avoiding a busy waiting ... writer.isInterrupted ... of the previous version.
> 
> 	Ralf
> 
> On Sep 6, 2013, at 14:41, Ralf Buschermöhle <[log in to unmask]> wrote:
> 
>> Hi,
>> 
>> I added "!writer.isInterrupted()" to stop the (write)loop and now the threads are stopped.
>> 
>> In context "SlaveConnection.java"
>> 
>> writer = new Thread()
>>           {
>>           public void run()
>>               {
>>               while( !writer.isInterrupted() )
>>               	writeLoop();
>>               }
>>           };
>> 
>> Greetings,
>> 
>> 	Ralf
>> 
>> On Sep 6, 2013, at 11:59, Ralf Buschermöhle <[log in to unmask]> wrote:
>> 
>>> Hi,
>>> 
>>> unfortunately adding memory does not solve the problem. 
>>> 
>>> It does not seem to be heap related ... I jstacked after 56 slaves, after 500 adding and removing (also waited for the disconnect messages) slaves and after the next 500 I received the Exception.
>>> 
>>> And the number of threads did not reduce after waiting.
>>> 
>>> I attached the thread dumps.
>>> 
>>> It's the slaveMonitor ... who does not receive an interrupt?
>>> 
>>> public boolean waitOnMonitor(Object monitor)
>>>      {
>>>      try
>>>          {
>>>          monitor.wait();
>>>          }
>>>      catch (InterruptedException e) { return false; }
>>>      return true;
>>>      }
>>> 
>>> Greetings,
>>> 
>>> 	Ralf
>>> 
>>> 
>>> <56.stack><56+-500.stack><Exception.stack>
>>> 
>>> 
>> 
>