Print

Print


Hi Sean,

On May 15, 2013, at 02:50, Sean Luke <[log in to unmask]> wrote:

> That one took a bit for me to nail down: it turned out to be an error in the serialization mechanism of MasterProblem (on the *master* side).

Ah, ok - yes I thought it was on the master side ... but did not know where to start searching.

> I have committed a new version of MasterProblem.java, which I think will fix things.  Also I have committed a new version of Evaluator.java which cleans up some gunk in creating MasterProblems which I didn't like (came across it while trying to hunt down your error).

It fixed the bug. Thanks (in particular for the fast help)! :)

Greetings,

	Ralf


> On May 14, 2013, at 6:54 PM, Ralf Buschermöhle wrote:
> 
>> Hi Sean,
>> 
>> no problem. Everything works fine - a pleasure to use the framework.
>> 
>> Albeit ... one problem occurs after restarting the master from a checkpoint and reconnecting the slaves. Then each slave evaluates only in single thread mode (despite from evaltreads > 1)
>> 
>> The problem seems to be the "source" of the input of line 473 of "Slave.java"
>> 
>> public static void evaluateSimpleProblemForm( final EvolutionState state, boolean returnIndividuals,
>>       DataInputStream dataIn, DataOutputStream dataOut, String[] args )
>>       {
>>       ParameterDatabase params=null; 
>> 
>>       // first load the individuals
>>       int numInds=1; 
>>       try
>>           {
>> 473:            numInds = dataIn.readInt();
>>           }
>>       catch (IOException e)
>>           {
>>           state.output.fatal("Unable to read the number of individuals from the master:\n"+e);
>>           }
>> 
>> because after checkpointing "numInds" receives always a "1" from the master / in contrast to non-checkpointing.
>> 
>> Any suggestions? 
>> 
>> Greetings,
>> 
>> 	Ralf
>> 
>> 
>> On May 13, 2013, at 19:26, Sean Luke <[log in to unmask]> wrote:
>> 
>>> Ralf, I'm sorry, I had misinterpreted your earlier posting as saying you had it figured it out.
>>> 
>>> You are correct, while Master-Slave allows for dynamic reintroduction of slave units, ECJ's island model does not allow dynamic reintroduction of island units.  There's not any real reason for this, except that (1) the island model code was written first [hindsight is 20/20] and (2) dynamic reintroduction is less useful for island models, since islands are supposed to be long-running stateful processes.  So we always figured the right strategy in the case of an island failure was to kill everything and restart from checkpoint (which they can do).  Perhaps we might add in dynamic reintroduction but it's somewhat complex to do right and a bit low in priority.
>>> 
>>> Sean
>>> 
>>> On May 13, 2013, at 6:32 AM, Ralf Buschermöhle wrote:
>>> 
>>>> Hi,
>>>> 
>>>> it seems that Master-Slave already does the trick :). 
>>>> 
>>>> I ignored MS because the ReadMe says "Slaves run in single-threaded mode ..." in contrast to the manual and I have a large database for each node/slave (in memory).
>>>> 
>>>> Greetings,
>>>> 
>>>> 	Ralf
>>>> 
>>>> 
>>>> On May 9, 2013, at 11:32, Ralf Buschermöhle <[log in to unmask]> wrote:
>>>> 
>>>>> Please ignore async connection ... :)
>>>>> 
>>>>> On May 9, 2013, at 11:28, Ralf Buschermöhle <[log in to unmask]> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> is it possible to rejoin islands after the connection to the server was lost? (async connection)
>>>>>> 
>>>>>> This would allow to introduce computation nodes more flexible (e.g., when not used by other users).
>>>>>> 
>>>>>> Greetings,
>>>>>> 
>>>>>> 	Ralf
>>>>>