ECJ-INTEREST-L Archives

June 2014

ECJ-INTEREST-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show HTML Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Content-Type:
multipart/alternative; boundary=001a11c3262829006a04fb239030
Sender:
ECJ Evolutionary Computation Toolkit <[log in to unmask]>
Subject:
From:
Eric 'Siggy' Scott <[log in to unmask]>
Date:
Fri, 6 Jun 2014 00:47:47 -0400
In-Reply-To:
MIME-Version:
1.0
Reply-To:
ECJ Evolutionary Computation Toolkit <[log in to unmask]>
Parts/Attachments:
text/plain (4029 bytes) , text/html (6 kB)
PS: I've confirmed that this also occurs with SimpleEvolutionState, not
just SteadyStateEvolutionState, so it has nothing to do with asynchronous
evolution proper.


On Fri, Jun 6, 2014 at 12:35 AM, Eric 'Siggy' Scott <[log in to unmask]> wrote:

> I'm running asynchronous evolution with the latest SVN revision.  This is
> a toy experiment, so all my slaves are running on the same machine as the
> master.
>
> All the slaves are supposed to shut themselves down at the end of the run
> upon a command from the master (in non-daemon mode), or keep purring in the
> background (in daemon mode).
>
> What I see, however, is that in both cases a handful of slaves shut down,
> and handful do not, and then the master hangs.
>
> For instance, the following is a case where I have 10 slaves running in
> daemon mode, and master executing multiple jobs in sequence.  Job 0 ran
> fine -- I got 10 "connected successfully" messages at the beginning and 10
> "shut down" messages at the end.  But here is the output of job 1:
>
> Threads:  breed/1 eval/1
>> Seed: 1869276809
>> Job: 1
>> Setting up
>> WARNING:
>> You've chosen to use Steady-State Evolution, but your statistics does not
>> implement the SteadyStateStatisticsForm.
>> PARAMETER: stat.child.0
>> Initializing Generation 0
>> Slave /127.0.0.1/1402028546585 connected successfully.
>> Slave /127.0.0.1/1402028546798 connected successfully.
>> Slave /127.0.0.1/1402028546699 connected successfully.
>> Slave /127.0.0.1/1402028546658 connected successfully.
>> Slave /127.0.0.1/1402028546690 connected successfully.
>> Slave /127.0.0.1/1402028546773 connected successfully.
>> Slave /127.0.0.1/1402028546763 connected successfully.
>> Slave /127.0.0.1/1402028546753 connected successfully.
>> Slave /127.0.0.1/1402028546794 connected successfully.
>> Slave /127.0.0.1/1402028546755 connected successfully.
>> Generation 1 Evaluations 50
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 2 Evaluations 100
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 3 Evaluations 150
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 4 Evaluations 200
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 5 Evaluations 250
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 6 Evaluations 300
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 7 Evaluations 350
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 8 Evaluations 400
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 9 Evaluations 450
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 10 Evaluations 500
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Subpop 0 best fitness of run: Fitness: 1.0
>> Slave /127.0.0.1/1402028546585 shut down.
>> Slave /127.0.0.1/1402028546798 shut down.
>> Slave /127.0.0.1/1402028546699 shut down.
>> Slave /127.0.0.1/1402028546658 shut down.
>> Slave /127.0.0.1/1402028546690 shut down.
>
>
> After shutting down a fraction of the slaves, it hangs.  I have to
> control-C to exit.
>
> The failure appears to be random -- sometimes it occurs at the end of the
> 0th job.  It still occurs when the slaves are launched in non-daemon mode.
>  Sometimes the failure does not show up until the 4th or 5th job.  The
> number of slaves that succeed or fail to shut down appears to be arbitrary.
>  In short, we have all the signs of a race condition.
>
> When there is a failure, some of the slaves (but not necessarily the same
> number of slaves that succeeded or failed to shut down) print the message:
>
> FATAL ERROR:
>> Unable to read individual from master.java.net.SocketException: Broken
>> pipe
>
>
> I ran a debugger on a non-daemon slave that failed to shut down -- it
> seemed to be stuck happily waiting to receive a message from the master, as
> if it'd never been told to shut down at all.
>
> Besides that, I don't know what's going on.
>
> Siggy
>
> --
>
> Ph.D student in Computer Science
> George Mason University
> http://mason.gmu.edu/~escott8/
>



-- 

Ph.D student in Computer Science
George Mason University
http://mason.gmu.edu/~escott8/


ATOM RSS1 RSS2