Content-Type: |
multipart/alternative; boundary=001a11c3262829006a04fb239030 |
Sender: |
|
Subject: |
|
From: |
|
Date: |
Fri, 6 Jun 2014 00:47:47 -0400 |
In-Reply-To: |
|
MIME-Version: |
1.0 |
Reply-To: |
|
Parts/Attachments: |
|
|
PS: I've confirmed that this also occurs with SimpleEvolutionState, not
just SteadyStateEvolutionState, so it has nothing to do with asynchronous
evolution proper.
On Fri, Jun 6, 2014 at 12:35 AM, Eric 'Siggy' Scott <[log in to unmask]> wrote:
> I'm running asynchronous evolution with the latest SVN revision. This is
> a toy experiment, so all my slaves are running on the same machine as the
> master.
>
> All the slaves are supposed to shut themselves down at the end of the run
> upon a command from the master (in non-daemon mode), or keep purring in the
> background (in daemon mode).
>
> What I see, however, is that in both cases a handful of slaves shut down,
> and handful do not, and then the master hangs.
>
> For instance, the following is a case where I have 10 slaves running in
> daemon mode, and master executing multiple jobs in sequence. Job 0 ran
> fine -- I got 10 "connected successfully" messages at the beginning and 10
> "shut down" messages at the end. But here is the output of job 1:
>
> Threads: breed/1 eval/1
>> Seed: 1869276809
>> Job: 1
>> Setting up
>> WARNING:
>> You've chosen to use Steady-State Evolution, but your statistics does not
>> implement the SteadyStateStatisticsForm.
>> PARAMETER: stat.child.0
>> Initializing Generation 0
>> Slave /127.0.0.1/1402028546585 connected successfully.
>> Slave /127.0.0.1/1402028546798 connected successfully.
>> Slave /127.0.0.1/1402028546699 connected successfully.
>> Slave /127.0.0.1/1402028546658 connected successfully.
>> Slave /127.0.0.1/1402028546690 connected successfully.
>> Slave /127.0.0.1/1402028546773 connected successfully.
>> Slave /127.0.0.1/1402028546763 connected successfully.
>> Slave /127.0.0.1/1402028546753 connected successfully.
>> Slave /127.0.0.1/1402028546794 connected successfully.
>> Slave /127.0.0.1/1402028546755 connected successfully.
>> Generation 1 Evaluations 50
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 2 Evaluations 100
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 3 Evaluations 150
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 4 Evaluations 200
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 5 Evaluations 250
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 6 Evaluations 300
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 7 Evaluations 350
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 8 Evaluations 400
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 9 Evaluations 450
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Generation 10 Evaluations 500
>> Subpop 0 best fitness of generation Fitness: 1.0
>> Subpop 0 best fitness of run: Fitness: 1.0
>> Slave /127.0.0.1/1402028546585 shut down.
>> Slave /127.0.0.1/1402028546798 shut down.
>> Slave /127.0.0.1/1402028546699 shut down.
>> Slave /127.0.0.1/1402028546658 shut down.
>> Slave /127.0.0.1/1402028546690 shut down.
>
>
> After shutting down a fraction of the slaves, it hangs. I have to
> control-C to exit.
>
> The failure appears to be random -- sometimes it occurs at the end of the
> 0th job. It still occurs when the slaves are launched in non-daemon mode.
> Sometimes the failure does not show up until the 4th or 5th job. The
> number of slaves that succeed or fail to shut down appears to be arbitrary.
> In short, we have all the signs of a race condition.
>
> When there is a failure, some of the slaves (but not necessarily the same
> number of slaves that succeeded or failed to shut down) print the message:
>
> FATAL ERROR:
>> Unable to read individual from master.java.net.SocketException: Broken
>> pipe
>
>
> I ran a debugger on a non-daemon slave that failed to shut down -- it
> seemed to be stuck happily waiting to receive a message from the master, as
> if it'd never been told to shut down at all.
>
> Besides that, I don't know what's going on.
>
> Siggy
>
> --
>
> Ph.D student in Computer Science
> George Mason University
> http://mason.gmu.edu/~escott8/
>
--
Ph.D student in Computer Science
George Mason University
http://mason.gmu.edu/~escott8/
|
|
|