PS: I've confirmed that this also occurs with SimpleEvolutionState, not just SteadyStateEvolutionState, so it has nothing to do with asynchronous evolution proper. On Fri, Jun 6, 2014 at 12:35 AM, Eric 'Siggy' Scott <[log in to unmask]> wrote: > I'm running asynchronous evolution with the latest SVN revision. This is > a toy experiment, so all my slaves are running on the same machine as the > master. > > All the slaves are supposed to shut themselves down at the end of the run > upon a command from the master (in non-daemon mode), or keep purring in the > background (in daemon mode). > > What I see, however, is that in both cases a handful of slaves shut down, > and handful do not, and then the master hangs. > > For instance, the following is a case where I have 10 slaves running in > daemon mode, and master executing multiple jobs in sequence. Job 0 ran > fine -- I got 10 "connected successfully" messages at the beginning and 10 > "shut down" messages at the end. But here is the output of job 1: > > Threads: breed/1 eval/1 >> Seed: 1869276809 >> Job: 1 >> Setting up >> WARNING: >> You've chosen to use Steady-State Evolution, but your statistics does not >> implement the SteadyStateStatisticsForm. >> PARAMETER: stat.child.0 >> Initializing Generation 0 >> Slave /127.0.0.1/1402028546585 connected successfully. >> Slave /127.0.0.1/1402028546798 connected successfully. >> Slave /127.0.0.1/1402028546699 connected successfully. >> Slave /127.0.0.1/1402028546658 connected successfully. >> Slave /127.0.0.1/1402028546690 connected successfully. >> Slave /127.0.0.1/1402028546773 connected successfully. >> Slave /127.0.0.1/1402028546763 connected successfully. >> Slave /127.0.0.1/1402028546753 connected successfully. >> Slave /127.0.0.1/1402028546794 connected successfully. >> Slave /127.0.0.1/1402028546755 connected successfully. >> Generation 1 Evaluations 50 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 2 Evaluations 100 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 3 Evaluations 150 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 4 Evaluations 200 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 5 Evaluations 250 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 6 Evaluations 300 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 7 Evaluations 350 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 8 Evaluations 400 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 9 Evaluations 450 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Generation 10 Evaluations 500 >> Subpop 0 best fitness of generation Fitness: 1.0 >> Subpop 0 best fitness of run: Fitness: 1.0 >> Slave /127.0.0.1/1402028546585 shut down. >> Slave /127.0.0.1/1402028546798 shut down. >> Slave /127.0.0.1/1402028546699 shut down. >> Slave /127.0.0.1/1402028546658 shut down. >> Slave /127.0.0.1/1402028546690 shut down. > > > After shutting down a fraction of the slaves, it hangs. I have to > control-C to exit. > > The failure appears to be random -- sometimes it occurs at the end of the > 0th job. It still occurs when the slaves are launched in non-daemon mode. > Sometimes the failure does not show up until the 4th or 5th job. The > number of slaves that succeed or fail to shut down appears to be arbitrary. > In short, we have all the signs of a race condition. > > When there is a failure, some of the slaves (but not necessarily the same > number of slaves that succeeded or failed to shut down) print the message: > > FATAL ERROR: >> Unable to read individual from master.java.net.SocketException: Broken >> pipe > > > I ran a debugger on a non-daemon slave that failed to shut down -- it > seemed to be stuck happily waiting to receive a message from the master, as > if it'd never been told to shut down at all. > > Besides that, I don't know what's going on. > > Siggy > > -- > > Ph.D student in Computer Science > George Mason University > http://mason.gmu.edu/~escott8/ > -- Ph.D student in Computer Science George Mason University http://mason.gmu.edu/~escott8/