I am sorry, I did not understand your problem well enough in the
beginning....
If you start a single-generation run, give the slaves enough time to
connect to the server. You may have to start the single-generation
run several times, if it is too short.
But I was also serious about the script, I have a script for our
cluster to allow me to kill all of them. Assuming I do not have any
other java programs running on the cluster (and given that I cannot
kill other people's processes), I type:
>>>> > forall killall java
where forall is a script that looks like this:
>>>> trap_int()
>>>> {
>>>> thatsallfolks=1
>>>> }
>>>>
>>>> trap trap_int INT
>>>> for a in `seq 1 30`
>>>> do
>>>> echo Node: $a
>>>> ssh node$a "$@"
>>>> if [ -n "$thatsallfolks" ]
>>>> then
>>>> exit;
>>>> fi
>>>> done
Our cluster has nodes node1 to node30, which may be different in your
case.
Best regards,
Liviu.
On Apr 21, 2006, at 3:53 PM, I Jonyer wrote:
> Yes, that is what I've been doing. But NOW they are running without
> that code. I think I'll just make a run with a single generation
> and hope they will exit.
>
> Istvan
>
>
> From: Liviu Panait <[log in to unmask]>
> Reply-To: ECJ Evolutionary Computation Toolkit <ECJ-INTEREST-
> [log in to unmask]>
> To: [log in to unmask]
> Subject: Re: Master/Slave
> Date: Fri, 21 Apr 2006 15:36:59 -0400
> >If you kill the master explicitly, then it does not get to send the
> >V_SHUTDOWN message. What you can do instead is modify the Slave
> >class such that it exits once an IOException is generated because of
> > a failed socket. Look into the ec.eval.Slave class and insert
> >such a condition when an exception occurs.
> >
> >Hope it helps.
> >
> >Best regards,
> >
> >Liviu.
> >
> >On Apr 21, 2006, at 2:39 PM, I Jonyer wrote:
> >
> >>I use ecj 14.
> >>
> >>I usually kill the master with Ctrl-C, and make the clients exit
> >>when the connection is lost.
> >>
> >>Istvan
> >>
> >>
> >>From: Liviu Panait <[log in to unmask]>
> >>Reply-To: ECJ Evolutionary Computation Toolkit <ECJ-INTEREST-
> >>[log in to unmask]>
> >>To: [log in to unmask]
> >>Subject: Re: Master/Slave
> >>Date: Fri, 21 Apr 2006 12:24:47 -0400
> >> >Dear Istvan,
> >> >
> >> >>Is there any way to kill all the slave processes after the
> >>master
> >> >>goes down? I modified the previous version so that the slaves
> >>would
> >> >> exit, but after upgrading I forgot to do this, and now I have
> >> >>slaves running on my entire cluster and they would not exit. Any
> >> >>way that I would not have to log into all nodes and kill them
> >>all
> >> >>one-by-one?
> >> >Of course there is a way, you can always write a script.... ;-)
> >> >
> >> >On a more serious note, which version are you using? I tried
> >> >version 15, and the slaves seem to exit when the master shuts
> >>down.
> >> > The way this is implemented is that all slaves are sent a
> >> >V_SHUTDOWN message when the master is about to exit. When the
> >> >slaves receive this message, they close their sockets, and then
> >> >they exit (this is implemented via a return call from the main
> >> >function). Is it possible that the return call is commented out
> >>in
> >> >your version for some reason?
> >> >
> >> >Best regards,
> >> >
> >> >Liviu.
> >> >
> >> >>From: Sean Luke <[log in to unmask]>
> >> >>Reply-To: ECJ Evolutionary Computation Toolkit <ECJ-INTEREST-
> >> >>[log in to unmask]>
> >> >>To: [log in to unmask]
> >> >>Subject: ECJ 14/15 and MASON 11 released
> >> >>Date: Tue, 4 Apr 2006 01:10:38 -0400
> >> >> >The George Mason University Evolutionary Computation
> >>Laboratory
> >> >>and
> >> >> >Center for Social Complexity announce a new release of the ECJ
> >> >> >evolutionary computation library and MASON multiagent
> >>simulation
> >> >> >toolkit. Both systems have seen major improvements and
> >>revisions
> >> >> >since the last release approximately eight months ago. The
> >>two
> >> >> >systems are also being re-licensed under the Academic Free
> >> >>License
> >> >> >version 3.0.
> >> >> >
> >> >> >ECJ is being released in two versions: a backward-compatable
> >> >>version
> >> >> > (14) and a non-backward-compatible version (15) with
> >> >>significant
> >> >> >framework revisions. The dual release will (hopefully) give
> >> >>people
> >> >> >some extra time to convert to the new version. ECJ 14/15 also
> >> >>has
> >> >> >numerous bug-fixes, speed improvements, and a new package
> >> >>(spatial
> >> >> >embedding).
> >> >> >
> >> >> >ECJ can be found here:
> >> >> > http://cs.gmu.edu/~eclab/projects/ecj/
> >> >> >
> >> >> >ECJ CVS access is also available at SourceForge, but
> >> >>sourceforge.net
> >> >> > has experienced a major hardware failure this past week and
> >>CVS
> >> >> >access is not expected for several days at the earliest.
> >> >> >
> >> >> >
> >> >> >MASON 11 is a major revision of our multiagent simulator. It
> >> >>sports
> >> >> > a new charting and tracking facility, several new problem
> >> >>domains,
> >> >> > and a very large number of bug fixes and improvements.
> >> >> >
> >> >> >MASON can be found here:
> >> >> > http://cs.gmu.edu/~eclab/projects/mason/
> >> >> >
> >> >> >Sean Luke
|