Good thoughts Eric.
however:
1. I could specify multicore VMs on the cloud service, BUT that would be
wasteful of paid for CPU. For example: suppose I set the Test N Times to
4, and used a 4 core VM ... because of the very substantial variance in run
times of the simulations, there would be unused cores as the evaluations
than ran to the shorter end of the duration curve finished. Economic
efficiency is one of the key design goals for the project.\: only pay for
core that you are actively running.
My plan is to use singe core VMs. Interestingly, the cloud services charge
"per core" and a 4 core VM costs exactly 4 times the cost of a 1 core VM.
To minimize paying for unused cores it sure seems that 1 core VMs makes
the most sense. I've also heard anecdotally that there is sometimes
inadequate CPU<->Memory bandwidth available in the multicore VMs, and that
could be a choke point for my compute and memory_access bound simulator.
Another problem is that it is likely I'd want to gradually increase the
Test N Times number as evolution progresses, making efficient core use even
more difficult.
2. I could try and use a large population instead of doing doing multiple
tests, but I hate to restrict my search space for Evolutionary Algorithms
that way.
I currently have my own home-brew python evolver that is parrallelized to
the extent that it can run multiple evaluations in parallel on a single
multi-core machine, not too hard to extend that to multiple systems, BUT it
would be nice to get all the other stuff that comes with ECJ.
Perhaps another approach would be to extend (ie hack) the single machine
steady-state system to make it asynchronous and write my own distributed
evaluation runner.
I don't need any of the fancy distributed evolution that the ECJ slaves can
do: just a simple running of a shell level command and the capture and
parse of the console output and then return the Win/Lose/Draw data back to
the Master. If I did it myself, rather than using sockets for
communication (always a potential source of trouble!), I'd use one of the
queue services that the Cloud Platform provides.
Anybody have any opinions on how hard it would be to make the ECJ
steady-state mechanism asynchronous?
On Tue, Oct 25, 2016 at 2:29 PM, Eric 'Siggy' Scott <[log in to unmask]> wrote:
> Jim,
>
> Tis an interesting problem you raise. It's true that the "hack" ECJ's
> generational algorithms use for re-evaluation doesn't make any sense in a
> steady-state model.
>
> I don't know how people have handled multiple tests in parallel
> steady-state EAs before. I don't recall seeing any discussion of it in the
> literature.
>
> It seems to me that there are two options, though, that could save you the
> trouble of implementing a distribution multiple-testing scheme:
>
> 1. If your evaluation function doesn't exhaust all of a node's
> resources, run multiple tests in parallel on the same node. This is easy
> to do inside your implementation of the Problem class.
>
> Your heavy-duty simulations probably eat up all your nodes'
> processors, though, so this might not help your application.
>
> 2. Ramp up the population size. In some cases, given the same
> computational resources, using a large population can be just as effective
> at washing out the effects of noise as multiple testing.
>
> You can see if your application falls into this category by using a
> fixed budget of fitness evaluations and seeing if it makes more progress
> with a big population, or with multiple testing. If the latter truly works
> much better, then that's a sign that it could be worth your effort to
> modify ECJ's steady-state master-slave model to support distributed
> multiple testing.
>
> Just my two cents. Sean et all will be more familiar with what it might
> take to implement the feature itself.
>
> Siggy
>
> On Tue, Oct 25, 2016 at 1:34 PM, Jim Rutt <[log in to unmask]> wrote:
>
>> I've been evaluating ECJ for possible use in a large scale cloud
>> computing based evolutionary computation project for the optimization of
>> AIs in highly complex wargames.
>>
>> What makes this a hard problem is that:
>>
>> 1. The evaluations are expensive - a mean of 400 seconds per evaluation
>> on a one core 3.5 ghz processor.
>> 2. The evaluations are noisy - a better AI can still lose to worse AI,
>> and often does
>> 3. The evaluation run times also have a large variance from
>> approximately 80 seconds up to 1000 seconds.
>>
>> As evolutionary approaches, I'm leaning to steady-state EDA type
>> algorithms as a seemingly good fit for the problem domain.
>>
>> All was looking good in the evaluation of ECJ until what seems like a
>> fatal problem in the last sentence of section 6.1.6 Noisy Distributed
>> Problems in the ECJ Owners manual :
>>
>> "There’s no equivalent to this hack in Asynchronous Evolution: you’ll
>> just have to ask a machine to test the individual 5 times."
>>
>> Unfortunately that would seem to significantly reduce the ability to fan
>> out evaluations to reduce elapsed clock time per evaluation which would
>> significantly increase "time travel" - ie where evaluated individuals
>> re-enter a population as candidates for inclusion at a much later time
>> than they were created for evaluation.
>>
>> Is another hack possible to spread out evaluations where one needs to run
>> multiple tests to get a good-enough estimator of an individual? i might
>> even be willing to do the hacking.
>>
>>
>>
>>
>> --
>> Jim Rutt
>> JPR Ventures
>>
>
>
>
> --
>
> Ph.D student in Computer Science, George Mason University
> CFO and Web Director, Journal of Mason Graduate Research
> http://mason.gmu.edu/~escott8/
>
--
===========================
Jim Rutt
JPR Ventures
|