Good thoughts Eric.
however:
1. I could specify multicore VMs on the cloud service, BUT that would be wasteful of paid for CPU. For example: suppose I set the Test N Times to 4, and used a 4 core VM ... because of the very substantial variance in run times of the simulations, there would be unused cores as the evaluations than ran to the shorter end of the duration curve finished. Economic efficiency is one of the key design goals for the project.\: only pay for core that you are actively running.
My plan is to use singe core VMs. Interestingly, the cloud services charge "per core" and a 4 core VM costs exactly 4 times the cost of a 1 core VM. To minimize paying for unused cores it sure seems that 1 core VMs makes the most sense. I've also heard anecdotally that there is sometimes inadequate CPU<->Memory bandwidth available in the multicore VMs, and that could be a choke point for my compute and memory_access bound simulator.
Another problem is that it is likely I'd want to gradually increase the Test N Times number as evolution progresses, making efficient core use even more difficult.
2. I could try and use a large population instead of doing doing multiple tests, but I hate to restrict my search space for Evolutionary Algorithms that way.
I currently have my own home-brew python evolver that is parrallelized to the extent that it can run multiple evaluations in parallel on a single multi-core machine, not too hard to extend that to multiple systems, BUT it would be nice to get all the other stuff that comes with ECJ.
Perhaps another approach would be to extend (ie hack) the single machine steady-state system to make it asynchronous and write my own distributed evaluation runner.
I don't need any of the fancy distributed evolution that the ECJ slaves can do: just a simple running of a shell level command and the capture and parse of the console output and then return the Win/Lose/Draw data back to the Master. If I did it myself, rather than using sockets for communication (always a potential source of trouble!), I'd use one of the queue services that the Cloud Platform provides.
Anybody have any opinions on how hard it would be to make the ECJ steady-state mechanism asynchronous?