I have two notes:
SGE (at least on cloud) was the most lightweight solution that I could find. Currently it really takes me less than 1 hour to build a complete HPC cluster starting from scratch (default AMIs, and having no pre-configuration). Since you have asked about "experience" - I have recently researched Apache Spark for managing distributing computing and installation of it is a total disaster (again on cloud) - basically it works only if you remove all bugs from installation scripts and there are many of them.
For small jobs (up to 40 parallel processes) my best scenario is the following :
- spin up c4.* (up to 36 vCPUS & 60GB RAM) or m4.* (up to 40 vCPUs & 160GB RAM) instance
- run processes in parallel e.g. loop over a bash command such as
nohup java -server -cp some.jar package.Main $i > logs$i.csv 2>error$i.txt &
of course the number of processes should match number of available cores.
In this scenario each process is responsible for executing subsequent simulation repetitions, which is not perfect but this is the simplest approach and is suitable for many production scenarios.
I do not like using multi-threading for two reasons. Firstly, I feel better having control over separated processes (e.g. I can kill and resume just when I need). Secondly, multi-threading is also more error prone - e.g. if I get model written by someone else and that person has used a static variable in a wrong place. This creates bugs hard to track. The disadvantage of multi-processing is obviously a bigger memory footprint but usually I do not care :-)
Maybe someone else could comment on their HPC experience with MASON?