Thank you all for sharing your experiences.

What I have on my table right now is a long simulation that I would like to speed up. E.g. dividing simulation space into 8 sub-spaces.  Multi-threading looks like the most straightforward approach, even if I do not discard multi-processing completely. Regarding the issues raised by Przemyslaw below, would anyone have best practices to share?

Thank you once more,

Luís


Sent from ProtonMail, encrypted email based in Switzerland.


-------- Original Message --------
Subject: Re: MASON on multi-core systems
Local Time: May 11, 2016 3:21 AM
UTC Time: May 11, 2016 1:21 AM
From: [log in to unmask]
To: [log in to unmask]

Dear Luis,

I have two notes:
SGE (at least on cloud) was the most lightweight solution that I could find. Currently it really takes me less than 1 hour to build a complete HPC cluster starting from scratch (default AMIs, and having no pre-configuration). Since you have asked about "experience" - I have recently researched Apache Spark for managing distributing computing and installation of it is a total disaster (again on cloud) - basically it works only if you remove all bugs from installation scripts and there are many of them.

For small jobs (up to 40 parallel processes) my best scenario is the following :
- spin up c4.* (up to 36 vCPUS & 60GB RAM) or m4.* (up to 40 vCPUs & 160GB RAM) instance
- run processes in parallel e.g. loop over a bash command such as
nohup java -server -cp some.jar package.Main $i > logs$i.csv 2>error$i.txt &
of course the number of processes should match number of available cores.
In this scenario each process is responsible for executing subsequent simulation repetitions, which is not perfect but this is the simplest approach and is suitable for many production scenarios.

I do not like using multi-threading for two reasons. Firstly, I feel better having control over separated processes (e.g. I can kill and resume just when I need). Secondly, multi-threading is also more error prone - e.g. if I get model written by someone else and that person has used a static variable in a wrong place. This creates bugs hard to track. The disadvantage of multi-processing is obviously a bigger memory footprint but usually I do not care :-)

Maybe someone else could comment on their HPC experience with MASON?

All best,
Przemyslaw
https://szufel.pl/en/