Regarding the issues raised by Przemyslaw below, would anyone have best practices to share?

A piece of standard advice: Use immutable data structures whenever possible (i.e. whenever it isn't hideously computationally expensive to do so)—they are automatically thread safe.

In MASON the big thing is to make sure that your agents (Steppables) aren't stepping on each other's toes.  Besides that, though, it just brings the same considerations as any other multithreaded application—so much of what you read in standard Java references on the topic will be useful to you.

Siggy

On Wed, May 11, 2016 at 3:26 AM, Luí­s de Sousa <[log in to unmask]> wrote:
Thank you all for sharing your experiences.

What I have on my table right now is a long simulation that I would like to speed up. E.g. dividing simulation space into 8 sub-spaces.  Multi-threading looks like the most straightforward approach, even if I do not discard multi-processing completely. Regarding the issues raised by Przemyslaw below, would anyone have best practices to share?

Thank you once more,

Luís


Sent from ProtonMail, encrypted email based in Switzerland.


-------- Original Message --------
Subject: Re: MASON on multi-core systems
Local Time: May 11, 2016 3:21 AM
UTC Time: May 11, 2016 1:21 AM

Dear Luis,

I have two notes:
SGE (at least on cloud) was the most lightweight solution that I could find. Currently it really takes me less than 1 hour to build a complete HPC cluster starting from scratch (default AMIs, and having no pre-configuration). Since you have asked about "experience" - I have recently researched Apache Spark for managing distributing computing and installation of it is a total disaster (again on cloud) - basically it works only if you remove all bugs from installation scripts and there are many of them.

For small jobs (up to 40 parallel processes) my best scenario is the following :
- spin up c4.* (up to 36 vCPUS & 60GB RAM) or m4.* (up to 40 vCPUs & 160GB RAM) instance
- run processes in parallel e.g. loop over a bash command such as
nohup java -server -cp some.jar package.Main $i > logs$i.csv 2>error$i.txt &
of course the number of processes should match number of available cores.
In this scenario each process is responsible for executing subsequent simulation repetitions, which is not perfect but this is the simplest approach and is suitable for many production scenarios.

I do not like using multi-threading for two reasons. Firstly, I feel better having control over separated processes (e.g. I can kill and resume just when I need). Secondly, multi-threading is also more error prone - e.g. if I get model written by someone else and that person has used a static variable in a wrong place. This creates bugs hard to track. The disadvantage of multi-processing is obviously a bigger memory footprint but usually I do not care :-)

Maybe someone else could comment on their HPC experience with MASON?

All best,
Przemyslaw




--

Ph.D student in Computer Science, George Mason University
CFO and Web Director, Journal of Mason Graduate Research
http://mason.gmu.edu/~escott8/