Yes, it seems to be due to item (2) : memory access with a poor memory
controller. There are big timing variations between the same
multi-threaded runs on my machine : evaluation can range from 19s. to
33s. , while it keeps quite constant around 30s. when using only one
thread. Thus, running two threads on a Core2 CPU 6600 @ 2.40GHz can
even be slower than using one thread...
Denis
Sean Luke a écrit :
> There are no dependencies between breedthreads and evalthreads. It's
> actually quite simple if you're doing plain generational evolution,
> it's roughly:
>
> Eval:
> For each thread
> Create a Problem for that thread
> Fork a thread to do PopSize/NumThreads evals
> Breed:
> For each thread
> Create a BreedingPipeline for that thread
> Fork a thread to do Popsize/NumThreads new individuals
>
> There's no locking in ECJ's basic eval or breed facilities (which is
> why we need multiple RNGs). Most performance failures in the
> threading are due to (1) GC and (2) memory access with a poor memory
> controller, if not (3) a synchronization you put in there but forgot
> about :-). My guess is #2 -- it bites us in MASON too. Basically
> although the cores can go full-blast, if you're doing lots of fetches
> from cold memory (as ECJ is doing constantly -- it does scans across
> populations), there's only *one* memory and cache controller on the
> machine and that becomes the bottleneck.
>
> That being said, ECJ will get about 40% improvement on a two-core
> Intel chip. For example, when I run ecsuite with 1000 individuals,
> here are some rough wall-clock times I get on my Macbook Pro:
>
> 1 breed 1 eval 24 secs
> 2 breed 1 eval 21 secs
> 1 breed 2 eval 19 secs
> 2 breed 2 eval 17 secs
>
> Note that eval gives you a bigger boost than breed in this example.
>
> You might try fooling with the GC parameters (-Xmx and -Xms for
> setting, -verbose:gc for testing), though it probably won't be a big
> deal for you.
>
> Sean
>
> On Jun 2, 2008, at 10:16 AM, Denis Robilliard wrote:
>
>> Hi,
>>
>> I just performed some experiences with the "breedthreads" and
>> "evalthreads" parameters on the tutorial regression problem. On a
>> dual core machine, I observed that the performance increases
>> (computing time roughly divided by 2) when breedthreads = 2 &
>> evalthreads=1, but there is no gain when breedthreads = 1 &
>> evalthreads=2 . However the stat file shows that most of the running
>> time is spent for evaluation (as expected). Is there some
>> dependencies between breedthreads and evalthreads values ?
>>
>> --
>> Denis Robilliard
>> L.I.L.
>> Université du Littoral
>> 50 rue F. Buisson
>> 62100 Calais
>> France
>
--
Denis Robilliard
L.I.L.
Université du Littoral
50 rue F. Buisson
62100 Calais
|