yes, we could do that and it is a bit simpler than my solution.
As a final attempt I now created my own version of ParallelSequence
which provides each thread with its own MersenneTwisterFast. That makes
synchronization unnecessary and boosts speed. Now I'm 50% faster with 4
threads than with a single thread.
Much better than before, but in my view not worth the effort :-(