AForge has a nice solution for constraining the length of genomes (some number of bits less than or equal to the data type representation).
In a tree representation, it might make sense to borrow techniques from graph theory or network theory to constrain the search space. For example, you can find a good representation in the DAG (Directed Acyclic Graph) literature, or you might want to investigate similar constraints imposed by compiler tools in the form of AST (Abstract Syntax Trees).
In the latter case, it is obvious that there are infinitely many variations in code that are valid, and infinitely many that are not. It doesn't make any sense to create a population with any large amount of invalid solutions because there will be enough genetic anomolies in the viable candidates to feed a greedy evolutionary system forever (at least with the gene pool available in any non-trivial programming grammar, including any useful DSL that you might be parsing). Besides, the fitness will be completely arbitrary if you can't measure one bad apple from another bad orange.

In short, I think the reasonable thing to do is express your problem as a context-free grammar (if possible). Even the attempt to do so will probably better illuminate the source of the problem you might be facing. Next, try and prototype a solution space that is smaller, but that can be scaled up without changing the topological semantics.
Finally, the number of variables sounds similar to what one might encounter in neural network architectures. Even if that has nothing to do with it, you could try to apply the general rules of such architecture as constraints for producing valid DataFlow networks. The Axons and Synapses can be configured to perform virtually any type of transform you might require. As a "backbone" you can have a root Axon, several constrained layers (perhaps with some form of randomized arity), and an output Axon feeding the result(s) back to your evalution logic.
On Wed, Mar 9, 2011 at 12:38 PM, adil raja <[log in to unmask]> wrote:
This is a very good idea, but it may make the whole algorithm hefty.

Best Regards,
Adil Raja

----- Original Message ----
Sent: Wed, March 9, 2011 6:30:27 PM
Subject: Re: Help on a general GA issue

A while back, i was using a GA to evolve logic puzzles, and it created many
invalid puzzles.  But I didn't exclude invalid puzzles from the search, because
I was afraid that they would include good building blocks for valid puzzles.
So, my algorithm pulled error counts into the fitness function as a way to guide
the invalid puzzles towards valid ones.

I wonder if there is a good way to determine if a search is better or worse off
by doing this.


On Mar 9, 2011, at 11:56 AM, Robert Baruch wrote:

> I, too, am interested in the answer to this question.
> In my own work, I've modified the crossover or mutation algorithm to explicitly
>generate valid individuals. For example, if a given element in a GA individual
>must be within a certain range, I could clip the value to the upper or lower
>limit if the value goes out of range.
> GP has retries when it generates individuals that are too big -- it just tries
>again. If it can't generate a valid individual after a certain number of
>retries, it gives up and copies the parent.
> On the other hand, sometimes you don't know that an individual is valid until
>you evaluate it. For example, perhaps an individual based on code will throw an
>exception. Then you just have to score that individual as very poor.
> --Rob
> On Mar 9, 2011, at 11:33 AM, Paul Fisher wrote:
>> Hello everyone
>> This is a plea for help on a general point regarding the genetic algorithm
>>method (rather than a technical ECJ issue).  I hope my question makes sense to
>>someone who can point me in the right direction.
>> Simply put, if you have a large solution space (c.9000 element matrix) with a
>>set of constraints that make a large proportion of the possible solutions
>>invalid, how do you treat invalid solutions generated by the reproduction
>>process to ensure the population is composed of only (or mostly) valid
>>solutions?  For example, what happens when you take two good solutions from the
>>initial population, cross them over and mutate them according to some standard
>>method, and the result is two solutions which happen to violate the constraints
>>of the solution space and therefore render the new solutions invalid?  How can
>>you take two good solutions and mate them in such a way to produce only valid
>>solutions according to the problem constraints?
>> The only way I have known how to treat invalid solutions so far is to tolerate
>>them in the population but score them out of the selection process.  The problem
>>with this is that 9 times out of 10 the population will be swamped by these duds
>>and never get going.
>> Any suggestions please folks?
>> Thanks
>> Paul