Ha ah!  Read 5.2.12 Parsimony Pressure.  Thank you.  You're constraining the system, possibly limiting good regression as well  This brings up a notion of an engineering trade-off between a number of variables (often just two but possibly more) with a weak or strong relation to some constant.  This is the math approach to saying you can't have everything.  A common relationship in many dynamic systems from biology to physics (my original training).  It just occurred to me that evolution algorithm might be yet another off the original concept application of Arrow's impossibility theorem.  Like duck tape, it has many uses.  Evolution can be thought of as a social group interaction.  Now that I think on it, ECJ, or any EP system, would have a lot in common with scheduling algorithms, computer or not, to which Arrow's theorem also applies.  Apologies, what passes for my mind wanders like this.

Tournament selection, likely other selections too, can result in similar regression strings, where similar has some definition.  Eureqa has an interesting way of showing this with a Pareto graph.  You get to see leaps, if you will, in progress towards the fitness goal.  Knowing where those leaps are, you can go back and look in a log of expressions, each generation or whatever, around the leap and see what happened.

Does ECJ have a general way of showing this kind of thing?  Maybe I haven't found it yet?

Chris,

You're describing the problem of "overfitting" to the data.

I'm not a GP guy, but I know that it's normal to use a training, test, and validation set when doing symbolic regression (just like you would with other regression/classification methods).

If you find that you are over-fitting, you might want to add some kind of parsimony pressure to your GP method (effectively limiting the size of your polynomial or expression).  p. 185 of the manual talks about parsimony pressure, FWIW.

Siggy

--

 Chris Johnson [log in to unmask] Ex SysAdmin, now, writer If sex is a pain in the ass, then you’re doing it wrong… (Rodney Dangerfield)