Ha ah! Read 5.2.12 Parsimony
Pressure. Thank you. You're constraining the system, possibly
limiting good regression as well This brings up a notion of an
engineering trade-off between a number of variables (often just
two but possibly more) with a weak or strong relation to some
constant. This is the math approach to saying you can't have
everything. A common relationship in many dynamic systems from
biology to physics (my original training). It just occurred to me
that evolution algorithm might be yet another off the original
concept application of Arrow's impossibility theorem. Like duck
tape, it has many uses. Evolution can be thought of as a social
group interaction. Now that I think on it, ECJ, or any EP system,
would have a lot in common with scheduling algorithms, computer or
not, to which Arrow's theorem also applies. Apologies, what
passes for my mind wanders like this.

Tournament selection, likely other selections too, can result in similar regression strings, where similar has some definition. Eureqa has an interesting way of showing this with a Pareto graph. You get to see leaps, if you will, in progress towards the fitness goal. Knowing where those leaps are, you can go back and look in a log of expressions, each generation or whatever, around the leap and see what happened.

Does ECJ have a general way of showing this kind of thing? Maybe I haven't found it yet?

Tournament selection, likely other selections too, can result in similar regression strings, where similar has some definition. Eureqa has an interesting way of showing this with a Pareto graph. You get to see leaps, if you will, in progress towards the fitness goal. Knowing where those leaps are, you can go back and look in a log of expressions, each generation or whatever, around the leap and see what happened.

Does ECJ have a general way of showing this kind of thing? Maybe I haven't found it yet?

[log in to unmask]">Chris,

You're describing the problem of "overfitting" to the data.

I'm not a GP guy, but I know that it's normal to use a training, test, and validation set when doing symbolic regression (just like you would with other regression/classification methods).

If you find that you are over-fitting, you might want to add some kind of parsimony pressure to your GP method (effectively limiting the size of your polynomial or expression). p. 185 of the manual talks about parsimony pressure, FWIW.

Siggy

--

Chris Johnson | [log in to unmask] |

Ex SysAdmin, now, writer | If sex is a pain in the ass,then you’re doing it wrong… (Rodney Dangerfield) |