Chris,

You're describing the problem of "overfitting" to the data.

I'm not a GP guy, but I know that it's normal to use a training, test, and validation set when doing symbolic regression (just like you would with other regression/classification methods).

If you find that you are over-fitting, you might want to add some kind of parsimony pressure to your GP method (effectively limiting the size of your polynomial or expression).  p. 185 of the manual talks about parsimony pressure, FWIW.

Siggy

On Sat, May 6, 2017 at 3:34 PM, Chris Johnson <[log in to unmask]> wrote:
Ok.  Went through all four ECJ tutorials.  Played with them a little.  Learned a bit more about Java.  Still don't like it.

I do have a question.  If this question belong elsewhere, please point me there and there I will ask it.

I had to wait until tutorial4 because I'm getting into symbolic regression modeling.  You have a bunch of data, these days quite often peta-data or more.  You want a mathematical model that has the attributes of describing the data and hopefully making, successful, predictions. 

Here's my issue.  If I remember correctly, it is possible to come up with a polynomial of degree n-1, where n is the number of data points, that precisely passes through every data point in your data set.  However, the odds of such a polynomial having any descriptive truths about the data, let alone predictive capabilities, are pretty small as a rule.

What you want is probably something more in the way of a spline function, at the least, with the wonderful piece wise continuous differential hoo-ha yada yada they taught back in the Precambrian era when I studied math. 

I googled Koza fitness tests.  I've seen similar for symbolic regression.  Many look a lot like a statistical variance.  Maybe I'm missing something here, probably am.  Looks to me like my aforementioned n-1 degree polynomial would fit like the proverbial glove with a 0 fitness measure.  What's to prevent such a symbolic regression system, ECJ or other, from simply coming up with a useless polynomial?

Thanks.

--

Chris Johnson [log in to unmask]
Ex SysAdmin, now, writer  A bargain is something you don’t need
at a price you can’t resist.
(Franklin Jones)



--

Ph.D student in Computer Science, George Mason University
CFO and Web Director, Journal of Mason Graduate Research
http://mason.gmu.edu/~escott8/