Chris, You're describing the problem of "overfitting" to the data. I'm not a GP guy, but I know that it's normal to use a training, test, and validation set when doing symbolic regression (just like you would with other regression/classification methods). If you find that you are over-fitting, you might want to add some kind of parsimony pressure to your GP method (effectively limiting the size of your polynomial or expression). p. 185 of the manual <https://cs.gmu.edu/~eclab/projects/ecj/docs/manual/manual.pdf> talks about parsimony pressure, FWIW. Siggy On Sat, May 6, 2017 at 3:34 PM, Chris Johnson <[log in to unmask] > wrote: > Ok. Went through all four ECJ tutorials. Played with them a little. > Learned a bit more about Java. Still don't like it. > > I do have a question. If this question belong elsewhere, please point me > there and there I will ask it. > > I had to wait until tutorial4 because I'm getting into symbolic regression > modeling. You have a bunch of data, these days quite often peta-data or > more. You want a mathematical model that has the attributes of describing > the data and hopefully making, successful, predictions. > > Here's my issue. If I remember correctly, it is possible to come up with > a polynomial of degree n-1, where n is the number of data points, that > precisely passes through every data point in your data set. However, the > odds of such a polynomial having any descriptive truths about the data, let > alone predictive capabilities, are pretty small as a rule. > > What you want is probably something more in the way of a spline function, > at the least, with the wonderful piece wise continuous differential hoo-ha > yada yada they taught back in the Precambrian era when I studied math. > > I googled Koza fitness tests. I've seen similar for symbolic regression. > Many look a lot like a statistical variance. Maybe I'm missing something > here, probably am. Looks to me like my aforementioned n-1 degree > polynomial would fit like the proverbial glove with a 0 fitness measure. > What's to prevent such a symbolic regression system, ECJ or other, from > simply coming up with a useless polynomial? > > Thanks. > > -- > > Chris Johnson [log in to unmask] > Ex SysAdmin, now, writer > > *A bargain is something you don’t need at a price you can’t resist. *(Franklin > Jones) > -- Ph.D student in Computer Science, George Mason University CFO and Web Director, Journal of Mason Graduate Research http://mason.gmu.edu/~escott8/