Chris,
You're describing the problem of "overfitting" to the data.
I'm not a GP guy, but I know that it's normal to use a training, test, and
validation set when doing symbolic regression (just like you would with
other regression/classification methods).
If you find that you are over-fitting, you might want to add some kind of
parsimony pressure to your GP method (effectively limiting the size of your
polynomial or expression). p. 185 of the manual
<https://cs.gmu.edu/~eclab/projects/ecj/docs/manual/manual.pdf> talks about
parsimony pressure, FWIW.
Siggy
On Sat, May 6, 2017 at 3:34 PM, Chris Johnson <[log in to unmask]
> wrote:
> Ok. Went through all four ECJ tutorials. Played with them a little.
> Learned a bit more about Java. Still don't like it.
>
> I do have a question. If this question belong elsewhere, please point me
> there and there I will ask it.
>
> I had to wait until tutorial4 because I'm getting into symbolic regression
> modeling. You have a bunch of data, these days quite often peta-data or
> more. You want a mathematical model that has the attributes of describing
> the data and hopefully making, successful, predictions.
>
> Here's my issue. If I remember correctly, it is possible to come up with
> a polynomial of degree n-1, where n is the number of data points, that
> precisely passes through every data point in your data set. However, the
> odds of such a polynomial having any descriptive truths about the data, let
> alone predictive capabilities, are pretty small as a rule.
>
> What you want is probably something more in the way of a spline function,
> at the least, with the wonderful piece wise continuous differential hoo-ha
> yada yada they taught back in the Precambrian era when I studied math.
>
> I googled Koza fitness tests. I've seen similar for symbolic regression.
> Many look a lot like a statistical variance. Maybe I'm missing something
> here, probably am. Looks to me like my aforementioned n-1 degree
> polynomial would fit like the proverbial glove with a 0 fitness measure.
> What's to prevent such a symbolic regression system, ECJ or other, from
> simply coming up with a useless polynomial?
>
> Thanks.
>
> --
>
> Chris Johnson [log in to unmask]
> Ex SysAdmin, now, writer
>
> *A bargain is something you don’t need at a price you can’t resist. *(Franklin
> Jones)
>
--
Ph.D student in Computer Science, George Mason University
CFO and Web Director, Journal of Mason Graduate Research
http://mason.gmu.edu/~escott8/
|