Print

Print


Chris,

You're describing the problem of "overfitting" to the data.

I'm not a GP guy, but I know that it's normal to use a training, test, and
validation set when doing symbolic regression (just like you would with
other regression/classification methods).

If you find that you are over-fitting, you might want to add some kind of
parsimony pressure to your GP method (effectively limiting the size of your
polynomial or expression).  p. 185 of the manual
<https://cs.gmu.edu/~eclab/projects/ecj/docs/manual/manual.pdf> talks about
parsimony pressure, FWIW.

Siggy

On Sat, May 6, 2017 at 3:34 PM, Chris Johnson <[log in to unmask]
> wrote:

> Ok.  Went through all four ECJ tutorials.  Played with them a little.
> Learned a bit more about Java.  Still don't like it.
>
> I do have a question.  If this question belong elsewhere, please point me
> there and there I will ask it.
>
> I had to wait until tutorial4 because I'm getting into symbolic regression
> modeling.  You have a bunch of data, these days quite often peta-data or
> more.  You want a mathematical model that has the attributes of describing
> the data and hopefully making, successful, predictions.
>
> Here's my issue.  If I remember correctly, it is possible to come up with
> a polynomial of degree n-1, where n is the number of data points, that
> precisely passes through every data point in your data set.  However, the
> odds of such a polynomial having any descriptive truths about the data, let
> alone predictive capabilities, are pretty small as a rule.
>
> What you want is probably something more in the way of a spline function,
> at the least, with the wonderful piece wise continuous differential hoo-ha
> yada yada they taught back in the Precambrian era when I studied math.
>
> I googled Koza fitness tests.  I've seen similar for symbolic regression.
> Many look a lot like a statistical variance.  Maybe I'm missing something
> here, probably am.  Looks to me like my aforementioned n-1 degree
> polynomial would fit like the proverbial glove with a 0 fitness measure.
> What's to prevent such a symbolic regression system, ECJ or other, from
> simply coming up with a useless polynomial?
>
> Thanks.
>
> --
>
> Chris Johnson [log in to unmask]
> Ex SysAdmin, now, writer
>
> *A bargain is something you don’t need at a price you can’t resist. *(Franklin
> Jones)
>



-- 

Ph.D student in Computer Science, George Mason University
CFO and Web Director, Journal of Mason Graduate Research
http://mason.gmu.edu/~escott8/