Friday, March 28, 2014
With the data explosion being witnessed in biology, immense emphasis is being placed on developing systematic approaches to integrate the various types and sources of data to build models of complex biological processes and diseases.
In this talk, I will discuss our efforts to model complex biomedical phenotypes using predictive modeling approaches applied to large genome-wide data sets. The first part presents experiences and results from a collaborative-competitive effort to model and predict survival rates for breast cancer patients using a recently published set of gene expression, copy number aberration and clinical features.
In the second part, I will present our analysis of heterogeneous ensemble predictive methods that generally produce the best performance for complex biomedical prediction problems. These methods leverage the consensus and diversity among hundreds or even thousands of heterogeneous base predictors, and thus generally outperform even the best homogeneous ensemble methods, like boosting and random forests.