Dissertation Defense Announcement
To:  The George Mason University Community

Candidate: KanakaDurga Addepalli

Program: PhD in Bioinformatics & Computational Biology

Date:   Friday April 25, 2014

Time:   2:00 p.m.

Place:  George Mason University
             Prince William Campus<>

             Occoquan Bldg., Room 203

Title: "Models Predicting Effects of Missense Mutations in Oncogenesis"
Thesis Director: Dr. Iosif Vaisman
Thesis Committee:  Dr. Ancha Baranova, Dr. James D. Willett
A copy of the dissertation will be available in the Mercer Library.  All are invited to attend the defense.
The recent avalanche in high-throughput genotyping, next generation sequencing technologies and re-sequencing of cancer genomes has revolutionized and changed the way the cancer is being studied. Identification and characterization of these mutations and predicting their effect has become one of the major goals of cancer research. We present here a computational geometry approach based on the application of Delaunay tessellation derived four-body statistical potential function directly derived from the high-resolution protein x-ray crystallographic structures utilizing their atomic coordinates. Proteins and their mutants are characterized by potential topological scores and profiles, which measure the relative change in the overall sequence-structure compatibility. Residual scores and profiles are generated which quantify environmental perturbations from wild-type amino acids at every mutational position. We also present here an integrated database of human cancer missense mutations linked to their 3D structures, which has been created with the whole motivation of building a one stop shop of human missense mutations data sets huge and versatile enough to be used for training and testing of machine learning methodologies.  We successfully apply supervised learning to training sets of protein mutants taken from our database and generate models, which make statistically meaningful predictions of effects of missense mutations on cancer proteins.This approach is further validated to be useful in improving feature selection in cancer survival time models trained on Gene expression data.