Dissertation Defense Announcement
To:  The George Mason University Community

Candidate: Saad Almasuod

Program: PhD in Bioinformatics and Computational Biology

Date:   Tuesday November 28, 2017

Time:   11:00 AM
Place:  Colgan Hall, Room 328-H
              George Mason University
             Science & Tech campus

Title: "Machine Learning Modeling of Survival in Cancer Patients"
Committee Chair: Dr. Iosif Vaisman
Committee Members:  Dr. Saleet Jafri, Dr. James D. Willett

All are invited to attend the defense.
Bioinformatics plays an important role in cancer research as its methodologies, software, computational tools, and databases could be used for exploration of the molecular mechanisms of cancer and for the identification of both novel biomarkers and network markers; this could lead to individualized, patient specific modalities of cancer treatment. In this research, we consider the use of machine learning algorithms to build models for recurrence free survival prediction of cancer patients based on genomic data. Algorithms such as Random Forest, Na´ve Bayes, Support Vector Machines, and Meta-Ada Boost M1 were applied, among other methods, to a variety of gene expression data sets from cancer patients. Our models were applied to two data sets. The first set consists of data by authors Villanueva et al (2011). They used 287 which included patients 244 that had single nodules as well as tumor genomic data and 201 patients that had both tumor and non-tumor (cirrhotic) gene expression data. The second data was previously used by authors Roessler, et al (2010). They used a total of 153 gene signatures out of which 146 were available for our research. The gene expression data was extracted from a total of 481 patients including 242 tumors and 239 tumor free patients. Our analysis shows a rate of prediction that reached a level of 85% accuracy in some cases. Such promising results open doors for future work to further investigate the cohort of genes included in the gene signature, include relevant clinical data to improve classifications, and then both identify which algorithm works best and recommend its use to determine the individualized treatment.