[log in to unmask]" type="cite">Dissertation Defense Announcement:
To:  The George Mason University Community

Candidate: Baris E. Suzek
Program: PhD Bioinformatics & Computational Biology 

Date:   Tuesday November 27, 2012
Time:   11:00 A.M. - 1:00 P.M. 
Place:  George Mason University
 	    Occoquan Bldg. Room #110-A
	    Prince William campus
Dissertation Chair: Dr. Iosif Vaisman
Committee Members: Dr. Donald Seto, Dr. James D. Willett


The dissertation is on reserve in the Johnson Center Library, Fairfax campus.
The doctoral project will not be read at the meeting, but should be read in advance. 
All members of the George Mason University community are invited to attend.

The advances in high-throughput sequencing and molecular profiling techniques led to identification of single amino acid polymorphisms (SAPs). The large volume of SAPs makes it time and cost prohibitive to experimentally test and validate the disease causing or pathogenic impacts for all of them individually. Hence, accurate computational methods to classify disease-causing SAPs are needed and they can play a clinical role as a core part of a system to support clinicians’ decision making for diagnosis.  This thesis describes several components that are collectively used for a pipeline to classify disease-causing SAPs for human.  These components include (1)  a data collection and integration pipeline that collects a set of SAPs, sequence and structure information (2) a non-redundant protein sequence cluster database that is employed in assessing evolutionary conservation of  amino acids (3) a methodology that use evolutionary conservation information to classify disease causing SAPs, and  (4) an ensemble classifier that combines several machine learning classifiers for different sequential and structural contexts SAPs are located on. This approach is comprehensive; it uses several sequence annotation and structure information resources to provide sequential and structural context to SAPs and create features for classifiers. It is extensible; the resources and classifiers can be updated as new sequence annotations or structure information become available. It is specific; it assesses impact of SAP within sequential and structural context, utilizing a unique blend of evolutionary information.  The resulting classification system performs comparable to or better than existing SAP classifiers. In long run, this system is intended to become a core part of a system to support clinicians’ decision making for diagnosis, identification of treatment protocols and preventive care.