Dissertation Defense Announcement To: The George Mason University Community Candidate: Tugba Onal-Suzek Program: PhD Bioinformatics & Computational Biology Date: Friday July 20, 2012 Time: 11:00 a.m. Place: George Mason University Occoquan Bldg. Room 203 Prince William Campus Dissertation Director: Dr. Jeffrey L. Solka Committee members: Dr. James D. Willett, Dr. Donald Seto Title: "The Text-Mining Based Neighboring and Automated Annotation of PubChem BioAssays" The dissertation is on reserve in the Johnson Center Library, Fairfax campus. The doctoral project will not be read at the meeting, but should be read in advance. All members of the George Mason University community are invited to attend. ABSTRACT: The number of High Throughput Assays (HTS), namely BioAssays, deposited in PubChem has grown quickly in recent years. Currently available grouping tools and single retrieval analysis approaches for the BioAssays turned out to be impractical with the rapidly increasing volume of data. In this work, a text-mining based approach is proposed towards automated neighboring and annotation of BioAssays using their unstructured text descriptions. Our results from assay neighbor clustering analysis compared to the existing assay neighboring methods suggest that strong correlations among the bioassays can be identified from their conceptual relevance and complement existing neighboring methods in PubChem. A novel method to extract keywords from a single unstructured text document is described and the comparative performance of the method when applied to the BioAssay descriptions is reported. Finally results of the automated biomedical annotation of the BioAssay text descriptions towar ds the discovery of chemical substances that satisfy the probe criteria are presented. ###