*Correction:* > Thesis Defense Announcement > To The George Mason University Community > > *Candidate: Philip Goetz > Program: Master of Science in Bioinformatics & Computational Biology > * > *Date: Thursday December 1, 2011 > Time: 1:00 p.m. > Place: George Mason University, Prince William campus <http://www.gmu.edu/resources/visitors/findex.html> > Discovery Hall Room 224 > > *Thesis Chair: Dr. Iosif Vaisman > > Title: "AUTOMATED CONSTRUCTION OF A RANKING SYSTEM FOR AUTOMATIC FUNCTIONAL > GENE ANNOTATION"*** > * > > A copy of the thesis is on reserve in the Johnson Center Library, > Fairfax campus. The thesis will not be read at the meeting, but > should be read in advance. > All members of the George Mason University community are invited to > attend. > > *ABSTRACT: > * > One key method of automatic functional annotation of a gene is finding > BLAST hits to the gene in question that have functional annotations, > choosing the best single hit, and copying the annotation from that hit > if it is of sufficient quality as measured by a p-value or other > criterion. In the JCVI prokaryote automatic functional annotation > system, the best hit is chosen by looking up categories in a > manually-constructed table stating how reliable the annotation is > depending on who made the annotation, what percent identity the BLAST > hit had, and what percentage of the query gene and the hit gene were > involved in the match. > > Constructing this table is labor-intensive; and humans are incapable > of processing enough data to construct it correctly. I therefore > reduced the data requirements by breaking the table into orthogonal > components; and I developed an iterative method to minimize the > least-squares error of the table on a training set. I also > constructed a validation set of 50,000 manually-annotated proteins > from JCVI data, and developed a protein name thesaurus and ontology to > make it possible to tell when two names meant the same thing, or when > one name was a more-specific refinement of another name. Training on > 9/10 of the validation set, and testing on the held-out 1/10, showed > an improvement in accuracy from 71.8% to 77.7%. > > ###