LISTSERV - BIOSCIENCES-L Archives

*Correction:*
> Thesis Defense Announcement
> To The George Mason University Community
>
> *Candidate: Philip Goetz
> Program: Master of Science in Bioinformatics & Computational Biology
> *
> *Date:   Thursday December 1, 2011
> Time:   1:00 p.m.
> Place:  George Mason University, Prince William campus <http://www.gmu.edu/resources/visitors/findex.html>
> 	    Discovery Hall Room 224
>  
> *Thesis Chair:  Dr. Iosif Vaisman
>
> Title: "AUTOMATED CONSTRUCTION OF A RANKING SYSTEM FOR AUTOMATIC FUNCTIONAL
> GENE ANNOTATION"***
> *
>   
> A copy of the thesis is on reserve in the Johnson Center Library, 
> Fairfax campus.  The thesis will not be read at the meeting, but 
> should be read in advance.
> All members of the George Mason University community are invited to 
> attend.
>
> *ABSTRACT:
> *
> One key method of automatic functional annotation of a gene is finding
> BLAST hits to the gene in question that have functional annotations,
> choosing the best single hit, and copying the annotation from that hit
> if it is of sufficient quality as measured by a p-value or other
> criterion.  In the JCVI prokaryote automatic functional annotation
> system, the best hit is chosen by looking up categories in a
> manually-constructed table stating how reliable the annotation is
> depending on who made the annotation, what percent identity the BLAST
> hit had, and what percentage of the query gene and the hit gene were
> involved in the match.
>
> Constructing this table is labor-intensive; and humans are incapable
> of processing enough data to construct it correctly.  I therefore
> reduced the data requirements by breaking the table into orthogonal
> components; and I developed an iterative method to minimize the
> least-squares error of the table on a training set.  I also
> constructed a validation set of 50,000 manually-annotated proteins
> from JCVI data, and developed a protein name thesaurus and ontology to
> make it possible to tell when two names meant the same thing, or when
> one name was a more-specific refinement of another name.  Training on
> 9/10 of the validation set, and testing on the held-out 1/10, showed
> an improvement in accuracy from 71.8% to 77.7%.
>   
> ###