Sender: |
|
Date: |
Tue, 29 Nov 2011 10:42:08 -0500 |
MIME-version: |
1.0 |
Reply-To: |
|
Content-type: |
multipart/alternative;
boundary=------------060801070702030804010706 |
Subject: |
|
From: |
|
In-Reply-To: |
|
Comments: |
|
Parts/Attachments: |
|
|
*Correction:*
> Thesis Defense Announcement
> To The George Mason University Community
>
> *Candidate: Philip Goetz
> Program: Master of Science in Bioinformatics & Computational Biology
> *
> *Date: Thursday December 1, 2011
> Time: 1:00 p.m.
> Place: George Mason University, Prince William campus <http://www.gmu.edu/resources/visitors/findex.html>
> Discovery Hall Room 224
>
> *Thesis Chair: Dr. Iosif Vaisman
>
> Title: "AUTOMATED CONSTRUCTION OF A RANKING SYSTEM FOR AUTOMATIC FUNCTIONAL
> GENE ANNOTATION"***
> *
>
> A copy of the thesis is on reserve in the Johnson Center Library,
> Fairfax campus. The thesis will not be read at the meeting, but
> should be read in advance.
> All members of the George Mason University community are invited to
> attend.
>
> *ABSTRACT:
> *
> One key method of automatic functional annotation of a gene is finding
> BLAST hits to the gene in question that have functional annotations,
> choosing the best single hit, and copying the annotation from that hit
> if it is of sufficient quality as measured by a p-value or other
> criterion. In the JCVI prokaryote automatic functional annotation
> system, the best hit is chosen by looking up categories in a
> manually-constructed table stating how reliable the annotation is
> depending on who made the annotation, what percent identity the BLAST
> hit had, and what percentage of the query gene and the hit gene were
> involved in the match.
>
> Constructing this table is labor-intensive; and humans are incapable
> of processing enough data to construct it correctly. I therefore
> reduced the data requirements by breaking the table into orthogonal
> components; and I developed an iterative method to minimize the
> least-squares error of the table on a training set. I also
> constructed a validation set of 50,000 manually-annotated proteins
> from JCVI data, and developed a protein name thesaurus and ontology to
> make it possible to tell when two names meant the same thing, or when
> one name was a more-specific refinement of another name. Training on
> 9/10 of the validation set, and testing on the held-out 1/10, showed
> an improvement in accuracy from 71.8% to 77.7%.
>
> ###
|
|
|