[log in to unmask]" type="cite">Dissertation Defense Announcement:
To:  The George Mason University Community


William Anderson von Canon
PhD Bioinformatics & Computational Biology Candidate

Date:   Tuesday April 12, 2011
Time:   4:30 p.m. 
Place:  George Mason University
 	     Occoquan Bldg. Room 203
	     Prince William campus
  
Dissertation Chair: Dr. Jeffrey Solka
Committee members: Dr. James D. Willett, Dr. Jason Kinser
Title: "Enhancement of Literature Based Discovery Using Advanced
Computational Techniques and Evaluation of Potential Discoveries
Related to Amyotrophic Lateral Sclerosis (ALS)"

The dissertation is on reserve in the Johnson Center Library, Fairfax campus.
The doctoral project will not be read at the meeting, but should be read in advance. 
All members of the George Mason University community are invited to attend.


ABSTRACT: 
Literature based discovery (LBD) has been
used successfully to support analysis of medical literature and to identify
potential non related topics that could support novel discoveries in disease
research. As the concept of LBD expanded to the wider scientific community its
potential value grew but its statistical underpinnings needed further
refinement. The foundational research conducted by Don Swanson established a
potential link between “fish oil” and “Raynauds” disease by analyzing
literature inferences that linked the concepts indirectly through associated
words that were not common to the direct topic literature sources. The
identification of “joining words” that were common to each literature source
helped link the two concepts of “disease” and “potential cure” that otherwise
may not have been identified. This dissertation focused on enhancing LBD with
new probabilistic techniques that could increase the accuracy and efficiency of
discovering literature similarities while enforcing a more stringent
statistical foundation for the technique. The initial LBD research conducted by
Gordon/Dumais on Raynaud’s disease, using the Latent Semantic Indexing (LSI)
technique, was recreated as part of this analysis. Two new statistical
techniques, Probabilistic Latent Semantic Analysis (PLSA) and Non-Negative
Matrix Factorization (NMF), were evaluated and supported increased precision in
intermediate literature identification and potential literature inferred
discovery over the original LSI technique. These techniques were then used to
conduct LBD on another novel disease, Amyotrophic Lateral Sclerosis (ALS). The
investigation of NMF and PLSA as new computation approaches did validate
quantifiable enhancements over the previous LSI techniques. While they enhanced
the level of accuracy and efficiency in both literature B and C discovery, the
aspect of being able to use each computational technique as reinforcement for
the other method’s findings proved very interesting and will provide the
researcher stronger statistical LBD models in support of current and future
scientific discovery.

###