Please join Jeff Solka, PhD for the Colloquium on Tuesday, January 29, 2013 at 4:30 pm. The presentation will be held in room 256 of Bull Run Hall on the PW campus. The speaker will be Dr. Elizabeth Hohman (NSWCDD) > TITLE: Statistical Methods in Text Analysis > > ABSTRACT: This talk is structured like a mini-tutorial of text > analysis using the R programing language and environment. We use > PubMed to download an example corpus and perform the parsing, > classification, and clustering in R. Instead of using R text packages > such as tm, we represent the documents as a matrix and apply some > standard classification and clustering techniques. All code is > included in the slides and can be run on your own PubMed download. The > focus is on understanding the math behind the techniques, not on > efficiency. After understanding basics such as the TFIDF (term > frequency inverse document frequency) representation of a corpus, one > can be better prepared to use the available text mining packages. > > -- > > >