Distinguished Lecture Series: Probabilistic Topic Models and User Behavior
Friday, September 26, 2014
Research Hall Room 163
Probabilistic topic models provide a suite of tools for analyzing large document collections.
Topic modeling algorithms discover the latent themes that underlie the documents and identify how each document exhibits those themes. Topic modeling can be used to help explore, summarize, and form predictions about documents. Topic modeling ideas have been
adapted to many domains, including images, music, networks, genomics, and neuroscience.
Traditional topic modeling algorithms analyze a document collection and estimate its latent
thematic structure. However, many collections contain an additional type of data: how people use the documents. For example, readers click on articles in a newspaper website, scientists place articles in their personal libraries, and lawmakers vote on a collection
of bills. Behavior data is essential both for making predictions about users and for understanding how a collection is organized.
I will review the basics of topic modeling and describe our recent research on collaborative
topic models, models that simultaneously analyze a collection of texts and its corresponding user behavior. > We studied collaborative topic models on scientists' libraries (from Mendeley) and scientists' click data (from the arXiv). Collaborative topic models
enable interpretable recommendation systems, capturing scientists' preferences and pointing them to articles of interest. Further, they help organize the articles according to the discovered patterns of readership. For example, we can identify articles that
are important within a field, and articles that transcend disciplinary boundaries.
More broadly, topic modeling is a case study in the large field of applied probabilistic
modeling. I will survey some recent advances in this field. I will show how modern probabilistic modeling gives data scientists a rich language for expressing statistical assumptions and scalable algorithms for uncovering hidden patterns in massive data.
David Blei is a Professor of Statistics and Computer Science at Columbia University. His
research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference. He works on a variety of applications, including text, images, music, social networks, user behavior, and
David earned his Bachelor's degree in Computer Science and Mathematics from Brown University
(1997) and his PhD in Computer Science from the University of California, Berkeley (2004). Before arriving to Columbia, he was an Associate Professor of Computer Science at Princeton University. He has received several awards for his research, including a
Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013).