Computer Science Colloquium: Solving the Search for Source Code

Thursday, February 28, 2013
Nguyen Engineering 4201
Kathryn Stolee


Programmers frequently use keyword searches to find source code in large repositories. However, to do this effectively, programmers must specify keyword queries that capture implementation details of their desired code. I propose that code search should be about behavior, not about keywords. 

In this talk, I will present an approach to code search that allows programmers to provide inputs and outputs that define the behavior of their desired code. This approach indexes source code repositories by symbolically analyzing the programs and program fragments and transforming them into constraints representing their behavior. Results are identified using an SMT solver, which, given an input/output specification and the constraint representation of a program fragment, determines if the fragment matches the desired behavior. While promoting code reuse, my approach enables reuse where it was not possible before: the constraints can be relaxed, identifying code that approximately matches the specification. Further, the solver can then guide the instantiation of the code to produce the desired behavior. I will illustrate the generality of the approach by showing its instantiation in subsets of three languages, the Java String library, Yahoo! Pipes mashups, and SQL select statements. I will conclude by sharing my vision for new research directions related to this semantic approach to code search. 

Speakers Bio

Kathryn Stolee is a Ph.D. candidate and NSF Graduate Research Fellow in the Department of Computer Science and Engineering at the University of Nebraska-Lincoln. She has been awarded an ESEM Distinguished Paper Award and two departmental outstanding research awards. Her research is in software engineering with a focus on program analysis. Extracting useful information from software artifact repositories is a broad theme of her research, most recently through semantic code search.