CS Colloquium: Searching for Relevant Functions and Their Usages in Millions of Lines of Code

Thursday, March 28, 2013
ENGR 4201 

Denys Poshyvanyk


Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or ordinary code fragments. Therefore, developers require support in finding relevant functions and determining how those functions are used. Unfortunately, existing code search engines do not provide enough of this support to developers, thus reducing the effectiveness of code reuse. We provide this support to programmers in a code search system called Portfolio that retrieves and visualizes relevant functions and their usages. We have built Portfolio using a combination of models that address surfing behavior of programmers and sharing related concepts among functions. Currently, Portfolio is instantiated on two large source code repositories with thousands of projects spanning 270 Million C/C++ and 440 Million Java lines of code. In order to evaluate Portfolio, we conducted two experiments. First, an experiment with 49 professional C/C++ programmers to compare Portfolio to Google Code Search and Koders using a standard methodology for evaluating Information Retrieval-based engines. And second, an experiment with 19 Java programmers to compare Portfolio to Koders. The results show with strong statistical significance that users find more relevant functions with higher precision with Portfolio than with Google Code Search and Koders. We also demonstrate that by using PageRank, Portfolio is able to rank returned relevant functions more efficiently.

Speaker's Bio

Dr. Denys Poshyvanyk is an Assistant Professor at the College of William and Mary in Virginia. He received his Ph.D. degree in Computer Science from Wayne State University in 2008. He also obtained his M.S. and M.A. degrees in Computer Science from the National University of Kyiv-Mohyla Academy, Ukraine and Wayne State University in 2003 and 2006, respectively. Since 2010, he has been serving on the steering committee of the International Conference on Program Comprehension (ICPC). He has been elected as a chair of the ICPC steering committee in 2012. He serves as a program co-chair for 21st IEEE International Conference on Program Comprehension (ICPC'13). He also served as a program co-chair for the 18th and 19th International Working Conference on Reverse Engineering (WCRE 2011 and WCRE 2012). Dr. Poshyvanyk received NSF CAREER award and several Best Paper Awards, including ICPC’06, ICPC'07, ICSM’10, and SCAM’10. His research interests are in software engineering, software maintenance and evolution, program comprehension, reverse engineering, software repository mining, source code analysis, and metrics.