Main menu:

Invited Talks:

Tao Xie, University of Illinois at Urbana-Champaign

Software Analytics: Towards Software Mining that Matters

Abstract: A huge wealth of various data exists in software life cycle, including source code, feature specifications, bug reports, test cases, execution traces/logs, and real-world user feedback, etc. Data plays an essential role in modern software development, because hidden in the data is information about the quality of software and services as well as the dynamics of software development. To leverage such data for various software engineering tasks, software mining has been actively pursued by researchers for more than a decade. With substantial ground work already done in software mining, it is important for the research community to pay attention to the impact that our software mining research produces on industrial practices.
In fact, the research community of machine learning, a field related to software mining, has recently started discussions on the practice impact of machine learning research. An example paper is "Machine Learning that Matters" ( presented by Kiri Wagstaff at ICML 2012. In this keynote talk, I will initiate similar discussions to explore how to conduct software mining research that can produce impact on industrial practices, by using some successful technology-transfer examples in the emerging field of software analytics. Software analytics is to utilize data-driven approaches to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for completing various tasks around software systems, software users, and software development process. The viewpoints and example work described in this talk are primarily resulted from collaborative efforts with the Software Analytics group ( of Microsoft Research Asia.

Bio: Tao Xie is an Associate Professor in the Department of Computer Science at University of Illinois at Urbana-Champaign, USA since July 2013. Before then, he was an Associate Professor in the Department of Computer Science at North Carolina State University. He received his Ph.D. in Computer Science from the University of Washington in 2005, advised by David Notkin. He has worked as a visiting researcher at Microsoft Research Redmond and Microsoft Research Asia. His research interests are in software engineering, focusing on software testing, program analysis, and software analytics. He has served as the ACM SIGSOFT History Liaison in the SIGSOFT Executive Committee as well as a member of the ACM History Committee (ACM History SGB Liaison). He received an NSF CAREER Award in 2009. He received a 2011 Microsoft Research Software Engineering Innovation Foundation (SEIF) Award, 2008, 2009, and 2010 IBM Faculty Awards, and a 2008 IBM Jazz Innovation Award. His homepage is at


Ling Huang, Intel Labs

Machine Learning, Program Analysis and Large-Scale Systems

Abstract: Today’s distributed systems (e.g., those for cloud computing and Big Data) usually consist of hundreds of software components running on thousands of computers. In such complex and flexible systems, mining the patterns in the software system to develop automatic tools for task scheduling, performance modeling and problem detection become imperative. In this talk, I will present you two examples of our work in combining machine learning and program analysis to model the software system and develop automated tools to aid the critical operations in large-scale systems.
In the first example, I will present you a framework for predicting the performance of programs on given inputs automatically, accurately, and efficiently. Our framework synergistically combines techniques from program instrumentation, program analysis, and machine learning. It constructs concise performance models by choosing from many program execution features only a handful that are most correlated with the program’s execution time yet can be evaluated efficiently from the program’s input. It also automatically generates executable code snippets for efficiently evaluating features. We have implemented and applied our framework to four real-world Java programs. Our approach predicts their execution time with at most 10% estimation error by executing lightweight predictor costing less than 5% of their execution time.
In the second example, I will talk about a novel approach for system problem detection by mining console logs, which is generated by applications contain information that the developers believed would be useful in debugging or monitoring the applications. We first combine log parsing and text mining with source code analysis to extract structure from the console logs. We then generate features from the structured information in order to detect anomalous patterns in the logs using a method based on Principal Component Analysis (PCA). Finally, we use a decision tree to distill the detection results to a format readily understandable by domain experts (e.g., developers, integrators and operators) who need not be familiar with the anomaly detection algorithms. Our methodology works on textual console logs of any size and requires no changes to the service software, no human input, and no knowledge of the software's internals. We validate our approach using real systems, where we detect numerous real problems with high accuracy and few false positives.

Bio: Ling Huang is a research scientist in Intel Labs. He is currently a member of the Intel Science and Technology Center on Secure Computing at UC Berkeley. His research interests are in machine learning, systems and security, especially in efficient machine learning methods for mobile perception, system problem detection and security. Ling joined Intel Labs Berkeley in October 2007, immediately after getting his Ph.D. from Computer Science at University of California at Berkeley. During his Ph.D. study, he was affiliated with RadLab. Prior to UC Berkeley, he obtained B.S. and M.S. degree from Beijing University of Aeronautics and Astronautics (BUAA) in China, and worked more than three years as a system architect and project manager at CAXA, the No.1 CAD/CAM software company in China.