暑期科研的意义

对于打算深造的同学:一封海外推荐信,一边做毕设一边横扫海外名校,证明自己的科研、学习能力;
对于系里有要求的同学:例如清华电子系,要求大三暑期参加生产实习,可以去公司实习或者去海外实验室研修;
对于保研直博的同学:增加海外经历,了解海外生活,增长见识、开阔眼界。

加州大学洛杉矶分校

计算机软件工程科研组-Miryung Kim

Research Interests

My research focuses on software engineering. My research group, Software Engineering and Analysis Laboratory, develops program analysis algorithms and development tools to make it easier to develop and evolve large scale software systems. To improve programmer productivity and program correctness, we design, implement, and evaluate automated software analysis algorithms and tools. We also conduct user studies with professional software engineers and carry out statistical analysis of open source project data to allow data-driven decisions for designing novel software engineering tools. These days, my research focuses on software engineering support for big data systems and understanding how data scientists work in software development organizations. In particular, our focus is to improve productivity of data science work.

Please check out the new UCLA-UC Irvine project website on synergistic software customization, supported by Office of Naval Research, TPCP program. You can read more about current research projects at SEAL. Many of my publications and software toolsare available on-line. I am actively recruiting strong students who are interested in software engineering. I am interested in new colleagues at all levels: undergraduates, graduate students, and post-docs.

 

Interactive and Automated Debugging for Big Data Analytics 

An abundance of data in science, engineering, national security, and health care has led to the emerging field of big data analytics. To process massive quantities of data, developers leverage data-intensive scalable computing (DISC) systems in the cloud, such as Google's MapReduce, Apache Hadoop, and Apache Spark. However, the current cloud computing model lacks the kinds of expressive and interactive debugging features found in traditional desktop computing. We seek to address these challenges by providing providing interactive debugging primitives and tool-assisted fault localization services for big data analytics. We showcase the data provenance and optimized incremental computation features to effectively and efficiently support interactive debugging, and investigate new research directions on how to automatically pinpoint and repair the root cause of errors in large-scale distributed data processing. Big Data Debugging project has a separate project site link. This project is led by my PhD student Muhammad Ali Gulzar.

 

 

Data Scientists in Software Teams: Backgrounds, Activities, Tools, Challenges and Best Practices

The demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams. Facebook, LinkedIn and Microsoft are creating a new career path for data scientists.

We have conducted an in-depth study on the emerging roles of data scientists using a semi-structured interview and identified distinct working styles of data scientists and a set of strategies that they employ to increase the impact and actionability of their work. As a follow up, we conducted a large scale survey with 793 professional data scientists at Microsoft to understand their educational background, problem topics that they work on, tool usages, and activities.  We cluster these data scientists based on the time spent for various activities and identify 9 distinct clusters of data scientists, and their corresponding characteristics. We also discuss the challenges that they face and the best practices they share with other data scientists. This project is led in collaboration with Microsoft Research.

 

Mining, Assessing, and Visualizing Code Examples at Scale

Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. We design ExampleCheck, an API usage mining framework that extracts patterns from over 380K Java repositories on GitHub. ExampleCheck subsequently reports potential API usage violations in 217K SO posts. We find that 31% may have potential API usage violations that could produce unexpected behavior such as program crashes and resource leaks.

There are often a massive number of related code examples and it is difficult for a user to understand the commonalities and variances among them, while being able to drill down to concrete details. We introduce an interactive visualization, called Examplore, that summarizes hundreds of code examples in one synthetic code skeleton with statistical distributions for canonicalized statements and structures enclosing an API call. This project is led by my PhD student Tianyi Zhang.

 

Coping with Code Duplication in Software Systems

Code duplication created by copy and paste is common in large software. Our research on how to cope with code duplication has enabled me to lead a new research team to address software debloating and delayering, which must be urgently addressed to secure our nation's cyber infrastructure. I am the PI of an Office of Naval Research (ONR) project, Synergistic Software Customization. Below are the details on code duplication searchdifferential testing, and clone removal refactoring.

 

Analysis and Automation of Systematic Software Changes

Extension of existing software often requires systematic and pervasive edits?programmers apply similar, but not identical, enhancements, refactorings, and bug fixes to many similar methods. The vision of this research is to produce a novel example-based program transformation approach. Our key insight is that by learning abstract transformation from examples, we can automate systematic edits in a flexible and easy-to-use manner. In our evaluation of real world bug fixes, our approach LASE found fix locations with 99% precision, 89% recall, and applied fixes with 91% correctness. It also fixed locations missed by human developers, correcting errors of omissions. This project is sponsored by National Science Foundation CAREER Award: Analysis and Automation of Systematic Software Modifications.

向上