Software developers often have to make many decisions. The underlying logic behind these decisions, also called design rationale, represents beneficial and valuable information. In the past, researchers have tried to automatically extract and exploit this information, however, prior techniques are only applicable to specific contexts and there is insufficient progress on an automated end-to-end rationale extraction and management system. In this research project, we propose to use Natural Language Processing (NLP) and Machine Learning (ML) techniques to create a system for the automated extraction, structuring and management of design rationale. This system would support and ensure the consistency and the coherence of the development process.
Leveraging Artificial Intelligence on Binary Code Comprehension
Understanding binary code is an essential but complex software engineering task for reverse engineering, malware analysis, and compiler optimization. Unlike source code, binary code has limited semantic information, which makes it challenging for human comprehension. At the same time, compiling source to binary code, or transpiling among different programming languages (PLs) can provide a way to introduce external knowledge into binary comprehension. We propose to develop Artificial Intelligence (AI) models that aid human comprehension of binary code. Specifically, we propose to incorporate domain knowledge from large corpora of source code (e.g., variable names, comments) to build AI models that capture a generalizable representation of binary code. Lastly, we will investigate metrics to assess the performance of models that apply to binary code by using human studies of comprehension.
Yifan is a researcher focusing on AI for Software Engineering (AI4SE), Graph Data Mining, and Domain Generalization. For the time being, he is pursuing a Ph.D. in Computer Science at Vanderbilt University, affiliated with Institute for Software Integrated Systems.
Background: Automated Intelligent Toolchains are widely used in software engineering to deploy automated program repair techniques, or in software security to identify vulnerabilites. Overall Research Problem: Most studies with automated intelligent tool-chains report uncertainty and evaluations only of the individual components of the chain. How do we calculate the uncertainty and error propagation on the overall automated toolchain? Approach: I plan to replicate research case studies to collect data and design a methodology to reconstruct the overall correctness metrics of the toolchains, or identifying missing variables. Further confirmatory experiments with humans will be performed. Finally, I will implement an artifact to automate the overall assessment of automated toolchains. Current Status: A preliminary validation of published studies showed promising results.
Software developers often have to make many decisions. The underlying logic behind these decisions, also called design rationale, represents beneficial and valuable information. In the past, researchers have tried to automatically extract and exploit this information, however, prior techniques are only applicable to specific contexts and there is insufficient progress on an automated end-to-end rationale extraction and management system. In this research project, we propose to use Natural Language Processing (NLP) and Machine Learning (ML) techniques to create a system for the automated extraction, structuring and management of design rationale. This system would support and ensure the consistency and the coherence of the development process.