no description available
Machine learning (ML) has transformed various fields, highlighting the importance of early defect detection in ML programs without executing the code. While static analysis presents opportunities, existing studies have limitations. Meanwhile, notebooks have become a popular platform for developing ML prototypes. Notably, notebooks offer valuable run-time information, which can potentially enhance static analysis. In this project, we propose a semi-static analysis approach that will leverage available notebook run-time information. Our techniques will incorporate abstract interpretation with ML-based methods and support both notebooks and scripts. Our goal is to deliver efficient and effective semi-static analysis methodologies and open-source tools for the early detection of defects during coding, to enhance the productivity of ML development and the quality of ML programs.
Smart contracts hold the potential for revolutionizing various industries, but their implementation requires thorough testing due to the associated financial risks. Mutation testing is a powerful technique that can boost the fault-detection capabilities of a test suite, but it can also foster a deeper understanding of the smart contract behavior. This work proposes the usage of mutation testing throughout the smart contract auditing process to support code inspection activities.
Automatic test generation is an important software engineering task. Various test generation paradigms (e.g., code-coverage-based, model-based) have been proposed. One major objective of these paradigms is to generate test cases that achieve high coverage of a program’s components/functionalities. Despite many advances, there are still two outstanding challenges for these paradigms: 1) high coverage is not necessarily correlated with bug-revealing capabilities, 2) constructing a test oracle is often an undecidable problem. To address these challenges, we plan to study the paradigm of failure-based testing, which focuses on constructing failure-inducing test cases. We observe that LLMs have several desired characteristics, that can address these challenges for finding failure-inducing test cases.
Static analysis cannot provide fully sound results due to the undecidability of dynamic programming language features. We propose an approach that complements static analysis with relevant information from dynamic analysis. Our goal is to provide a framework that defers part of the execution of a static analysis at unsound code points in order to collect relevant values at runtime. To evaluate this, real-world benchmarks will be used to ensure practical feasibility.
Automatic code generation is an advanced application of program understanding and is considered as a crucial method to improve the automation level and quality of software development. Researchers have recently extended the application of large language models (LLMs) to code generation, with impressive results. However, the code generated by these models might not always align with developers’ specific requirements, and it is challenging to make necessary modifications to the model, since the LLMs are often black-box and require huge computation resources. To address this problem, I plan to conduct post-processing to the output of LLMs. In this proposal, I first conduct a literature review of the field. Moreover, two potential directions for mitigating this problem will be proposed.
My name is Jinhao Dong, currently a PhD student at Peking University, scheduled to graduate in 2025. My research interests primarily lie in deep learning and software testing. My PhD research focuses on collaborative software development, which is essential to improve productivity when working on large-scale projects. I have introduced fine-grained structured representations for code changes in commit message generation (ICSE22) and conflict resolutions in merge conflict resolution (ASE23). Additionally, I have proposed specialized neural networks, including a graph neural network and dual copy mechanism for commit message generation (ICSE22) and generative models for merge conflict resolution (ASE23). Moreover, I have proposed a pattern-based approach to evaluate generated commit messages by matching the patterns to reflect their details and distribution (ICSE23).
Furthermore, I have devised a generative adversarial network called MarginGAN, which leverages the margin theory to enhance the accuracy of semi-supervised classifiers (NeurIPS19). I have also proposed a new direction to accelerate regression testing by reusing program states and skipping unnecessary program executions (ASE20 NIER Track). Other areas of my research include fault localization (FSE21) and test case reduction (ISSRE20).
We aim to address a critical research problem regarding to the improvement of methodologies and tool-support systems for the comprehensive analysis and seamless transformation of imperative Deep Learning (DL) programs. DL frameworks have traditionally embraced deferred execution-style DL code. While scalable, such development tends to produce code that is error-prone. A more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. Though hybrid approaches aim for the “best of both worlds,” using them effectively requires subtle considerations to make code amenable to safe, accurate, and efficient graph execution— avoiding performance bottlenecks and semantically inequivalent results. Our proposed research tries to bridge this gap by comprehensively investigating scalable and reliable imperative DL programming, focusing in the development of novel methodologies and advanced tool-support mechanisms. We have initial work given where we analyze the challenges of migrating DL programs to graph execution and our progress to developing automated refactoring of imperative DL programs to graph execution.
- Ph.D. student at The Graduate School and University Center of the City University of New York (CUNY).
- Member of the PONDER lab at Hunter College.
no description available