
Registered user since Wed 22 Jan 2020
Contributions
Registered user since Wed 22 Jan 2020
Contributions
[Workshop] CPAchecker
Mon 11 Sep 2023 10:30 - 10:50 at Room PT - Latest Analysis Techniques around CPAcheckerGenerative language models have made tremendous progress in software engineering tasks such as code generation, code refinement and debugging. Many of these tasks require strong reasoning skills over code and code executions. This raises the question whether we can utilize the reasoning skills of ChatGPT in the formal setting of software verification and complement existing automatic verifiers such as CPAchecker. To answer this question, we study the abilities of ChatGPT in the context of automatic software verification. More specifically, we focus on the task of loop invariant generation. Loop invariant generation requires reasoning over the code semantics to infer invariants that hold for all loop executions and good loop invariants can significantly simplify the verification process. Our initial investigation on the SV-COMP Loops category shows that ChatGPT can effectively annotate C programs with valid loop invariants, many of which can be useful for automatic software verification. Interestingly, we find that many generated loop invariants are far from trivial and have a similar quality as invariants produced by a human annotator.
Real bug fixes found in open source repositories seem to be the perfect source for learning to localize and repair real bugs. Yet, the scale of existing bug fix collections is typically too small for training data-intensive neural approaches. Neural bug detectors are hence almost exclusively trained on artificial bugs, produced by mutating existing source code and thus easily obtainable at large scales. However, neural bug detectors trained on artificial bugs usually underperform when faced with real bugs. To address this shortcoming, we set out to explore the impact of training on real bug fixes at scale. Our systematic study compares neural bug detectors trained on real bug fixes, artificial bugs and mixtures of real and artificial bugs at various dataset scales and with varying training techniques. Based on our insights gained from training on a novel dataset of 33k real bug fixes, we were able to identify a training setting capable of significantly improving the performance of existing neural bug detectors by up to 170% on simple bugs in Python. In addition, our evaluation shows that further gains can be expected by increasing the size of the real bug fix dataset or the code dataset used for generating artificial bugs. To facilitate future research on neural bug detection, we release our real bug fix dataset, trained models and code.
Pre-print File Attached