Root Cause Localization for Unreproducible Builds via Causality Analysis over System Call Tracing
Localization of the root cases for unreproducible builds is an important yet challenging task during software maintenance. The major challenges lie in limited runtime traces from build processes and high diversity of build environments. To address these challenges, in this paper, we propose RepTrace, a framework that identifies the root causes for unreproducible builds based on collected system call traces of the executed build commands. Our framework leverages system call tracing’s uniform interfaces for monitoring executed build commands in diverse build environments. From the collected system call traces, causality analysis included in our framework builds a dependency graph starting from an inconsistent build artifact (across two builds) via two types of dependencies: read/write dependencies among processes and parent/child process dependencies, and searches the graph to find the processes that result in the inconsistencies. To handle massive noisy dependencies and uncertain parent/child dependencies, RepTrace includes two novel techniques: (1) using difference analysis on multiple builds to reduce the search space of read/write dependencies, and (2) computing similarity of the runtime values to filter out noisy parent/child process dependencies. The evaluation results of RepTrace over a set of real-world software packages show that \tool effectively finds not only the root cause commands responsible for the unreproducible builds, but also the files to patch for addressing the unreproducible issues. Among its Top-10 identified commands and files, RepTrace achieves high accuracy of 90.00% and 90.56% in identifying the root causes, respectively.