A Study of Oracle Approximations in Testing Deep Learning Libraries
Due to the popularity of deep learning (DL) applications, testing DL libraries are becoming more and more important. Different from traditional testing, for which output is asserted definitely (e.g., an output is compared with an oracle for equality), testing deep learning libraries often requires to perform oracle approximations, i.e., the output is allowed to be within a restricted range of the oracle. However, oracle approximations have not been studied in prior empirical work that focuses on traditional testing practices. The prevalence, common practices, evolution, and maintenance challenges of oracle approximations remain unknown. In this work, we studied oracle approximation assertions that are implemented in four popular deep learning libraries. Our study shows that oracle approximation assertions are a significant portion among all the assertions in the test suites of deep learning libraries. We identify the commonly-used oracle types when there are approximations being performed on oracles through a comprehensive manual study. In addition, we find that developers frequently modify code on oracle approximations, i.e., using a different approximation API, modifying the oracle or the output from the code under test, and using a different threshold value. Finally, we performed in-depth studies to understand the reasons behind the evolution of oracle approximation assertions and our findings reveal maintenance challenges.