Automatically Tagging the “AAA" Pattern in Unit Test Cases Using Machine Learning Models
The AAA pattern, i.e. the Arrangement, Action, and Assertion, is a common and nature layout to create a test case. Following this pattern in test cases may benefit comprehension, debugging, and maintenance. The AAA structure of real-life test cases may not be explicit due to its high complexity. Manually labeling AAA statements in test cases is tedious. Thus, an automated approach for labeling AAA statements in existing test cases could benefit new developers and projects that practice collective code ownership and test driven development.
This study contributes an automatic approach based on machine learning models. The ``secret sauce" of this approach is a set of three learning features that are based on the semantic, syntax, and context information in test cases, derived from the manual tagging process. Thus, our approach mimics how developers may manually tag the AAA pattern of a test case. We assess the precision, recall, and F-1 score of our approach based on 449 test cases, containing about 16,612 statements, across 4 Apache open source projects. For achieving the best performance in our approach, we explore the usage of six machine learning models; the contribution of the SMOTE data balancing technique; the comparison of the three learning features; and the comparison of five different methods for calculating the semantic feature. The results show our approach is able to identify Arrangement, Action, and Assertion statements with a precision upwards of 92%, and recall up to 74%. Our experiments also provide empirical insights regarding how to best leverage machine learning for software engineering tasks.