Not registered as user yet
Contributions
View general profile
Not registered as user yet
Contributions
Journal-first Papers
Wed 12 Oct 2022 17:20 - 17:40 at Banquet A - Technical Session 18 - Testing II Chair(s): Darko MarinovFault Localization (FL) is an important first step in software debugging and is mostly manual in the current practice. Many methods have been proposed over the years to automate the FL process, including information retrieval (IR)-based techniques. These methods localize the fault based on the similarity of the reported bug report and the source code. Newer variations of IR-based FL (IRFL) techniques also look into the history of bug reports and leverage them during the localization. However, all existing IRFL techniques limit themselves to the current project’s data (local data). In this study, we introduce, which is an IRFL framework consisting of methods that use models pre-trained on the global data (extracted from open-source benchmark projects). In, we investigate two heuristics: (a) the effect of global data on a state-of-the-art IR-FL technique, namely, and (b) the application of a Word Embedding technique (Doc2Vec) together with global data. Our large-scale experiment on 51 software projects shows that using global data improves on average 6.6% and 4.8% in terms of MRR (Mean Reciprocal Rank) and MAP (Mean Average Precision), with over 14% in a majority (64% and 54% in terms of MRR and MAP, respectively) of the cases. This amount of improvement is significant compared to the improvement rates that five other state-of-the-art IRFL tools provide over. In addition, training the models globally is a one-time offline task with no overhead on ’s run-time fault localization. Our study, however, shows that a Word Embedding-based global solution did not further improve the results.
Link to publication DOI