
Registered user since Thu 1 Jun 2017
Contributions
View general profile
Registered user since Thu 1 Jun 2017
Contributions
Late Breaking Results
Tue 11 Oct 2022 14:30 - 14:40 at Banquet A - Technical Session 6 - Source Code Manipulation Chair(s): Collin McMillanTransformer networks such as CodeBERT already achieve very good results for code clone detection in benchmark datasets, so one could assume that this task has already been solved. However, code clone detection is not a trivial task. Semantic code clones in particular are difficult to detect. We show that the generalizability of CodeBERT decreases by evaluating two different subsets of Java code clones from BigCloneBench. We observe a significant drop of F1 score when we evaluate different code snippets and different functionality IDs than those used for model building.
DOI Pre-printResearch Papers
Thu 13 Oct 2022 17:30 - 17:50 at Room 128 - Technical Session 30 - Builds and Dependencies Chair(s): Christian KästnerTraceability establishes trace links among software artifacts (e.g., requirements and code) based on whether two artifacts relate to the same part of system functionalities. These trace links are valuable for software development process, but are difficult to obtain manually. To cope with the costly and fallible manual recovery, researchers proposed many automated approaches that help to recover trace links through the textual similarities among software artifacts, such as approaches based on Information Retrieval (IR). However, the low quality and the low quantity of artifact texts negatively impact the calculated textual similarities, thus greatly hindering the performance of IR-based approaches. In this study, we propose to extract co-occurred word pairs from the text structures of both requirements and code (i.e., consensual biterms) to improve IR-based traceability recovery. Specifically, we first collect a set of biterms based on the part-of-speech of requirement texts, and then filter them through the code texts. We then use these consensual biterms to both enrich the input corpus for IR techniques and enhance the calculations of IR values. An empirical evaluation based on nine real-world systems shows that our approach can not only outperform baseline approaches, but also achieve a significant complementary effect with other enhancing strategies from different perspectives.
Pre-print