Not registered as user yet
Contributions
View general profile
Not registered as user yet
Contributions
Research Papers
Wed 12 Oct 2022 17:40 - 18:00 at Ballroom C East - Technical Session 19 - Formal Methods and Models I Chair(s): Michalis FamelisCode clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design \emph{Amain}, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the complex original tree into simple Markov chains and compute the similarity of all states in these chains. After obtaining all similarity scores, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of \emph{Amain}, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that \emph{Amain} is superior to five state-of-the-art code clone detection tools (\ie \emph{SourcererCC}, \emph{Deckard}, \emph{RtvNN}, \emph{ASTNN}, and \emph{SCDetector}). Furthermore, compared to a recent tree-based code clone detector \emph{ASTNN}, \emph{Amain} is more than 160 times faster in predicting semantic code clones.