Namesake: A Checker of Lexical Similarity in Identifier Names
Virtual
Identifier naming is one of the main sources of information in program comprehension, where a significant portion of software development time is spent. Previous research shows that similarity in identifier names could potentially hinder code comprehension, and subsequently code maintenance and evolution. In this paper, we present an open-source tool for assessing confusing naming combinations in Python programs. The tool which we call Namesake, flags confusing identifier naming combinations that are similar in orthography (word form), phonology (pronunciation), or semantics (meaning). Our tool extracts identifier names from the abstract syntax tree of a program, splits compound names, and evaluates the similarity of each pair in orthography, phonology, and semantics. Problematic identifier combinations are flagged to programmers along with their line numbers. In combination with existing coding style checkers, Namesake can provide programmers with an additional resource to enhance identifier naming quality. The tool can be integrated easily in DevOps pipelines for automated checking and identifier naming appraisal.
Names of classes/methods/variables play an important role in code readability. To investigate how developers choose names, Feitelson et al. conducted an empirical survey and suggested a method to improve naming quality. We replicated their study, but limited the survey subjects to university students. Specifically, we conducted two experiments including 341 students from freshman to senior. The aim of the first experiment was to investigate the characteristics of the names given by students. The experimental results showed that the name length as well as the number of words contained in names increased with the grade level and students have ambiguity in understanding the name. The second experiment was to verify whether Feitelson et al.’s naming method can help improve the quality of names given by students. The experimental data showed an improvement in the quality of names for 70% of cases, which confirms the validity of the method for university students.