
Registered user since Wed 3 Jul 2019
Contributions
View general profile
Registered user since Wed 3 Jul 2019
Contributions
NIER Track
Wed 12 Oct 2022 16:00 - 16:10 at Banquet B - Technical Session 17 - SE for AI Chair(s): Tim MenziesBytecode is used in software analysis and other approaches due to its advantages such as high availability and simple specification. Therefore, to leverage these advantages in training language models with bytecode, it is important to clearly recognize the characteristics of the naturalness of bytecode. However, the naturalness of bytecode has not been actively explored.
In this paper, we experimentally show the naturalness of bytecode instructions and investigate their characteristics by empirically assessing 10 Java open-source projects. Consequently, we demonstrate that the bytecode instructions are more natural than source code representations and less natural than abstract syntax tree representations at a method-level. Furthermore, we found that there is no correlation between the naturalness of bytecode instructions and source code representations at a method-level. Based on the findings, it is needed to explore the naturalness of bytecode instructions’ characteristics. We expect that the findings of this paper will be helpful for future work to study various automated software engineering tasks that use bytecode models.