The decompiler is one of the most common tools for examining binaries without corresponding source code. It transforms binaries into high-level code, reversing the compilation process. However, compilation loses information contained within the original source code (e.g., structure, type information, and variable names). Semantically meaningful variable names are known to increase code understandability, but they generally cannot be recovered by decompilers. We propose the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses both lexical and structural information. We also present a technique for generating corpora suitable for training and evaluating models of decompiled code renaming, which we use to create a corpus of 164,632 unique x86-64 binaries generated from C projects mined from GitHub. Our results show that on this corpus DIRE can predict variable names identical to the names in the original source code up to 74.3% of the time.
Wed 13 Nov
16:00 - 17:40: Papers - API and Renaming at Cortez 2&3 Chair(s): Massimiliano Di PentaUniversity of Sannio | ||||||||||||||||||||||||||||||||||||||||||
16:00 - 16:20 Talk | CodeKernel: A Graph Kernel based Approach to the Selection of API Usage Examples Xiaodong GuThe Hong Kong University of Science and Technology, Hongyu ZhangThe University of Newcastle, Sunghun KimHong Kong University of Science and Technology Pre-print | |||||||||||||||||||||||||||||||||||||||||
16:20 - 16:40 Talk | Machine Learning Based Automated Method Name Recommendation: How Far Are We Lin Jiangbeijing university of posts and telecommunication, Hui LiuBeijing Institute of Technology, He JiangSchool of Software, Dalian University of Technology Link to publication Pre-print | |||||||||||||||||||||||||||||||||||||||||
16:40 - 17:00 Talk | MARBLE: Mining for Boilerplate Code to Identify API Usability Problems Daye NamCarnegie Mellon University, Amber HorvathCarnegie Mellon University, Andrew MacveanGoogle, Inc., Brad MyersCarnegie Mellon University, Bogdan VasilescuCarnegie Mellon University Pre-print | |||||||||||||||||||||||||||||||||||||||||
17:00 - 17:20 Talk | DIRE: A Neural Approach to Decompiled Identifier Renaming Jeremy LacomisCarnegie Mellon University, Pengcheng YinCarnegie Mellon University, Edward J. SchwartzCarnegie Mellon University Software Engineering Institute, Miltiadis AllamanisMicrosoft Research, Cambridge, Claire Le GouesCarnegie Mellon University, Graham NeubigCarnegie Mellon University, Bogdan VasilescuCarnegie Mellon University Pre-print Media Attached | |||||||||||||||||||||||||||||||||||||||||
17:20 - 17:40 Talk | Automatic Detection and Update Suggestion for Outdated API Names in Documentation Seonah LeeGyeongsang National University, Rongxin WuDepartment of Computer Science and Engineering, The Hong Kong University of Science and Technology, Shing-Chi CheungDepartment of Computer Science and Engineering, The Hong Kong University of Science and Technology, Sungwon KangKorea Advanced Institute of Science and Technology Link to publication |