Not registered as user yet
Contributions
View general profile
Not registered as user yet
Contributions
Research Papers
Wed 12 Oct 2022 13:30 - 13:50 at Banquet B - Technical Session 15 - Compilers and Languages Chair(s): Lingming ZhangBinary analysis or the ability to analyze binary code is an important capability required for many security and software engineering applications. Consequently, there are many binary analysis tech- niques and tools with varied capabilities. However, testing these tools requires a large, varied binary dataset with corresponding source-level information. In this paper, we present Cornucopia, an architecture agnostic automated framework that can generate a large number of semantically equivalent binaries from program source code. We exploit compiler optimizations and use feedback- guided learning to maximize the generation of unique binaries that correspond to the same program. Our evaluation shows that Cor- nucopia was able to generate 309K binaries across four archi- tectures (x86, x64, ARM, MIPS) with an average of 403 binaries for each program. Our experiments also revealed a large number (∼300) of issues with LLVM optimization scheduler resulting in compiler crashes. Our evaluation of four popular binary analysis tools angr, Ghidra, ida, and radare, using Cornucopia gener- ated binaries, revealed various issues with these tools. Specifically, we found 263 crashes in angr and one memory corruption issue in ida. Our differential testing on the analysis results revealed vari- ous semantic bugs in these tools. We also tested machine learning tools, Asm2Vec, SAFE, and Debin, that claim to capture binary semantics and show that they perform very poorly (e.g., Debin F1 score dropped to 12.9% from reported 63.1%) on Cornucopia generated binaries. In summary, our exhaustive evaluation shows that Cornucopia is an effective mechanism to generate binaries that can be used to test binary analysis techniques effectively.