Methods for randomized testing of compilers to find miscompilation bugs typically require a way to generate programs that are free from undefined behaviour (UB). Tools such as Csmith achieve UB-freedom by heavily restricting the form of generated programs. This leads to highly idiomatic programs, and we hypothesise that this limits the thoroughness with which compilers are tested. Our idea is that researchers should investigate ways to generate less restricted programs that are still UB-free—programs that get closer to the edge of undefined behaviour, but that do not quite cross the edge. We present experiments investigating one instance of idea via a prototype tool, CsmithEdge, that uses a simple dynamic analysis to detect where Csmith has been too conservative in its use of “safe math” wrappers that guarantee UB-freedom for arithmetic operations, eliminating redundant wrappers. By reducing the use of safe math wrappers, CsmithEdge was able to discover two new miscompilation bugs in GCC that could not be found via intensive testing using regular Csmith, as well as achieving substantial differences in code coverage on GCC compared with regular Csmith.
Junjie ChenTianjin University, China, Haoyang MaCollege of Intelligence and Computing, Tianjin University, Lingming ZhangUniversity of Illinois at Urbana-Champaign
A key challenge in automatic Web testing is the generation of syntactically and semantically valid input values that can exercise the many functionalities that impose constraints on the validity of the inputs. Existing test case generation techniques either rely on manually curated catalogs of values, or extract values from external data sources, such as the Web or publicly available knowledge bases. Unfortunately, relying on manual effort is generally too expensive for most practical applications, while domain-specific and application-specific data can be hardly found either on the Web or in general purpose knowledge bases. This paper proposes DBInputs, a novel approach that reuses the data from the database of the target Web applications, to automatically identify domain-specific and application-specific inputs, and effectively fulfil the validity constraints present in the tested Web pages. DBInputs can properly cope with system testing and maintenance testing efforts, since databases are naturally and inexpensively available in those phases. To extract valid inputs from the application databases, DBInputs exploits the syntactic and semantic similarity between the identifiers of the input fields and the ones in the tables of the database, automatically resolving the mismatch between the user interface and the schema of the database. Our experiments provide initial evidence that DBInputs can outperform both random input selection and LINK, a state-of-the-art approach for searching inputs from knowledge bases.
Compiler bugs can be disastrous since they could affect all the software systems built on the buggy compilers. Meanwhile, diagnosing compiler bugs is extremely challenging since usually limited debugging information is available and a large number of compiler files can be suspicious. More specifically, when compiling a given bug-triggering test program, hundreds of compiler files are usually involved, and can all be treated as suspicious buggy files. To facilitate compiler debugging, in this paper we propose the first reinforcement compiler bug isolation approach via structural mutation, called RecBi. For a given bug-triggering test program, RecBi first augments traditional local mutation operators with structural ones to transforms it into a set of passing test programs. Since not all the passing test programs can help isolate compiler bugs effectively, RecBi further leverages reinforcement learning to intelligently guide the process of passing test program generation. Then, RecBi ranks all the suspicious files by analyzing the compiler execution traces of the generated passing test programs and the given failing test program following the practice of compiler bug isolation. The experimental results on 120 real bugs from two most popular C open-source compilers, i.e., GCC and LLVM, show that RecBi is able to isolate about 23%/58%/78% bugs within Top-1/Top-5/Top-10 compiler files, and significantly outperforms the state-of-the-art compiler bug isolation approach by improving 92.86%/55.56%/25.68% isolation effectiveness in terms of Top-1/Top-5/Top-10 results.
Software testing is an important and time-consuming task that is often done manually. In the last decades, researchers have come up with techniques to generate input data (e.g., fuzzing) and automate the process of generating test cases (e.g., search-based testing). However, these techniques are known to have their own limitations: search-based testing does not generate highly-structured data; grammar-based fuzzing does not generate test case structures. To address these limitations, we combine these two techniques. By applying grammar-based mutations to the input data gathered by the search-based testing algorithm, it allows us to co-evolve both aspects of test case generation. We evaluate our approach by performing an empirical study on 20 Java classes from the three most popular JSON parsers across multiple search budgets. Our results show that the proposed approach on average improves branch coverage for JSON related classes by 15% (with a maximum increase of 50%) without negatively impacting other classes.
Methods for randomized testing of compilers to find miscompilation bugs typically require a way to generate programs that are free from undefined behaviour (UB). Tools such as Csmith achieve UB-freedom by heavily restricting the form of generated programs. This leads to highly idiomatic programs, and we hypothesise that this limits the thoroughness with which compilers are tested. Our idea is that researchers should investigate ways to generate less restricted programs that are still UB-free—programs that get closer to the edge of undefined behaviour, but that do not quite cross the edge. We present experiments investigating one instance of idea via a prototype tool, CsmithEdge, that uses a simple dynamic analysis to detect where Csmith has been too conservative in its use of “safe math” wrappers that guarantee UB-freedom for arithmetic operations, eliminating redundant wrappers. By reducing the use of safe math wrappers, CsmithEdge was able to discover two new miscompilation bugs in GCC that could not be found via intensive testing using regular Csmith, as well as achieving substantial differences in code coverage on GCC compared with regular Csmith.