Accurate String Constraints Solution Counting with Weighted Automata
As an important extension of symbolic execution (SE), probabilistic symbolic execution (PSE) computes execution probabilities of program paths. Using this information, PSE can prioritize path exploration strategies. To calculate the probability of a path PSE relies on solution counting approaches for the path constraint. The correctness of a solution counting approach depends on the methodology used to count solutions and whether a path constraint maintains a one-to-one relation with program input values. This work focuses on the latter aspect of the solution counting correctness for string constraints.
In general, maintaining a one-to-one relation is not always possible, especially in a presence of non-linear constraints. To deal with this issue, researchers that work on PSE for numerical domains either analyze programs with linear constraints, or develop novel techniques to handle solution counting of non-linear constraints. For the string domain, however, previous work on PSE mainly focuses on efficient and accurate solution counting for automata-based string models and has not investigated whether a one-to-one relationship between the strings encoded by automata and input string values is preserved. In this work we demonstrate that traditional automata-base string models fail to maintain one-to-one relations and propose to use the weighted automata model, which preserves the one-to-one relation between the path constraint it encodes and the input string values. We use this model to implement a string constraint solver and show its correctness on a set of non-trivial synthetic benchmarks. We also present an empirical evaluation of traditional and proposed automata solvers on real-world string constraints. The evaluations show that while being less efficient than traditional automata models, the weighted automata model maintains correct solution counts.